***

title: AWS EKS Setup
description: Create and configure an EKS cluster for Smallest Self-Host with GPU support
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/waves/v-4-0-0/self-host/kubernetes-setup/aws/llms.txt. For full documentation content, see https://docs.smallest.ai/waves/v-4-0-0/self-host/kubernetes-setup/aws/llms-full.txt.

## Overview

This guide walks you through creating an Amazon EKS cluster optimized for running Smallest Self-Host with GPU acceleration.

## Prerequisites

<Steps>
  <Step title="AWS CLI">
    Install and configure AWS CLI:

    ```bash
    aws --version
    aws configure
    ```
  </Step>

  <Step title="eksctl">
    Install eksctl (EKS cluster management tool):

    ```bash
    brew install eksctl
    ```

    Verify:

    ```bash
    eksctl version
    ```
  </Step>

  <Step title="kubectl">
    Install kubectl:

    ```bash
    brew install kubectl
    ```
  </Step>

  <Step title="IAM Permissions">
    Ensure your AWS user/role has permissions to:

    * Create EKS clusters
    * Manage EC2 instances
    * Create IAM roles
    * Manage VPC resources
  </Step>
</Steps>

## Cluster Configuration

### Option 1: Quick Start with eksctl

Create a cluster with GPU nodes using a single command:

```bash
eksctl create cluster \
  --name smallest-cluster \
  --region us-east-1 \
  --version 1.28 \
  --nodegroup-name cpu-nodes \
  --node-type t3.large \
  --nodes 2 \
  --nodes-min 1 \
  --nodes-max 3 \
  --managed
```

Then add GPU node group:

```bash
eksctl create nodegroup \
  --cluster smallest-cluster \
  --region us-east-1 \
  --name gpu-nodes \
  --node-type g5.xlarge \
  --nodes 1 \
  --nodes-min 0 \
  --nodes-max 5 \
  --managed \
  --node-labels "workload=gpu,nvidia.com/gpu=true" \
  --node-taints "nvidia.com/gpu=true:NoSchedule"
```

<Note>
  This creates a cluster with separate CPU and GPU node groups, allowing for cost-effective scaling.
</Note>

### Option 2: Using Cluster Config File

Create a cluster configuration file for more control:

```yaml cluster-config.yaml
apiVersion: eksctl.io/v1alpha5
kind: ClusterConfig

metadata:
  name: smallest-cluster
  region: us-east-1
  version: "1.28"

iam:
  withOIDC: true

managedNodeGroups:
  - name: cpu-nodes
    instanceType: t3.large
    minSize: 1
    maxSize: 3
    desiredCapacity: 2
    volumeSize: 50
    ssh:
      allow: false
    labels:
      workload: cpu
    tags:
      Environment: production
      Application: smallest-self-host

  - name: gpu-nodes
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 1
    volumeSize: 100
    ssh:
      allow: false
    labels:
      workload: gpu
      nvidia.com/gpu: "true"
      node.kubernetes.io/instance-type: g5.xlarge
    taints:
      - key: nvidia.com/gpu
        value: "true"
        effect: NoSchedule
    tags:
      Environment: production
      Application: smallest-self-host
      NodeType: gpu
    iam:
      withAddonPolicies:
        autoScaler: true
        ebs: true
        efs: true

addons:
  - name: vpc-cni
  - name: coredns
  - name: kube-proxy
  - name: aws-ebs-csi-driver
```

Create the cluster:

```bash
eksctl create cluster -f cluster-config.yaml
```

<Tip>
  Cluster creation takes 15-20 minutes. Monitor progress in the AWS CloudFormation console.
</Tip>

## GPU Instance Types

Choose the right GPU instance type for your workload:

<table>
  <thead>
    <tr>
      <th>
        Instance Type
      </th>

      <th>
        GPU
      </th>

      <th>
        VRAM
      </th>

      <th>
        vCPUs
      </th>

      <th>
        RAM
      </th>

      <th>
        $/hour*
      </th>

      <th>
        Recommended For
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        g5.xlarge
      </td>

      <td>
        1x A10G
      </td>

      <td>
        24 GB
      </td>

      <td>
        4
      </td>

      <td>
        16 GB
      </td>

      <td>
        $1.00
      </td>

      <td>
        Development, testing
      </td>
    </tr>

    <tr>
      <td>
        g5.2xlarge
      </td>

      <td>
        1x A10G
      </td>

      <td>
        24 GB
      </td>

      <td>
        8
      </td>

      <td>
        32 GB
      </td>

      <td>
        $1.21
      </td>

      <td>
        Small production
      </td>
    </tr>

    <tr>
      <td>
        g5.4xlarge
      </td>

      <td>
        1x A10G
      </td>

      <td>
        24 GB
      </td>

      <td>
        16
      </td>

      <td>
        64 GB
      </td>

      <td>
        $1.63
      </td>

      <td>
        Medium production
      </td>
    </tr>

    <tr>
      <td>
        g5.12xlarge
      </td>

      <td>
        4x A10G
      </td>

      <td>
        96 GB
      </td>

      <td>
        48
      </td>

      <td>
        192 GB
      </td>

      <td>
        $5.67
      </td>

      <td>
        High-volume production
      </td>
    </tr>

    <tr>
      <td>
        p3.2xlarge
      </td>

      <td>
        1x V100
      </td>

      <td>
        16 GB
      </td>

      <td>
        8
      </td>

      <td>
        61 GB
      </td>

      <td>
        $3.06
      </td>

      <td>
        Legacy workloads
      </td>
    </tr>
  </tbody>
</table>

<small>
  \* Approximate on-demand pricing in us-east-1, subject to change
</small>

<Note>
  **Recommendation**: Start with `g5.xlarge` for development and testing. Scale to `g5.2xlarge` or higher for production.
</Note>

## Verify Cluster

### Check Cluster Status

```bash
eksctl get cluster --name smallest-cluster --region us-east-1
```

### Verify Node Groups

```bash
eksctl get nodegroup --cluster smallest-cluster --region us-east-1
```

### Configure kubectl

```bash
aws eks update-kubeconfig --name smallest-cluster --region us-east-1
```

Verify access:

```bash
kubectl get nodes
```

Expected output:

```
NAME                         STATUS   ROLES    AGE   VERSION
ip-xxx-cpu-1                 Ready    <none>   5m    v1.28.x
ip-xxx-cpu-2                 Ready    <none>   5m    v1.28.x
ip-xxx-gpu-1                 Ready    <none>   5m    v1.28.x
```

### Verify GPU Nodes

Check GPU availability:

```bash
kubectl get nodes -l workload=gpu -o json | \
  jq '.items[].status.capacity'
```

Look for `nvidia.com/gpu` in the output:

```json
{
  "cpu": "4",
  "memory": "15944904Ki",
  "nvidia.com/gpu": "1",
  "pods": "29"
}
```

## Install NVIDIA Device Plugin

The NVIDIA device plugin enables GPU scheduling in Kubernetes.

### Using Helm (Recommended)

The Smallest Self-Host chart includes the NVIDIA GPU Operator. Enable it in your values:

```yaml values.yaml
gpu-operator:
  enabled: true
```

### Manual Installation

If installing separately:

```bash
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml
```

Verify:

```bash
kubectl get pods -n kube-system | grep nvidia
```

## Install EBS CSI Driver

Required for persistent volumes:

### Using eksctl

```bash
eksctl create addon \
  --name aws-ebs-csi-driver \
  --cluster smallest-cluster \
  --region us-east-1
```

### Using AWS Console

1. Navigate to EKS → Clusters → smallest-cluster → Add-ons
2. Click "Add new"
3. Select "Amazon EBS CSI Driver"
4. Click "Add"

### Verify EBS CSI Driver

```bash
kubectl get pods -n kube-system -l app=ebs-csi-controller
```

## Install EFS CSI Driver (Optional)

Recommended for shared model storage across pods.

### Create IAM Policy

```bash
curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json

aws iam create-policy \
  --policy-name AmazonEKS_EFS_CSI_Driver_Policy \
  --policy-document file://iam-policy.json
```

### Create IAM Service Account

```bash
eksctl create iamserviceaccount \
  --cluster smallest-cluster \
  --region us-east-1 \
  --namespace kube-system \
  --name efs-csi-controller-sa \
  --attach-policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/AmazonEKS_EFS_CSI_Driver_Policy \
  --approve
```

Replace `YOUR_ACCOUNT_ID` with your AWS account ID.

### Install EFS CSI Driver

```bash
kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.7"
```

Verify:

```bash
kubectl get pods -n kube-system -l app=efs-csi-controller
```

## Enable Cluster Autoscaler

See the [Cluster Autoscaler](/waves/self-host/kubernetes-setup/autoscaling/cluster-autoscaler) guide for detailed setup.

Quick setup:

```bash
eksctl create iamserviceaccount \
  --cluster smallest-cluster \
  --region us-east-1 \
  --namespace kube-system \
  --name cluster-autoscaler \
  --attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
  --approve \
  --override-existing-serviceaccounts
```

## Cost Optimization

### Use Spot Instances for GPU Nodes

Reduce costs by up to 70% with Spot instances:

```yaml cluster-config.yaml
managedNodeGroups:
  - name: gpu-nodes-spot
    instanceType: g5.xlarge
    minSize: 0
    maxSize: 5
    desiredCapacity: 1
    spot: true
    instancesDistribution:
      maxPrice: 0.50
      instanceTypes: ["g5.xlarge", "g5.2xlarge"]
      onDemandBaseCapacity: 0
      onDemandPercentageAboveBaseCapacity: 0
      spotAllocationStrategy: capacity-optimized
```

<Warning>
  Spot instances can be interrupted with 2-minute warning. Ensure your application handles graceful shutdowns.
</Warning>

### Right-Size Node Groups

Start small and scale based on metrics:

```yaml
managedNodeGroups:
  - name: gpu-nodes
    minSize: 0
    maxSize: 10
    desiredCapacity: 1
```

Set `minSize: 0` to scale down to zero during off-hours.

### Enable Cluster Autoscaler

Automatically adjust node count based on demand:

```yaml values.yaml
cluster-autoscaler:
  enabled: true
  autoDiscovery:
    clusterName: smallest-cluster
  awsRegion: us-east-1
```

## Security Best Practices

### Enable Private Endpoint

```bash
eksctl utils update-cluster-endpoint \
  --cluster smallest-cluster \
  --region us-east-1 \
  --private-access=true \
  --public-access=false
```

### Enable Logging

```bash
eksctl utils update-cluster-logging \
  --cluster smallest-cluster \
  --region us-east-1 \
  --enable-types all \
  --approve
```

### Update Security Groups

Restrict inbound access to API server:

```bash
aws ec2 describe-security-groups \
  --filters "Name=tag:aws:eks:cluster-name,Values=smallest-cluster"
```

Update rules to allow only specific IPs.

## Troubleshooting

### GPU Nodes Not Ready

Check NVIDIA device plugin:

```bash
kubectl get pods -n kube-system | grep nvidia
kubectl describe node <gpu-node-name>
```

### Pods Stuck in Pending

Check node capacity:

```bash
kubectl describe pod <pod-name>
kubectl get nodes -o json | jq '.items[].status.allocatable'
```

### EBS Volumes Not Mounting

Verify EBS CSI driver:

```bash
kubectl get pods -n kube-system -l app=ebs-csi-controller
kubectl logs -n kube-system -l app=ebs-csi-controller
```

## What's Next?

<CardGroup cols={2}>
  <Card title="IAM & IRSA" href="/waves/self-host/kubernetes-setup/aws/iam-irsa">
    Configure IAM roles for service accounts
  </Card>

  <Card title="GPU Nodes" href="/waves/self-host/kubernetes-setup/aws/gpu-nodes">
    Advanced GPU node configuration and optimization
  </Card>

  <Card title="EFS Configuration" href="/waves/self-host/kubernetes-setup/storage-pvc/efs-configuration">
    Set up shared file storage for models
  </Card>

  <Card title="Cluster Autoscaler" href="/waves/self-host/kubernetes-setup/autoscaling/cluster-autoscaler">
    Enable automatic node scaling
  </Card>
</CardGroup>
Instance Type	GPU	VRAM	vCPUs	RAM	$/hour*	Recommended For
g5.xlarge	1x A10G	24 GB	4	16 GB	$1.00	Development, testing
g5.2xlarge	1x A10G	24 GB	8	32 GB	$1.21	Small production
g5.4xlarge	1x A10G	24 GB	16	64 GB	$1.63	Medium production
g5.12xlarge	4x A10G	96 GB	48	192 GB	$5.67	High-volume production
p3.2xlarge	1x V100	16 GB	8	61 GB	$3.06	Legacy workloads