AWS EKS Setup | Smallest AI Docs

Overview

This guide walks you through creating an Amazon EKS cluster optimized for running Smallest Self-Host with GPU acceleration.

Prerequisites

AWS CLI

Install and configure AWS CLI:

$ aws --version
$ aws configure

eksctl

Install eksctl (EKS cluster management tool):

$ brew install eksctl

Verify:

$ eksctl version

kubectl

Install kubectl:

$ brew install kubectl

IAM Permissions

Ensure your AWS user/role has permissions to:

Create EKS clusters
Manage EC2 instances
Create IAM roles
Manage VPC resources

Cluster Configuration

Option 1: Quick Start with eksctl

Create a cluster with GPU nodes using a single command:

$ eksctl create cluster \
>   --name smallest-cluster \
>   --region us-east-1 \
>   --version 1.28 \
>   --nodegroup-name cpu-nodes \
>   --node-type t3.large \
>   --nodes 2 \
>   --nodes-min 1 \
>   --nodes-max 3 \
>   --managed

Then add GPU node group:

$ eksctl create nodegroup \
>   --cluster smallest-cluster \
>   --region us-east-1 \
>   --name gpu-nodes \
>   --node-type g5.xlarge \
>   --nodes 1 \
>   --nodes-min 0 \
>   --nodes-max 5 \
>   --managed \
>   --node-labels "workload=gpu,nvidia.com/gpu=true" \
>   --node-taints "nvidia.com/gpu=true:NoSchedule"

This creates a cluster with separate CPU and GPU node groups, allowing for cost-effective scaling.

Option 2: Using Cluster Config File

Create a cluster configuration file for more control:

cluster-config.yaml

1 apiVersion: eksctl.io/v1alpha5
2 kind: ClusterConfig
3 
4 metadata:
5   name: smallest-cluster
6   region: us-east-1
7   version: "1.28"
8 
9 iam:
10   withOIDC: true
11 
12 managedNodeGroups:
13   - name: cpu-nodes
14     instanceType: t3.large
15     minSize: 1
16     maxSize: 3
17     desiredCapacity: 2
18     volumeSize: 50
19     ssh:
20       allow: false
21     labels:
22       workload: cpu
23     tags:
24       Environment: production
25       Application: smallest-self-host
26 
27   - name: gpu-nodes
28     instanceType: g5.xlarge
29     minSize: 0
30     maxSize: 5
31     desiredCapacity: 1
32     volumeSize: 100
33     ssh:
34       allow: false
35     labels:
36       workload: gpu
37       nvidia.com/gpu: "true"
38       node.kubernetes.io/instance-type: g5.xlarge
39     taints:
40       - key: nvidia.com/gpu
41         value: "true"
42         effect: NoSchedule
43     tags:
44       Environment: production
45       Application: smallest-self-host
46       NodeType: gpu
47     iam:
48       withAddonPolicies:
49         autoScaler: true
50         ebs: true
51         efs: true
52 
53 addons:
54   - name: vpc-cni
55   - name: coredns
56   - name: kube-proxy
57   - name: aws-ebs-csi-driver

Create the cluster:

$ eksctl create cluster -f cluster-config.yaml

Cluster creation takes 15-20 minutes. Monitor progress in the AWS CloudFormation console.

GPU Instance Types

Choose the right GPU instance type for your workload:

Instance Type	GPU	VRAM	vCPUs	RAM	$/hour*	Recommended For
g5.xlarge	1x A10G	24 GB	4	16 GB	$1.00	Development, testing
g5.2xlarge	1x A10G	24 GB	8	32 GB	$1.21	Small production
g5.4xlarge	1x A10G	24 GB	16	64 GB	$1.63	Medium production
g5.12xlarge	4x A10G	96 GB	48	192 GB	$5.67	High-volume production
p3.2xlarge	1x V100	16 GB	8	61 GB	$3.06	Legacy workloads

* Approximate on-demand pricing in us-east-1, subject to change

Recommendation: Start with g5.xlarge for development and testing. Scale to g5.2xlarge or higher for production.

Verify Cluster

Check Cluster Status

$ eksctl get cluster --name smallest-cluster --region us-east-1

Verify Node Groups

$ eksctl get nodegroup --cluster smallest-cluster --region us-east-1

Configure kubectl

$ aws eks update-kubeconfig --name smallest-cluster --region us-east-1

Verify access:

$ kubectl get nodes

Expected output:

NAME                         STATUS   ROLES    AGE   VERSION
ip-xxx-cpu-1                 Ready    <none>   5m    v1.28.x
ip-xxx-cpu-2                 Ready    <none>   5m    v1.28.x
ip-xxx-gpu-1                 Ready    <none>   5m    v1.28.x

Verify GPU Nodes

Check GPU availability:

$ kubectl get nodes -l workload=gpu -o json | \
>   jq '.items[].status.capacity'

Look for nvidia.com/gpu in the output:

1 {
2   "cpu": "4",
3   "memory": "15944904Ki",
4   "nvidia.com/gpu": "1",
5   "pods": "29"
6 }

Install NVIDIA Device Plugin

The NVIDIA device plugin enables GPU scheduling in Kubernetes.

Using Helm (Recommended)

The Smallest Self-Host chart includes the NVIDIA GPU Operator. Enable it in your values:

values.yaml

1 gpu-operator:
2   enabled: true

Manual Installation

If installing separately:

$ kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v0.14.0/nvidia-device-plugin.yml

Verify:

$ kubectl get pods -n kube-system | grep nvidia

Install EBS CSI Driver

Required for persistent volumes:

Using eksctl

$ eksctl create addon \
>   --name aws-ebs-csi-driver \
>   --cluster smallest-cluster \
>   --region us-east-1

Using AWS Console

Navigate to EKS → Clusters → smallest-cluster → Add-ons
Click “Add new”
Select “Amazon EBS CSI Driver”
Click “Add”

Verify EBS CSI Driver

$ kubectl get pods -n kube-system -l app=ebs-csi-controller

Install EFS CSI Driver (Optional)

Recommended for shared model storage across pods.

Create IAM Policy

$ curl -o iam-policy.json https://raw.githubusercontent.com/kubernetes-sigs/aws-efs-csi-driver/master/docs/iam-policy-example.json
$ 
$ aws iam create-policy \
>   --policy-name AmazonEKS_EFS_CSI_Driver_Policy \
>   --policy-document file://iam-policy.json

Create IAM Service Account

$ eksctl create iamserviceaccount \
>   --cluster smallest-cluster \
>   --region us-east-1 \
>   --namespace kube-system \
>   --name efs-csi-controller-sa \
>   --attach-policy-arn arn:aws:iam::YOUR_ACCOUNT_ID:policy/AmazonEKS_EFS_CSI_Driver_Policy \
>   --approve

Replace YOUR_ACCOUNT_ID with your AWS account ID.

Install EFS CSI Driver

$ kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.7"

Verify:

$ kubectl get pods -n kube-system -l app=efs-csi-controller

Enable Cluster Autoscaler

See the Cluster Autoscaler guide for detailed setup.

Quick setup:

$ eksctl create iamserviceaccount \
>   --cluster smallest-cluster \
>   --region us-east-1 \
>   --namespace kube-system \
>   --name cluster-autoscaler \
>   --attach-policy-arn arn:aws:iam::aws:policy/AutoScalingFullAccess \
>   --approve \
>   --override-existing-serviceaccounts

Cost Optimization

Use Spot Instances for GPU Nodes

Reduce costs by up to 70% with Spot instances:

cluster-config.yaml

1 managedNodeGroups:
2   - name: gpu-nodes-spot
3     instanceType: g5.xlarge
4     minSize: 0
5     maxSize: 5
6     desiredCapacity: 1
7     spot: true
8     instancesDistribution:
9       maxPrice: 0.50
10       instanceTypes: ["g5.xlarge", "g5.2xlarge"]
11       onDemandBaseCapacity: 0
12       onDemandPercentageAboveBaseCapacity: 0
13       spotAllocationStrategy: capacity-optimized

Spot instances can be interrupted with 2-minute warning. Ensure your application handles graceful shutdowns.

Right-Size Node Groups

Start small and scale based on metrics:

1 managedNodeGroups:
2   - name: gpu-nodes
3     minSize: 0
4     maxSize: 10
5     desiredCapacity: 1

Set minSize: 0 to scale down to zero during off-hours.

Enable Cluster Autoscaler

Automatically adjust node count based on demand:

values.yaml

1 cluster-autoscaler:
2   enabled: true
3   autoDiscovery:
4     clusterName: smallest-cluster
5   awsRegion: us-east-1

Security Best Practices

Enable Private Endpoint

$ eksctl utils update-cluster-endpoint \
>   --cluster smallest-cluster \
>   --region us-east-1 \
>   --private-access=true \
>   --public-access=false

Enable Logging

$ eksctl utils update-cluster-logging \
>   --cluster smallest-cluster \
>   --region us-east-1 \
>   --enable-types all \
>   --approve

Update Security Groups

Restrict inbound access to API server:

$ aws ec2 describe-security-groups \
>   --filters "Name=tag:aws:eks:cluster-name,Values=smallest-cluster"

Update rules to allow only specific IPs.

Troubleshooting

GPU Nodes Not Ready

Check NVIDIA device plugin:

$ kubectl get pods -n kube-system | grep nvidia
$ kubectl describe node <gpu-node-name>

Pods Stuck in Pending

Check node capacity:

$ kubectl describe pod <pod-name>
$ kubectl get nodes -o json | jq '.items[].status.allocatable'

EBS Volumes Not Mounting

Verify EBS CSI driver:

$ kubectl get pods -n kube-system -l app=ebs-csi-controller
$ kubectl logs -n kube-system -l app=ebs-csi-controller

What’s Next?

IAM & IRSA

Configure IAM roles for service accounts

GPU Nodes

Advanced GPU node configuration and optimization

EFS Configuration

Set up shared file storage for models

Cluster Autoscaler

Enable automatic node scaling