Cluster Autoscaler | Smallest AI Docs

Overview

The Cluster Autoscaler automatically adjusts the number of nodes in your EKS cluster based on pending pods and resource utilization. When combined with HPA, it provides end-to-end autoscaling from application load to infrastructure capacity.

How It Works

Flow:

HPA scales pods based on metrics
New pods enter “Pending” state (insufficient resources)
Cluster Autoscaler detects pending pods
Adds nodes to Auto Scaling Group
Pods scheduled on new nodes
After scale-down period, removes underutilized nodes

Prerequisites

IAM Role

Create IAM role with autoscaling permissions (see IAM & IRSA)

Node Group Tags

Ensure node groups have proper tags:

k8s.io/cluster-autoscaler/<cluster-name>: owned
k8s.io/cluster-autoscaler/enabled: true

Service Account

IRSA-enabled service account for Cluster Autoscaler

Installation

Using Helm Chart

The Smallest Self-Host chart includes Cluster Autoscaler as a dependency:

values.yaml

1 cluster-autoscaler:
2   enabled: true
3   rbac:
4     serviceAccount:
5       name: cluster-autoscaler
6       annotations:
7         eks.amazonaws.com/role-arn: arn:aws:iam::YOUR_ACCOUNT_ID:role/cluster-autoscaler-role
8   autoDiscovery:
9     clusterName: smallest-cluster
10   awsRegion: us-east-1
11   
12   extraArgs:
13     balance-similar-node-groups: true
14     skip-nodes-with-system-pods: false
15     scale-down-delay-after-add: 5m
16     scale-down-unneeded-time: 10m

Deploy:

$ helm upgrade --install smallest-self-host smallest-self-host/smallest-self-host \
>   -f values.yaml \
>   --namespace smallest

Standalone Installation

Install Cluster Autoscaler separately:

$ helm repo add autoscaler https://kubernetes.github.io/autoscaler
$ helm repo update
$ 
$ helm install cluster-autoscaler autoscaler/cluster-autoscaler \
>   --namespace kube-system \
>   --set autoDiscovery.clusterName=smallest-cluster \
>   --set awsRegion=us-east-1 \
>   --set rbac.serviceAccount.annotations."eks\.amazonaws\.com/role-arn"=arn:aws:iam::ACCOUNT_ID:role/cluster-autoscaler-role

Configuration

Auto-Discovery

Auto-discover Auto Scaling Groups by cluster name:

1 autoDiscovery:
2   clusterName: smallest-cluster
3   tags:
4     - k8s.io/cluster-autoscaler/enabled
5     - k8s.io/cluster-autoscaler/smallest-cluster

Manual Configuration

Explicitly specify Auto Scaling Groups:

1 autoscalingGroups:
2   - name: eks-cpu-nodes
3     minSize: 1
4     maxSize: 10
5   - name: eks-gpu-nodes
6     minSize: 0
7     maxSize: 20

Scale-Down Configuration

Control when and how nodes are removed:

1 extraArgs:
2   scale-down-enabled: true
3   scale-down-delay-after-add: 10m
4   scale-down-unneeded-time: 10m
5   scale-down-utilization-threshold: 0.5
6   max-graceful-termination-sec: 600

Parameters:

scale-down-delay-after-add: Wait time after adding node before considering scale-down
scale-down-unneeded-time: How long node must be underutilized before removal
scale-down-utilization-threshold: CPU/memory threshold (0.5 = 50%)
max-graceful-termination-sec: Max time for pod eviction

Node Group Priorities

Scale specific node groups first:

1 extraArgs:
2   expander: priority
3   
4 priorityConfigMapAnnotations:
5   cluster-autoscaler.kubernetes.io/expander-priorities: |
6     10:
7       - .*-spot-.*
8     50:
9       - .*-ondemand-.*

Priorities:

Higher number = higher priority
Regex patterns match node group names
Useful for preferring spot instances

Verify Installation

Check Cluster Autoscaler Pod

$ kubectl get pods -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler

Check Logs

$ kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler -f

Look for:

Starting cluster autoscaler
Auto-discovery enabled
Discovered node groups: [eks-gpu-nodes, eks-cpu-nodes]

Verify IAM Permissions

$ kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler | grep -i "error\|permission"

Should show no permission errors.

Testing Cluster Autoscaler

Trigger Scale-Up

Create pods that exceed cluster capacity:

$ kubectl run test-scale-up-1 \
>   --image=nginx \
>   --requests='cpu=1,memory=1Gi' \
>   --replicas=20 \
>   --namespace=smallest

Watch nodes:

$ watch -n 5 'kubectl get nodes'

Watch Cluster Autoscaler:

$ kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler -f

Expected behavior:

Pods enter “Pending” state
Cluster Autoscaler detects pending pods
Logs show: “Scale-up: setting group size to X”
New nodes appear in kubectl get nodes
Pods transition to “Running”

Trigger Scale-Down

Delete test pods:

$ kubectl delete deployment test-scale-up-1 -n smallest

After scale-down-unneeded-time (default 10 minutes):

Cluster Autoscaler marks underutilized nodes
Drains pods gracefully
Terminates EC2 instances
Node count decreases

GPU Node Scaling

Configure GPU Node Groups

Tag GPU node groups for autoscaling:

cluster-config.yaml

1 managedNodeGroups:
2   - name: gpu-nodes
3     instanceType: g5.xlarge
4     minSize: 0
5     maxSize: 10
6     desiredCapacity: 1
7     tags:
8       k8s.io/cluster-autoscaler/smallest-cluster: "owned"
9       k8s.io/cluster-autoscaler/enabled: "true"
10       k8s.io/cluster-autoscaler/node-template/label/workload: "gpu"

Prevent Cluster Autoscaler on GPU Nodes

Run Cluster Autoscaler on CPU nodes to avoid wasting GPU:

values.yaml

1 cluster-autoscaler:
2   nodeSelector:
3     workload: cpu
4   
5   tolerations: []

Scale to Zero

Allow GPU nodes to scale to zero during off-hours:

1 managedNodeGroups:
2   - name: gpu-nodes
3     minSize: 0
4     maxSize: 10

Cluster Autoscaler will:

Add GPU nodes when Lightning ASR pods are pending
Remove GPU nodes when all GPU workloads complete

First startup after scale-to-zero takes longer (node provisioning + model download).

Spot Instance Integration

Mixed Instance Groups

Use spot and on-demand instances:

cluster-config.yaml

1 managedNodeGroups:
2   - name: gpu-nodes-mixed
3     minSize: 1
4     maxSize: 10
5     instancesDistribution:
6       onDemandBaseCapacity: 1
7       onDemandPercentageAboveBaseCapacity: 20
8       spotAllocationStrategy: capacity-optimized
9     instanceTypes:
10       - g5.xlarge
11       - g5.2xlarge
12       - g4dn.xlarge

Configuration:

Base capacity: 1 on-demand node always
Additional capacity: 20% on-demand, 80% spot
Multiple instance types increase spot availability

Handle Spot Interruptions

Configure Cluster Autoscaler for spot:

1 extraArgs:
2   balance-similar-node-groups: true
3   skip-nodes-with-local-storage: false
4   max-node-provision-time: 15m

Add AWS Node Termination Handler:

$ helm repo add eks https://aws.github.io/eks-charts
$ helm install aws-node-termination-handler eks/aws-node-termination-handler \
>   --namespace kube-system \
>   --set enableSpotInterruptionDraining=true

Advanced Configuration

Multiple Node Groups

Scale different workloads independently:

1 cluster-autoscaler:
2   autoscalingGroups:
3     - name: cpu-small
4       minSize: 2
5       maxSize: 10
6     - name: cpu-large
7       minSize: 0
8       maxSize: 5
9     - name: gpu-a10
10       minSize: 0
11       maxSize: 10
12     - name: gpu-t4
13       minSize: 0
14       maxSize: 5

Scale-Up Policies

Control scale-up behavior:

1 extraArgs:
2   max-nodes-total: 50
3   max-empty-bulk-delete: 10
4   new-pod-scale-up-delay: 0s
5   scan-interval: 10s

Resource Limits

Prevent runaway scaling:

1 extraArgs:
2   cores-total: "0:512"
3   memory-total: "0:2048"
4   max-nodes-total: 100

Monitoring

CloudWatch Metrics

View Auto Scaling Group metrics in CloudWatch:

GroupDesiredCapacity
GroupInServiceInstances
GroupPendingInstances
GroupTerminatingInstances

Kubernetes Events

$ kubectl get events -n smallest --sort-by='.lastTimestamp' | grep -i scale

Cluster Autoscaler Status

$ kubectl get configmap cluster-autoscaler-status -n kube-system -o yaml

Grafana Dashboard

Import Cluster Autoscaler dashboard:

Dashboard ID: 3831

See Grafana Dashboards

Troubleshooting

Nodes Not Scaling Up

Check pending pods:

$ kubectl get pods --all-namespaces --field-selector=status.phase=Pending

Check Cluster Autoscaler logs:

$ kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler --tail=100

Common issues:

Max nodes reached (max-nodes-total)
IAM permission denied
Auto Scaling Group at max capacity
Node group not tagged properly

Nodes Not Scaling Down

Check node utilization:

$ kubectl top nodes

Check for blocking conditions:

$ kubectl describe node <node-name> | grep -i "scale-down disabled"

Common causes:

Pods without PodDisruptionBudget
Pods with local storage
System pods (unless skip-nodes-with-system-pods: false)
Nodes below utilization threshold

Permission Errors

Check service account:

$ kubectl describe sa cluster-autoscaler -n kube-system

Verify IAM role:

$ kubectl logs -n kube-system -l app.kubernetes.io/name=aws-cluster-autoscaler | grep AccessDenied

Update IAM policy if needed (see IAM & IRSA)

Best Practices

Tag Node Groups Properly

Always tag Auto Scaling Groups:

k8s.io/cluster-autoscaler/smallest-cluster: owned
k8s.io/cluster-autoscaler/enabled: true

Set Realistic Limits

Configure appropriate min/max for each node group:

1 gpu-nodes:
2   minSize: 0  # Save costs
3   maxSize: 10 # Prevent runaway

Use PodDisruptionBudgets

Protect critical workloads during scale-down:

1 apiVersion: policy/v1
2 kind: PodDisruptionBudget
3 metadata:
4   name: lightning-asr-pdb
5 spec:
6   minAvailable: 1
7   selector:
8     matchLabels:
9       app: lightning-asr

Monitor Scaling Events

Track scaling decisions in Grafana

Set alerts for scale failures

Test Regularly

Periodically test scale-up and scale-down:

$ kubectl scale deployment lightning-asr --replicas=20

Watch for proper node addition/removal

What’s Next?

HPA Configuration

Configure pod-level autoscaling

Metrics Setup

Set up Prometheus metrics

Grafana Dashboards

Visualize autoscaling behavior