The Cluster Autoscaler automatically adjusts the number of nodes in your EKS cluster based on pending pods and resource utilization. When combined with HPA, it provides end-to-end autoscaling from application load to infrastructure capacity.
Flow:
The Smallest Self-Host chart includes Cluster Autoscaler as a dependency:
Deploy:
Install Cluster Autoscaler separately:
Auto-discover Auto Scaling Groups by cluster name:
Explicitly specify Auto Scaling Groups:
Control when and how nodes are removed:
Parameters:
scale-down-delay-after-add: Wait time after adding node before considering scale-downscale-down-unneeded-time: How long node must be underutilized before removalscale-down-utilization-threshold: CPU/memory threshold (0.5 = 50%)max-graceful-termination-sec: Max time for pod evictionScale specific node groups first:
Priorities:
Look for:
Should show no permission errors.
Create pods that exceed cluster capacity:
Watch nodes:
Watch Cluster Autoscaler:
Expected behavior:
kubectl get nodesDelete test pods:
After scale-down-unneeded-time (default 10 minutes):
Tag GPU node groups for autoscaling:
Run Cluster Autoscaler on CPU nodes to avoid wasting GPU:
Allow GPU nodes to scale to zero during off-hours:
Cluster Autoscaler will:
First startup after scale-to-zero takes longer (node provisioning + model download).
Use spot and on-demand instances:
Configuration:
Configure Cluster Autoscaler for spot:
Add AWS Node Termination Handler:
Scale different workloads independently:
Control scale-up behavior:
Prevent runaway scaling:
View Auto Scaling Group metrics in CloudWatch:
GroupDesiredCapacityGroupInServiceInstancesGroupPendingInstancesGroupTerminatingInstancesImport Cluster Autoscaler dashboard:
Dashboard ID: 3831
Check pending pods:
Check Cluster Autoscaler logs:
Common issues:
max-nodes-total)Check node utilization:
Check for blocking conditions:
Common causes:
skip-nodes-with-system-pods: false)Check service account:
Verify IAM role:
Update IAM policy if needed (see IAM & IRSA)
Always tag Auto Scaling Groups:
Configure appropriate min/max for each node group:
Protect critical workloads during scale-down:
Track scaling decisions in Grafana
Set alerts for scale failures
Periodically test scale-up and scale-down:
Watch for proper node addition/removal