Cluster Autoscaler
Overview
The Cluster Autoscaler automatically adjusts the number of nodes in your EKS cluster based on pending pods and resource utilization. When combined with HPA, it provides end-to-end autoscaling from application load to infrastructure capacity.
How It Works
Flow:
- HPA scales pods based on metrics
- New pods enter “Pending” state (insufficient resources)
- Cluster Autoscaler detects pending pods
- Adds nodes to Auto Scaling Group
- Pods scheduled on new nodes
- After scale-down period, removes underutilized nodes
Prerequisites
Installation
Using Helm Chart
The Smallest Self-Host chart includes Cluster Autoscaler as a dependency:
Deploy:
Standalone Installation
Install Cluster Autoscaler separately:
Configuration
Auto-Discovery
Auto-discover Auto Scaling Groups by cluster name:
Manual Configuration
Explicitly specify Auto Scaling Groups:
Scale-Down Configuration
Control when and how nodes are removed:
Parameters:
scale-down-delay-after-add: Wait time after adding node before considering scale-downscale-down-unneeded-time: How long node must be underutilized before removalscale-down-utilization-threshold: CPU/memory threshold (0.5 = 50%)max-graceful-termination-sec: Max time for pod eviction
Node Group Priorities
Scale specific node groups first:
Priorities:
- Higher number = higher priority
- Regex patterns match node group names
- Useful for preferring spot instances
Verify Installation
Check Cluster Autoscaler Pod
Check Logs
Look for:
Verify IAM Permissions
Should show no permission errors.
Testing Cluster Autoscaler
Trigger Scale-Up
Create pods that exceed cluster capacity:
Watch nodes:
Watch Cluster Autoscaler:
Expected behavior:
- Pods enter “Pending” state
- Cluster Autoscaler detects pending pods
- Logs show: “Scale-up: setting group size to X”
- New nodes appear in
kubectl get nodes - Pods transition to “Running”
Trigger Scale-Down
Delete test pods:
After scale-down-unneeded-time (default 10 minutes):
- Cluster Autoscaler marks underutilized nodes
- Drains pods gracefully
- Terminates EC2 instances
- Node count decreases
GPU Node Scaling
Configure GPU Node Groups
Tag GPU node groups for autoscaling:
Prevent Cluster Autoscaler on GPU Nodes
Run Cluster Autoscaler on CPU nodes to avoid wasting GPU:
Scale to Zero
Allow GPU nodes to scale to zero during off-hours:
Cluster Autoscaler will:
- Add GPU nodes when Lightning ASR pods are pending
- Remove GPU nodes when all GPU workloads complete
First startup after scale-to-zero takes longer (node provisioning + model download).
Spot Instance Integration
Mixed Instance Groups
Use spot and on-demand instances:
Configuration:
- Base capacity: 1 on-demand node always
- Additional capacity: 20% on-demand, 80% spot
- Multiple instance types increase spot availability
Handle Spot Interruptions
Configure Cluster Autoscaler for spot:
Add AWS Node Termination Handler:
Advanced Configuration
Multiple Node Groups
Scale different workloads independently:
Scale-Up Policies
Control scale-up behavior:
Resource Limits
Prevent runaway scaling:
Monitoring
CloudWatch Metrics
View Auto Scaling Group metrics in CloudWatch:
GroupDesiredCapacityGroupInServiceInstancesGroupPendingInstancesGroupTerminatingInstances
Kubernetes Events
Cluster Autoscaler Status
Grafana Dashboard
Import Cluster Autoscaler dashboard:
Dashboard ID: 3831
Troubleshooting
Nodes Not Scaling Up
Check pending pods:
Check Cluster Autoscaler logs:
Common issues:
- Max nodes reached (
max-nodes-total) - IAM permission denied
- Auto Scaling Group at max capacity
- Node group not tagged properly
Nodes Not Scaling Down
Check node utilization:
Check for blocking conditions:
Common causes:
- Pods without PodDisruptionBudget
- Pods with local storage
- System pods (unless
skip-nodes-with-system-pods: false) - Nodes below utilization threshold
Permission Errors
Check service account:
Verify IAM role:
Update IAM policy if needed (see IAM & IRSA)
Best Practices
Tag Node Groups Properly
Always tag Auto Scaling Groups:
Set Realistic Limits
Configure appropriate min/max for each node group:
Use PodDisruptionBudgets
Protect critical workloads during scale-down:
Monitor Scaling Events
Track scaling decisions in Grafana
Set alerts for scale failures
Test Regularly
Periodically test scale-up and scale-down:
Watch for proper node addition/removal

