GPU Nodes Configuration
Overview
This guide covers advanced configuration and optimization for GPU nodes in AWS EKS, including node taints, tolerations, labels, and performance tuning.
GPU Node Configuration
Node Labels
Labels help Kubernetes schedule pods on the correct nodes.
Automatic Labels
EKS automatically adds these labels to GPU nodes:
Custom Labels
Add custom labels when creating node groups:
Or add labels to existing nodes:
Node Taints
Taints prevent non-GPU workloads from running on expensive GPU nodes.
Add Taints During Node Group Creation
Add Taints to Existing Nodes
Tolerations in Pod Specs
Pods must have matching tolerations to run on tainted nodes:
Node Selectors
Using Instance Type
Most common approach for AWS:
Using Custom Labels
Multiple Selectors
Combine multiple selectors for precise placement:
NVIDIA Device Plugin
The NVIDIA device plugin makes GPUs available to Kubernetes pods.
Installation via GPU Operator
The recommended approach is using the NVIDIA GPU Operator (included in the Smallest Helm chart):
Manual Installation
Alternatively, install the device plugin directly:
Verify Device Plugin
Check GPU Availability
GPU Resource Limits
Request GPU in Pod Spec
The Lightning ASR deployment automatically requests GPU:
Multiple GPUs
For pods that need multiple GPUs:
Smallest Self-Host Lightning ASR is optimized for single GPU per pod. Use multiple pods for scaling rather than multiple GPUs per pod.
GPU Performance Optimization
Enable GPU Persistence Mode
GPU persistence mode keeps the NVIDIA driver loaded, reducing initialization time:
Use DaemonSet for GPU Configuration
Create a DaemonSet to configure GPU settings on all GPU nodes:
Apply:
Monitor GPU Utilization
Deploy NVIDIA DCGM exporter for Prometheus metrics:
Multi-GPU Strategies
Strategy 1: One Pod per GPU (Recommended)
Scale horizontally with one pod per GPU:
Strategy 2: GPU Sharing (Time-Slicing)
Allow multiple pods to share a single GPU (reduces isolation):
GPU sharing reduces isolation and can impact performance. Use only if cost is more critical than performance.
Strategy 3: Multi-Instance GPU (MIG)
For A100 and A30 GPUs, use MIG to partition GPUs:
Configure pods to use MIG instances:
Node Auto-Scaling
Configure Auto-Scaling Groups
When creating node groups, enable auto-scaling:
Install Cluster Autoscaler
See Cluster Autoscaler for full setup.
Quick enable:
Run Cluster Autoscaler on CPU nodes, not GPU nodes, to avoid wasting GPU resources.
Cost Optimization
Use Spot Instances
Save up to 70% with Spot instances:
Handle Spot Interruptions
Add pod disruption budget:
Configure graceful shutdown:
Mixed On-Demand and Spot
Combine both for reliability:
Use pod affinity to prefer spot:
Monitoring GPU Nodes
View GPU Node Status
Check GPU Allocation
GPU Utilization
Using NVIDIA SMI:
Troubleshooting
GPU Not Detected
Check NVIDIA device plugin:
Verify driver on node:
Pods Not Scheduling on GPU Nodes
Check tolerations:
Check node selector:
Check node taints:
GPU Out of Memory
Check pod resource limits:
Monitor GPU memory:
Best Practices
Isolate GPU Workloads
Always use taints and tolerations to prevent non-GPU workloads from running on GPU nodes:
Set Resource Limits
Always specify GPU resource requests and limits:
Use Node Auto-Scaling
Configure auto-scaling to scale GPU nodes to zero during off-hours:
Monitor GPU Utilization
Use DCGM exporter and Grafana to monitor GPU metrics:
- GPU utilization
- Memory usage
- Temperature
- Power consumption
Test Spot Interruptions
Regularly test your application’s response to spot interruptions:

