Model Storage
Overview
AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.
Storage Strategies
Strategy 1: Shared EFS Volume (Recommended)
Best for production with autoscaling.
Advantages:
- Models downloaded once, shared across all pods
- New pods start in 30-60 seconds
- No storage duplication
- Enables horizontal scaling
Implementation:
See EFS Configuration for setup.
Strategy 2: Container Image with Baked Model
Best for fixed deployments with infrequent updates.
Advantages:
- Fastest startup (model pre-loaded)
- No external download required
- Works offline
Disadvantages:
- Very large container images (20+ GB)
- Slow image pulls
- Updates require new image build
Implementation:
Build custom image:
Build and push:
Update values:
Strategy 3: EmptyDir Volume
Best for development/testing only.
Advantages:
- Simple configuration
- No external storage required
Disadvantages:
- Model downloaded on every pod start
- Cannot scale beyond single node
- Data lost on pod restart
Implementation:
Each pod downloads the model independently.
Strategy 4: Init Container with S3
Best for AWS deployments without EFS.
Advantages:
- Fast downloads from S3 within AWS
- No EFS cost
- Works with ReadWriteOnce volumes
Disadvantages:
- Each pod downloads independently
- Slower scaling than EFS
- Requires S3 bucket
Implementation:
Upload model to S3:
Create custom deployment with init container:
Model Download Optimization
Parallel Downloads
For multiple model files, download in parallel:
Resume on Failure
Enable download resume for interrupted downloads:
CDN Acceleration
Use CloudFront for faster downloads:
Model Versioning
Multiple Models
Support multiple model versions:
Blue-Green Deployments
Deploy new model version alongside old:
Test v2, then switch traffic:
Storage Quotas
Limit Model Cache Size
Prevent unbounded growth:
Monitor Storage Usage
Check PVC usage:
Check actual usage in pod:
Pre-warming Models
Pre-download Before Scaling
Download models before peak traffic:
Scheduled Pre-warming
Use CronJob for regular pre-warming:
Model Integrity
Checksum Validation
Verify model integrity after download:
Automatic Retry
Retry failed downloads:
Performance Comparison
Best Practices
Use EFS for Production
Always use shared storage (EFS) for production deployments with autoscaling.
The cost savings from reduced download time and faster scaling far outweigh EFS costs.
Monitor Download Progress
Watch logs during first deployment:
Look for download progress indicators.
Set Resource Limits
Ensure sufficient storage for models:
Test Model Updates
Test new models in separate deployment before updating production:
Troubleshooting
Model Download Stalled
Check pod logs:
Check network connectivity:
Insufficient Storage
Check available space:
Increase PVC size:
Model Corruption
Delete cached model and restart:

