AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.
Best for production with autoscaling.
Advantages:
Implementation:
See EFS Configuration for setup.
Best for fixed deployments with infrequent updates.
Advantages:
Disadvantages:
Implementation:
Build custom image:
Build and push:
Update values:
Best for development/testing only.
Advantages:
Disadvantages:
Implementation:
Each pod downloads the model independently.
Best for AWS deployments without EFS.
Advantages:
Disadvantages:
Implementation:
Upload model to S3:
Create custom deployment with init container:
For multiple model files, download in parallel:
Enable download resume for interrupted downloads:
Use CloudFront for faster downloads:
Support multiple model versions:
Deploy new model version alongside old:
Test v2, then switch traffic:
Prevent unbounded growth:
Check PVC usage:
Check actual usage in pod:
Download models before peak traffic:
Use CronJob for regular pre-warming:
Verify model integrity after download:
Retry failed downloads:
Always use shared storage (EFS) for production deployments with autoscaling.
The cost savings from reduced download time and faster scaling far outweigh EFS costs.
Watch logs during first deployment:
Look for download progress indicators.
Ensure sufficient storage for models:
Test new models in separate deployment before updating production:
Check pod logs:
Check network connectivity:
Check available space:
Increase PVC size:
Delete cached model and restart: