Model Storage

View as MarkdownOpen in Claude

Overview

AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.

Storage Strategies

Best for production with autoscaling.

Advantages:

  • Models downloaded once, shared across all pods
  • New pods start in 30-60 seconds
  • No storage duplication
  • Enables horizontal scaling

Implementation:

values.yaml
1models:
2 asrModelUrl: "https://example.com/model.bin"
3 volumes:
4 aws:
5 efs:
6 enabled: true
7 fileSystemId: "fs-0123456789abcdef"
8 namePrefix: "models"
9
10scaling:
11 auto:
12 enabled: true
13 lightningAsr:
14 hpa:
15 enabled: true
16 maxReplicas: 10

See EFS Configuration for setup.

Strategy 2: Container Image with Baked Model

Best for fixed deployments with infrequent updates.

Advantages:

  • Fastest startup (model pre-loaded)
  • No external download required
  • Works offline

Disadvantages:

  • Very large container images (20+ GB)
  • Slow image pulls
  • Updates require new image build

Implementation:

Build custom image:

Dockerfile
1FROM quay.io/smallestinc/lightning-asr:latest
2
3RUN wget -O /app/models/model.bin https://example.com/model.bin
4
5ENV MODEL_PATH=/app/models/model.bin

Build and push:

$docker build -t myregistry/lightning-asr:with-model .
$docker push myregistry/lightning-asr:with-model

Update values:

values.yaml
1lightningAsr:
2 image: "myregistry/lightning-asr:with-model"
3
4models:
5 asrModelUrl: ""

Strategy 3: EmptyDir Volume

Best for development/testing only.

Advantages:

  • Simple configuration
  • No external storage required

Disadvantages:

  • Model downloaded on every pod start
  • Cannot scale beyond single node
  • Data lost on pod restart

Implementation:

values.yaml
1models:
2 asrModelUrl: "https://example.com/model.bin"
3 volumes:
4 aws:
5 efs:
6 enabled: false
7
8lightningAsr:
9 persistence:
10 enabled: false

Each pod downloads the model independently.

Strategy 4: Init Container with S3

Best for AWS deployments without EFS.

Advantages:

  • Fast downloads from S3 within AWS
  • No EFS cost
  • Works with ReadWriteOnce volumes

Disadvantages:

  • Each pod downloads independently
  • Slower scaling than EFS
  • Requires S3 bucket

Implementation:

Upload model to S3:

$aws s3 cp model.bin s3://my-bucket/models/model.bin

Create custom deployment with init container:

1initContainers:
2 - name: download-model
3 image: amazon/aws-cli
4 command:
5 - sh
6 - -c
7 - |
8 if [ ! -f /models/model.bin ]; then
9 aws s3 cp s3://my-bucket/models/model.bin /models/model.bin
10 fi
11 volumeMounts:
12 - name: model-cache
13 mountPath: /models
14 env:
15 - name: AWS_REGION
16 value: us-east-1

Model Download Optimization

Parallel Downloads

For multiple model files, download in parallel:

1lightningAsr:
2 env:
3 - name: MODEL_DOWNLOAD_WORKERS
4 value: "4"

Resume on Failure

Enable download resume for interrupted downloads:

1lightningAsr:
2 env:
3 - name: MODEL_DOWNLOAD_RESUME
4 value: "true"

CDN Acceleration

Use CloudFront for faster downloads:

1models:
2 asrModelUrl: "https://d111111abcdef8.cloudfront.net/model.bin"

Model Versioning

Multiple Models

Support multiple model versions:

values.yaml
1models:
2 asrModelUrl: "https://example.com/model-v1.bin"
3
4lightningAsr:
5 env:
6 - name: MODEL_VERSION
7 value: "v1"
8 - name: MODEL_CACHE_DIR
9 value: "/app/models/v1"

Blue-Green Deployments

Deploy new model version alongside old:

$helm install smallest-v2 smallest-self-host/smallest-self-host \
> -f values.yaml \
> --set models.asrModelUrl="https://example.com/model-v2.bin" \
> --set lightningAsr.namePrefix="lightning-asr-v2" \
> --namespace smallest

Test v2, then switch traffic:

1apiServer:
2 env:
3 - name: LIGHTNING_ASR_BASE_URL
4 value: "http://lightning-asr-v2:2269"

Storage Quotas

Limit Model Cache Size

Prevent unbounded growth:

1lightningAsr:
2 persistence:
3 enabled: true
4 size: 100Gi
5
6 env:
7 - name: MODEL_CACHE_MAX_SIZE
8 value: "50GB"
9 - name: MODEL_CACHE_EVICTION
10 value: "lru"

Monitor Storage Usage

Check PVC usage:

$kubectl get pvc -n smallest
$kubectl describe pvc models-aws-efs-pvc -n smallest

Check actual usage in pod:

$kubectl exec -it <lightning-asr-pod> -n smallest -- df -h /app/models

Pre-warming Models

Pre-download Before Scaling

Download models before peak traffic:

$kubectl create job model-preload \
> --image=quay.io/smallestinc/lightning-asr:latest \
> --namespace=smallest \
> -- sh -c "wget -O /app/models/model.bin $MODEL_URL && exit 0"

Scheduled Pre-warming

Use CronJob for regular pre-warming:

1apiVersion: batch/v1
2kind: CronJob
3metadata:
4 name: model-preload
5 namespace: smallest
6spec:
7 schedule: "0 8 * * *"
8 jobTemplate:
9 spec:
10 template:
11 spec:
12 containers:
13 - name: preload
14 image: quay.io/smallestinc/lightning-asr:latest
15 command:
16 - sh
17 - -c
18 - wget -O /app/models/model.bin $MODEL_URL || true
19 env:
20 - name: MODEL_URL
21 value: "https://example.com/model.bin"
22 volumeMounts:
23 - name: models
24 mountPath: /app/models
25 volumes:
26 - name: models
27 persistentVolumeClaim:
28 claimName: models-aws-efs-pvc
29 restartPolicy: OnFailure

Model Integrity

Checksum Validation

Verify model integrity after download:

1lightningAsr:
2 env:
3 - name: MODEL_CHECKSUM
4 value: "sha256:abc123..."
5 - name: MODEL_VALIDATE
6 value: "true"

Automatic Retry

Retry failed downloads:

1lightningAsr:
2 env:
3 - name: MODEL_DOWNLOAD_RETRIES
4 value: "3"
5 - name: MODEL_DOWNLOAD_TIMEOUT
6 value: "3600"

Performance Comparison

StrategyFirst StartSubsequent StartsScaling SpeedCost
EFS Shared5-10 min30-60 secFastMedium
Baked Image3-5 min3-5 minMediumLow
EmptyDir5-10 min5-10 minSlowLow
S3 Init2-5 min2-5 minMediumLow

Best Practices

Always use shared storage (EFS) for production deployments with autoscaling.

The cost savings from reduced download time and faster scaling far outweigh EFS costs.

Watch logs during first deployment:

$kubectl logs -f -l app=lightning-asr -n smallest

Look for download progress indicators.

Ensure sufficient storage for models:

1models:
2 volumes:
3 aws:
4 efs:
5 enabled: true
6
7lightningAsr:
8 resources:
9 ephemeral-storage: "50Gi"

Test new models in separate deployment before updating production:

$helm install test smallest-self-host/smallest-self-host \
> --set models.asrModelUrl="new-model-url" \
> --namespace smallest-test

Troubleshooting

Model Download Stalled

Check pod logs:

$kubectl logs -l app=lightning-asr -n smallest --tail=100

Check network connectivity:

$kubectl exec -it <pod> -n smallest -- wget --spider $MODEL_URL

Insufficient Storage

Check available space:

$kubectl exec -it <pod> -n smallest -- df -h

Increase PVC size:

1models:
2 volumes:
3 aws:
4 efs:
5 enabled: true
6
7lightningAsr:
8 persistence:
9 size: 200Gi

Model Corruption

Delete cached model and restart:

$kubectl exec -it <pod> -n smallest -- rm -rf /app/models/*
$kubectl delete pod <pod> -n smallest

What’s Next?