Model Storage

Overview

AI models for Lightning ASR are large files (20-30 GB) that significantly impact startup time. This guide covers strategies for efficient model storage and caching to minimize download time and enable fast scaling.

Storage Strategies

Strategy 1: Shared EFS Volume (Recommended)

Best for production with autoscaling.

Advantages:

Models downloaded once, shared across all pods
New pods start in 30-60 seconds
No storage duplication
Enables horizontal scaling

Implementation:

values.yaml

1 models:
2   asrModelUrl: "https://example.com/model.bin"
3   volumes:
4     aws:
5       efs:
6         enabled: true
7         fileSystemId: "fs-0123456789abcdef"
8         namePrefix: "models"
9 
10 scaling:
11   auto:
12     enabled: true
13     lightningAsr:
14       hpa:
15         enabled: true
16         maxReplicas: 10

See EFS Configuration for setup.

Strategy 2: Container Image with Baked Model

Best for fixed deployments with infrequent updates.

Advantages:

Fastest startup (model pre-loaded)
No external download required
Works offline

Disadvantages:

Very large container images (20+ GB)
Slow image pulls
Updates require new image build

Implementation:

Build custom image:

Dockerfile

1 FROM quay.io/smallestinc/lightning-asr:latest
2 
3 RUN wget -O /app/models/model.bin https://example.com/model.bin
4 
5 ENV MODEL_PATH=/app/models/model.bin

Build and push:

$ docker build -t myregistry/lightning-asr:with-model .
$ docker push myregistry/lightning-asr:with-model

Update values:

values.yaml

1 lightningAsr:
2   image: "myregistry/lightning-asr:with-model"
3 
4 models:
5   asrModelUrl: ""

Strategy 3: EmptyDir Volume

Best for development/testing only.

Advantages:

Simple configuration
No external storage required

Disadvantages:

Model downloaded on every pod start
Cannot scale beyond single node
Data lost on pod restart

Implementation:

values.yaml

1 models:
2   asrModelUrl: "https://example.com/model.bin"
3   volumes:
4     aws:
5       efs:
6         enabled: false
7 
8 lightningAsr:
9   persistence:
10     enabled: false

Each pod downloads the model independently.

Strategy 4: Init Container with S3

Best for AWS deployments without EFS.

Advantages:

Fast downloads from S3 within AWS
No EFS cost
Works with ReadWriteOnce volumes

Disadvantages:

Each pod downloads independently
Slower scaling than EFS
Requires S3 bucket

Implementation:

Upload model to S3:

$ aws s3 cp model.bin s3://my-bucket/models/model.bin

Create custom deployment with init container:

1 initContainers:
2   - name: download-model
3     image: amazon/aws-cli
4     command:
5       - sh
6       - -c
7       - |
8         if [ ! -f /models/model.bin ]; then
9           aws s3 cp s3://my-bucket/models/model.bin /models/model.bin
10         fi
11     volumeMounts:
12       - name: model-cache
13         mountPath: /models
14     env:
15       - name: AWS_REGION
16         value: us-east-1

Model Download Optimization

Parallel Downloads

For multiple model files, download in parallel:

1 lightningAsr:
2   env:
3     - name: MODEL_DOWNLOAD_WORKERS
4       value: "4"

Resume on Failure

Enable download resume for interrupted downloads:

1 lightningAsr:
2   env:
3     - name: MODEL_DOWNLOAD_RESUME
4       value: "true"

CDN Acceleration

Use CloudFront for faster downloads:

1 models:
2   asrModelUrl: "https://d111111abcdef8.cloudfront.net/model.bin"

Model Versioning

Multiple Models

Support multiple model versions:

values.yaml

1 models:
2   asrModelUrl: "https://example.com/model-v1.bin"
3   
4 lightningAsr:
5   env:
6     - name: MODEL_VERSION
7       value: "v1"
8     - name: MODEL_CACHE_DIR
9       value: "/app/models/v1"

Blue-Green Deployments

Deploy new model version alongside old:

$ helm install smallest-v2 smallest-self-host/smallest-self-host \
>   -f values.yaml \
>   --set models.asrModelUrl="https://example.com/model-v2.bin" \
>   --set lightningAsr.namePrefix="lightning-asr-v2" \
>   --namespace smallest

Test v2, then switch traffic:

1 apiServer:
2   env:
3     - name: LIGHTNING_ASR_BASE_URL
4       value: "http://lightning-asr-v2:2269"

Storage Quotas

Limit Model Cache Size

Prevent unbounded growth:

1 lightningAsr:
2   persistence:
3     enabled: true
4     size: 100Gi
5 
6   env:
7     - name: MODEL_CACHE_MAX_SIZE
8       value: "50GB"
9     - name: MODEL_CACHE_EVICTION
10       value: "lru"

Monitor Storage Usage

Check PVC usage:

$ kubectl get pvc -n smallest
$ kubectl describe pvc models-aws-efs-pvc -n smallest

Check actual usage in pod:

$ kubectl exec -it <lightning-asr-pod> -n smallest -- df -h /app/models

Pre-warming Models

Pre-download Before Scaling

Download models before peak traffic:

$ kubectl create job model-preload \
>   --image=quay.io/smallestinc/lightning-asr:latest \
>   --namespace=smallest \
>   -- sh -c "wget -O /app/models/model.bin $MODEL_URL && exit 0"

Scheduled Pre-warming

Use CronJob for regular pre-warming:

1 apiVersion: batch/v1
2 kind: CronJob
3 metadata:
4   name: model-preload
5   namespace: smallest
6 spec:
7   schedule: "0 8 * * *"
8   jobTemplate:
9     spec:
10       template:
11         spec:
12           containers:
13           - name: preload
14             image: quay.io/smallestinc/lightning-asr:latest
15             command:
16               - sh
17               - -c
18               - wget -O /app/models/model.bin $MODEL_URL || true
19             env:
20               - name: MODEL_URL
21                 value: "https://example.com/model.bin"
22             volumeMounts:
23               - name: models
24                 mountPath: /app/models
25           volumes:
26             - name: models
27               persistentVolumeClaim:
28                 claimName: models-aws-efs-pvc
29           restartPolicy: OnFailure

Model Integrity

Checksum Validation

Verify model integrity after download:

1 lightningAsr:
2   env:
3     - name: MODEL_CHECKSUM
4       value: "sha256:abc123..."
5     - name: MODEL_VALIDATE
6       value: "true"

Automatic Retry

Retry failed downloads:

1 lightningAsr:
2   env:
3     - name: MODEL_DOWNLOAD_RETRIES
4       value: "3"
5     - name: MODEL_DOWNLOAD_TIMEOUT
6       value: "3600"

Performance Comparison

Strategy	First Start	Subsequent Starts	Scaling Speed	Cost
EFS Shared	5-10 min	30-60 sec	Fast	Medium
Baked Image	3-5 min	3-5 min	Medium	Low
EmptyDir	5-10 min	5-10 min	Slow	Low
S3 Init	2-5 min	2-5 min	Medium	Low

Best Practices

Use EFS for Production

Always use shared storage (EFS) for production deployments with autoscaling.

The cost savings from reduced download time and faster scaling far outweigh EFS costs.

Monitor Download Progress

Watch logs during first deployment:

$ kubectl logs -f -l app=lightning-asr -n smallest

Look for download progress indicators.

Set Resource Limits

Ensure sufficient storage for models:

1 models:
2   volumes:
3     aws:
4       efs:
5         enabled: true
6 
7 lightningAsr:
8   resources:
9     ephemeral-storage: "50Gi"

Test Model Updates

Test new models in separate deployment before updating production:

$ helm install test smallest-self-host/smallest-self-host \
>   --set models.asrModelUrl="new-model-url" \
>   --namespace smallest-test

Troubleshooting

Model Download Stalled

Check pod logs:

$ kubectl logs -l app=lightning-asr -n smallest --tail=100

Check network connectivity:

$ kubectl exec -it <pod> -n smallest -- wget --spider $MODEL_URL

Insufficient Storage

Check available space:

$ kubectl exec -it <pod> -n smallest -- df -h

Increase PVC size:

1 models:
2   volumes:
3     aws:
4       efs:
5         enabled: true
6 
7 lightningAsr:
8   persistence:
9     size: 200Gi

Model Corruption

Delete cached model and restart:

$ kubectl exec -it <pod> -n smallest -- rm -rf /app/models/*
$ kubectl delete pod <pod> -n smallest

What’s Next?

EFS Configuration

Set up EFS for shared model storage

Redis Persistence

Configure Redis data persistence

HPA Configuration

Enable autoscaling with fast pod startup

1	models:
2	asrModelUrl: "https://example.com/model.bin"
3	volumes:
4	aws:
5	efs:
6	enabled: true
7	fileSystemId: "fs-0123456789abcdef"
8	namePrefix: "models"
9
10	scaling:
11	auto:
12	enabled: true
13	lightningAsr:
14	hpa:
15	enabled: true
16	maxReplicas: 10

1	FROM quay.io/smallestinc/lightning-asr:latest
2
3	RUN wget -O /app/models/model.bin https://example.com/model.bin
4
5	ENV MODEL_PATH=/app/models/model.bin

$	docker build -t myregistry/lightning-asr:with-model .
$	docker push myregistry/lightning-asr:with-model

1	lightningAsr:
2	image: "myregistry/lightning-asr:with-model"
3
4	models:
5	asrModelUrl: ""

1	initContainers:
2	- name: download-model
3	image: amazon/aws-cli
4	command:
5	- sh
6	- -c
7	- \|
8	if [ ! -f /models/model.bin ]; then
9	aws s3 cp s3://my-bucket/models/model.bin /models/model.bin
10	fi
11	volumeMounts:
12	- name: model-cache
13	mountPath: /models
14	env:
15	- name: AWS_REGION
16	value: us-east-1

1	lightningAsr:
2	env:
3	- name: MODEL_DOWNLOAD_WORKERS
4	value: "4"

1	lightningAsr:
2	env:
3	- name: MODEL_DOWNLOAD_RESUME
4	value: "true"

1	models:
2	asrModelUrl: "https://d111111abcdef8.cloudfront.net/model.bin"

1	models:
2	asrModelUrl: "https://example.com/model-v1.bin"
3
4	lightningAsr:
5	env:
6	- name: MODEL_VERSION
7	value: "v1"
8	- name: MODEL_CACHE_DIR
9	value: "/app/models/v1"

$	helm install smallest-v2 smallest-self-host/smallest-self-host \
>	-f values.yaml \
>	--set models.asrModelUrl="https://example.com/model-v2.bin" \
>	--set lightningAsr.namePrefix="lightning-asr-v2" \
>	--namespace smallest

1	apiServer:
2	env:
3	- name: LIGHTNING_ASR_BASE_URL
4	value: "http://lightning-asr-v2:2269"

1	lightningAsr:
2	persistence:
3	enabled: true
4	size: 100Gi
5
6	env:
7	- name: MODEL_CACHE_MAX_SIZE
8	value: "50GB"
9	- name: MODEL_CACHE_EVICTION
10	value: "lru"

$	kubectl get pvc -n smallest
$	kubectl describe pvc models-aws-efs-pvc -n smallest

$	kubectl create job model-preload \
>	--image=quay.io/smallestinc/lightning-asr:latest \
>	--namespace=smallest \
>	-- sh -c "wget -O /app/models/model.bin $MODEL_URL && exit 0"

1	apiVersion: batch/v1
2	kind: CronJob
3	metadata:
4	name: model-preload
5	namespace: smallest
6	spec:
7	schedule: "0 8 * * *"
8	jobTemplate:
9	spec:
10	template:
11	spec:
12	containers:
13	- name: preload
14	image: quay.io/smallestinc/lightning-asr:latest
15	command:
16	- sh
17	- -c
18	- wget -O /app/models/model.bin $MODEL_URL \|\| true
19	env:
20	- name: MODEL_URL
21	value: "https://example.com/model.bin"
22	volumeMounts:
23	- name: models
24	mountPath: /app/models
25	volumes:
26	- name: models
27	persistentVolumeClaim:
28	claimName: models-aws-efs-pvc
29	restartPolicy: OnFailure

1	lightningAsr:
2	env:
3	- name: MODEL_CHECKSUM
4	value: "sha256:abc123..."
5	- name: MODEL_VALIDATE
6	value: "true"

1	lightningAsr:
2	env:
3	- name: MODEL_DOWNLOAD_RETRIES
4	value: "3"
5	- name: MODEL_DOWNLOAD_TIMEOUT
6	value: "3600"

$	helm install test smallest-self-host/smallest-self-host \
>	--set models.asrModelUrl="new-model-url" \
>	--namespace smallest-test

$	kubectl exec -it <pod> -n smallest -- rm -rf /app/models/*
$	kubectl delete pod <pod> -n smallest