Quick Start
Kubernetes deployment is currently available for ASR (Speech-to-Text) only. For TTS deployments, use Docker.
Ensure you’ve completed all prerequisites before starting.
Add Helm Repository
Create Namespace
Configure Values
Create a values.yaml file:
Replace placeholder values with credentials provided by Smallest.ai support.
Install
Monitor the deployment:
Model downloads are cached when using shared storage (EFS). Subsequent starts complete in under a minute.
Verify Installation
All pods should show Running status with the following services available:
Test the API
Port forward and send a health check:
Autoscaling
Enable automatic scaling based on real-time inference load:
This deploys HorizontalPodAutoscalers that scale based on active requests:
How It Works
- Lightning ASR exposes
asr_active_requestsmetric on port 9090 - Prometheus scrapes this metric via ServiceMonitor
- Prometheus Adapter makes it available to the Kubernetes metrics API
- HPA scales pods when average requests per pod exceeds target
Configuration
Verify Autoscaling
The TARGETS column shows current/target. When current exceeds target, pods scale up.
Autoscaling requires the Prometheus stack. It’s included as a dependency and enabled by default.
Helm Operations
Troubleshooting
For detailed troubleshooting, see Troubleshooting Guide.

