Kubernetes deployment is currently available for ASR (Speech-to-Text) only. For TTS deployments, use Docker.
Ensure you’ve completed all prerequisites before starting.
Create a values.yaml file:
Replace placeholder values with credentials provided by Smallest.ai support.
Monitor the deployment:
Model downloads are cached when using shared storage (EFS). Subsequent starts complete in under a minute.
All pods should show Running status with the following services available:
Port forward and send a health check:
Enable automatic scaling based on real-time inference load:
This deploys HorizontalPodAutoscalers that scale based on active requests:
asr_active_requests metric on port 9090The TARGETS column shows current/target. When current exceeds target, pods scale up.
Autoscaling requires the Prometheus stack. It’s included as a dependency and enabled by default.
For detailed troubleshooting, see Troubleshooting Guide.