*** title: Quick Start description: Deploy Smallest Self-Host on Kubernetes with Helm --------------------- For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/waves/v-4-0-0/self-host/kubernetes-setup/llms.txt. For full documentation content, see https://docs.smallest.ai/waves/v-4-0-0/self-host/kubernetes-setup/llms-full.txt. Kubernetes deployment is currently available for **ASR (Speech-to-Text)** only. For TTS deployments, use [Docker](/waves/self-host/docker-setup/tts-deployment/quick-start). Ensure you've completed all [prerequisites](/waves/self-host/kubernetes-setup/prerequisites/hardware-requirements) before starting. ## Add Helm Repository ```bash helm repo add smallest-self-host https://smallest-inc.github.io/smallest-self-host helm repo update ``` ## Create Namespace ```bash kubectl create namespace smallest kubectl config set-context --current --namespace=smallest ``` ## Configure Values Create a `values.yaml` file: ```yaml values.yaml global: licenseKey: "your-license-key-here" imageCredentials: create: true registry: quay.io username: "your-registry-username" password: "your-registry-password" email: "your-email@example.com" models: asrModelUrl: "your-model-url-here" scaling: replicas: lightningAsr: 1 licenseProxy: 1 lightningAsr: nodeSelector: tolerations: redis: enabled: true auth: enabled: true ``` Replace placeholder values with credentials provided by Smallest.ai support. ## Install ```bash helm install smallest-self-host smallest-self-host/smallest-self-host \ -f values.yaml \ --namespace smallest ``` Monitor the deployment: ```bash kubectl get pods -w ```

Component	Startup Time	Ready Indicator
Redis	\~30s	`1/1 Running`
License Proxy	\~1m	`1/1 Running`
Lightning ASR	2-10m	`1/1 Running` (model download on first run)
API Server	\~30s	`1/1 Running`

Model downloads are cached when using shared storage (EFS). Subsequent starts complete in under a minute. ## Verify Installation ```bash kubectl get pods,svc ``` All pods should show `Running` status with the following services available:

Service	Port	Description
api-server	7100	REST API endpoint
lightning-asr-internal	2269	ASR inference service
license-proxy	3369	License validation
redis-master	6379	Request queue

## Test the API Port forward and send a health check: ```bash kubectl port-forward svc/api-server 7100:7100 ``` ```bash curl http://localhost:7100/health ``` ## Autoscaling Enable automatic scaling based on real-time inference load: ```yaml values.yaml scaling: auto: enabled: true ``` This deploys HorizontalPodAutoscalers that scale based on active requests:

Component	Metric	Default Target	Behavior
Lightning ASR	`asr_active_requests`	4 per pod	Scales GPU workers based on inference queue depth
API Server	`lightning_asr_replica_count`	2:1 ratio	Maintains API capacity proportional to ASR workers

### How It Works 1. **Lightning ASR** exposes `asr_active_requests` metric on port 9090 2. **Prometheus** scrapes this metric via ServiceMonitor 3. **Prometheus Adapter** makes it available to the Kubernetes metrics API 4. **HPA** scales pods when average requests per pod exceeds target ### Configuration ```yaml values.yaml scaling: auto: enabled: true lightningAsr: hpa: minReplicas: 1 maxReplicas: 10 targetActiveRequests: 4 ``` ### Verify Autoscaling ```bash kubectl get hpa ``` ``` NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS lightning-asr Deployment/lightning-asr 0/4 1 10 1 api-server Deployment/api-server 1/2 1 10 1 ``` The `TARGETS` column shows `current/target`. When current exceeds target, pods scale up. Autoscaling requires the Prometheus stack. It's included as a dependency and enabled by default. ## Helm Operations ```bash Upgrade helm upgrade smallest-self-host smallest-self-host/smallest-self-host \ -f values.yaml -n smallest ``` ```bash Rollback helm rollback smallest-self-host -n smallest ``` ```bash Uninstall helm uninstall smallest-self-host -n smallest ``` ```bash View Config helm get values smallest-self-host -n smallest ``` ## Troubleshooting

Issue	Cause	Resolution
Pods `Pending`	Insufficient resources or missing GPU nodes	Check `kubectl describe pod` for scheduling errors
`ImagePullBackOff`	Invalid registry credentials	Verify `imageCredentials` in values.yaml
`CrashLoopBackOff`	Invalid license or insufficient memory	Check logs with `kubectl logs --previous`
Slow model download	Large model size (~20GB)	Use shared storage (EFS) for caching

For detailed troubleshooting, see [Troubleshooting Guide](/waves/self-host/kubernetes-setup/k8s-troubleshooting). ## Next Steps EKS-specific configuration Shared storage for faster cold starts Fine-tune scaling behavior and thresholds Grafana dashboards and alerting