***
title: Quick Start
description: Deploy Smallest Self-Host on Kubernetes with Helm
--------------------------------------------------------------
Kubernetes deployment is currently available for **ASR (Speech-to-Text)** only. For TTS deployments, use [Docker](/waves/self-host/docker-setup/tts-deployment/quick-start).
Ensure you've completed all [prerequisites](/waves/self-host/kubernetes-setup/prerequisites) before starting.
## Add Helm Repository
```bash
helm repo add smallest-self-host https://smallest-inc.github.io/smallest-self-host
helm repo update
```
## Create Namespace
```bash
kubectl create namespace smallest
kubectl config set-context --current --namespace=smallest
```
## Configure Values
Create a `values.yaml` file:
```yaml values.yaml
global:
licenseKey: "your-license-key-here"
imageCredentials:
create: true
registry: quay.io
username: "your-registry-username"
password: "your-registry-password"
email: "your-email@example.com"
models:
asrModelUrl: "your-model-url-here"
scaling:
replicas:
lightningAsr: 1
licenseProxy: 1
lightningAsr:
nodeSelector:
tolerations:
redis:
enabled: true
auth:
enabled: true
```
Replace placeholder values with credentials provided by Smallest.ai support.
## Install
```bash
helm install smallest-self-host smallest-self-host/smallest-self-host \
-f values.yaml \
--namespace smallest
```
Monitor the deployment:
```bash
kubectl get pods -w
```
| Component | Startup Time | Ready Indicator |
| ------------- | ------------ | ------------------------------------------- |
| Redis | \~30s | `1/1 Running` |
| License Proxy | \~1m | `1/1 Running` |
| Lightning ASR | 2-10m | `1/1 Running` (model download on first run) |
| API Server | \~30s | `1/1 Running` |
Model downloads are cached when using shared storage (EFS). Subsequent starts complete in under a minute.
## Verify Installation
```bash
kubectl get pods,svc
```
All pods should show `Running` status with the following services available:
| Service | Port | Description |
| ---------------------- | ---- | --------------------- |
| api-server | 7100 | REST API endpoint |
| lightning-asr-internal | 2269 | ASR inference service |
| license-proxy | 3369 | License validation |
| redis-master | 6379 | Request queue |
## Test the API
Port forward and send a health check:
```bash
kubectl port-forward svc/api-server 7100:7100
```
```bash
curl http://localhost:7100/health
```
## Autoscaling
Enable automatic scaling based on real-time inference load:
```yaml values.yaml
scaling:
auto:
enabled: true
```
This deploys HorizontalPodAutoscalers that scale based on active requests:
| Component | Metric | Default Target | Behavior |
| ------------- | ----------------------------- | -------------- | -------------------------------------------------- |
| Lightning ASR | `asr_active_requests` | 4 per pod | Scales GPU workers based on inference queue depth |
| API Server | `lightning_asr_replica_count` | 2:1 ratio | Maintains API capacity proportional to ASR workers |
### How It Works
1. **Lightning ASR** exposes `asr_active_requests` metric on port 9090
2. **Prometheus** scrapes this metric via ServiceMonitor
3. **Prometheus Adapter** makes it available to the Kubernetes metrics API
4. **HPA** scales pods when average requests per pod exceeds target
### Configuration
```yaml values.yaml
scaling:
auto:
enabled: true
lightningAsr:
hpa:
minReplicas: 1
maxReplicas: 10
targetActiveRequests: 4
```
### Verify Autoscaling
```bash
kubectl get hpa
```
```
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS
lightning-asr Deployment/lightning-asr 0/4 1 10 1
api-server Deployment/api-server 1/2 1 10 1
```
The `TARGETS` column shows `current/target`. When current exceeds target, pods scale up.
Autoscaling requires the Prometheus stack. It's included as a dependency and enabled by default.
## Helm Operations
```bash Upgrade
helm upgrade smallest-self-host smallest-self-host/smallest-self-host \
-f values.yaml -n smallest
```
```bash Rollback
helm rollback smallest-self-host -n smallest
```
```bash Uninstall
helm uninstall smallest-self-host -n smallest
```
```bash View Config
helm get values smallest-self-host -n smallest
```
## Troubleshooting
| Issue | Cause | Resolution |
| ------------------- | ------------------------------------------- | --------------------------------------------------------- |
| Pods `Pending` | Insufficient resources or missing GPU nodes | Check `kubectl describe pod ` for scheduling errors |
| `ImagePullBackOff` | Invalid registry credentials | Verify `imageCredentials` in values.yaml |
| `CrashLoopBackOff` | Invalid license or insufficient memory | Check logs with `kubectl logs --previous` |
| Slow model download | Large model size (\~20GB) | Use shared storage (EFS) for caching |
For detailed troubleshooting, see [Troubleshooting Guide](/waves/self-host/kubernetes-setup/troubleshooting).
## Next Steps
EKS-specific configuration
Shared storage for faster cold starts
Fine-tune scaling behavior and thresholds
Grafana dashboards and alerting