*** title: Docker Troubleshooting description: Debug common issues and optimize your STT Docker deployment --------------------- For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/waves/v-4-0-0/self-host/docker-setup/stt-deployment/llms.txt. For full documentation content, see https://docs.smallest.ai/waves/v-4-0-0/self-host/docker-setup/stt-deployment/llms-full.txt. ## Common Issues ### GPU Not Accessible **Symptoms:** * Error: `could not select device driver "nvidia"` * Error: `no NVIDIA GPU devices found` * Lightning ASR fails to start **Diagnosis:** ```bash docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi ``` ```bash sudo systemctl restart docker docker compose up -d ``` ```bash sudo apt-get remove nvidia-container-toolkit sudo apt-get update sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker ``` ```bash nvidia-smi ``` If driver version is below 470, update: ```bash sudo ubuntu-drivers autoinstall sudo reboot ``` Verify `/etc/docker/daemon.json` contains: ```json { "runtimes": { "nvidia": { "path": "nvidia-container-runtime", "runtimeArgs": [] } } } ``` Restart Docker after changes: ```bash sudo systemctl restart docker ``` ### License Validation Failed **Symptoms:** * Error: `License validation failed` * Error: `Invalid license key` * Services fail to start **Diagnosis:** Check license-proxy logs: ```bash docker compose logs license-proxy ``` Check `.env` file: ```bash cat .env | grep LICENSE_KEY ``` Ensure there are no: * Extra spaces * Quotes around the key * Line breaks Correct format: ```bash LICENSE_KEY=abc123def456 ``` Test connection to license server: ```bash curl -v https://api.smallest.ai ``` If this fails, check: * Firewall rules * Proxy settings * DNS resolution If the key appears correct and network is accessible, your license may be: * Expired * Revoked * Invalid Contact **[support@smallest.ai](mailto:support@smallest.ai)** with: * Your license key * License-proxy logs * Error messages ### Model Download Failed **Symptoms:** * Lightning ASR stuck at startup * Error: `Failed to download model` * Error: `Connection timeout` **Diagnosis:** Check Lightning ASR logs: ```bash docker compose logs lightning-asr ``` Check `.env` file: ```bash cat .env | grep MODEL_URL ``` Test URL accessibility: ```bash curl -I "${MODEL_URL}" ``` Models require \~20-30 GB: ```bash df -h ``` Free up space if needed: ```bash docker system prune -a ``` Download model manually and use volume mount: ```bash mkdir -p ~/models cd ~/models wget "${MODEL_URL}" -O model.bin ``` Update docker-compose.yml: ```yaml lightning-asr: volumes: - ~/models:/app/models ``` For slow connections, increase download timeout: ```yaml lightning-asr: environment: - DOWNLOAD_TIMEOUT=3600 ``` ### Port Already in Use **Symptoms:** * Error: `port is already allocated` * Error: `bind: address already in use` **Diagnosis:** Find what's using the port: ```bash sudo lsof -i :7100 sudo netstat -tulpn | grep 7100 ``` If another service is using the port: ```bash sudo systemctl stop [service-name] ``` Or kill the process: ```bash sudo kill -9 [PID] ``` Modify docker-compose.yml to use different port: ```yaml api-server: ports: - "8080:7100" ``` Access API at [http://localhost:8080](http://localhost:8080) instead Old containers may still be bound: ```bash docker compose down docker container prune -f docker compose up -d ``` ### Out of Memory **Symptoms:** * Container killed unexpectedly * Error: `OOMKilled` * System becomes unresponsive **Diagnosis:** Check container status: ```bash docker compose ps docker inspect [container-name] | grep OOMKilled ``` Lightning ASR requires minimum 16 GB RAM Check current memory: ```bash free -h ``` Prevent one service from consuming all memory: ```yaml services: lightning-asr: deploy: resources: limits: memory: 14G reservations: memory: 12G ``` Add swap space (temporary solution): ```bash sudo fallocate -l 16G /swapfile sudo chmod 600 /swapfile sudo mkswap /swapfile sudo swapon /swapfile ``` Use smaller model or reduce batch size: ```yaml lightning-asr: environment: - BATCH_SIZE=1 - MODEL_PRECISION=fp16 ``` ### Container Keeps Restarting **Symptoms:** * Container status shows `Restarting` * Logs show crash loop **Diagnosis:** View recent logs: ```bash docker compose logs --tail=100 [service-name] ``` ```bash docker inspect [container-name] --format='{{.State.ExitCode}}' ``` Common exit codes: * `137`: Out of memory (OOMKilled) * `139`: Segmentation fault * `1`: General error Temporarily disable restart to debug: ```yaml lightning-asr: restart: "no" ``` Start manually and watch logs: ```bash docker compose up lightning-asr ``` Ensure required services are healthy: ```bash docker compose ps ``` All should show `Up (healthy)` or `Up` ### Slow Performance **Symptoms:** * High latency (>500ms) * Low throughput * GPU underutilized **Diagnosis:** Monitor GPU usage: ```bash watch -n 1 nvidia-smi ``` Check container resources: ```bash docker stats ``` Ensure GPU is not throttling: ```bash nvidia-smi -q -d PERFORMANCE ``` Enable persistence mode: ```bash sudo nvidia-smi -pm 1 ``` ```yaml lightning-asr: deploy: resources: limits: cpus: '8' ``` For maximum performance (loses isolation): ```yaml api-server: network_mode: host ``` Use Redis with persistence disabled for speed: ```yaml redis: command: redis-server --save "" ``` Scale Lightning ASR workers: ```bash docker compose up -d --scale lightning-asr=2 ``` ## Performance Optimization ### Best Practices Cache models to avoid re-downloading: ```yaml volumes: - model-cache:/app/models ``` Reduces GPU initialization time: ```bash sudo nvidia-smi -pm 1 ``` Allocate appropriate CPU/memory: ```yaml deploy: resources: limits: cpus: '8' memory: 14G ``` Use monitoring tools: ```bash docker stats nvidia-smi dmon ``` ### Benchmark Your Deployment Test transcription performance: ```bash time curl -X POST http://localhost:7100/v1/listen \ -H "Authorization: Token ${LICENSE_KEY}" \ -H "Content-Type: application/json" \ -d '{ "url": "https://example.com/test-audio-60s.wav" }' ``` Expected performance: * **Cold start**: First request after container start (5-10 seconds) * **Warm requests**: Subsequent requests (50-200ms) * **Real-time factor**: 0.05-0.15x (60s audio in 3-9 seconds) ## Debugging Tools ### View All Logs ```bash docker compose logs -f ``` ### Follow Specific Service ```bash docker compose logs -f lightning-asr ``` ### Last N Lines ```bash docker compose logs --tail=100 api-server ``` ### Save Logs to File ```bash docker compose logs > deployment-logs.txt ``` ### Execute Commands in Container ```bash docker compose exec lightning-asr bash ``` ### Check Container Configuration ```bash docker inspect lightning-asr-1 ``` ### Network Debugging Test connectivity between containers: ```bash docker compose exec api-server ping lightning-asr docker compose exec api-server curl http://lightning-asr:2233/health ``` ## Health Checks ### API Server ```bash curl http://localhost:7100/health ``` Expected: `{"status": "healthy"}` ### Lightning ASR ```bash curl http://localhost:2233/health ``` Expected: `{"status": "ready", "gpu": "NVIDIA A10"}` ### License Proxy ```bash docker compose exec license-proxy wget -q -O- http://localhost:6699/health ``` Expected: `{"status": "valid"}` ### Redis ```bash docker compose exec redis redis-cli ping ``` Expected: `PONG` ## Log Analysis ### Common Log Patterns ```log redis-1 | Ready to accept connections license-proxy | License validated successfully lightning-asr-1 | Model loaded successfully lightning-asr-1 | GPU: NVIDIA A10 (24GB) lightning-asr-1 | Server ready on port 2233 api-server | Connected to Lightning ASR api-server | API server listening on port 7100 ``` ```log license-proxy | ERROR: License validation failed license-proxy | ERROR: Invalid license key license-proxy | ERROR: Connection to license server failed ``` ```log lightning-asr-1 | ERROR: No CUDA-capable device detected lightning-asr-1 | ERROR: CUDA out of memory lightning-asr-1 | ERROR: GPU not accessible ``` ```log api-server | ERROR: Connection refused: lightning-asr:2233 api-server | ERROR: Timeout connecting to license-proxy ``` ## Getting Help ### Before Contacting Support Collect the following information: ```bash docker version docker compose version nvidia-smi uname -a ``` ```bash docker compose ps > status.txt docker stats --no-stream > resources.txt ``` ```bash docker compose logs > all-logs.txt ``` Sanitize and include: * docker-compose.yml * .env (remove license key) ### Contact Support Email: **[support@smallest.ai](mailto:support@smallest.ai)** Include: * Description of the issue * Steps to reproduce * System information * Logs and configuration * License key (via secure channel) ## What's Next? Advanced configuration options Integrate with your applications