Docker Troubleshooting

Common Issues

GPU Not Accessible

Symptoms:

Error: could not select device driver "nvidia"
Error: no NVIDIA GPU devices found
Lightning ASR fails to start

Diagnosis:

$ docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Solution 1: Restart Docker

$ sudo systemctl restart docker
$ docker compose up -d

Solution 2: Reinstall NVIDIA Container Toolkit

$ sudo apt-get remove nvidia-container-toolkit
$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ 
$ sudo systemctl restart docker

Solution 3: Update NVIDIA Driver

$ nvidia-smi

If driver version is below 470, update:

$ sudo ubuntu-drivers autoinstall
$ sudo reboot

Solution 4: Check Docker Daemon Configuration

Verify /etc/docker/daemon.json contains:

1 {
2   "runtimes": {
3     "nvidia": {
4       "path": "nvidia-container-runtime",
5       "runtimeArgs": []
6     }
7   }
8 }

Restart Docker after changes:

$ sudo systemctl restart docker

License Validation Failed

Symptoms:

Error: License validation failed
Error: Invalid license key
Services fail to start

Diagnosis:

Check license-proxy logs:

$ docker compose logs license-proxy

Solution 1: Verify License Key

Check .env file:

$ cat .env | grep LICENSE_KEY

Ensure there are no:

Extra spaces
Quotes around the key
Line breaks

Correct format:

$ LICENSE_KEY=abc123def456

Solution 2: Check Network Connectivity

Test connection to license server:

$ curl -v https://api.smallest.ai

If this fails, check:

Firewall rules
Proxy settings
DNS resolution

Solution 3: Contact Support

If the key appears correct and network is accessible, your license may be:

Expired
Revoked
Invalid

Contact support@smallest.ai with:

Your license key
License-proxy logs
Error messages

Model Download Failed

Symptoms:

Lightning ASR stuck at startup
Error: Failed to download model
Error: Connection timeout

Diagnosis:

Check Lightning ASR logs:

$ docker compose logs lightning-asr

Solution 1: Verify Model URL

Check .env file:

$ cat .env | grep MODEL_URL

Test URL accessibility:

$ curl -I "${MODEL_URL}"

Solution 2: Check Disk Space

Models require ~20-30 GB:

$ df -h

Free up space if needed:

$ docker system prune -a

Solution 3: Manual Download

Download model manually and use volume mount:

$ mkdir -p ~/models
$ cd ~/models
$ wget "${MODEL_URL}" -O model.bin

Update docker-compose.yml:

1 lightning-asr:
2   volumes:
3     - ~/models:/app/models

Solution 4: Increase Timeout

For slow connections, increase download timeout:

1 lightning-asr:
2   environment:
3     - DOWNLOAD_TIMEOUT=3600

Port Already in Use

Symptoms:

Error: port is already allocated
Error: bind: address already in use

Diagnosis:

Find what’s using the port:

$ sudo lsof -i :7100
$ sudo netstat -tulpn | grep 7100

Solution 1: Stop Conflicting Service

If another service is using the port:

$ sudo systemctl stop [service-name]

Or kill the process:

$ sudo kill -9 [PID]

Solution 2: Change Port

Modify docker-compose.yml to use different port:

1 api-server:
2   ports:
3     - "8080:7100"

Access API at http://localhost:8080 instead

Solution 3: Remove Old Containers

Old containers may still be bound:

$ docker compose down
$ docker container prune -f
$ docker compose up -d

Out of Memory

Symptoms:

Container killed unexpectedly
Error: OOMKilled
System becomes unresponsive

Diagnosis:

Check container status:

$ docker compose ps
$ docker inspect [container-name] | grep OOMKilled

Solution 1: Increase System Memory

Lightning ASR requires minimum 16 GB RAM

Check current memory:

$ free -h

Solution 2: Add Memory Limits

Prevent one service from consuming all memory:

1 services:
2   lightning-asr:
3     deploy:
4       resources:
5         limits:
6           memory: 14G
7         reservations:
8           memory: 12G

Solution 3: Enable Swap

Add swap space (temporary solution):

$ sudo fallocate -l 16G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

Solution 4: Optimize Model Loading

Use smaller model or reduce batch size:

1 lightning-asr:
2   environment:
3     - BATCH_SIZE=1
4     - MODEL_PRECISION=fp16

Container Keeps Restarting

Symptoms:

Container status shows Restarting
Logs show crash loop

Diagnosis:

View recent logs:

$ docker compose logs --tail=100 [service-name]

Solution 1: Check Exit Code

$ docker inspect [container-name] --format='{{.State.ExitCode}}'

Common exit codes:

137: Out of memory (OOMKilled)
139: Segmentation fault
1: General error

Solution 2: Disable Auto-Restart

Temporarily disable restart to debug:

1 lightning-asr:
2   restart: "no"

Start manually and watch logs:

$ docker compose up lightning-asr

Solution 3: Check Dependencies

Ensure required services are healthy:

$ docker compose ps

All should show Up (healthy) or Up

Slow Performance

Symptoms:

High latency (>500ms)
Low throughput
GPU underutilized

Diagnosis:

Monitor GPU usage:

$ watch -n 1 nvidia-smi

Check container resources:

$ docker stats

Solution 1: Optimize GPU Usage

Ensure GPU is not throttling:

$ nvidia-smi -q -d PERFORMANCE

Enable persistence mode:

$ sudo nvidia-smi -pm 1

Solution 2: Increase CPU Allocation

1 lightning-asr:
2   deploy:
3     resources:
4       limits:
5         cpus: '8'

Solution 3: Use Host Network

For maximum performance (loses isolation):

1 api-server:
2   network_mode: host

Solution 4: Optimize Redis

Use Redis with persistence disabled for speed:

1 redis:
2   command: redis-server --save ""

Solution 5: Add More Workers

Scale Lightning ASR workers:

$ docker compose up -d --scale lightning-asr=2

Performance Optimization

Best Practices

Use Persistent Volumes

Cache models to avoid re-downloading:

1 volumes:
2   - model-cache:/app/models

Enable GPU Persistence Mode

Reduces GPU initialization time:

$ sudo nvidia-smi -pm 1

Optimize Container Resources

Allocate appropriate CPU/memory:

1 deploy:
2   resources:
3     limits:
4       cpus: '8'
5       memory: 14G

Monitor and Tune

Use monitoring tools:

$ docker stats
$ nvidia-smi dmon

Benchmark Your Deployment

Test transcription performance:

$ time curl -X POST http://localhost:7100/v1/listen \
>   -H "Authorization: Token ${LICENSE_KEY}" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "url": "https://example.com/test-audio-60s.wav"
>   }'

Expected performance:

Cold start: First request after container start (5-10 seconds)
Warm requests: Subsequent requests (50-200ms)
Real-time factor: 0.05-0.15x (60s audio in 3-9 seconds)

Debugging Tools

View All Logs

$ docker compose logs -f

Follow Specific Service

$ docker compose logs -f lightning-asr

Last N Lines

$ docker compose logs --tail=100 api-server

Save Logs to File

$ docker compose logs > deployment-logs.txt

Execute Commands in Container

$ docker compose exec lightning-asr bash

Check Container Configuration

$ docker inspect lightning-asr-1

Network Debugging

Test connectivity between containers:

$ docker compose exec api-server ping lightning-asr
$ docker compose exec api-server curl http://lightning-asr:2233/health

Health Checks

API Server

$ curl http://localhost:7100/health

Expected: {"status": "healthy"}

Lightning ASR

$ curl http://localhost:2233/health

Expected: {"status": "ready", "gpu": "NVIDIA A10"}

License Proxy

$ docker compose exec license-proxy wget -q -O- http://localhost:6699/health

Expected: {"status": "valid"}

Redis

$ docker compose exec redis redis-cli ping

Expected: PONG

Log Analysis

Common Log Patterns

Successful Startup

License Issues

GPU Issues

Network Issues

1 redis-1              | Ready to accept connections
2 license-proxy        | License validated successfully
3 lightning-asr-1      | Model loaded successfully
4 lightning-asr-1      | GPU: NVIDIA A10 (24GB)
5 lightning-asr-1      | Server ready on port 2233
6 api-server           | Connected to Lightning ASR
7 api-server           | API server listening on port 7100

Getting Help

Before Contacting Support

Collect the following information:

System Information

$ docker version
$ docker compose version
$ nvidia-smi
$ uname -a

Container Status

$ docker compose ps > status.txt
$ docker stats --no-stream > resources.txt

Logs

$ docker compose logs > all-logs.txt

Configuration

Sanitize and include:

docker-compose.yml
.env (remove license key)

Contact Support

Email: support@smallest.ai

Include:

Description of the issue
Steps to reproduce
System information
Logs and configuration
License key (via secure channel)

What’s Next?

STT Configuration

Advanced configuration options

API Reference

Integrate with your applications

$	sudo apt-get remove nvidia-container-toolkit
$	sudo apt-get update
$	sudo apt-get install -y nvidia-container-toolkit
$
$	sudo systemctl restart docker

1	{
2	"runtimes": {
3	"nvidia": {
4	"path": "nvidia-container-runtime",
5	"runtimeArgs": []
6	}
7	}
8	}

$	mkdir -p ~/models
$	cd ~/models
$	wget "${MODEL_URL}" -O model.bin

$	docker compose down
$	docker container prune -f
$	docker compose up -d

$	docker compose ps
$	docker inspect [container-name] \| grep OOMKilled

1	services:
2	lightning-asr:
3	deploy:
4	resources:
5	limits:
6	memory: 14G
7	reservations:
8	memory: 12G

$	sudo fallocate -l 16G /swapfile
$	sudo chmod 600 /swapfile
$	sudo mkswap /swapfile
$	sudo swapon /swapfile

1	lightning-asr:
2	environment:
3	- BATCH_SIZE=1
4	- MODEL_PRECISION=fp16

$	time curl -X POST http://localhost:7100/v1/listen \
>	-H "Authorization: Token ${LICENSE_KEY}" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"url": "https://example.com/test-audio-60s.wav"
>	}'

$	docker compose exec api-server ping lightning-asr
$	docker compose exec api-server curl http://lightning-asr:2233/health

1	redis-1 \| Ready to accept connections
2	license-proxy \| License validated successfully
3	lightning-asr-1 \| Model loaded successfully
4	lightning-asr-1 \| GPU: NVIDIA A10 (24GB)
5	lightning-asr-1 \| Server ready on port 2233
6	api-server \| Connected to Lightning ASR
7	api-server \| API server listening on port 7100

$	docker compose ps > status.txt
$	docker stats --no-stream > resources.txt