Docker Troubleshooting | Smallest AI Docs

Common Issues

GPU Not Accessible

Symptoms:

Error: could not select device driver "nvidia"
Error: no NVIDIA GPU devices found
Lightning TTS fails to start

Diagnosis:

$ docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi

Solution 1: Restart Docker

$ sudo systemctl restart docker
$ docker compose up -d

Solution 2: Reinstall NVIDIA Container Toolkit

$ sudo apt-get remove nvidia-container-toolkit
$ sudo apt-get update
$ sudo apt-get install -y nvidia-container-toolkit
$ 
$ sudo systemctl restart docker

Solution 3: Update NVIDIA Driver

$ nvidia-smi

If driver version is below 470, update:

$ sudo ubuntu-drivers autoinstall
$ sudo reboot

Solution 4: Check Docker Daemon Configuration

Verify /etc/docker/daemon.json contains:

1 {
2   "runtimes": {
3     "nvidia": {
4       "path": "nvidia-container-runtime",
5       "runtimeArgs": []
6     }
7   }
8 }

Restart Docker after changes:

$ sudo systemctl restart docker

License Validation Failed

Symptoms:

Error: License validation failed
Error: Invalid license key
Services fail to start

Diagnosis:

Check license-proxy logs:

$ docker compose logs license-proxy

Solution 1: Verify License Key

Check .env file:

$ cat .env | grep LICENSE_KEY

Ensure there are no:

Extra spaces
Quotes around the key
Line breaks

Correct format:

$ LICENSE_KEY=abc123def456

Solution 2: Check Network Connectivity

Test connection to license server:

$ curl -v https://api.smallest.ai

If this fails, check:

Firewall rules
Proxy settings
DNS resolution

Solution 3: Contact Support

If the key appears correct and network is accessible, your license may be:

Expired
Revoked
Invalid

Contact support@smallest.ai with:

Your license key
License-proxy logs
Error messages

Model Loading Failed

Symptoms:

Lightning TTS stuck at startup
Error: Failed to load model
Container keeps restarting

Diagnosis:

Check Lightning TTS logs:

$ docker compose logs lightning-tts

Solution 1: Check GPU Memory

Verify GPU has enough VRAM:

$ nvidia-smi

Lightning TTS requires minimum 16GB VRAM.

Solution 2: Check Disk Space

Models require space:

$ df -h

Free up space if needed:

$ docker system prune -a

Solution 3: Increase Startup Time

Models may need more time to load:

1 lightning-tts:
2   healthcheck:
3     start_period: 120s

Port Already in Use

Symptoms:

Error: port is already allocated
Error: bind: address already in use

Diagnosis:

Find what’s using the port:

$ sudo lsof -i :7100
$ sudo netstat -tulpn | grep 7100

Solution 1: Stop Conflicting Service

If another service is using the port:

$ sudo systemctl stop [service-name]

Or kill the process:

$ sudo kill -9 [PID]

Solution 2: Change Port

Modify docker-compose.yml to use different port:

1 api-server:
2   ports:
3     - "8080:7100"

Access API at http://localhost:8080 instead

Solution 3: Remove Old Containers

Old containers may still be bound:

$ docker compose down
$ docker container prune -f
$ docker compose up -d

Out of Memory

Symptoms:

Container killed unexpectedly
Error: OOMKilled
System becomes unresponsive

Diagnosis:

Check container status:

$ docker compose ps
$ docker inspect [container-name] | grep OOMKilled

Solution 1: Increase System Memory

Lightning TTS requires minimum 16 GB RAM

Check current memory:

$ free -h

Solution 2: Add Memory Limits

Prevent one service from consuming all memory:

1 services:
2   lightning-tts:
3     deploy:
4       resources:
5         limits:
6           memory: 14G
7         reservations:
8           memory: 12G

Solution 3: Enable Swap

Add swap space (temporary solution):

$ sudo fallocate -l 16G /swapfile
$ sudo chmod 600 /swapfile
$ sudo mkswap /swapfile
$ sudo swapon /swapfile

Slow Performance

Symptoms:

High latency (>500ms)
Low throughput
GPU underutilized

Diagnosis:

Monitor GPU usage:

$ watch -n 1 nvidia-smi

Check container resources:

$ docker stats

Solution 1: Optimize GPU Usage

Ensure GPU is not throttling:

$ nvidia-smi -q -d PERFORMANCE

Enable persistence mode:

$ sudo nvidia-smi -pm 1

Solution 2: Increase CPU Allocation

1 lightning-tts:
2   deploy:
3     resources:
4       limits:
5         cpus: '8'

Solution 3: Optimize Redis

Use Redis with persistence disabled for speed:

1 redis:
2   command: redis-server --save ""

Performance Optimization

Best Practices

Enable GPU Persistence Mode

Reduces GPU initialization time:

$ sudo nvidia-smi -pm 1

Optimize Container Resources

Allocate appropriate CPU/memory:

1 deploy:
2   resources:
3     limits:
4       cpus: '8'
5       memory: 14G

Monitor and Tune

Use monitoring tools:

$ docker stats
$ nvidia-smi dmon

Benchmark Your Deployment

Test TTS performance:

$ time curl -X POST http://localhost:7100/v1/speak \
>   -H "Authorization: Token ${LICENSE_KEY}" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "text": "This is a test of the text-to-speech service.",
>     "voice": "default"
>   }'

Expected performance:

Cold start: First request after container start (5-10 seconds)
Warm requests: Subsequent requests (100-300ms)
Real-time factor: 0.1-0.3x

Debugging Tools

View All Logs

$ docker compose logs -f

Follow Specific Service

$ docker compose logs -f lightning-tts

Last N Lines

$ docker compose logs --tail=100 api-server

Save Logs to File

$ docker compose logs > deployment-logs.txt

Execute Commands in Container

$ docker compose exec lightning-tts bash

Check Container Configuration

$ docker inspect lightning-tts

Network Debugging

Test connectivity between containers:

$ docker compose exec api-server ping lightning-tts
$ docker compose exec api-server curl http://lightning-tts:8876/health

Health Checks

API Server

$ curl http://localhost:7100/health

Expected: {"status": "healthy"}

Lightning TTS

$ curl http://localhost:8876/health

Expected: {"status": "ready", "gpu": "NVIDIA A10"}

License Proxy

$ docker compose exec license-proxy wget -q -O- http://localhost:3369/health

Expected: {"status": "valid"}

Redis

$ docker compose exec redis redis-cli ping

Expected: PONG

Getting Help

Before Contacting Support

Collect the following information:

System Information

$ docker version
$ docker compose version
$ nvidia-smi
$ uname -a

Container Status

$ docker compose ps > status.txt
$ docker stats --no-stream > resources.txt

Logs

$ docker compose logs > all-logs.txt

Configuration

Sanitize and include:

docker-compose.yml
.env (remove license key)

Contact Support

Email: support@smallest.ai

Include:

Description of the issue
Steps to reproduce
System information
Logs and configuration
License key (via secure channel)

What’s Next?

TTS Configuration

Advanced configuration options

API Reference

Integrate with your applications