Docker Troubleshooting

View as MarkdownOpen in Claude

Common Issues

GPU Not Accessible

Symptoms:

  • Error: could not select device driver "nvidia"
  • Error: no NVIDIA GPU devices found
  • Lightning TTS fails to start

Diagnosis:

$docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
$sudo systemctl restart docker
$docker compose up -d
$sudo apt-get remove nvidia-container-toolkit
$sudo apt-get update
$sudo apt-get install -y nvidia-container-toolkit
$
$sudo systemctl restart docker
$nvidia-smi

If driver version is below 470, update:

$sudo ubuntu-drivers autoinstall
$sudo reboot

Verify /etc/docker/daemon.json contains:

1{
2 "runtimes": {
3 "nvidia": {
4 "path": "nvidia-container-runtime",
5 "runtimeArgs": []
6 }
7 }
8}

Restart Docker after changes:

$sudo systemctl restart docker

License Validation Failed

Symptoms:

  • Error: License validation failed
  • Error: Invalid license key
  • Services fail to start

Diagnosis:

Check license-proxy logs:

$docker compose logs license-proxy

Check .env file:

$cat .env | grep LICENSE_KEY

Ensure there are no:

  • Extra spaces
  • Quotes around the key
  • Line breaks

Correct format:

$LICENSE_KEY=abc123def456

Test connection to license server:

$curl -v https://console-api.smallest.ai

If this fails, check:

  • Firewall rules
  • Proxy settings
  • DNS resolution

If the key appears correct and network is accessible, your license may be:

  • Expired
  • Revoked
  • Invalid

Contact support@smallest.ai with:

  • Your license key
  • License-proxy logs
  • Error messages

Model Loading Failed

Symptoms:

  • Lightning TTS stuck at startup
  • Error: Failed to load model
  • Container keeps restarting

Diagnosis:

Check Lightning TTS logs:

$docker compose logs lightning-tts

Verify GPU has enough VRAM:

$nvidia-smi

Lightning TTS requires minimum 16GB VRAM.

Models require space:

$df -h

Free up space if needed:

$docker system prune -a

Models may need more time to load:

1lightning-tts:
2 healthcheck:
3 start_period: 120s

Port Already in Use

Symptoms:

  • Error: port is already allocated
  • Error: bind: address already in use

Diagnosis:

Find what’s using the port:

$sudo lsof -i :7100
$sudo netstat -tulpn | grep 7100

If another service is using the port:

$sudo systemctl stop [service-name]

Or kill the process:

$sudo kill -9 [PID]

Modify docker-compose.yml to use different port:

1api-server:
2 ports:
3 - "8080:7100"

Access API at http://localhost:8080 instead

Old containers may still be bound:

$docker compose down
$docker container prune -f
$docker compose up -d

Out of Memory

Symptoms:

  • Container killed unexpectedly
  • Error: OOMKilled
  • System becomes unresponsive

Diagnosis:

Check container status:

$docker compose ps
$docker inspect [container-name] | grep OOMKilled

Lightning TTS requires minimum 16 GB RAM

Check current memory:

$free -h

Prevent one service from consuming all memory:

1services:
2 lightning-tts:
3 deploy:
4 resources:
5 limits:
6 memory: 14G
7 reservations:
8 memory: 12G

Add swap space (temporary solution):

$sudo fallocate -l 16G /swapfile
$sudo chmod 600 /swapfile
$sudo mkswap /swapfile
$sudo swapon /swapfile

Slow Performance

Symptoms:

  • High latency (>500ms)
  • Low throughput
  • GPU underutilized

Diagnosis:

Monitor GPU usage:

$watch -n 1 nvidia-smi

Check container resources:

$docker stats

Ensure GPU is not throttling:

$nvidia-smi -q -d PERFORMANCE

Enable persistence mode:

$sudo nvidia-smi -pm 1
1lightning-tts:
2 deploy:
3 resources:
4 limits:
5 cpus: '8'

Use Redis with persistence disabled for speed:

1redis:
2 command: redis-server --save ""

Performance Optimization

Best Practices

1

Enable GPU Persistence Mode

Reduces GPU initialization time:

$sudo nvidia-smi -pm 1
2

Optimize Container Resources

Allocate appropriate CPU/memory:

1deploy:
2 resources:
3 limits:
4 cpus: '8'
5 memory: 14G
3

Monitor and Tune

Use monitoring tools:

$docker stats
$nvidia-smi dmon

Benchmark Your Deployment

Test TTS performance:

$time curl -X POST http://localhost:7100/v1/speak \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -H "Content-Type: application/json" \
> -d '{
> "text": "This is a test of the text-to-speech service.",
> "voice": "default"
> }'

Expected performance:

  • Cold start: First request after container start (5-10 seconds)
  • Warm requests: Subsequent requests (100-300ms)
  • Real-time factor: 0.1-0.3x

Debugging Tools

View All Logs

$docker compose logs -f

Follow Specific Service

$docker compose logs -f lightning-tts

Last N Lines

$docker compose logs --tail=100 api-server

Save Logs to File

$docker compose logs > deployment-logs.txt

Execute Commands in Container

$docker compose exec lightning-tts bash

Check Container Configuration

$docker inspect lightning-tts

Network Debugging

Test connectivity between containers:

$docker compose exec api-server ping lightning-tts
$docker compose exec api-server curl http://lightning-tts:8876/health

Health Checks

API Server

$curl http://localhost:7100/health

Expected: {"status": "healthy"}

Lightning TTS

$curl http://localhost:8876/health

Expected: {"status": "ready", "gpu": "NVIDIA A10"}

License Proxy

$docker compose exec license-proxy wget -q -O- http://localhost:3369/health

Expected: {"status": "valid"}

Redis

$docker compose exec redis redis-cli ping

Expected: PONG

Getting Help

Before Contacting Support

Collect the following information:

1

System Information

$docker version
$docker compose version
$nvidia-smi
$uname -a
2

Container Status

$docker compose ps > status.txt
$docker stats --no-stream > resources.txt
3

Logs

$docker compose logs > all-logs.txt
4

Configuration

Sanitize and include:

  • docker-compose.yml
  • .env (remove license key)

Contact Support

Email: support@smallest.ai

Include:

  • Description of the issue
  • Steps to reproduce
  • System information
  • Logs and configuration
  • License key (via secure channel)

What’s Next?