Services Overview
Architecture
The Docker deployment consists of four main services that work together:
API Server
The API Server is the main entry point for all client requests.
Purpose
- Routes incoming API requests to Lightning ASR workers
- Manages WebSocket connections for streaming
- Handles request queuing and load balancing
- Provides unified API interface
Container Details
quay.io/smallestinc/self-hosted-api-server:latest
7100 - Main API endpoint
- CPU: 0.5-2 cores
- Memory: 512 MB - 2 GB
- No GPU required
Key Endpoints
Environment Variables
Logs
Key log messages:
Dependencies
- Requires Lightning ASR to be running
- Requires License Proxy for validation
- Optionally uses Redis for request coordination
Lightning ASR
The core speech recognition engine powered by GPU acceleration.
Purpose
- Performs audio-to-text transcription
- Processes both batch and streaming requests
- Manages GPU resources and model inference
- Handles audio preprocessing and postprocessing
Container Details
quay.io/smallestinc/lightning-asr:latest
2233 - ASR service endpoint
- CPU: 4-8 cores
- Memory: 12-16 GB
- GPU: 1x NVIDIA GPU (16+ GB VRAM)
GPU Requirements
Lightning ASR requires NVIDIA GPU with CUDA support:
Environment Variables
Model Loading
On first startup, Lightning ASR:
- Downloads model from MODEL_URL (~20 GB)
- Validates model integrity
- Loads model into GPU memory
- Performs warmup inference
Use persistent volumes to cache models and avoid re-downloading on container restart.
Logs
Key log messages:
Performance
Typical performance metrics:
Dependencies
- Requires License Proxy for validation
- Requires Redis for request coordination
- Requires NVIDIA GPU
License Proxy
Validates license keys and reports usage to Smallest servers.
Purpose
- Validates license keys on startup
- Reports usage metadata to Smallest
- Provides grace period for offline operation
- Acts as licensing gateway for all services
Container Details
quay.io/smallestinc/license-proxy:latest
6699 - License validation endpoint (internal)
- CPU: 0.25-1 core
- Memory: 256-512 MB
- No GPU required
Environment Variables
Network Requirements
License Proxy requires outbound HTTPS access to:
api.smallest.aion port 443
Ensure your firewall allows these connections.
Validation Process
- On startup, validates license key with Smallest servers
- Receives license terms and quotas
- Caches validation (valid for grace period)
- Periodically reports usage metadata
Usage Reporting
License Proxy reports only metadata:
No audio or transcript data is transmitted to Smallest servers.
Offline Mode
If connection to license server fails:
- Uses cached validation (24-hour grace period)
- Continues serving requests
- Logs warning messages
- Retries connection periodically
Logs
Key log messages:
Redis
Provides caching and state management for the system.
Purpose
- Request queuing and coordination
- Session state for streaming connections
- Caching of frequent requests
- Performance optimization
Container Details
redis:latest or redis:7-alpine
6379 - Redis protocol
- CPU: 0.5-1 core
- Memory: 512 MB - 1 GB
- No GPU required
Configuration Options
Embedded Redis
With Persistence
With Authentication
External Redis
Default configuration with minimal setup:
Data Stored
Redis stores:
- Request queue state
- WebSocket session data
- Temporary audio chunks (streaming)
- Worker status and health
Data in Redis is temporary and can be safely cleared. No persistent state is stored.
Health Check
Built-in health check:
Service Dependencies
Startup order and dependencies:
Recommended Startup Sequence
- Redis - Starts immediately (5 seconds)
- License Proxy - Validates license (10-15 seconds)
- Lightning ASR - Downloads/loads model (30-600 seconds)
- API Server - Connects to services (5-10 seconds)
Resource Planning
Minimum Configuration
For development/testing:
Production Configuration
For production workloads:
Multi-Worker Configuration
For high-volume production:
Monitoring
Container Health
Check container status:
Resource Usage
Monitor resource consumption:
GPU Usage
Monitor GPU utilization:
Logs
View service logs:

