Architecture Overview | Smallest AI Docs

System Architecture

Routes requests to Lightning ASR/TTS workers, manages WebSocket connections, and provides a unified REST API interface.

Resources: 0.5-2 CPU cores, 512 MB - 2 GB RAM, no GPU

GPU-accelerated speech-to-text engine with 0.05-0.15x real-time factor. Supports real-time and batch transcription.

Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

GPU-accelerated text-to-speech engine for natural voice synthesis. Supports streaming and batch generation.

Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

Validates license keys and reports usage metadata. Supports offline grace periods.

Resources: 0.25-1 CPU core, 256-512 MB RAM, no GPU

Request queuing, session state, and caching. Can use embedded or external (ElastiCache).

Resources: 0.5-1 CPU core, 512 MB - 2 GB RAM, no GPU

Client Request — Your application sends audio (STT) or text (TTS) via HTTP or WebSocket
API Server — Routes the request to the appropriate worker and validates the license
Worker Processing — Lightning ASR or TTS processes the request on GPU
Response — Results stream back through the API server to your application

All processing happens within your infrastructure. Only license validation metadata is sent to Smallest Cloud.

License key, credentials, and infrastructure requirements

Benefits of self-hosting for your use case