For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Prerequisites
    • Why Self-Host?
    • Architecture
  • Docker Setup
  • Kubernetes Setup
    • Quick Start
    • Troubleshooting
  • Troubleshooting
    • Common Issues
    • Debugging Guide
    • Logs Analysis
  • API Reference
    • Authentication
    • Examples
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • System Architecture
  • Components
  • Data Flow
  • What’s Next?
Getting Started

Architecture Overview

||View as Markdown|
Was this page helpful?
Previous

Why Self-Host?

Next

Hardware Requirements

Built with

System Architecture

Components

API Server

Routes requests to Lightning ASR/TTS workers, manages WebSocket connections, and provides a unified REST API interface.

Resources: 0.5-2 CPU cores, 512 MB - 2 GB RAM, no GPU

Lightning ASR

GPU-accelerated speech-to-text engine with 0.05-0.15x real-time factor. Supports real-time and batch transcription.

Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

Lightning TTS

GPU-accelerated text-to-speech engine for natural voice synthesis. Supports streaming and batch generation.

Resources: 4-8 CPU cores, 12-16 GB RAM, 1x NVIDIA GPU (16+ GB VRAM)

License Proxy

Validates license keys and reports usage metadata. Supports offline grace periods.

Resources: 0.25-1 CPU core, 256-512 MB RAM, no GPU

Redis

Request queuing, session state, and caching. Can use embedded or external (ElastiCache).

Resources: 0.5-1 CPU core, 512 MB - 2 GB RAM, no GPU

Data Flow

  1. Client Request — Your application sends audio (STT) or text (TTS) via HTTP or WebSocket
  2. API Server — Routes the request to the appropriate worker and validates the license
  3. Worker Processing — Lightning ASR or TTS processes the request on GPU
  4. Response — Results stream back through the API server to your application

All processing happens within your infrastructure. Only license validation metadata is sent to Smallest Cloud.

What’s Next?

Prerequisites

License key, credentials, and infrastructure requirements

Why Self-Host?

Benefits of self-hosting for your use case