This guide walks you through integrating Smallest AI TTS and STT into a LiveKit Agents voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.
The livekit-plugins-smallestai package provides two services:
smallestai.STT — real-time speech-to-text using the Pulse API, with streaming over WebSocket (~64ms TTFT) and batch transcription over HTTPsmallestai.TTS — ultra-low-latency text-to-speech using the Lightning APIThe full runnable example is in the Smallest AI cookbook:
LiveKit Voice Agent — Smallest AI TTS + STT
Activate it:
livekit-plugins-smallestai is published on PyPI and includes both the STT and TTS services. livekit-plugins-silero provides the VAD used for turn detection, and livekit-plugins-openai provides the LLM.
Sign in to LiveKit Cloud, create a new project, and copy your project credentials.
.env Filesmallestai.STTReal-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.
The STT service connects to wss://api.smallest.ai/waves/v1/pulse/get_text for streaming and https://api.smallest.ai/waves/v1/pulse/get_text for batch. Interim and final transcripts are both supported. START_OF_SPEECH is inferred from the first non-empty transcript.
smallestai.TTSText-to-speech using the Smallest AI Lightning API. Because the plugin synthesizes audio per request rather than streaming tokens, wrap it in tts.StreamAdapter with a SentenceTokenizer. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk, keeping first-audio latency low.
consistency, similarity, and enhancement apply only to "lightning-v2" and are ignored for "lightning-v3.1".
A minimal but production-ready voice agent using Smallest AI for both STT and TTS:
The dev flag starts the agent worker in development mode. To interact with it, open the LiveKit Agents Playground and enter your LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET. The agent will greet the user automatically on session start.
The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.
StreamAdapter + SentenceTokenizer wrapper is required for TTS — the Smallest AI plugin synthesizes audio per request. Without it, the agent waits for the entire LLM response before starting synthesis.eou_timeout_ms=0 (the default) when using LiveKit’s built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit’s own logic, which increases end-of-turn latency."lightning-v3.1" is the recommended TTS model — it delivers ~100ms latency with 80+ voices. Switch to "lightning-v2" only if you need the consistency/similarity/enhancement quality parameters.