LiveKit

View as Markdown

This guide walks you through integrating Smallest AI TTS and STT into a LiveKit Agents voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.

The livekit-plugins-smallestai package provides two services:

  • smallestai.STT — real-time speech-to-text using the Pulse API, with streaming over WebSocket (~64ms TTFT) and batch transcription over HTTP
  • smallestai.TTS — ultra-low-latency text-to-speech using the Lightning API

Code Example

The full runnable example is in the Smallest AI cookbook:

LiveKit Voice Agent — Smallest AI TTS + STT

Setup

1. Create a Virtual Environment

$python3.11 -m venv venv

Activate it:

  • On Linux/Mac:
    $source venv/bin/activate
  • On Windows:
    $venv\Scripts\activate

2. Install Dependencies

$pip install livekit-plugins-smallestai livekit-plugins-openai livekit-plugins-silero python-dotenv

livekit-plugins-smallestai is published on PyPI and includes both the STT and TTS services. livekit-plugins-silero provides the VAD used for turn detection, and livekit-plugins-openai provides the LLM.

3. Create a LiveKit Project

Sign in to LiveKit Cloud, create a new project, and copy your project credentials.

4. Create a .env File

$LIVEKIT_API_KEY=...
$LIVEKIT_API_SECRET=...
$LIVEKIT_URL=...
$OPENAI_API_KEY=...
$SMALLEST_API_KEY=...

Services

smallestai.STT

Real-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.

1from livekit.plugins import smallestai
2
3# Streaming transcription — English
4stt = smallestai.STT(language="en")
5
6# Automatic language detection across 39 languages
7stt = smallestai.STT(language="multi")
8
9# With speaker diarization
10stt = smallestai.STT(language="en", diarize=True)
ParameterTypeDefaultDescription
api_keystr$SMALLEST_API_KEYYour Smallest AI API key
modelstr"pulse"STT model — currently only "pulse" is available
languagestr"en"BCP-47 language code (e.g. "en", "hi", "fr"). Use "multi" for automatic detection across 39 languages
sample_rateint16000Audio sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000
encodingstr"linear16"PCM encoding: "linear16", "linear32", "alaw", "mulaw", "opus", "ogg_opus"
word_timestampsboolTrueInclude per-word start/end timestamps and confidence scores
diarizeboolFalseEnable speaker diarization — each word includes a speaker ID
eou_timeout_msint0Milliseconds of silence before the server emits a final transcript. 0 disables server-side end-of-utterance detection (recommended — lets LiveKit’s built-in turn detection control timing)

The STT service connects to wss://api.smallest.ai/waves/v1/pulse/get_text for streaming and https://api.smallest.ai/waves/v1/pulse/get_text for batch. Interim and final transcripts are both supported. START_OF_SPEECH is inferred from the first non-empty transcript.


smallestai.TTS

Text-to-speech using the Smallest AI Lightning API. Because the plugin synthesizes audio per request rather than streaming tokens, wrap it in tts.StreamAdapter with a SentenceTokenizer. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk, keeping first-audio latency low.

1from livekit.agents import tts, tokenize
2from livekit.plugins import smallestai
3
4smallest_tts = tts.StreamAdapter(
5 tts=smallestai.TTS(
6 model="lightning-v3.1",
7 voice_id="sophia",
8 language="en",
9 speed=1.0,
10 ),
11 sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
12)
ParameterTypeDefaultDescription
api_keystr$SMALLEST_API_KEYYour Smallest AI API key
modelstr"lightning-v3.1"TTS model: "lightning-v3.1" (recommended, 80+ voices, ~100ms latency) or "lightning-v2"
voice_idstr"sophia"Voice ID for synthesis
languagestr"en"Language code ("en" or "hi")
speedfloat1.0Playback speed multiplier
sample_rateint24000Output audio sample rate in Hz
output_formatstr"pcm"Output encoding: "pcm", "mp3", "wav", "mulaw", "alaw"
consistencyfloat0.5Voice consistency — lightning-v2 only
similarityfloat0.0Voice similarity — lightning-v2 only
enhancementfloat1.0Audio enhancement level — lightning-v2 only

consistency, similarity, and enhancement apply only to "lightning-v2" and are ignored for "lightning-v3.1".


Complete Agent Example

A minimal but production-ready voice agent using Smallest AI for both STT and TTS:

1import logging
2from dotenv import load_dotenv
3from livekit.agents import (
4 Agent,
5 AgentSession,
6 JobContext,
7 JobProcess,
8 RoomInputOptions,
9 RoomOutputOptions,
10 WorkerOptions,
11 cli,
12 tts,
13 tokenize,
14)
15from livekit.plugins import openai, silero, smallestai
16
17logger = logging.getLogger("voice-agent")
18load_dotenv()
19
20
21class MyAgent(Agent):
22 def __init__(self) -> None:
23 super().__init__(
24 instructions="You are a helpful voice assistant built by Smallest AI.",
25 )
26
27 async def on_enter(self):
28 self.session.generate_reply()
29
30
31def prewarm(proc: JobProcess):
32 proc.userdata["vad"] = silero.VAD.load()
33
34
35async def entrypoint(ctx: JobContext):
36 session = AgentSession(
37 vad=ctx.proc.userdata["vad"],
38 stt=smallestai.STT(language="en"),
39 llm=openai.LLM(model="gpt-4o-mini"),
40 tts=tts.StreamAdapter(
41 tts=smallestai.TTS(),
42 sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
43 ),
44 )
45
46 await session.start(
47 agent=MyAgent(),
48 room=ctx.room,
49 room_input_options=RoomInputOptions(),
50 room_output_options=RoomOutputOptions(transcription_enabled=True),
51 )
52
53
54if __name__ == "__main__":
55 cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

Running the Agent

$python3 agent.py dev

The dev flag starts the agent worker in development mode. To interact with it, open the LiveKit Agents Playground and enter your LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET. The agent will greet the user automatically on session start.

The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.


Notes

  • The StreamAdapter + SentenceTokenizer wrapper is required for TTS — the Smallest AI plugin synthesizes audio per request. Without it, the agent waits for the entire LLM response before starting synthesis.
  • Set eou_timeout_ms=0 (the default) when using LiveKit’s built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit’s own logic, which increases end-of-turn latency.
  • "lightning-v3.1" is the recommended TTS model — it delivers ~100ms latency with 80+ voices. Switch to "lightning-v2" only if you need the consistency/similarity/enhancement quality parameters.
  • For issues or questions, open an issue in the cookbook repository or reach out on Discord.