LiveKit | Smallest AI Docs

This guide walks you through integrating Smallest AI TTS and STT into a LiveKit Agents voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.

The livekit-plugins-smallestai package provides two services:

smallestai.STT — real-time speech-to-text using the Pulse API, with streaming over WebSocket (~64ms TTFT) and batch transcription over HTTP
smallestai.TTS — ultra-low-latency text-to-speech using the Lightning API

Code Example

The full runnable example is in the Smallest AI cookbook:

LiveKit Voice Agent — Smallest AI TTS + STT

Setup

1. Create a Virtual Environment

$ python3.11 -m venv venv

Activate it:

On Linux/Mac:

$ source venv/bin/activate

On Windows:

$ venv\Scripts\activate

2. Install Dependencies

$ pip install livekit-plugins-smallestai livekit-plugins-openai livekit-plugins-silero python-dotenv

livekit-plugins-smallestai is published on PyPI and includes both the STT and TTS services. livekit-plugins-silero provides the VAD used for turn detection, and livekit-plugins-openai provides the LLM.

3. Create a LiveKit Project

4. Create a `.env` File

$ LIVEKIT_API_KEY=...
$ LIVEKIT_API_SECRET=...
$ LIVEKIT_URL=...
$ OPENAI_API_KEY=...
$ SMALLEST_API_KEY=...

Services

`smallestai.STT`

Real-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.

1 from livekit.plugins import smallestai
2 
3 # Streaming transcription — English
4 stt = smallestai.STT(language="en")
5 
6 # Automatic language detection across 39 languages
7 stt = smallestai.STT(language="multi")
8 
9 # With speaker diarization
10 stt = smallestai.STT(language="en", diarize=True)

Parameter	Type	Default	Description
`api_key`	`str`	`$SMALLEST_API_KEY`	Your Smallest AI API key
`model`	`str`	`"pulse"`	STT model — currently only `"pulse"` is available
`language`	`str`	`"en"`	BCP-47 language code (e.g. `"en"`, `"hi"`, `"fr"`). Use `"multi"` for automatic detection across 39 languages
`sample_rate`	`int`	`16000`	Audio sample rate in Hz. Supported: `8000`, `16000`, `22050`, `24000`, `44100`, `48000`
`encoding`	`str`	`"linear16"`	PCM encoding: `"linear16"`, `"linear32"`, `"alaw"`, `"mulaw"`, `"opus"`, `"ogg_opus"`
`word_timestamps`	`bool`	`True`	Include per-word `start`/`end` timestamps and confidence scores
`diarize`	`bool`	`False`	Enable speaker diarization — each word includes a speaker ID
`eou_timeout_ms`	`int`	`0`	Milliseconds of silence before the server emits a final transcript. `0` disables server-side end-of-utterance detection (recommended — lets LiveKit’s built-in turn detection control timing)

The STT service connects to wss://api.smallest.ai/waves/v1/pulse/get_text for streaming and https://api.smallest.ai/waves/v1/pulse/get_text for batch. Interim and final transcripts are both supported. START_OF_SPEECH is inferred from the first non-empty transcript.

`smallestai.TTS`

Text-to-speech using the Smallest AI Lightning API. Because the plugin synthesizes audio per request rather than streaming tokens, wrap it in tts.StreamAdapter with a SentenceTokenizer. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk, keeping first-audio latency low.

1 from livekit.agents import tts, tokenize
2 from livekit.plugins import smallestai
3 
4 smallest_tts = tts.StreamAdapter(
5     tts=smallestai.TTS(
6         model="lightning-v3.1",
7         voice_id="sophia",
8         language="en",
9         speed=1.0,
10     ),
11     sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
12 )

Parameter	Type	Default	Description
`api_key`	`str`	`$SMALLEST_API_KEY`	Your Smallest AI API key
`model`	`str`	`"lightning-v3.1"`	TTS model: `"lightning-v3.1"` (recommended, 80+ voices, ~100ms latency) or `"lightning-v2"`
`voice_id`	`str`	`"sophia"`	Voice ID for synthesis
`language`	`str`	`"en"`	Language code (`"en"` or `"hi"`)
`speed`	`float`	`1.0`	Playback speed multiplier
`sample_rate`	`int`	`24000`	Output audio sample rate in Hz
`output_format`	`str`	`"pcm"`	Output encoding: `"pcm"`, `"mp3"`, `"wav"`, `"mulaw"`, `"alaw"`
`consistency`	`float`	`0.5`	Voice consistency — `lightning-v2` only
`similarity`	`float`	`0.0`	Voice similarity — `lightning-v2` only
`enhancement`	`float`	`1.0`	Audio enhancement level — `lightning-v2` only

consistency, similarity, and enhancement apply only to "lightning-v2" and are ignored for "lightning-v3.1".

Complete Agent Example

A minimal but production-ready voice agent using Smallest AI for both STT and TTS:

1 import logging
2 from dotenv import load_dotenv
3 from livekit.agents import (
4     Agent,
5     AgentSession,
6     JobContext,
7     JobProcess,
8     RoomInputOptions,
9     RoomOutputOptions,
10     WorkerOptions,
11     cli,
12     tts,
13     tokenize,
14 )
15 from livekit.plugins import openai, silero, smallestai
16 
17 logger = logging.getLogger("voice-agent")
18 load_dotenv()
19 
20 
21 class MyAgent(Agent):
22     def __init__(self) -> None:
23         super().__init__(
24             instructions="You are a helpful voice assistant built by Smallest AI.",
25         )
26 
27     async def on_enter(self):
28         self.session.generate_reply()
29 
30 
31 def prewarm(proc: JobProcess):
32     proc.userdata["vad"] = silero.VAD.load()
33 
34 
35 async def entrypoint(ctx: JobContext):
36     session = AgentSession(
37         vad=ctx.proc.userdata["vad"],
38         stt=smallestai.STT(language="en"),
39         llm=openai.LLM(model="gpt-4o-mini"),
40         tts=tts.StreamAdapter(
41             tts=smallestai.TTS(),
42             sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
43         ),
44     )
45 
46     await session.start(
47         agent=MyAgent(),
48         room=ctx.room,
49         room_input_options=RoomInputOptions(),
50         room_output_options=RoomOutputOptions(transcription_enabled=True),
51     )
52 
53 
54 if __name__ == "__main__":
55     cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

Running the Agent

$ python3 agent.py dev

The dev flag starts the agent worker in development mode. To interact with it, open the LiveKit Agents Playground and enter your LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET. The agent will greet the user automatically on session start.

The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.

Notes

The StreamAdapter + SentenceTokenizer wrapper is required for TTS — the Smallest AI plugin synthesizes audio per request. Without it, the agent waits for the entire LLM response before starting synthesis.
Set eou_timeout_ms=0 (the default) when using LiveKit’s built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit’s own logic, which increases end-of-turn latency.
"lightning-v3.1" is the recommended TTS model — it delivers ~100ms latency with 80+ voices. Switch to "lightning-v2" only if you need the consistency/similarity/enhancement quality parameters.
For issues or questions, open an issue in the cookbook repository or reach out on Discord.