LiveKit

View as Markdown

This guide walks you through integrating Smallest AI TTS and STT into a LiveKit Agents voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.

The livekit-plugins-smallestai package provides two services:

  • smallestai.STT — real-time speech-to-text using the Pulse API, with streaming over WebSocket (~64ms TTFT) and batch transcription over HTTP
  • smallestai.TTS — ultra-low-latency text-to-speech using the Lightning API

Code Example

The full runnable example is in the Smallest AI cookbook:

LiveKit Voice Agent — Smallest AI TTS + STT

Setup

1. Create a Virtual Environment

$python3.11 -m venv venv

Activate it:

  • On Linux/Mac:
    $source venv/bin/activate
  • On Windows:
    $venv\Scripts\activate

2. Install Dependencies

$pip install livekit-plugins-smallestai livekit-plugins-openai livekit-plugins-silero python-dotenv

livekit-plugins-smallestai is published on PyPI and includes both the STT and TTS services. livekit-plugins-silero provides the VAD used for turn detection, and livekit-plugins-openai provides the LLM.

3. Create a LiveKit Project

Sign in to LiveKit Cloud, create a new project, and copy your project credentials.

4. Create a .env File

$LIVEKIT_API_KEY=...
$LIVEKIT_API_SECRET=...
$LIVEKIT_URL=...
$OPENAI_API_KEY=...
$SMALLEST_API_KEY=...

Services

smallestai.STT

Real-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.

1from livekit.plugins import smallestai
2
3# Streaming transcription — English
4stt = smallestai.STT(language="en")
5
6# Automatic language detection across 38 languages
7stt = smallestai.STT(language="multi")
8
9# With speaker diarization
10stt = smallestai.STT(language="en", diarize=True)
ParameterTypeDefaultDescription
api_keystr$SMALLEST_API_KEYYour Smallest AI API key
modelstr"pulse"STT model — currently only "pulse" is available
languagestr"en"BCP-47 language code (e.g. "en", "hi", "fr"). Use "multi" for automatic detection across 38 languages
sample_rateint16000Audio sample rate in Hz. Supported: 8000, 16000, 22050, 24000, 44100, 48000
encodingstr"linear16"PCM encoding: "linear16", "linear32", "alaw", "mulaw", "opus", "ogg_opus"
word_timestampsboolTrueInclude per-word start/end timestamps and confidence scores
diarizeboolFalseEnable speaker diarization — each word includes a speaker ID
eou_timeout_msint0Milliseconds of silence before the server emits a final transcript. 0 disables server-side end-of-utterance detection (recommended — lets LiveKit’s built-in turn detection control timing)

The STT service connects to wss://api.smallest.ai/waves/v1/pulse/get_text for streaming and https://api.smallest.ai/waves/v1/pulse/get_text for batch. Interim and final transcripts are both supported. START_OF_SPEECH is inferred from the first non-empty transcript.


smallestai.TTS

Text-to-speech using the Smallest AI Lightning API. The plugin uses persistent WebSocket streaming backed by a connection pool for low-latency audio delivery.

1from livekit.plugins import smallestai
2
3smallest_tts = smallestai.TTS(
4 model="lightning_v3.1_pro",
5 voice_id="meher",
6 language="en",
7 speed=1.0,
8)
ParameterTypeDefaultDescription
api_keystr$SMALLEST_API_KEYYour Smallest AI API key
modelstr"lightning_v3.1_pro"TTS model. "lightning_v3.1_pro" — premium 44.1 kHz pool with curated American, British, and Indian voices; "lightning_v3.1" — standard pool with 217 voices across 12 languages
voice_idstrautoVoice ID for synthesis. Defaults to "meher" for lightning_v3.1_pro and "sophia" for lightning_v3.1. Pro voices must be paired with lightning_v3.1_pro; standard voices with lightning_v3.1
languagestr"en"Language code. lightning_v3.1 supports 12 codes plus "auto"; lightning_v3.1_pro supports "en", "hi", and "auto" only — see the model cards for the full lists
speedfloat1.0Playback speed multiplier (0.5–2.0)
sample_rateint24000Output audio sample rate in Hz. Supported: 8000, 16000, 24000, 44100
output_formatstr"pcm"Output encoding for HTTP synthesis: "pcm", "mp3", "wav", "ulaw", "alaw". WebSocket streaming always returns PCM.
ws_urlstrwss://api.smallest.ai/waves/v1/tts/liveWebSocket endpoint for low-latency streaming synthesis

Complete Agent Example

A minimal but production-ready voice agent using Smallest AI for both STT and TTS:

1import logging
2from dotenv import load_dotenv
3from livekit.agents import (
4 Agent,
5 AgentSession,
6 JobContext,
7 JobProcess,
8 RoomInputOptions,
9 RoomOutputOptions,
10 WorkerOptions,
11 cli,
12)
13from livekit.plugins import openai, silero, smallestai
14
15logger = logging.getLogger("voice-agent")
16load_dotenv()
17
18
19class MyAgent(Agent):
20 def __init__(self) -> None:
21 super().__init__(
22 instructions="You are a helpful voice assistant built by Smallest AI.",
23 )
24
25 async def on_enter(self):
26 self.session.generate_reply()
27
28
29def prewarm(proc: JobProcess):
30 proc.userdata["vad"] = silero.VAD.load()
31
32
33async def entrypoint(ctx: JobContext):
34 session = AgentSession(
35 vad=ctx.proc.userdata["vad"],
36 stt=smallestai.STT(language="en"),
37 llm=openai.LLM(model="gpt-4o-mini"),
38 tts=smallestai.TTS(),
39 )
40
41 await session.start(
42 agent=MyAgent(),
43 room=ctx.room,
44 room_input_options=RoomInputOptions(),
45 room_output_options=RoomOutputOptions(transcription_enabled=True),
46 )
47
48
49if __name__ == "__main__":
50 cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))

Running the Agent

$python3 agent.py dev

The dev flag starts the agent worker in development mode. To interact with it, open the LiveKit Agents Playground and enter your LIVEKIT_URL, LIVEKIT_API_KEY, and LIVEKIT_API_SECRET. The agent will greet the user automatically on session start.

The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.


Notes

  • Set eou_timeout_ms=0 (the default) when using LiveKit’s built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit’s own logic, which increases end-of-turn latency.
  • Call tts.prewarm() during worker startup to pre-warm the WebSocket connection pool and reduce first-audio latency on the initial request.
  • For issues or questions, open an issue in the cookbook repository or reach out on Discord.