> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# LiveKit

> Build real-time voice agents with LiveKit Agents using Smallest AI TTS and STT.

This guide walks you through integrating [Smallest AI](https://smallest.ai) TTS and STT into a [LiveKit Agents](https://docs.livekit.io/agents/) voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.

The `livekit-plugins-smallestai` package provides two services:

* **`smallestai.STT`** — real-time speech-to-text using the Pulse API, with streaming over WebSocket (\~64ms TTFT) and batch transcription over HTTP
* **`smallestai.TTS`** — ultra-low-latency text-to-speech using the Lightning API

## Code Example

The full runnable example is in the Smallest AI cookbook:

[LiveKit Voice Agent — Smallest AI TTS + STT](https://github.com/smallest-inc/cookbook/tree/main/voice-agents/livekit-voice-agent)

## Setup

### 1. Create a Virtual Environment

```bash
python3.11 -m venv venv
```

Activate it:

* On Linux/Mac:
  ```bash
  source venv/bin/activate
  ```
* On Windows:
  ```bash
  venv\Scripts\activate
  ```

### 2. Install Dependencies

```bash
pip install livekit-plugins-smallestai livekit-plugins-openai livekit-plugins-silero python-dotenv
```

`livekit-plugins-smallestai` is published on PyPI and includes both the STT and TTS services. `livekit-plugins-silero` provides the VAD used for turn detection, and `livekit-plugins-openai` provides the LLM.

### 3. Create a LiveKit Project

Sign in to [LiveKit Cloud](https://cloud.livekit.io), create a new project, and copy your project credentials.

### 4. Create a `.env` File

```bash
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...
```

***

## Services

### `smallestai.STT`

Real-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.

```python
from livekit.plugins import smallestai

# Streaming transcription — English
stt = smallestai.STT(language="en")

# Automatic language detection across 38 languages
stt = smallestai.STT(language="multi")

# With speaker diarization
stt = smallestai.STT(language="en", diarize=True)
```

| Parameter         | Type   | Default             | Description                                                                                                                                                                                   |
| ----------------- | ------ | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key`         | `str`  | `$SMALLEST_API_KEY` | Your Smallest AI API key                                                                                                                                                                      |
| `model`           | `str`  | `"pulse"`           | STT model — currently only `"pulse"` is available                                                                                                                                             |
| `language`        | `str`  | `"en"`              | BCP-47 language code (e.g. `"en"`, `"hi"`, `"fr"`). Use `"multi"` for automatic detection across 38 languages                                                                                 |
| `sample_rate`     | `int`  | `16000`             | Audio sample rate in Hz. Supported: `8000`, `16000`, `22050`, `24000`, `44100`, `48000`                                                                                                       |
| `encoding`        | `str`  | `"linear16"`        | PCM encoding: `"linear16"`, `"linear32"`, `"alaw"`, `"mulaw"`, `"opus"`, `"ogg_opus"`                                                                                                         |
| `word_timestamps` | `bool` | `True`              | Include per-word `start`/`end` timestamps and confidence scores                                                                                                                               |
| `diarize`         | `bool` | `False`             | Enable speaker diarization — each word includes a speaker ID                                                                                                                                  |
| `eou_timeout_ms`  | `int`  | `0`                 | Milliseconds of silence before the server emits a final transcript. `0` disables server-side end-of-utterance detection (recommended — lets LiveKit's built-in turn detection control timing) |

The STT service connects to `wss://api.smallest.ai/waves/v1/pulse/get_text` for streaming and `https://api.smallest.ai/waves/v1/pulse/get_text` for batch. Interim and final transcripts are both supported. `START_OF_SPEECH` is inferred from the first non-empty transcript.

***

### `smallestai.TTS`

Text-to-speech using the Smallest AI Lightning API. The plugin uses persistent WebSocket streaming backed by a connection pool for low-latency audio delivery.

```python
from livekit.plugins import smallestai

smallest_tts = smallestai.TTS(
    model="lightning_v3.1_pro",
    voice_id="meher",
    language="en",
    speed=1.0,
)
```

| Parameter       | Type    | Default                                   | Description                                                                                                                                                                                                    |
| --------------- | ------- | ----------------------------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key`       | `str`   | `$SMALLEST_API_KEY`                       | Your Smallest AI API key                                                                                                                                                                                       |
| `model`         | `str`   | `"lightning_v3.1_pro"`                    | TTS model. `"lightning_v3.1_pro"` — premium 44.1 kHz pool with curated American, British, and Indian voices; `"lightning_v3.1"` — standard pool with 217 voices across 12 languages                            |
| `voice_id`      | `str`   | auto                                      | Voice ID for synthesis. Defaults to `"meher"` for `lightning_v3.1_pro` and `"sophia"` for `lightning_v3.1`. Pro voices must be paired with `lightning_v3.1_pro`; standard voices with `lightning_v3.1`         |
| `language`      | `str`   | `"en"`                                    | Language code. `lightning_v3.1` supports 12 codes plus `"auto"`; `lightning_v3.1_pro` supports `"en"`, `"hi"`, and `"auto"` only — see the [model cards](/waves/model-cards/text-to-speech) for the full lists |
| `speed`         | `float` | `1.0`                                     | Playback speed multiplier (0.5–2.0)                                                                                                                                                                            |
| `sample_rate`   | `int`   | `24000`                                   | Output audio sample rate in Hz. Supported: `8000`, `16000`, `24000`, `44100`                                                                                                                                   |
| `output_format` | `str`   | `"pcm"`                                   | Output encoding for HTTP synthesis: `"pcm"`, `"mp3"`, `"wav"`, `"ulaw"`, `"alaw"`. WebSocket streaming always returns PCM.                                                                                     |
| `ws_url`        | `str`   | `wss://api.smallest.ai/waves/v1/tts/live` | WebSocket endpoint for low-latency streaming synthesis                                                                                                                                                         |

***

## Complete Agent Example

A minimal but production-ready voice agent using Smallest AI for both STT and TTS:

```python
import logging
from dotenv import load_dotenv
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    JobProcess,
    RoomInputOptions,
    RoomOutputOptions,
    WorkerOptions,
    cli,
)
from livekit.plugins import openai, silero, smallestai

logger = logging.getLogger("voice-agent")
load_dotenv()


class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice assistant built by Smallest AI.",
        )

    async def on_enter(self):
        self.session.generate_reply()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=smallestai.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=smallestai.TTS(),
    )

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
```

***

## Running the Agent

```bash
python3 agent.py dev
```

The `dev` flag starts the agent worker in development mode. To interact with it, open the [LiveKit Agents Playground](https://agents-playground.livekit.io) and enter your `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET`. The agent will greet the user automatically on session start.

The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.

***

## Notes

* Set `eou_timeout_ms=0` (the default) when using LiveKit's built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit's own logic, which increases end-of-turn latency.
* Call `tts.prewarm()` during worker startup to pre-warm the WebSocket connection pool and reduce first-audio latency on the initial request.
* For issues or questions, open an issue in the [cookbook repository](https://github.com/smallest-inc/cookbook) or reach out on [Discord](https://discord.gg/9WtSXv26WE).