> For clean Markdown of any page, append .md to the page URL.
> For a complete documentation index, see https://docs.smallest.ai/llms.txt.
> For full documentation content, see https://docs.smallest.ai/llms-full.txt.

# LiveKit

> Build real-time voice agents with LiveKit Agents using Smallest AI TTS and STT.

This guide walks you through integrating [Smallest AI](https://smallest.ai) TTS and STT into a [LiveKit Agents](https://docs.livekit.io/agents/) voice pipeline. LiveKit Agents is an open-source Python framework for building production-grade, real-time voice AI agents over WebRTC.

The `livekit-plugins-smallestai` package provides two services:

* **`smallestai.STT`** — real-time speech-to-text using the Pulse API, with streaming over WebSocket (\~64ms TTFT) and batch transcription over HTTP
* **`smallestai.TTS`** — ultra-low-latency text-to-speech using the Lightning API

## Code Example

The full runnable example is in the Smallest AI cookbook:

[LiveKit Voice Agent — Smallest AI TTS + STT](https://github.com/smallest-inc/cookbook/tree/main/voice-agents/livekit-voice-agent)

## Setup

### 1. Create a Virtual Environment

```bash
python3.11 -m venv venv
```

Activate it:

* On Linux/Mac:
  ```bash
  source venv/bin/activate
  ```
* On Windows:
  ```bash
  venv\Scripts\activate
  ```

### 2. Install Dependencies

```bash
pip install livekit-plugins-smallestai livekit-plugins-openai livekit-plugins-silero python-dotenv
```

`livekit-plugins-smallestai` is published on PyPI and includes both the STT and TTS services. `livekit-plugins-silero` provides the VAD used for turn detection, and `livekit-plugins-openai` provides the LLM.

### 3. Create a LiveKit Project

Sign in to [LiveKit Cloud](https://cloud.livekit.io), create a new project, and copy your project credentials.

### 4. Create a `.env` File

```bash
LIVEKIT_API_KEY=...
LIVEKIT_API_SECRET=...
LIVEKIT_URL=...
OPENAI_API_KEY=...
SMALLEST_API_KEY=...
```

***

## Services

### `smallestai.STT`

Real-time transcription using the Smallest AI Pulse API. Connects over WebSocket for streaming and supports batch transcription over HTTP.

```python
from livekit.plugins import smallestai

# Streaming transcription — English
stt = smallestai.STT(language="en")

# Automatic language detection across 39 languages
stt = smallestai.STT(language="multi")

# With speaker diarization
stt = smallestai.STT(language="en", diarize=True)
```

| Parameter         | Type   | Default             | Description                                                                                                                                                                                   |
| ----------------- | ------ | ------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `api_key`         | `str`  | `$SMALLEST_API_KEY` | Your Smallest AI API key                                                                                                                                                                      |
| `model`           | `str`  | `"pulse"`           | STT model — currently only `"pulse"` is available                                                                                                                                             |
| `language`        | `str`  | `"en"`              | BCP-47 language code (e.g. `"en"`, `"hi"`, `"fr"`). Use `"multi"` for automatic detection across 39 languages                                                                                 |
| `sample_rate`     | `int`  | `16000`             | Audio sample rate in Hz. Supported: `8000`, `16000`, `22050`, `24000`, `44100`, `48000`                                                                                                       |
| `encoding`        | `str`  | `"linear16"`        | PCM encoding: `"linear16"`, `"linear32"`, `"alaw"`, `"mulaw"`, `"opus"`, `"ogg_opus"`                                                                                                         |
| `word_timestamps` | `bool` | `True`              | Include per-word `start`/`end` timestamps and confidence scores                                                                                                                               |
| `diarize`         | `bool` | `False`             | Enable speaker diarization — each word includes a speaker ID                                                                                                                                  |
| `eou_timeout_ms`  | `int`  | `0`                 | Milliseconds of silence before the server emits a final transcript. `0` disables server-side end-of-utterance detection (recommended — lets LiveKit's built-in turn detection control timing) |

The STT service connects to `wss://api.smallest.ai/waves/v1/pulse/get_text` for streaming and `https://api.smallest.ai/waves/v1/pulse/get_text` for batch. Interim and final transcripts are both supported. `START_OF_SPEECH` is inferred from the first non-empty transcript.

***

### `smallestai.TTS`

Text-to-speech using the Smallest AI Lightning API. Because the plugin synthesizes audio per request rather than streaming tokens, wrap it in `tts.StreamAdapter` with a `SentenceTokenizer`. The adapter splits LLM output at sentence boundaries and fires synthesis for each chunk, keeping first-audio latency low.

```python
from livekit.agents import tts, tokenize
from livekit.plugins import smallestai

smallest_tts = tts.StreamAdapter(
    tts=smallestai.TTS(
        model="lightning-v3.1",
        voice_id="sophia",
        language="en",
        speed=1.0,
    ),
    sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
)
```

| Parameter       | Type    | Default             | Description                                                                                  |
| --------------- | ------- | ------------------- | -------------------------------------------------------------------------------------------- |
| `api_key`       | `str`   | `$SMALLEST_API_KEY` | Your Smallest AI API key                                                                     |
| `model`         | `str`   | `"lightning-v3.1"`  | TTS model: `"lightning-v3.1"` (recommended, 80+ voices, \~100ms latency) or `"lightning-v2"` |
| `voice_id`      | `str`   | `"sophia"`          | Voice ID for synthesis                                                                       |
| `language`      | `str`   | `"en"`              | Language code (`"en"` or `"hi"`)                                                             |
| `speed`         | `float` | `1.0`               | Playback speed multiplier                                                                    |
| `sample_rate`   | `int`   | `24000`             | Output audio sample rate in Hz                                                               |
| `output_format` | `str`   | `"pcm"`             | Output encoding: `"pcm"`, `"mp3"`, `"wav"`, `"mulaw"`, `"alaw"`                              |
| `consistency`   | `float` | `0.5`               | Voice consistency — `lightning-v2` only                                                      |
| `similarity`    | `float` | `0.0`               | Voice similarity — `lightning-v2` only                                                       |
| `enhancement`   | `float` | `1.0`               | Audio enhancement level — `lightning-v2` only                                                |

`consistency`, `similarity`, and `enhancement` apply only to `"lightning-v2"` and are ignored for `"lightning-v3.1"`.

***

## Complete Agent Example

A minimal but production-ready voice agent using Smallest AI for both STT and TTS:

```python
import logging
from dotenv import load_dotenv
from livekit.agents import (
    Agent,
    AgentSession,
    JobContext,
    JobProcess,
    RoomInputOptions,
    RoomOutputOptions,
    WorkerOptions,
    cli,
    tts,
    tokenize,
)
from livekit.plugins import openai, silero, smallestai

logger = logging.getLogger("voice-agent")
load_dotenv()


class MyAgent(Agent):
    def __init__(self) -> None:
        super().__init__(
            instructions="You are a helpful voice assistant built by Smallest AI.",
        )

    async def on_enter(self):
        self.session.generate_reply()


def prewarm(proc: JobProcess):
    proc.userdata["vad"] = silero.VAD.load()


async def entrypoint(ctx: JobContext):
    session = AgentSession(
        vad=ctx.proc.userdata["vad"],
        stt=smallestai.STT(language="en"),
        llm=openai.LLM(model="gpt-4o-mini"),
        tts=tts.StreamAdapter(
            tts=smallestai.TTS(),
            sentence_tokenizer=tokenize.basic.SentenceTokenizer(),
        ),
    )

    await session.start(
        agent=MyAgent(),
        room=ctx.room,
        room_input_options=RoomInputOptions(),
        room_output_options=RoomOutputOptions(transcription_enabled=True),
    )


if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint, prewarm_fnc=prewarm))
```

***

## Running the Agent

```bash
python3 agent.py dev
```

The `dev` flag starts the agent worker in development mode. To interact with it, open the [LiveKit Agents Playground](https://agents-playground.livekit.io) and enter your `LIVEKIT_URL`, `LIVEKIT_API_KEY`, and `LIVEKIT_API_SECRET`. The agent will greet the user automatically on session start.

The pipeline is fully interruptible — if the user speaks while the bot is talking, audio stops immediately and the bot re-engages without any custom logic.

***

## Notes

* The `StreamAdapter` + `SentenceTokenizer` wrapper is required for TTS — the Smallest AI plugin synthesizes audio per request. Without it, the agent waits for the entire LLM response before starting synthesis.
* Set `eou_timeout_ms=0` (the default) when using LiveKit's built-in turn detection. Setting it to a non-zero value adds server-side silence detection on top of LiveKit's own logic, which increases end-of-turn latency.
* `"lightning-v3.1"` is the recommended TTS model — it delivers \~100ms latency with 80+ voices. Switch to `"lightning-v2"` only if you need the `consistency`/`similarity`/`enhancement` quality parameters.
* For issues or questions, open an issue in the [cookbook repository](https://github.com/smallest-inc/cookbook) or reach out on [Discord](https://discord.gg/9WtSXv26WE).