Stream Speech (WebSocket) | Smallest AI Docs

Real-time text-to-speech over a persistent WebSocket connection. The model field in the request payload selects which Lightning pool serves the synthesis.

When to use this

Use this when text arrives incrementally (LLM token streams, live captioning, conversational pipelines where playback should start as soon as the first chunk is ready).
POST to /waves/v1/tts/live (SSE) when you have the full text up front but still want chunked playback. (Same URL, different protocol — HTTP POST gets you SSE; WSS connect gets you WebSocket.)
Use /waves/v1/tts (sync) when total latency doesn’t matter.

Selecting the model

Pass "model": "lightning_v3.1" (default) or "model": "lightning_v3.1_pro" on each request. Concurrency and latency are identical across both. Voice catalogs differ — see the Lightning v3.1 and Lightning v3.1 Pro model cards for the per-model catalog.

Language behaviour

On lightning_v3.1 the full 12-language catalog applies (see voice catalog).

On lightning_v3.1_pro:

Pass language: en → UK + American accented English.
Pass language: hi → Indian accented English + Hindi (code-switching).
Pass the ISO 639-1 code of any other Pro language (e.g. ta, de, ja) with a matching Pro voice — 27 additional languages (9 Indian, 8 Asian & Middle Eastern, 10 European) have dedicated Pro voices. See the Lightning v3.1 Pro model card for the full list.
Omit language → defaults to en + hi (mixed Indian + Western English coverage).

Optional features

Set word_timestamps: true to receive per-word timing events interleaved with the audio chunks (status: "word_timestamp"). Supported on English + Hindi base-queue voices. See Word-level timestamps.

Connection timeout

The server closes idle WebSocket connections to free resources. The default idle timeout is 60 seconds — if your client does not send a message within that window the server closes the connection with:

1 {"status": "error", "message": "Connection timed out after 60 seconds of inactivity"}

Override the value with the timeout query parameter on the URL:

wss://api.smallest.ai/waves/v1/tts/live?timeout=120

Pass any positive integer (seconds). Smaller values are honored verbatim (e.g. ?timeout=5 closes after 5 s of silence). Use a larger value when your application has known pauses between turns — voice agents with long human-thinking windows, agentic pipelines waiting on an LLM round-trip, etc.

The timeout is reset on every message you send (binary audio in, JSON control in), so keep-alive traffic restarts the clock.

Migrating from `/waves/v1/lightning-v3.1/get_speech/stream`

Same protocol, same payload shape — only the URL changes. Existing clients should:

Update the WebSocket URL to wss://api.smallest.ai/waves/v1/tts/live.
Optionally add "model": "lightning_v3.1_pro" to route to the Pro pool. Omitting model keeps the existing standard-pool behavior.

Voice IDs, sample rates, auth, and the response/streaming format are unchanged, so downstream audio handling, jitter buffers, and barge-in logic stay the same.

# Live TTS WebSocket — `/waves/v1/tts/live` Real-time text-to-speech over a persistent WebSocket connection. The `model` field in the request payload selects which Lightning pool serves the synthesis. ## When to use this - **Use this** when text arrives incrementally (LLM token streams, live captioning, conversational pipelines where playback should start as soon as the first chunk is ready). - POST to `/waves/v1/tts/live` (SSE) when you have the full text up front but still want chunked playback. (Same URL, different protocol — HTTP POST gets you SSE; WSS connect gets you WebSocket.) - Use `/waves/v1/tts` (sync) when total latency doesn't matter. ## Selecting the model Pass `"model": "lightning_v3.1"` (default) or `"model": "lightning_v3.1_pro"` on each request. Concurrency and latency are identical across both. Voice catalogs differ — see the [Lightning v3.1](/waves/model-cards/text-to-speech/lightning-v-3-1) and [Lightning v3.1 Pro](/waves/model-cards/text-to-speech/lightning-v-3-1-pro) model cards for the per-model catalog. ## Language behaviour On `lightning_v3.1` the full 12-language catalog applies (see voice catalog). On `lightning_v3.1_pro`: - Pass `language: en` → UK + American accented English. - Pass `language: hi` → Indian accented English + Hindi (code-switching). - Pass the ISO 639-1 code of any other Pro language (e.g. `ta`, `de`, `ja`) with a matching Pro voice — 27 additional languages (9 Indian, 8 Asian & Middle Eastern, 10 European) have dedicated Pro voices. See the [Lightning v3.1 Pro model card](/waves/model-cards/text-to-speech/lightning-v-3-1-pro#supported-languages) for the full list. - Omit `language` → defaults to `en + hi` (mixed Indian + Western English coverage). ## Optional features Set `word_timestamps: true` to receive per-word timing events interleaved with the audio chunks (`status: "word_timestamp"`). Supported on English + Hindi base-queue voices. See [Word-level timestamps](/waves/documentation/text-to-speech-lightning/word-timestamps). ## Connection timeout The server closes idle WebSocket connections to free resources. The default idle timeout is **60 seconds** — if your client does not send a message within that window the server closes the connection with: ```json {"status": "error", "message": "Connection timed out after 60 seconds of inactivity"} ``` Override the value with the `timeout` query parameter on the URL: ``` wss://api.smallest.ai/waves/v1/tts/live?timeout=120 ``` Pass any positive integer (seconds). Smaller values are honored verbatim (e.g. `?timeout=5` closes after 5 s of silence). Use a larger value when your application has known pauses between turns — voice agents with long human-thinking windows, agentic pipelines waiting on an LLM round-trip, etc. The timeout is reset on every message you send (binary audio in, JSON control in), so keep-alive traffic restarts the clock. ## Migrating from `/waves/v1/lightning-v3.1/get_speech/stream` Same protocol, same payload shape — only the URL changes. Existing clients should: 1. Update the WebSocket URL to `wss://api.smallest.ai/waves/v1/tts/live`. 2. Optionally add `"model": "lightning_v3.1_pro"` to route to the Pro pool. Omitting `model` keeps the existing standard-pool behavior. Voice IDs, sample rates, auth, and the response/streaming format are unchanged, so downstream audio handling, jitter buffers, and barge-in logic stay the same.

Handshake

WSS

wss://api.smallest.ai/waves/v1/tts/live

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Send

TtsRequestobjectRequired

Send a JSON message with voice_id, text, and optional parameters (including model) to generate speech audio.

Receive

TtsResponseobjectRequired

Receive audio data chunks and completion status from the server.

Real-time text-to-speech over a persistent WebSocket connection. The model field in the request payload selects which Lightning pool serves the synthesis.

When to use this

Use this when text arrives incrementally (LLM token streams, live captioning, conversational pipelines where playback should start as soon as the first chunk is ready).
POST to /waves/v1/tts/live (SSE) when you have the full text up front but still want chunked playback. (Same URL, different protocol — HTTP POST gets you SSE; WSS connect gets you WebSocket.)
Use /waves/v1/tts (sync) when total latency doesn’t matter.

Selecting the model

Language behaviour

On lightning_v3.1 the full 12-language catalog applies (see voice catalog).

On lightning_v3.1_pro:

Pass language: en → UK + American accented English.
Pass language: hi → Indian accented English + Hindi (code-switching).
Pass the ISO 639-1 code of any other Pro language (e.g. ta, de, ja) with a matching Pro voice — 27 additional languages (9 Indian, 8 Asian & Middle Eastern, 10 European) have dedicated Pro voices. See the Lightning v3.1 Pro model card for the full list.
Omit language → defaults to en + hi (mixed Indian + Western English coverage).

Optional features

Connection timeout

1 {"status": "error", "message": "Connection timed out after 60 seconds of inactivity"}

Override the value with the timeout query parameter on the URL:

wss://api.smallest.ai/waves/v1/tts/live?timeout=120

The timeout is reset on every message you send (binary audio in, JSON control in), so keep-alive traffic restarts the clock.

Migrating from `/waves/v1/lightning-v3.1/get_speech/stream`

Same protocol, same payload shape — only the URL changes. Existing clients should:

Update the WebSocket URL to wss://api.smallest.ai/waves/v1/tts/live.
Optionally add "model": "lightning_v3.1_pro" to route to the Pro pool. Omitting model keeps the existing standard-pool behavior.

Voice IDs, sample rates, auth, and the response/streaming format are unchanged, so downstream audio handling, jitter buffers, and barge-in logic stay the same.

URL	wss://api.smallest.ai/waves/v1/tts/live
Method	GET
Status	101 Switching Protocols

When to use this

Selecting the model

Language behaviour

Optional features

Connection timeout

Migrating from /waves/v1/lightning-v3.1/get_speech/stream

HandshakeTry it

Authentication

Send

Receive

When to use this

Selecting the model

Language behaviour

Optional features

Connection timeout

Migrating from /waves/v1/lightning-v3.1/get_speech/stream

HandshakeTry it

Authentication

Send

Receive

Migrating from `/waves/v1/lightning-v3.1/get_speech/stream`

Handshake

Migrating from `/waves/v1/lightning-v3.1/get_speech/stream`

Handshake