Stream Speech (WebSocket)

View as Markdown
# Live TTS WebSocket — `/waves/v1/tts/live` Real-time text-to-speech over a persistent WebSocket connection. The `model` field in the request payload selects which Lightning pool serves the synthesis. ## When to use this - **Use this** when text arrives incrementally (LLM token streams, live captioning, conversational pipelines where playback should start as soon as the first chunk is ready). - POST to `/waves/v1/tts/live` (SSE) when you have the full text up front but still want chunked playback. (Same URL, different protocol — HTTP POST gets you SSE; WSS connect gets you WebSocket.) - Use `/waves/v1/tts` (sync) when total latency doesn't matter. ## Selecting the model Pass `"model": "lightning_v3.1"` (default) or `"model": "lightning_v3.1_pro"` on each request. Concurrency and latency are identical across both. Voice catalogs differ — see the [Lightning v3.1](/waves/model-cards/text-to-speech/lightning-v-3-1) and [Lightning v3.1 Pro](/waves/model-cards/text-to-speech/lightning-v-3-1-pro) model cards for the per-model catalog. ## Optional features Set `word_timestamps: true` to receive per-word timing events interleaved with the audio chunks (`status: "word_timestamp"`). Supported on English + Hindi base-queue voices. See [Word-level timestamps](/waves/documentation/text-to-speech-lightning/word-timestamps). ## Connection timeout The server closes idle WebSocket connections to free resources. The default idle timeout is **60 seconds** — if your client does not send a message within that window the server closes the connection with: ```json {"status": "error", "message": "Connection timed out after 60 seconds of inactivity"} ``` Override the value with the `timeout` query parameter on the URL: ``` wss://api.smallest.ai/waves/v1/tts/live?timeout=120 ``` Pass any positive integer (seconds). Smaller values are honored verbatim (e.g. `?timeout=5` closes after 5 s of silence). Use a larger value when your application has known pauses between turns — voice agents with long human-thinking windows, agentic pipelines waiting on an LLM round-trip, etc. The timeout is reset on every message you send (binary audio in, JSON control in), so keep-alive traffic restarts the clock. ## Migrating from `/waves/v1/lightning-v3.1/get_speech/stream` Same protocol, same payload shape — only the URL changes. Existing clients should: 1. Update the WebSocket URL to `wss://api.smallest.ai/waves/v1/tts/live`. 2. Optionally add `"model": "lightning_v3.1_pro"` to route to the Pro pool. Omitting `model` keeps the existing standard-pool behavior. Voice IDs, sample rates, auth, and the response/streaming format are unchanged, so downstream audio handling, jitter buffers, and barge-in logic stay the same.