Stream Speech (WebSocket)
Stream Speech (WebSocket)
Stream Speech (WebSocket)
Real-time text-to-speech over a persistent WebSocket connection. The
model field in the request payload selects which Lightning pool serves
the synthesis.
/waves/v1/tts/live (SSE) when you have the full text up
front but still want chunked playback. (Same URL, different
protocol — HTTP POST gets you SSE; WSS connect gets you WebSocket.)/waves/v1/tts (sync) when total latency doesn’t matter.Pass "model": "lightning_v3.1" (default) or
"model": "lightning_v3.1_pro" on each request. Concurrency and latency
are identical across both. Voice catalogs differ — see the
Lightning v3.1 and
Lightning v3.1 Pro
model cards for the per-model catalog.
Set word_timestamps: true to receive per-word timing events
interleaved with the audio chunks (status: "word_timestamp").
Supported on English + Hindi base-queue voices. See
Word-level timestamps.
The server closes idle WebSocket connections to free resources. The default idle timeout is 60 seconds — if your client does not send a message within that window the server closes the connection with:
Override the value with the timeout query parameter on the URL:
Pass any positive integer (seconds). Smaller values are honored
verbatim (e.g. ?timeout=5 closes after 5 s of silence). Use a larger
value when your application has known pauses between turns — voice
agents with long human-thinking windows, agentic pipelines waiting on
an LLM round-trip, etc.
The timeout is reset on every message you send (binary audio in, JSON control in), so keep-alive traffic restarts the clock.
/waves/v1/lightning-v3.1/get_speech/streamSame protocol, same payload shape — only the URL changes. Existing clients should:
wss://api.smallest.ai/waves/v1/tts/live."model": "lightning_v3.1_pro" to route to the Pro
pool. Omitting model keeps the existing standard-pool behavior.Voice IDs, sample rates, auth, and the response/streaming format are unchanged, so downstream audio handling, jitter buffers, and barge-in logic stay the same.
Header authentication of the form Bearer <token>
Send a JSON message with voice_id, text, and optional parameters (including model) to generate speech audio.