Lightning TTS WebSocket — documented ?timeout=N connection-timeout knob
The Lightning TTS WebSocket (Stream Speech (WebSocket)) API ref now documents the ?timeout=N query parameter that has been available on the endpoint all along but wasn’t surfaced in the developer-facing docs.
Purely a docs change — no protocol or wire-level change. The new section in the API ref clarifies:
WSS /waves/v1/tts/live is 60 seconds, not the 20 seconds the legacy “WebSocket Support for TTS” page used to claim.?timeout=N on the connection URL (positive integer seconds, e.g. wss://api.smallest.ai/waves/v1/tts/live?timeout=120).?timeout=5 closes after 5 seconds of silence) and large ones (verified up to ?timeout=999).Voice agents with long human-thinking windows, agentic pipelines that round-trip to an LLM between TTS bursts, and any workflow with extended natural pauses now have a documented way to keep the WebSocket open past 60 seconds without resorting to dummy keep-alive frames.
The standalone /waves/api-reference/api-references/web-socket page (which previously held this info, with a stale 20-second default and a reference to the deprecated /waves/v1/lightning-v3.1/get_speech/stream URL) has been removed. A redirect from the old URL points at the new home.
Lightning v3.1 — auto language removed from docs and spec enums
The language: "auto" value is no longer documented or listed in the Lightning v3.1 or Lightning v3.1 Pro spec enums. Pass an explicit language code that matches the voice instead.
Why: code-switching guidance lives on the voice, not the request. Each voice in the catalog has a tags.language set returned by GET /waves/v1/lightning-v3.1/get_voices; pass a language the voice was trained on to get the pronunciation you expect. The auto value never actually drove language detection at the model level — it was a permissive enum value that resolved to the voice’s default behavior — so removing it from the contract is the honest move.
What changed:
language enum no longer lists auto."auto" to "en" across the four specs (tts-openapi, lightning-v3.1-openapi, tts-ws, lightning-v3.1-ws).English (en), Hindi (hi); the code-switching cell now points to tags.language rather than auto.auto from their language parameter rows.language) stands on its own.Migration: if you were sending language: "auto", replace it with the language code that matches your voice (en, hi, ta, etc. — see tags.language on the voice via GET /waves/v1/lightning-v3.1/get_voices). Sending auto was not driving language detection in the first place; switching to an explicit code makes the output predictable.
Lightning v3.1 — per-word timestamps on WebSocket streaming
Lightning v3.1 now exposes per-word timing events to WebSocket clients. Opt in with one flag — useful for captioning UIs, karaoke-style word highlighting, avatar lip-sync, and word-level analytics.
Two changes to a WebSocket request: add word_timestamps: true and handle the new status: "word_timestamp" frame.
word is the exact substring from the input text — un-normalized. "$100" stays "$100", "25th" stays "25th", "3" stays "3". Non-Latin scripts come back verbatim (e.g., Devanagari for Hindi).
start and end are floats in seconds, relative to the start of the audio stream. Frames interleave with chunk in audio-time order, then a single complete terminates the session.
For unsupported voice families the flag is accepted — audio works normally, but no word_timestamp frames are emitted. Detect this client-side by counting received word events after complete arrives.
word_timestamps defaults to false. Clients that don’t set the flag see no behavior change — same audio chunks, same completion frame, no new event type to handle. Purely opt-in.
Migration: none — pure addition. Existing integrations keep working untouched.
→ Word-level timestamps on the Lightning v3.1 model card — full wire spec, JS example, support matrix.
Lightning v2 and Lightning Large endpoints retired — 410 Gone with v3.1 migration pointer
The underlying inference pools for Lightning v2 and Lightning Large have been retired. Calls to these endpoints now return a fast 410 Gone with a migration pointer to Lightning v3.1.
Response shape on the deprecated endpoints:
Migration: replace any lightning-v2 or lightning-large calls with the equivalent lightning-v3.1 endpoint. Voice cloning with no model parameter now routes to v3.1 automatically — no client change needed for that case.
Not affected (so v3.1 voice cloning keeps working): lightning-v3.1/get_speech, voice-cloning clone-creation with v3.1, and the unified /tts and /tts/live routes.
Lightning v3.1 Pro — 35 new voices added to the catalog
35 new voices have been added to the Lightning v3.1 Pro voice catalog. They’re available immediately to any org with Lightning v3.1 Pro access.
What’s in this batch:
model parameter later.model=lightning_v3.1_pro to see only the Pro catalog.Migration: no action — additive change, existing voices unchanged.
Lightning v3.1 Pro — premium voice catalog across American, British, and Indian accents
Lightning v3.1 now has a Pro tier with 39 curated voices across American, British, and Indian accents (both Male and Female). The Pro pool runs on dedicated inference capacity, delivering the same TTFB as standard Lightning v3.1.
What’s in the catalog:
Languages supported: Indian voices speak English and Hindi (with native code-switching when language="auto"). British and American voices speak English. See per-voice tags.language via GET /waves/v1/lightning-v3.1/get_voices.
How to use it:
POST /waves/v1/tts (sync), POST /waves/v1/tts/live (SSE), or WSS /waves/v1/tts/live (WebSocket) endpoints and pass "model": "lightning_v3.1_pro" in the request body alongside the chosen voice_id. The legacy /waves/v1/lightning-v3.1/* routes also accept the model field for backwards-compatible Pro opt-in.Voice cloning: not available on Lightning v3.1 Pro. Voice clones continue to use Lightning v3.1 (standard) and the existing voice-cloning flow. There is no migration required.
For the full catalog, integration examples, and a Python WebSocket sample, see the Lightning v3.1 Pro model card.
Indic voices now produce clean audio regardless of the language field
Lightning v3.1 used to pick an inference pool from the request’s language field, which meant Indic voices (Aadya, Yuvan, Samarth, Nilesh, Arnab, Niharika, Gargi, and other voices whose latents are trained on north_indic / south_indic encoders) could be served from the wrong pool when called with language=en or language=hi — producing distorted or unintelligible audio.
Routing now derives from the voice itself, not the request language:
odia / bengali / punjabi / gujarati / marathi route to the north_indic inference poolkannada / malayalam / telugu / tamil route to the south_indic poolNo code change needed. If you had previously worked around this by hard-coding language to match the voice family, you can remove that workaround — the platform now picks the correct pool automatically.
Voice clones are unaffected — clones bypass this lookup since they aren’t in the public voice catalog.
Lightning v3.1 — language list corrected to 12 (voice catalog source of truth)
Correction. Earlier in the week we expanded the Lightning v3.1 documented language list to 22 codes plus auto, sourced from the server-side lightningV3_1Schema enum in waves-platform. Live testing showed those 22 codes are accepted by the schema but only 12 of them have voices in the catalog — the other 10 (de, fr, it, pl, nl, ru, sv, pt, ar, he) silently fall back to the voice’s default language when called. We were lying to users.
Source of truth is now the voice catalog, not the schema enum. The actually-supported set is:
Plus auto for automatic language detection and code-switching across the above set. 217 voices total.
What changed:
language enum narrowed to 12 codes + auto.Beta rows for German, French, Italian, Polish, Dutch, Russian, Swedish, Portuguese, Arabic, Hebrew.Pulse STT (separate model) is unaffected — Pulse genuinely supports its full European + Indic + Asian language set via the multi-eu, multi-indic, multi-asian regional aggregators.
Reproducible verification. A live probe (scripts/spec-live-tests/spec_enum_vs_voice_catalog.py) now compares the spec’s language enum against the live GET /lightning-v3.1/get_voices response and fails CI if they drift. This catches the schema-vs-reality gap on every spec PR going forward.
Legacy Lightning STT/TTS API reference orphans removed from docs
Several legacy API reference MDX files that had already been unlinked from the v4 API reference navigation have been removed from the docs.
STT (Speech-to-Text):
Lightning (Pre-Recorded) HTTP reference (POST /waves/v1/lightning/get_text) and its OpenAPI spec (fern/apis/waves/openapi/asr-openapi.yaml). The current STT pre-recorded surface is Pulse (POST /waves/v1/pulse/get_text) — see Pulse pre-recorded reference. The legacy MDX page was a verbatim copy of the Pulse one with the URL substituted, so no functionality is lost.Lightning ASR WebSocket reference and its AsyncAPI spec.)TTS (Text-to-Speech):
lightning-large HTTP TTS, SSE, and WebSocket reference pages (lightning-large.mdx, lightning-large-stream.mdx, lightning-large-ws.mdx). These were already unlinked from v4 nav. The current TTS surface is lightning-v3.1 — the migration prose under Voice Cloning already documents the cutover.Voice cloning impact: none. The current voice-cloning flow (POST /waves/v1/voice-cloning) is unchanged. The deprecated lightning-large endpoints that are still in use (add_voice, get_cloned_voices, DELETE /waves/v1/lightning-large) remain in the API reference under Voice Cloning with their existing (Deprecated) labels.
If your code calls https://api.smallest.ai/waves/v1/lightning/get_text for STT or https://api.smallest.ai/waves/v1/lightning-large/get_speech for TTS, switch to Pulse and Lightning v3.1 respectively.
Lightning v2 is a legacy model. New integrations should use Lightning v3.1. The Lightning v2 endpoints remain available for existing callers but are not recommended for new work, and the docs have been updated to reflect that:
POST /waves/v1/lightning-v2/get_speech, POST /waves/v1/lightning-v2/stream, WSS /waves/v1/lightning-v2/get_speech/stream) now carry a (Deprecated) suffix in their nav titles.lightning-v2.mdx, lightning-v2-stream.mdx, and lightning-v2-ws.mdx (and their versions/v4.0.0 mirrors) now lead with a yellow Deprecated badge.text-to-speech/overview.mdx) — the “Available Models” CardGroup is now Lightning v3.1 only, with a deprecation notice for v2. The “Supported Languages” comparison table is now v3.1-only.getting-started/models.mdx) — Lightning v2 card removed from the TTS section. Model overview table reduced to Lightning v3.1.lightning-v2 as a fallback or list it alongside lightning-v3.1. The LiveKit page also drops the consistency, similarity, and enhancement parameter rows that were v2-only.Unchanged (intentional):
voice-cloning/how-to-vc-api.mdx) still references lightning-v2 in its deprecation-error rows — that’s a factual API behavior callers will see if they pass model=lightning-v2 to the cloning endpoint, and is useful for the migration audience.lightning-v2 container — that’s the on-prem service name, separate from public API guidance.If you’re calling any lightning-v2 endpoint, plan a migration to lightning-v3.1. The voice catalog is different — use GET /waves/v1/{model}/get_voices to enumerate.