Lightning v3.1 — per-word timestamps on WebSocket streaming

Lightning v3.1 now exposes per-word timing events to WebSocket clients. Opt in with one flag — useful for captioning UIs, karaoke-style word highlighting, avatar lip-sync, and word-level analytics.

What changed

Two changes to a WebSocket request: add word_timestamps: true and handle the new status: "word_timestamp" frame.

1 ws.send(JSON.stringify({
2   text: "I bought 3 cats for $100 on Dec 25th",
3   voice_id: "meher",
4   model: "lightning_v3.1_pro",
5   sample_rate: 44100,
6   output_format: "pcm",
7   word_timestamps: true,        // ← ADDED
8 }));
9 
10 ws.onmessage = (event) => {
11   const msg = JSON.parse(event.data);
12   switch (msg.status) {
13     case "chunk":
14       audioPlayer.push(Buffer.from(msg.data.audio, 'base64'));
15       break;
16     case "word_timestamp":       // ← NEW CASE
17       const { id, word, start, end } = msg.data;
18       captionTrack.push({ id, word, startSec: start, endSec: end });
19       break;
20     case "complete":
21       audioPlayer.end();
22       break;
23   }
24 };

word is the exact substring from the input text — un-normalized. "$100" stays "$100", "25th" stays "25th", "3" stays "3". Non-Latin scripts come back verbatim (e.g., Devanagari for Hindi).

start and end are floats in seconds, relative to the start of the audio stream. Frames interleave with chunk in audio-time order, then a single complete terminates the session.

Where it works

Surface	Word timestamps
`WSS /waves/v1/tts/live` (unified)	✅
`WSS /waves/v1/lightning-v3.1/get_speech/stream` (legacy, retiring 2026-07-14)	✅
`POST /waves/v1/tts` (sync HTTP)	❌ — flag accepted, silently ignored
`POST /waves/v1/tts/live` (HTTP SSE)	❌ — same

Voice + language support

Language	Voice family	Word events
English (`en`)	Base-queue voices — `meher`, `devansh`, `kartik`, `maithili`, `liam`, `avery`	✅
Hindi (`hi`)	Base-queue voices (same list)	✅
Marathi / Bengali / Gujarati / Punjabi / Odia	north-Indic family	❌
Tamil / Telugu / Kannada / Malayalam	south-Indic family	❌

For unsupported voice families the flag is accepted — audio works normally, but no word_timestamp frames are emitted. Detect this client-side by counting received word events after complete arrives.

Backward compatibility

word_timestamps defaults to false. Clients that don’t set the flag see no behavior change — same audio chunks, same completion frame, no new event type to handle. Purely opt-in.

Migration: none — pure addition. Existing integrations keep working untouched.

→ Word-level timestamps on the Lightning v3.1 model card — full wire spec, JS example, support matrix.