Lightning v3.1 — per-word timestamps on WebSocket streaming

Lightning v3.1 now exposes per-word timing events to WebSocket clients. Opt in with one flag — useful for captioning UIs, karaoke-style word highlighting, avatar lip-sync, and word-level analytics.

What changed

Two changes to a WebSocket request: add word_timestamps: true and handle the new status: "word_timestamp" frame.

1ws.send(JSON.stringify({
2 text: "I bought 3 cats for $100 on Dec 25th",
3 voice_id: "meher",
4 model: "lightning_v3.1_pro",
5 sample_rate: 44100,
6 output_format: "pcm",
7 word_timestamps: true, // ← ADDED
8}));
9
10ws.onmessage = (event) => {
11 const msg = JSON.parse(event.data);
12 switch (msg.status) {
13 case "chunk":
14 audioPlayer.push(Buffer.from(msg.data.audio, 'base64'));
15 break;
16 case "word_timestamp": // ← NEW CASE
17 const { id, word, start, end } = msg.data;
18 captionTrack.push({ id, word, startSec: start, endSec: end });
19 break;
20 case "complete":
21 audioPlayer.end();
22 break;
23 }
24};

word is the exact substring from the input text — un-normalized. "$100" stays "$100", "25th" stays "25th", "3" stays "3". Non-Latin scripts come back verbatim (e.g., Devanagari for Hindi).

start and end are floats in seconds, relative to the start of the audio stream. Frames interleave with chunk in audio-time order, then a single complete terminates the session.

Where it works

SurfaceWord timestamps
WSS /waves/v1/tts/live (unified)
WSS /waves/v1/lightning-v3.1/get_speech/stream (legacy, retiring 2026-07-14)
POST /waves/v1/tts (sync HTTP)❌ — flag accepted, silently ignored
POST /waves/v1/tts/live (HTTP SSE)❌ — same

Voice + language support

LanguageVoice familyWord events
English (en)Base-queue voices — meher, devansh, kartik, maithili, liam, avery
Hindi (hi)Base-queue voices (same list)
Marathi / Bengali / Gujarati / Punjabi / Odianorth-Indic family
Tamil / Telugu / Kannada / Malayalamsouth-Indic family

For unsupported voice families the flag is accepted — audio works normally, but no word_timestamp frames are emitted. Detect this client-side by counting received word events after complete arrives.

Backward compatibility

word_timestamps defaults to false. Clients that don’t set the flag see no behavior change — same audio chunks, same completion frame, no new event type to handle. Purely opt-in.

Migration: none — pure addition. Existing integrations keep working untouched.

Word-level timestamps on the Lightning v3.1 model card — full wire spec, JS example, support matrix.