Lightning v3.1 WebSocket

View as Markdown
# Lightning v3.1 WebSocket Synthesize speech over a persistent WebSocket. Audio chunks come back as text arrives — the fit-for-purpose path when *text itself* is streaming, like LLM token output. ## When to use this - **Use this** when text is generated incrementally and you want audio to start playing as soon as the first words are produced — LLM streaming, live captioning, voice agents. - **Use SSE streaming** when you have the full text up front but want low-latency playback. - **Use sync `/get_speech`** when latency isn't critical and a single buffer is easier to handle. ## How it works 1. Open a WebSocket to `wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream` with `Authorization: Bearer <key>`. 2. Send a JSON message per text chunk: `{ "voice_id": "magnus", "text": "...", "continue": true, "flush": false, ... }`. Set `continue: true` to keep the session open between sends. 3. The server pushes back JSON messages with `data.audio` (base64 PCM) and a `status` field (`chunk`, `complete`). 4. When you're done sending text, push one final message with `flush: true` and an empty `text` — the server finishes the buffer and sends `status: "complete"`. ## Concurrency and rate limits - 1 concurrency unit = 1 active TTS request at a time. - You can open up to 5 WebSocket connections per concurrency unit. - Requests beyond your concurrency limit are rejected with an error — queue on your side. Examples: 3 concurrency = up to 15 open sockets, but only 3 active synth requests at once. ## Examples **Python** (using `smallestai>=4.4.0` — the 4.3.1 compatibility shim): ```python from smallestai.waves import WavesStreamingTTS, TTSConfig config = TTSConfig(voice_id="magnus", api_key="YOUR_API_KEY", sample_rate=24000) tts = WavesStreamingTTS(config) def text_chunks(): # Pretend this is your LLM streaming tokens. for word in ["Hello,", " I am", " streaming", " speech."]: yield word with open("speech.pcm", "wb") as out: for audio_chunk in tts.synthesize_streaming(text_chunks(), continue_stream=True, auto_flush=True): out.write(audio_chunk) ``` For new code, you can also use the namespaced Fern client: `client.waves.lightning_v31tts.connect(...)` which returns a typed socket you drive yourself. **JavaScript / TypeScript** (using `ws`) ```typescript import WebSocket from "ws"; const ws = new WebSocket("wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream", { headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}` }, }); ws.on("open", () => { for (const text of ["Hello,", " I am", " streaming", " speech."]) { ws.send(JSON.stringify({ voice_id: "magnus", text, sample_rate: 24000, continue: true, flush: false, })); } // Final flush ws.send(JSON.stringify({ voice_id: "magnus", text: "", flush: true })); }); ws.on("message", (raw) => { const msg = JSON.parse(raw.toString()); if (msg.status === "complete") return ws.close(); const pcm = Buffer.from(msg.data.audio, "base64"); // … hand pcm to your audio pipeline }); ``` ## Common gotchas - **`continue: true` keeps the session alive.** Without it, the server closes after the first chunk. Send `flush: true` with empty `text` when you're done. - **44.1 kHz is supported but most pipelines want 24 kHz.** Match your downstream sample rate to avoid resampling. - **Backpressure**: if you push text faster than the server can synthesize, audio chunks queue server-side. Watch for `complete_backoff_ms` in responses. - **One concurrency unit = one in-flight synth.** Holding 5 sockets open doesn't get you 5× throughput unless you upgrade concurrency. - **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so connect with the `ws` library directly as shown above.

Handshake

WSS
wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AuthorizationstringRequired

Bearer token for authentication. Format: Bearer YOUR_API_KEY

Send

LightningV31TtsRequestobjectRequired

Send a JSON message with voice_id, text, and optional parameters to generate speech audio.

Receive

LightningV31TtsResponseobjectRequired
Receive audio data chunks and completion status from the server.