Lightning v3.1 WebSocket
# Lightning v3.1 WebSocket
Synthesize speech over a persistent WebSocket. Audio chunks come back as text arrives — the fit-for-purpose path when *text itself* is streaming, like LLM token output.
## When to use this
- **Use this** when text is generated incrementally and you want audio to start playing as soon as the first words are produced — LLM streaming, live captioning, voice agents.
- **Use SSE streaming** when you have the full text up front but want low-latency playback.
- **Use sync `/get_speech`** when latency isn't critical and a single buffer is easier to handle.
## How it works
1. Open a WebSocket to `wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream` with `Authorization: Bearer <key>`.
2. Send a JSON message per text chunk: `{ "voice_id": "magnus", "text": "...", "continue": true, "flush": false, ... }`. Set `continue: true` to keep the session open between sends.
3. The server pushes back JSON messages with `data.audio` (base64 PCM) and a `status` field (`chunk`, `complete`).
4. When you're done sending text, push one final message with `flush: true` and an empty `text` — the server finishes the buffer and sends `status: "complete"`.
## Concurrency and rate limits
- 1 concurrency unit = 1 active TTS request at a time.
- You can open up to 5 WebSocket connections per concurrency unit.
- Requests beyond your concurrency limit are rejected with an error — queue on your side.
Examples: 3 concurrency = up to 15 open sockets, but only 3 active synth requests at once.
## Examples
**Python** (using `smallestai>=4.4.0` — the 4.3.1 compatibility shim):
```python
from smallestai.waves import WavesStreamingTTS, TTSConfig
config = TTSConfig(voice_id="magnus", api_key="YOUR_API_KEY", sample_rate=24000)
tts = WavesStreamingTTS(config)
def text_chunks():
# Pretend this is your LLM streaming tokens.
for word in ["Hello,", " I am", " streaming", " speech."]:
yield word
with open("speech.pcm", "wb") as out:
for audio_chunk in tts.synthesize_streaming(text_chunks(), continue_stream=True, auto_flush=True):
out.write(audio_chunk)
```
For new code, you can also use the namespaced Fern client: `client.waves.lightning_v31tts.connect(...)` which returns a typed socket you drive yourself.
**JavaScript / TypeScript** (using `ws`)
```typescript
import WebSocket from "ws";
const ws = new WebSocket("wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream", {
headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}` },
});
ws.on("open", () => {
for (const text of ["Hello,", " I am", " streaming", " speech."]) {
ws.send(JSON.stringify({
voice_id: "magnus",
text,
sample_rate: 24000,
continue: true,
flush: false,
}));
}
// Final flush
ws.send(JSON.stringify({ voice_id: "magnus", text: "", flush: true }));
});
ws.on("message", (raw) => {
const msg = JSON.parse(raw.toString());
if (msg.status === "complete") return ws.close();
const pcm = Buffer.from(msg.data.audio, "base64");
// … hand pcm to your audio pipeline
});
```
## Common gotchas
- **`continue: true` keeps the session alive.** Without it, the server closes after the first chunk. Send `flush: true` with empty `text` when you're done.
- **44.1 kHz is supported but most pipelines want 24 kHz.** Match your downstream sample rate to avoid resampling.
- **Backpressure**: if you push text faster than the server can synthesize, audio chunks queue server-side. Watch for `complete_backoff_ms` in responses.
- **One concurrency unit = one in-flight synth.** Holding 5 sockets open doesn't get you 5× throughput unless you upgrade concurrency.
- **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so connect with the `ws` library directly as shown above.
Handshake
WSS
wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream
Authentication
AuthorizationBearer
Header authentication of the form Bearer <token>
Headers
Authorization
Bearer token for authentication. Format: Bearer YOUR_API_KEY
Send
LightningV31TtsRequest
Send a JSON message with voice_id, text, and optional parameters to generate speech audio.
Receive
LightningV31TtsResponse
Receive audio data chunks and completion status from the server.

