<Warning>**Endpoint scheduled for retirement.** This URL will stop accepting requests **60 days from the Lightning v3.1 Pro launch (2026-05-15)** — i.e. on **2026-07-14**. The Lightning v3.1 model itself is current and stays. Migrate to [`POST /waves/v1/tts/live`](/waves/api-reference/api-reference/text-to-speech/synthesize-speech-sse) and select Lightning v3.1 via the `model` body field (default).</Warning>
Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync `/get_speech` endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.
## When to use this
- **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
- **Use sync `/get_speech`** when total latency doesn't matter and you'd rather get one buffer.
- **Use the WebSocket endpoint** when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front.
## How it works
1. POST your text + voice settings — same payload as `/get_speech`.
2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`.
3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.).
4. A final `data: {"done": true}\n\n` frame marks end of stream.
## Examples
**cURL**
```bash
curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
"voice_id": "magnus",
"sample_rate": 24000,
"output_format": "pcm"
}'
```
**Python** (`pip install smallestai>=4.4.0`)
```python
import base64
from smallestai import SmallestAI
client = SmallestAI(api_key="YOUR_API_KEY")
with open("stream.pcm", "wb") as f:
for chunk in client.waves.synthesize_sse_lightning_v3_1(
text="Streaming this paragraph chunk by chunk so playback can start sooner.",
voice_id="magnus",
sample_rate=24000,
output_format="pcm",
):
# Each chunk is `{"audio": "<base64-encoded PCM>"}`.
# Decode and pipe to your audio pipeline.
if chunk.get("audio"):
f.write(base64.b64decode(chunk["audio"]))
```
**JavaScript / TypeScript** (using `fetch` + a reader)
```typescript
const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
voice_id: "magnus",
sample_rate: 24000,
output_format: "pcm",
}),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
let finished = false;
while (!finished) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value);
const events = buf.split("\n\n");
buf = events.pop() ?? "";
for (const ev of events) {
// SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
// We only care about the data line — pull it out and parse.
const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
if (!dataLine) continue;
const payload = JSON.parse(dataLine.slice(5).trim());
if (payload.done) { finished = true; break; }
if (payload.audio) {
const pcm = Buffer.from(payload.audio, "base64");
// … hand pcm to your audio pipeline
}
}
}
```
## Common gotchas
- **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win.
- **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event.
- **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes.
- **First-chunk latency** depends on model warm-up + network distance. Use `output_format=pcm` and a streaming-friendly client to minimize what you can control.
- **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` as shown above.
Request
This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
modelenumOptionalDefaults to lightning_v3.1
TTS model to route the request to.
lightning_v3.1 (default) — standard Lightning v3.1 pool.
lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated
voice catalog. See the
Pro model card.
New integrations should use the unified
/waves/v1/tts route instead of this endpoint, but the model
field is supported here for backwards-compatible Pro opt-in.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en
Language code for synthesis. Influences pronunciation, number/date
normalization, and phoneme selection.
- Indian:
en, hi, mr (Marathi), kn (Kannada), ta (Tamil),
bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam),
pa (Punjabi), or (Odia)
- European:
es (Spanish)
output_formatenumOptionalDefaults to pcm
Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Response
Synthesized speech retrieved successfully.