For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync `/get_speech` endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.
## When to use this
- **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
- **Use sync `/get_speech`** when total latency doesn't matter and you'd rather get one buffer.
- **Use the WebSocket endpoint** when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front.
## How it works
1. POST your text + voice settings — same payload as `/get_speech`.
2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`.
3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.).
4. A final `data: {"done": true}\n\n` frame marks end of stream.
## Examples
**cURL**
```bash
curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
"voice_id": "magnus",
"sample_rate": 24000,
"output_format": "pcm"
}'
```
**Python** (`pip install smallestai>=4.4.0`)
```python
import base64
from smallestai import SmallestAI
client = SmallestAI(token="YOUR_API_KEY")
with open("stream.pcm", "wb") as f:
for chunk in client.waves.synthesize_sse_lightning_v31(
text="Streaming this paragraph chunk by chunk so playback can start sooner.",
voice_id="magnus",
sample_rate=24000,
output_format="pcm",
):
# Each chunk is `{"audio": "<base64-encoded PCM>"}`.
# Decode and pipe to your audio pipeline.
if chunk.get("audio"):
f.write(base64.b64decode(chunk["audio"]))
```
**JavaScript / TypeScript** (using `fetch` + a reader)
```typescript
const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
voice_id: "magnus",
sample_rate: 24000,
output_format: "pcm",
}),
});
const reader = res.body!.getReader();
const decoder = new TextDecoder();
let buf = "";
let finished = false;
while (!finished) {
const { value, done } = await reader.read();
if (done) break;
buf += decoder.decode(value);
const events = buf.split("\n\n");
buf = events.pop() ?? "";
for (const ev of events) {
// SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
// We only care about the data line — pull it out and parse.
const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
if (!dataLine) continue;
const payload = JSON.parse(dataLine.slice(5).trim());
if (payload.done) { finished = true; break; }
if (payload.audio) {
const pcm = Buffer.from(payload.audio, "base64");
// … hand pcm to your audio pipeline
}
}
}
```
## Common gotchas
- **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win.
- **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event.
- **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes.
- **First-chunk latency** depends on model warm-up + network distance. Use `output_format=pcm` and a streaming-friendly client to minimize what you can control.
- **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` as shown above.
Authentication
AuthorizationBearer
Header authentication of the form Bearer <token>
Request
This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
Allowed values:
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en
Language code for synthesis. Influences pronunciation, number/date
normalization, and phoneme selection.
Indian:en, hi, mr (Marathi), kn (Kannada), ta (Tamil),
bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam),
pa (Punjabi), or (Odia)
European:es (Spanish)
auto — auto-detect from input text (recommended for code-switching)
output_formatenumOptionalDefaults to pcm
Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
Allowed values:
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Response
Synthesized speech retrieved successfully.
Errors
400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error
Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync /get_speech endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.
When to use this
Use this when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
Use sync /get_speech when total latency doesn’t matter and you’d rather get one buffer.
Use the WebSocket endpoint when the text arrives incrementally (LLM token stream). SSE assumes you have the full text up front.
How it works
POST your text + voice settings — same payload as /get_speech.
The response is Content-Type: text/event-stream. Each chunk frame is event: audio\n followed by data: {"audio": "<base64-pcm>"}\n\n.
Decode each chunk’s audio field with base64 and feed the PCM bytes to your audio pipeline (browser MediaSource, ffmpeg pipe, raw PCM player, etc.).
A final data: {"done": true}\n\n frame marks end of stream.
Examples
cURL
$
curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
>
-H "Authorization: Bearer $SMALLEST_API_KEY" \
>
-H "Content-Type: application/json" \
>
-d '{
>
"text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
>
"voice_id": "magnus",
>
"sample_rate": 24000,
>
"output_format": "pcm"
>
}'
Python (pip install smallestai>=4.4.0)
1
import base64
2
from smallestai import SmallestAI
3
4
client = SmallestAI(token="YOUR_API_KEY")
5
6
with open("stream.pcm", "wb") as f:
7
for chunk in client.waves.synthesize_sse_lightning_v31(
8
text="Streaming this paragraph chunk by chunk so playback can start sooner.",
9
voice_id="magnus",
10
sample_rate=24000,
11
output_format="pcm",
12
):
13
# Each chunk is `{"audio": "<base64-encoded PCM>"}`.
14
# Decode and pipe to your audio pipeline.
15
if chunk.get("audio"):
16
f.write(base64.b64decode(chunk["audio"]))
JavaScript / TypeScript (using fetch + a reader)
1
const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
Use a streaming-friendly client.curl -N, Python iter_lines, or a fetchReadableStream reader. Buffering clients will hide the latency win.
Audio is base64 inside the event payload, not the raw event bytes. Decode the data.audio field per event.
output_format=pcm gives the lowest overhead for streaming playback. wav/mp3 work but add per-chunk framing bytes.
First-chunk latency depends on model warm-up + network distance. Use output_format=pcm and a streaming-friendly client to minimize what you can control.
JavaScript / TypeScript: the official smallestai npm package predates Lightning v3.1, so call this endpoint with fetch as shown above.