Lightning v3.1 SSE (endpoint will be deprecated)

POST

https://api.smallest.ai/waves/v1/lightning-v3.1/stream

<Warning>**Endpoint scheduled for retirement.** This URL will stop accepting requests **60 days from the Lightning v3.1 Pro launch (2026-05-15)** — i.e. on **2026-07-14**. The Lightning v3.1 model itself is current and stays. Migrate to [`POST /waves/v1/tts/live`](/waves/api-reference/api-reference/text-to-speech/synthesize-speech-sse) and select Lightning v3.1 via the `model` body field (default).</Warning> Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync `/get_speech` endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob. ## When to use this - **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration. - **Use sync `/get_speech`** when total latency doesn't matter and you'd rather get one buffer. - **Use the WebSocket endpoint** when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front. ## How it works 1. POST your text + voice settings — same payload as `/get_speech`. 2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`. 3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.). 4. A final `data: {"done": true}\n\n` frame marks end of stream. ## Examples **cURL** ```bash curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Streaming this paragraph chunk by chunk so playback can start sooner.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "pcm" }' ``` **Python** (`pip install smallestai>=4.4.0`) ```python import base64 from smallestai import SmallestAI client = SmallestAI(token="YOUR_API_KEY") with open("stream.pcm", "wb") as f: for chunk in client.waves.synthesize_sse_lightning_v3_1( text="Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id="magnus", sample_rate=24000, output_format="pcm", ): # Each chunk is `{"audio": "<base64-encoded PCM>"}`. # Decode and pipe to your audio pipeline. if chunk.get("audio"): f.write(base64.b64decode(chunk["audio"])) ``` **JavaScript / TypeScript** (using `fetch` + a reader) ```typescript const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id: "magnus", sample_rate: 24000, output_format: "pcm", }), }); const reader = res.body!.getReader(); const decoder = new TextDecoder(); let buf = ""; let finished = false; while (!finished) { const { value, done } = await reader.read(); if (done) break; buf += decoder.decode(value); const events = buf.split("\n\n"); buf = events.pop() ?? ""; for (const ev of events) { // SSE frames are "event: audio\ndata: {json}" or just "data: {json}". // We only care about the data line — pull it out and parse. const dataLine = ev.split("\n").find((l) => l.startsWith("data:")); if (!dataLine) continue; const payload = JSON.parse(dataLine.slice(5).trim()); if (payload.done) { finished = true; break; } if (payload.audio) { const pcm = Buffer.from(payload.audio, "base64"); // … hand pcm to your audio pipeline } } } ``` ## Common gotchas - **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win. - **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event. - **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes. - **First-chunk latency** depends on model warm-up + network distance. Use `output_format=pcm` and a streaming-friendly client to minimize what you can control. - **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` as shown above.

Lightning v3.1 SSE (endpoint will be deprecated)

POST

https://api.smallest.ai/waves/v1/lightning-v3.1/stream

Endpoint scheduled for retirement. This URL will stop accepting requests 60 days from the Lightning v3.1 Pro launch (2026-05-15) — i.e. on 2026-07-14. The Lightning v3.1 model itself is current and stays. Migrate to POST /waves/v1/tts/live and select Lightning v3.1 via the model body field (default).

Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync /get_speech endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.

When to use this

Use this when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
Use sync /get_speech when total latency doesn’t matter and you’d rather get one buffer.
Use the WebSocket endpoint when the text arrives incrementally (LLM token stream). SSE assumes you have the full text up front.

How it works

POST your text + voice settings — same payload as /get_speech.
The response is Content-Type: text/event-stream. Each chunk frame is event: audio\n followed by data: {"audio": "<base64-pcm>"}\n\n.
Decode each chunk’s audio field with base64 and feed the PCM bytes to your audio pipeline (browser MediaSource, ffmpeg pipe, raw PCM player, etc.).
A final data: {"done": true}\n\n frame marks end of stream.

Examples

cURL

$ curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
>     "voice_id": "magnus",
>     "sample_rate": 24000,
>     "output_format": "pcm"
>   }'

Python (pip install smallestai>=4.4.0)

1 import base64
2 from smallestai import SmallestAI
3 
4 client = SmallestAI(token="YOUR_API_KEY")
5 
6 with open("stream.pcm", "wb") as f:
7     for chunk in client.waves.synthesize_sse_lightning_v3_1(
8         text="Streaming this paragraph chunk by chunk so playback can start sooner.",
9         voice_id="magnus",
10         sample_rate=24000,
11         output_format="pcm",
12     ):
13         # Each chunk is `{"audio": "<base64-encoded PCM>"}`.
14         # Decode and pipe to your audio pipeline.
15         if chunk.get("audio"):
16             f.write(base64.b64decode(chunk["audio"]))

JavaScript / TypeScript (using fetch + a reader)

1 const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
2   method: "POST",
3   headers: {
4     Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
5     "Content-Type": "application/json",
6   },
7   body: JSON.stringify({
8     text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
9     voice_id: "magnus",
10     sample_rate: 24000,
11     output_format: "pcm",
12   }),
13 });
14 
15 const reader = res.body!.getReader();
16 const decoder = new TextDecoder();
17 let buf = "";
18 let finished = false;
19 while (!finished) {
20   const { value, done } = await reader.read();
21   if (done) break;
22   buf += decoder.decode(value);
23   const events = buf.split("\n\n");
24   buf = events.pop() ?? "";
25   for (const ev of events) {
26     // SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
27     // We only care about the data line — pull it out and parse.
28     const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
29     if (!dataLine) continue;
30     const payload = JSON.parse(dataLine.slice(5).trim());
31     if (payload.done) { finished = true; break; }
32     if (payload.audio) {
33       const pcm = Buffer.from(payload.audio, "base64");
34       // … hand pcm to your audio pipeline
35     }
36   }
37 }

Common gotchas

Use a streaming-friendly client. curl -N, Python iter_lines, or a fetch ReadableStream reader. Buffering clients will hide the latency win.
Audio is base64 inside the event payload, not the raw event bytes. Decode the data.audio field per event.
output_format=pcm gives the lowest overhead for streaming playback. wav/mp3 work but add per-chunk framing bytes.
First-chunk latency depends on model warm-up + network distance. Use output_format=pcm and a streaming-friendly client to minimize what you can control.
JavaScript / TypeScript: the official smallestai npm package predates Lightning v3.1, so call this endpoint with fetch as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Request

This endpoint expects an object.

textstringRequiredDefaults to Hey i am your a text to speech model

The text to convert to speech.

voice_idstringRequiredDefaults to daniel

The voice identifier to use for speech generation.

modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to.

lightning_v3.1 (default) — standard Lightning v3.1 pool.
lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated voice catalog. See the Pro model card.

New integrations should use the unified /waves/v1/tts route instead of this endpoint, but the model field is supported here for backwards-compatible Pro opt-in.

Allowed values:

sample_rateenumOptionalDefaults to 44100

The sample rate for the generated audio.

Allowed values:

speeddoubleOptional0.5-2Defaults to 1

The speed of the generated speech.

languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

Indian: en, hi, mr (Marathi), kn (Kannada), ta (Tamil), bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam), pa (Punjabi), or (Odia)
European: es (Spanish)
auto — auto-detect from input text (recommended for code-switching)

output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response

Synthesized speech retrieved successfully.

Errors

400

Bad Request Error

401

Unauthorized Error

500

Internal Server Error

When to use this

Use this when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
Use sync /get_speech when total latency doesn’t matter and you’d rather get one buffer.
Use the WebSocket endpoint when the text arrives incrementally (LLM token stream). SSE assumes you have the full text up front.

How it works

POST your text + voice settings — same payload as /get_speech.
The response is Content-Type: text/event-stream. Each chunk frame is event: audio\n followed by data: {"audio": "<base64-pcm>"}\n\n.
Decode each chunk’s audio field with base64 and feed the PCM bytes to your audio pipeline (browser MediaSource, ffmpeg pipe, raw PCM player, etc.).
A final data: {"done": true}\n\n frame marks end of stream.

Examples

cURL

$ curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
>     "voice_id": "magnus",
>     "sample_rate": 24000,
>     "output_format": "pcm"
>   }'

Python (pip install smallestai>=4.4.0)

1 import base64
2 from smallestai import SmallestAI
3 
4 client = SmallestAI(token="YOUR_API_KEY")
5 
6 with open("stream.pcm", "wb") as f:
7     for chunk in client.waves.synthesize_sse_lightning_v3_1(
8         text="Streaming this paragraph chunk by chunk so playback can start sooner.",
9         voice_id="magnus",
10         sample_rate=24000,
11         output_format="pcm",
12     ):
13         # Each chunk is `{"audio": "<base64-encoded PCM>"}`.
14         # Decode and pipe to your audio pipeline.
15         if chunk.get("audio"):
16             f.write(base64.b64decode(chunk["audio"]))

JavaScript / TypeScript (using fetch + a reader)

1 const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
2   method: "POST",
3   headers: {
4     Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
5     "Content-Type": "application/json",
6   },
7   body: JSON.stringify({
8     text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
9     voice_id: "magnus",
10     sample_rate: 24000,
11     output_format: "pcm",
12   }),
13 });
14 
15 const reader = res.body!.getReader();
16 const decoder = new TextDecoder();
17 let buf = "";
18 let finished = false;
19 while (!finished) {
20   const { value, done } = await reader.read();
21   if (done) break;
22   buf += decoder.decode(value);
23   const events = buf.split("\n\n");
24   buf = events.pop() ?? "";
25   for (const ev of events) {
26     // SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
27     // We only care about the data line — pull it out and parse.
28     const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
29     if (!dataLine) continue;
30     const payload = JSON.parse(dataLine.slice(5).trim());
31     if (payload.done) { finished = true; break; }
32     if (payload.audio) {
33       const pcm = Buffer.from(payload.audio, "base64");
34       // … hand pcm to your audio pipeline
35     }
36   }
37 }

Common gotchas

Use a streaming-friendly client. curl -N, Python iter_lines, or a fetch ReadableStream reader. Buffering clients will hide the latency win.
Audio is base64 inside the event payload, not the raw event bytes. Decode the data.audio field per event.
output_format=pcm gives the lowest overhead for streaming playback. wav/mp3 work but add per-chunk framing bytes.
First-chunk latency depends on model warm-up + network distance. Use output_format=pcm and a streaming-friendly client to minimize what you can control.
JavaScript / TypeScript: the official smallestai npm package predates Lightning v3.1, so call this endpoint with fetch as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Request

This endpoint expects an object.

textstringRequiredDefaults to Hey i am your a text to speech model

The text to convert to speech.

voice_idstringRequiredDefaults to daniel

The voice identifier to use for speech generation.

modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to.

lightning_v3.1 (default) — standard Lightning v3.1 pool.
lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated voice catalog. See the Pro model card.

New integrations should use the unified /waves/v1/tts route instead of this endpoint, but the model field is supported here for backwards-compatible Pro opt-in.

Allowed values:

sample_rateenumOptionalDefaults to 44100

The sample rate for the generated audio.

Allowed values:

speeddoubleOptional0.5-2Defaults to 1

The speed of the generated speech.

languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

Indian: en, hi, mr (Marathi), kn (Kannada), ta (Tamil), bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam), pa (Punjabi), or (Odia)
European: es (Spanish)
auto — auto-detect from input text (recommended for code-switching)

output_formatenumOptionalDefaults to pcm

Allowed values:

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Response

Synthesized speech retrieved successfully.

Errors

400

Bad Request Error

401

Unauthorized Error

500

Internal Server Error

1	import requests
2
3	url = "https://api.smallest.ai/waves/v1/lightning-v3.1/stream"
4
5	payload = {
6	"text": "Hey i am your a text to speech model",
7	"voice_id": "daniel"
8	}
9	headers = {
10	"Authorization": "Bearer <BearerAuth>",
11	"Content-Type": "application/json"
12	}
13
14	response = requests.post(url, json=payload, headers=headers)
15
16	print(response.json())

1	import requests
2
3	url = "https://api.smallest.ai/waves/v1/lightning-v3.1/stream"
4
5	payload = {
6	"text": "Hey i am your a text to speech model",
7	"voice_id": "daniel"
8	}
9	headers = {
10	"Authorization": "Bearer <BearerAuth>",
11	"Content-Type": "application/json"
12	}
13
14	response = requests.post(url, json=payload, headers=headers)
15
16	print(response.json())

$	curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
>	-H "Authorization: Bearer $SMALLEST_API_KEY" \
>	-H "Content-Type: application/json" \
>	-d '{
>	"text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
>	"voice_id": "magnus",
>	"sample_rate": 24000,
>	"output_format": "pcm"
>	}'

1	import base64
2	from smallestai import SmallestAI
3
4	client = SmallestAI(token="YOUR_API_KEY")
5
6	with open("stream.pcm", "wb") as f:
7	for chunk in client.waves.synthesize_sse_lightning_v3_1(
8	text="Streaming this paragraph chunk by chunk so playback can start sooner.",
9	voice_id="magnus",
10	sample_rate=24000,
11	output_format="pcm",
12	):
13	# Each chunk is `{"audio": "<base64-encoded PCM>"}`.
14	# Decode and pipe to your audio pipeline.
15	if chunk.get("audio"):
16	f.write(base64.b64decode(chunk["audio"]))

1	const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
2	method: "POST",
3	headers: {
4	Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
5	"Content-Type": "application/json",
6	},
7	body: JSON.stringify({
8	text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
9	voice_id: "magnus",
10	sample_rate: 24000,
11	output_format: "pcm",
12	}),
13	});
14
15	const reader = res.body!.getReader();
16	const decoder = new TextDecoder();
17	let buf = "";
18	let finished = false;
19	while (!finished) {
20	const { value, done } = await reader.read();
21	if (done) break;
22	buf += decoder.decode(value);
23	const events = buf.split("\n\n");
24	buf = events.pop() ?? "";
25	for (const ev of events) {
26	// SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
27	// We only care about the data line — pull it out and parse.
28	const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
29	if (!dataLine) continue;
30	const payload = JSON.parse(dataLine.slice(5).trim());
31	if (payload.done) { finished = true; break; }
32	if (payload.audio) {
33	const pcm = Buffer.from(payload.audio, "base64");
34	// … hand pcm to your audio pipeline
35	}
36	}
37	}