For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • API References
    • Authentication
    • Concurrency and Limits
    • WebSocket
  • Text to Speech
    • POSTSynthesize Speech
    • STREAMStream Speech (SSE)
    • WSSStream Speech (WebSocket)
    • POSTLightning v3.1 (endpoint will be deprecated)
    • POSTLightning v3.1 SSE (endpoint will be deprecated)
    • WSSLightning v3.1 WebSocket (endpoint will be deprecated)
    • POSTLightning v2 (Deprecated)
    • POSTLightning v2 SSE (Deprecated)
    • WSSLightning v2 WebSocket (Deprecated)
    • GETGet Voices
    • POSTCreate a Voice Clone
    • GETList Voice Clones
    • DELDelete a Voice Clone
    • POSTAdd Voice (Deprecated)
    • GETGet Cloned Voices (Deprecated)
    • GETGet Pronunciation Dictionaries
    • POSTCreate Pronunciation Dictionary
    • PUTUpdate Pronunciation Dictionary
    • DELDelete Pronunciation Dictionary
  • Speech to Text
    • POSTTranscribe (Pre-recorded)
    • WSSTranscribe (Realtime / WebSocket)
  • LLM (Chat Completions)
    • POSTElectron — Chat Completions
  • Speech to Speech
    • WSSHydra (Realtime / WebSocket)
LogoLogo
Voice AgentsModels
Voice AgentsModels
Text to Speech

Lightning v3.1 SSE (endpoint will be deprecated)

||View as Markdown|
POST
https://api.smallest.ai/waves/v1/lightning-v3.1/stream
POST
/waves/v1/lightning-v3.1/stream
1import requests
2
3url = "https://api.smallest.ai/waves/v1/lightning-v3.1/stream"
4
5payload = {
6 "text": "Hey i am your a text to speech model",
7 "voice_id": "daniel"
8}
9headers = {
10 "Authorization": "Bearer <BearerAuth>",
11 "Content-Type": "application/json"
12}
13
14response = requests.post(url, json=payload, headers=headers)
15
16print(response.json())
<Warning>**Endpoint scheduled for retirement.** This URL will stop accepting requests **60 days from the Lightning v3.1 Pro launch (2026-05-15)** — i.e. on **2026-07-14**. The Lightning v3.1 model itself is current and stays. Migrate to [`POST /waves/v1/tts/live`](/waves/api-reference/api-reference/text-to-speech/synthesize-speech-sse) and select Lightning v3.1 via the `model` body field (default).</Warning> Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync `/get_speech` endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob. ## When to use this - **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration. - **Use sync `/get_speech`** when total latency doesn't matter and you'd rather get one buffer. - **Use the WebSocket endpoint** when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front. ## How it works 1. POST your text + voice settings — same payload as `/get_speech`. 2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`. 3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.). 4. A final `data: {"done": true}\n\n` frame marks end of stream. ## Examples **cURL** ```bash curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Streaming this paragraph chunk by chunk so playback can start sooner.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "pcm" }' ``` **Python** (`pip install smallestai>=4.4.0`) ```python import base64 from smallestai import SmallestAI client = SmallestAI(token="YOUR_API_KEY") with open("stream.pcm", "wb") as f: for chunk in client.waves.synthesize_sse_lightning_v3_1( text="Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id="magnus", sample_rate=24000, output_format="pcm", ): # Each chunk is `{"audio": "<base64-encoded PCM>"}`. # Decode and pipe to your audio pipeline. if chunk.get("audio"): f.write(base64.b64decode(chunk["audio"])) ``` **JavaScript / TypeScript** (using `fetch` + a reader) ```typescript const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id: "magnus", sample_rate: 24000, output_format: "pcm", }), }); const reader = res.body!.getReader(); const decoder = new TextDecoder(); let buf = ""; let finished = false; while (!finished) { const { value, done } = await reader.read(); if (done) break; buf += decoder.decode(value); const events = buf.split("\n\n"); buf = events.pop() ?? ""; for (const ev of events) { // SSE frames are "event: audio\ndata: {json}" or just "data: {json}". // We only care about the data line — pull it out and parse. const dataLine = ev.split("\n").find((l) => l.startsWith("data:")); if (!dataLine) continue; const payload = JSON.parse(dataLine.slice(5).trim()); if (payload.done) { finished = true; break; } if (payload.audio) { const pcm = Buffer.from(payload.audio, "base64"); // … hand pcm to your audio pipeline } } } ``` ## Common gotchas - **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win. - **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event. - **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes. - **First-chunk latency** depends on model warm-up + network distance. Use `output_format=pcm` and a streaming-friendly client to minimize what you can control. - **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` as shown above.
Was this page helpful?
Previous

Lightning v3.1 (endpoint will be deprecated)

Next

Lightning v3.1 WebSocket (endpoint will be deprecated)

Built with
Endpoint scheduled for retirement. This URL will stop accepting requests 60 days from the Lightning v3.1 Pro launch (2026-05-15) — i.e. on 2026-07-14. The Lightning v3.1 model itself is current and stays. Migrate to POST /waves/v1/tts/live and select Lightning v3.1 via the model body field (default).

Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync /get_speech endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.

When to use this

  • Use this when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
  • Use sync /get_speech when total latency doesn’t matter and you’d rather get one buffer.
  • Use the WebSocket endpoint when the text arrives incrementally (LLM token stream). SSE assumes you have the full text up front.

How it works

  1. POST your text + voice settings — same payload as /get_speech.
  2. The response is Content-Type: text/event-stream. Each chunk frame is event: audio\n followed by data: {"audio": "<base64-pcm>"}\n\n.
  3. Decode each chunk’s audio field with base64 and feed the PCM bytes to your audio pipeline (browser MediaSource, ffmpeg pipe, raw PCM player, etc.).
  4. A final data: {"done": true}\n\n frame marks end of stream.

Examples

cURL

$curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
> "voice_id": "magnus",
> "sample_rate": 24000,
> "output_format": "pcm"
> }'

Python (pip install smallestai>=4.4.0)

1import base64
2from smallestai import SmallestAI
3
4client = SmallestAI(token="YOUR_API_KEY")
5
6with open("stream.pcm", "wb") as f:
7 for chunk in client.waves.synthesize_sse_lightning_v3_1(
8 text="Streaming this paragraph chunk by chunk so playback can start sooner.",
9 voice_id="magnus",
10 sample_rate=24000,
11 output_format="pcm",
12 ):
13 # Each chunk is `{"audio": "<base64-encoded PCM>"}`.
14 # Decode and pipe to your audio pipeline.
15 if chunk.get("audio"):
16 f.write(base64.b64decode(chunk["audio"]))

JavaScript / TypeScript (using fetch + a reader)

1const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", {
2 method: "POST",
3 headers: {
4 Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
5 "Content-Type": "application/json",
6 },
7 body: JSON.stringify({
8 text: "Streaming this paragraph chunk by chunk so playback can start sooner.",
9 voice_id: "magnus",
10 sample_rate: 24000,
11 output_format: "pcm",
12 }),
13});
14
15const reader = res.body!.getReader();
16const decoder = new TextDecoder();
17let buf = "";
18let finished = false;
19while (!finished) {
20 const { value, done } = await reader.read();
21 if (done) break;
22 buf += decoder.decode(value);
23 const events = buf.split("\n\n");
24 buf = events.pop() ?? "";
25 for (const ev of events) {
26 // SSE frames are "event: audio\ndata: {json}" or just "data: {json}".
27 // We only care about the data line — pull it out and parse.
28 const dataLine = ev.split("\n").find((l) => l.startsWith("data:"));
29 if (!dataLine) continue;
30 const payload = JSON.parse(dataLine.slice(5).trim());
31 if (payload.done) { finished = true; break; }
32 if (payload.audio) {
33 const pcm = Buffer.from(payload.audio, "base64");
34 // … hand pcm to your audio pipeline
35 }
36 }
37}

Common gotchas

  • Use a streaming-friendly client. curl -N, Python iter_lines, or a fetch ReadableStream reader. Buffering clients will hide the latency win.
  • Audio is base64 inside the event payload, not the raw event bytes. Decode the data.audio field per event.
  • output_format=pcm gives the lowest overhead for streaming playback. wav/mp3 work but add per-chunk framing bytes.
  • First-chunk latency depends on model warm-up + network distance. Use output_format=pcm and a streaming-friendly client to minimize what you can control.
  • JavaScript / TypeScript: the official smallestai npm package predates Lightning v3.1, so call this endpoint with fetch as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Request

This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to.

  • lightning_v3.1 (default) — standard Lightning v3.1 pool.
  • lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated voice catalog. See the Pro model card.

New integrations should use the unified /waves/v1/tts route instead of this endpoint, but the model field is supported here for backwards-compatible Pro opt-in.

Allowed values:
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
Allowed values:
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

  • Indian: en, hi, mr (Marathi), kn (Kannada), ta (Tamil), bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam), pa (Punjabi), or (Odia)
  • European: es (Spanish)
  • auto — auto-detect from input text (recommended for code-switching)
output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response

Synthesized speech retrieved successfully.

Errors

400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error