Lightning v3.1 SSE

View as Markdown
Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync `/get_speech` endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob. ## When to use this - **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration. - **Use sync `/get_speech`** when total latency doesn't matter and you'd rather get one buffer. - **Use the WebSocket endpoint** when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front. ## How it works 1. POST your text + voice settings — same payload as `/get_speech`. 2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`. 3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.). 4. A final `data: {"done": true}\n\n` frame marks end of stream. ## Examples **cURL** ```bash curl -N -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/stream" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "text": "Streaming this paragraph chunk by chunk so playback can start sooner.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "pcm" }' ``` **Python** (`pip install smallestai>=4.4.0`) ```python import base64 from smallestai import SmallestAI client = SmallestAI(token="YOUR_API_KEY") with open("stream.pcm", "wb") as f: for chunk in client.waves.synthesize_sse_lightning_v31( text="Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id="magnus", sample_rate=24000, output_format="pcm", ): # Each chunk is `{"audio": "<base64-encoded PCM>"}`. # Decode and pipe to your audio pipeline. if chunk.get("audio"): f.write(base64.b64decode(chunk["audio"])) ``` **JavaScript / TypeScript** (using `fetch` + a reader) ```typescript const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/stream", { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/json", }, body: JSON.stringify({ text: "Streaming this paragraph chunk by chunk so playback can start sooner.", voice_id: "magnus", sample_rate: 24000, output_format: "pcm", }), }); const reader = res.body!.getReader(); const decoder = new TextDecoder(); let buf = ""; let finished = false; while (!finished) { const { value, done } = await reader.read(); if (done) break; buf += decoder.decode(value); const events = buf.split("\n\n"); buf = events.pop() ?? ""; for (const ev of events) { // SSE frames are "event: audio\ndata: {json}" or just "data: {json}". // We only care about the data line — pull it out and parse. const dataLine = ev.split("\n").find((l) => l.startsWith("data:")); if (!dataLine) continue; const payload = JSON.parse(dataLine.slice(5).trim()); if (payload.done) { finished = true; break; } if (payload.audio) { const pcm = Buffer.from(payload.audio, "base64"); // … hand pcm to your audio pipeline } } } ``` ## Common gotchas - **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win. - **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event. - **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes. - **First-chunk latency** depends on model warm-up + network distance. Use `output_format=pcm` and a streaming-friendly client to minimize what you can control. - **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Request

This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
Allowed values:
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

  • Indian: en, hi, mr (Marathi), kn (Kannada), ta (Tamil), bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam), pa (Punjabi), or (Odia)
  • European: es (Spanish)
  • auto — auto-detect from input text (recommended for code-switching)
output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response

Synthesized speech retrieved successfully.

Errors

400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error