Lightning v3.1 SSE (endpoint will be deprecated)
Lightning v3.1 SSE (endpoint will be deprecated)
Lightning v3.1 SSE (endpoint will be deprecated)
POST /waves/v1/tts/live and select Lightning v3.1 via the model body field (default).Synthesize speech and stream the audio back over Server-Sent Events. The body and parameters are identical to the sync /get_speech endpoint — the difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.
/get_speech when total latency doesn’t matter and you’d rather get one buffer./get_speech.Content-Type: text/event-stream. Each chunk frame is event: audio\n followed by data: {"audio": "<base64-pcm>"}\n\n.audio field with base64 and feed the PCM bytes to your audio pipeline (browser MediaSource, ffmpeg pipe, raw PCM player, etc.).data: {"done": true}\n\n frame marks end of stream.cURL
Python (pip install smallestai>=4.4.0)
JavaScript / TypeScript (using fetch + a reader)
curl -N, Python iter_lines, or a fetch ReadableStream reader. Buffering clients will hide the latency win.data.audio field per event.output_format=pcm gives the lowest overhead for streaming playback. wav/mp3 work but add per-chunk framing bytes.output_format=pcm and a streaming-friendly client to minimize what you can control.smallestai npm package predates Lightning v3.1, so call this endpoint with fetch as shown above.Header authentication of the form Bearer <token>
TTS model to route the request to.
lightning_v3.1 (default) — standard Lightning v3.1 pool.lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated
voice catalog. See the
Pro model card.New integrations should use the unified
/waves/v1/tts route instead of this endpoint, but the model
field is supported here for backwards-compatible Pro opt-in.
Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.
en, hi, mr (Marathi), kn (Kannada), ta (Tamil),
bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam),
pa (Punjabi), or (Odia)es (Spanish)auto — auto-detect from input text (recommended for code-switching)Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.