Lightning v3.1 (endpoint will be deprecated)
Lightning v3.1 (endpoint will be deprecated)
Lightning v3.1 (endpoint will be deprecated)
POST /waves/v1/tts and select Lightning v3.1 via the model body field (default).Synthesize speech from text in a single request. The simplest way to get audio when you have the full text up front — pass text + voice_id, get back binary audio.
voice_*) work — same param as catalog voicespcm, mp3, wav, ulaw, alawpronunciation_dictscURL
Python (pip install smallestai>=4.4.0)
JavaScript / TypeScript (using fetch)
Accept: audio/wav. Omitting it can return an empty or unplayable response.voice_* from add_voice) work on this endpoint and support pronunciation_dicts.pronunciation_dicts validates IDs at request time. Passing an unknown ID returns Invalid input data — create the dict first via the pronunciation-dicts endpoint and save the returned id.Synopsis and synopsis if your text uses both casings.smallestai npm package predates Lightning v3.1, so call this endpoint with fetch or axios as shown above.Header authentication of the form Bearer <token>
Must be audio/wav to receive binary audio. Required for proper playback.
TTS model to route the request to.
lightning_v3.1 (default) — standard Lightning v3.1 pool.lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated
voice catalog. See the
Pro model card.New integrations should use the unified
/waves/v1/tts route instead of this endpoint, but the model
field is supported here for backwards-compatible Pro opt-in.
Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.
en, hi, mr (Marathi), kn (Kannada), ta (Tamil),
bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam),
pa (Punjabi), or (Odia)es (Spanish)auto — auto-detect from input text (recommended for code-switching)Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Internal session identifier (system-generated UUID).
Internal request identifier (system-generated UUID).
Echoed client-provided session_id (empty if not provided).
Echoed client-provided request_id (empty if not provided).