Synthesize speech from text in a single request. Pass text + voice_id, get back binary audio.
Pick the model with the model body parameter: default lightning_v3.1, or lightning_v3.1_pro for the Pro pool. Other request parameters are identical across models.
/waves/v1/tts/live when you want playback to start before the full audio is ready (long passages, latency-sensitive apps)./waves/v1/tts/live (WebSocket) when text arrives incrementally (LLM token streams, live captioning).model body parametervoice_*) work on lightning_v3.1 — same param as catalog voiceslightning_v3.1; English + Hindi on lightning_v3.1_propcm, mp3, wav, ulaw, alawpronunciation_dictscURL — Lightning v3.1 (default)
cURL — Lightning v3.1 Pro
Accept: audio/wav. Omitting it can return an empty or unplayable response.lightning_v3.1 and lightning_v3.1_pro. The API does not reject mismatched pairings, but using a Pro-only voice_id with model=lightning_v3.1 (or omitting model) can return wrong or hallucinated audio. Pair Pro voices with model=lightning_v3.1_pro; standard catalog voices with model=lightning_v3.1 (the default).voice_* from add_voice) work with lightning_v3.1 only; voice cloning is not available on lightning_v3.1_pro.Header authentication of the form Bearer <token>
Must be audio/wav to receive binary audio. Required for proper playback.
TTS model to route the request to. Controls which model pool serves this synthesis.
lightning_v3.1 (default) — standard Lightning v3.1.lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio
quality and naturalness, with a curated voice catalog. See the
Lightning v3.1 Pro model card
for supported voice IDs.Same concurrency and latency profile across both. Other request parameters behave identically.
Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.
Each voice has its own tags.language set in the voice catalog —
query GET /waves/v1/lightning-v3.1/get_voices. Pass a language
the voice was trained on; passing other codes is accepted by the
API but produces English-pronounced output.
On lightning_v3.1, the full 12-language catalog applies. On
lightning_v3.1_pro, Indian voices speak en and hi (with
auto for code-switching); British and American voices speak
English only.
Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Internal session identifier (system-generated UUID).
Internal request identifier (system-generated UUID).
Echoed client-provided session_id (empty if not provided).
Echoed client-provided request_id (empty if not provided).