For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • API References
    • Authentication
    • Concurrency and Limits
    • WebSocket
  • Text to Speech
    • POSTSynthesize Speech
    • STREAMStream Speech (SSE)
    • WSSStream Speech (WebSocket)
    • POSTLightning v3.1 (endpoint will be deprecated)
    • POSTLightning v3.1 SSE (endpoint will be deprecated)
    • WSSLightning v3.1 WebSocket (endpoint will be deprecated)
    • POSTLightning v2 (Deprecated)
    • POSTLightning v2 SSE (Deprecated)
    • WSSLightning v2 WebSocket (Deprecated)
    • GETGet Voices
    • POSTCreate a Voice Clone
    • GETList Voice Clones
    • DELDelete a Voice Clone
    • POSTAdd Voice (Deprecated)
    • GETGet Cloned Voices (Deprecated)
    • GETGet Pronunciation Dictionaries
    • POSTCreate Pronunciation Dictionary
    • PUTUpdate Pronunciation Dictionary
    • DELDelete Pronunciation Dictionary
  • Speech to Text
    • POSTTranscribe (Pre-recorded)
    • WSSTranscribe (Realtime / WebSocket)
  • LLM (Chat Completions)
    • POSTElectron — Chat Completions
  • Speech to Speech
    • WSSHydra (Realtime / WebSocket)
LogoLogo
Voice AgentsModels
Voice AgentsModels
Text to Speech

Synthesize Speech

||View as Markdown|
POST
https://api.smallest.ai/waves/v1/tts
POST
/waves/v1/tts
1import requests
2
3url = "https://api.smallest.ai/waves/v1/tts"
4
5payload = {
6 "text": "Hello from Waves TTS.",
7 "voice_id": "magnus",
8 "model": "lightning_v3.1",
9 "sample_rate": 44100,
10 "speed": 1,
11 "output_format": "mp3"
12}
13headers = {
14 "Accept": "audio/wav",
15 "Authorization": "Bearer <BearerAuth>",
16 "Content-Type": "application/json"
17}
18
19response = requests.post(url, json=payload, headers=headers)
20
21print(response.json())
Synthesize speech from text in a single request. Pass `text` + `voice_id`, get back binary audio. Pick the model with the `model` body parameter: default `lightning_v3.1`, or `lightning_v3.1_pro` for the Pro pool. Other request parameters are identical across models. ## When to use this - **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation). - **Use `/waves/v1/tts/live`** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps). - **Use `/waves/v1/tts/live`** (WebSocket) when text arrives incrementally (LLM token streams, live captioning). ## Key features - 44 kHz natural, expressive synthesis - Model selectable per request via `model` body parameter - Cloned voice IDs (`voice_*`) work on `lightning_v3.1` — same param as catalog voices - 12 documented languages on `lightning_v3.1`; English + Hindi on `lightning_v3.1_pro` - Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw` - Sample rates: 8 kHz – 44.1 kHz - Speed: 0.5× – 2× - Per-call pronunciation dictionaries via `pronunciation_dicts` ## Examples **cURL — Lightning v3.1 (default)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from Waves TTS.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **cURL — Lightning v3.1 Pro** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from the Lightning v3.1 Pro pool.", "voice_id": "meher", "model": "lightning_v3.1_pro", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` ## Common gotchas - **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response. - **Pair voice IDs with the right model.** Voice catalogs differ between `lightning_v3.1` and `lightning_v3.1_pro`. The API does not reject mismatched pairings, but using a Pro-only `voice_id` with `model=lightning_v3.1` (or omitting `model`) can return wrong or hallucinated audio. Pair Pro voices with `model=lightning_v3.1_pro`; standard catalog voices with `model=lightning_v3.1` (the default). - **Cloned voices** (`voice_*` from `add_voice`) work with `lightning_v3.1` only; voice cloning is not available on `lightning_v3.1_pro`. - **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.
Was this page helpful?
Previous

WebSocket Support for Text to Speech (TTS) API

Next

Stream Speech (SSE)

Built with

Synthesize speech from text in a single request. Pass text + voice_id, get back binary audio.

Pick the model with the model body parameter: default lightning_v3.1, or lightning_v3.1_pro for the Pro pool. Other request parameters are identical across models.

When to use this

  • Use this for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation).
  • Use /waves/v1/tts/live when you want playback to start before the full audio is ready (long passages, latency-sensitive apps).
  • Use /waves/v1/tts/live (WebSocket) when text arrives incrementally (LLM token streams, live captioning).

Key features

  • 44 kHz natural, expressive synthesis
  • Model selectable per request via model body parameter
  • Cloned voice IDs (voice_*) work on lightning_v3.1 — same param as catalog voices
  • 12 documented languages on lightning_v3.1; English + Hindi on lightning_v3.1_pro
  • Output formats: pcm, mp3, wav, ulaw, alaw
  • Sample rates: 8 kHz – 44.1 kHz
  • Speed: 0.5× – 2×
  • Per-call pronunciation dictionaries via pronunciation_dicts

Examples

cURL — Lightning v3.1 (default)

$curl -X POST "https://api.smallest.ai/waves/v1/tts" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -H "Accept: audio/wav" \
> -d '{
> "text": "Hello from Waves TTS.",
> "voice_id": "magnus",
> "sample_rate": 24000,
> "output_format": "wav"
> }' --output speech.wav

cURL — Lightning v3.1 Pro

$curl -X POST "https://api.smallest.ai/waves/v1/tts" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -H "Accept: audio/wav" \
> -d '{
> "text": "Hello from the Lightning v3.1 Pro pool.",
> "voice_id": "meher",
> "model": "lightning_v3.1_pro",
> "sample_rate": 24000,
> "output_format": "wav"
> }' --output speech.wav

Common gotchas

  • Set Accept: audio/wav. Omitting it can return an empty or unplayable response.
  • Pair voice IDs with the right model. Voice catalogs differ between lightning_v3.1 and lightning_v3.1_pro. The API does not reject mismatched pairings, but using a Pro-only voice_id with model=lightning_v3.1 (or omitting model) can return wrong or hallucinated audio. Pair Pro voices with model=lightning_v3.1_pro; standard catalog voices with model=lightning_v3.1 (the default).
  • Cloned voices (voice_* from add_voice) work with lightning_v3.1 only; voice cloning is not available on lightning_v3.1_pro.
  • 44.1 kHz output is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AcceptenumRequiredDefaults to audio/wav

Must be audio/wav to receive binary audio. Required for proper playback.

Allowed values:

Request

This endpoint expects an object.
textstringRequiredDefaults to Hello from Waves TTS.
The text to convert to speech.
voice_idstringRequiredDefaults to magnus
The voice identifier to use for speech generation. See the model card for available voices per model.
modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to. Controls which model pool serves this synthesis.

  • lightning_v3.1 (default) — standard Lightning v3.1.
  • lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio quality and naturalness, with a curated voice catalog. See the Lightning v3.1 Pro model card for supported voice IDs.

Same concurrency and latency profile across both. Other request parameters behave identically.

Allowed values:
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
Allowed values:
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

Each voice has its own tags.language set in the voice catalog — query GET /waves/v1/lightning-v3.1/get_voices. Pass a language the voice was trained on; passing other codes is accepted by the API but produces English-pronounced output.

On lightning_v3.1, the full 12-language catalog applies. On lightning_v3.1_pro, Indian voices speak en and hi (with auto for code-switching); British and American voices speak English only.

output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:
pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response headers

X-Session-Idstring

Internal session identifier (system-generated UUID).

X-Request-Idstring

Internal request identifier (system-generated UUID).

X-External-Session-Idstring

Echoed client-provided session_id (empty if not provided).

X-External-Request-Idstring

Echoed client-provided request_id (empty if not provided).

Response

Synthesized speech retrieved successfully.

Errors

400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error