Synthesize Speech

POST

https://api.smallest.ai/waves/v1/tts

Synthesize speech from text in a single request. Pass `text` + `voice_id`, get back binary audio. Pick the model with the `model` body parameter: default `lightning_v3.1`, or `lightning_v3.1_pro` for the Pro pool. Other request parameters are identical across models. ## When to use this - **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation). - **Use `/waves/v1/tts/live`** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps). - **Use `/waves/v1/tts/live`** (WebSocket) when text arrives incrementally (LLM token streams, live captioning). ## Key features - 44 kHz natural, expressive synthesis - Model selectable per request via `model` body parameter - Cloned voice IDs (`voice_*`) work on `lightning_v3.1` — same param as catalog voices - 12 documented languages on `lightning_v3.1`; English + Hindi on `lightning_v3.1_pro` - Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw` - Sample rates: 8 kHz – 44.1 kHz - Speed: 0.5× – 2× - Per-call pronunciation dictionaries via `pronunciation_dicts` ## Examples **cURL — Lightning v3.1 (default)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from Waves TTS.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **cURL — Lightning v3.1 Pro** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from the Lightning v3.1 Pro pool.", "voice_id": "meher", "model": "lightning_v3.1_pro", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` ## Common gotchas - **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response. - **Pair voice IDs with the right model.** Voice catalogs differ between `lightning_v3.1` and `lightning_v3.1_pro`. The API does not reject mismatched pairings, but using a Pro-only `voice_id` with `model=lightning_v3.1` (or omitting `model`) can return wrong or hallucinated audio. Pair Pro voices with `model=lightning_v3.1_pro`; standard catalog voices with `model=lightning_v3.1` (the default). - **Cloned voices** (`voice_*` from `add_voice`) work with `lightning_v3.1` only; voice cloning is not available on `lightning_v3.1_pro`. - **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.

Synthesize Speech

POST

https://api.smallest.ai/waves/v1/tts

Synthesize speech from text in a single request. Pass text + voice_id, get back binary audio.

Pick the model with the model body parameter: default lightning_v3.1, or lightning_v3.1_pro for the Pro pool. Other request parameters are identical across models.

When to use this

Use this for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation).
Use /waves/v1/tts/live when you want playback to start before the full audio is ready (long passages, latency-sensitive apps).
Use /waves/v1/tts/live (WebSocket) when text arrives incrementally (LLM token streams, live captioning).

Key features

44 kHz natural, expressive synthesis
Model selectable per request via model body parameter
Cloned voice IDs (voice_*) work on lightning_v3.1 — same param as catalog voices
12 documented languages on lightning_v3.1; English + Hindi on lightning_v3.1_pro
Output formats: pcm, mp3, wav, ulaw, alaw
Sample rates: 8 kHz – 44.1 kHz
Speed: 0.5× – 2×
Per-call pronunciation dictionaries via pronunciation_dicts

Examples

cURL — Lightning v3.1 (default)

$ curl -X POST "https://api.smallest.ai/waves/v1/tts" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -H "Accept: audio/wav" \
>   -d '{
>     "text": "Hello from Waves TTS.",
>     "voice_id": "magnus",
>     "sample_rate": 24000,
>     "output_format": "wav"
>   }' --output speech.wav

cURL — Lightning v3.1 Pro

$ curl -X POST "https://api.smallest.ai/waves/v1/tts" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -H "Accept: audio/wav" \
>   -d '{
>     "text": "Hello from the Lightning v3.1 Pro pool.",
>     "voice_id": "meher",
>     "model": "lightning_v3.1_pro",
>     "sample_rate": 24000,
>     "output_format": "wav"
>   }' --output speech.wav

Common gotchas

Set Accept: audio/wav. Omitting it can return an empty or unplayable response.
Pair voice IDs with the right model. Voice catalogs differ between lightning_v3.1 and lightning_v3.1_pro. The API does not reject mismatched pairings, but using a Pro-only voice_id with model=lightning_v3.1 (or omitting model) can return wrong or hallucinated audio. Pair Pro voices with model=lightning_v3.1_pro; standard catalog voices with model=lightning_v3.1 (the default).
Cloned voices (voice_* from add_voice) work with lightning_v3.1 only; voice cloning is not available on lightning_v3.1_pro.
44.1 kHz output is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AcceptenumRequiredDefaults to audio/wav

Must be audio/wav to receive binary audio. Required for proper playback.

Allowed values:

Request

This endpoint expects an object.

textstringRequiredDefaults to Hello from Waves TTS.

The text to convert to speech.

voice_idstringRequiredDefaults to magnus

The voice identifier to use for speech generation. See the model card for available voices per model.

modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to. Controls which model pool serves this synthesis.

lightning_v3.1 (default) — standard Lightning v3.1.
lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio quality and naturalness, with a curated voice catalog. See the Lightning v3.1 Pro model card for supported voice IDs.

Same concurrency and latency profile across both. Other request parameters behave identically.

Allowed values:

sample_rateenumOptionalDefaults to 44100

The sample rate for the generated audio.

Allowed values:

speeddoubleOptional0.5-2Defaults to 1

The speed of the generated speech.

languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

Each voice has its own tags.language set in the voice catalog — query GET /waves/v1/lightning-v3.1/get_voices. Pass a language the voice was trained on; passing other codes is accepted by the API but produces English-pronounced output.

On lightning_v3.1, the full 12-language catalog applies. On lightning_v3.1_pro, Indian voices speak en and hi (with auto for code-switching); British and American voices speak English only.

output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response headers

X-Session-Idstring

Internal session identifier (system-generated UUID).

X-Request-Idstring

Internal request identifier (system-generated UUID).

X-External-Session-Idstring

Echoed client-provided session_id (empty if not provided).

X-External-Request-Idstring

Echoed client-provided request_id (empty if not provided).

Response

Synthesized speech retrieved successfully.

Errors

400

Bad Request Error

401

Unauthorized Error

500

Internal Server Error

Synthesize speech from text in a single request. Pass text + voice_id, get back binary audio.

Pick the model with the model body parameter: default lightning_v3.1, or lightning_v3.1_pro for the Pro pool. Other request parameters are identical across models.

When to use this

Use this for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation).
Use /waves/v1/tts/live when you want playback to start before the full audio is ready (long passages, latency-sensitive apps).
Use /waves/v1/tts/live (WebSocket) when text arrives incrementally (LLM token streams, live captioning).

Key features

44 kHz natural, expressive synthesis
Model selectable per request via model body parameter
Cloned voice IDs (voice_*) work on lightning_v3.1 — same param as catalog voices
12 documented languages on lightning_v3.1; English + Hindi on lightning_v3.1_pro
Output formats: pcm, mp3, wav, ulaw, alaw
Sample rates: 8 kHz – 44.1 kHz
Speed: 0.5× – 2×
Per-call pronunciation dictionaries via pronunciation_dicts

Examples

cURL — Lightning v3.1 (default)

$ curl -X POST "https://api.smallest.ai/waves/v1/tts" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -H "Accept: audio/wav" \
>   -d '{
>     "text": "Hello from Waves TTS.",
>     "voice_id": "magnus",
>     "sample_rate": 24000,
>     "output_format": "wav"
>   }' --output speech.wav

cURL — Lightning v3.1 Pro

$ curl -X POST "https://api.smallest.ai/waves/v1/tts" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -H "Accept: audio/wav" \
>   -d '{
>     "text": "Hello from the Lightning v3.1 Pro pool.",
>     "voice_id": "meher",
>     "model": "lightning_v3.1_pro",
>     "sample_rate": 24000,
>     "output_format": "wav"
>   }' --output speech.wav

Common gotchas

Set Accept: audio/wav. Omitting it can return an empty or unplayable response.
Pair voice IDs with the right model. Voice catalogs differ between lightning_v3.1 and lightning_v3.1_pro. The API does not reject mismatched pairings, but using a Pro-only voice_id with model=lightning_v3.1 (or omitting model) can return wrong or hallucinated audio. Pair Pro voices with model=lightning_v3.1_pro; standard catalog voices with model=lightning_v3.1 (the default).
Cloned voices (voice_* from add_voice) work with lightning_v3.1 only; voice cloning is not available on lightning_v3.1_pro.
44.1 kHz output is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AcceptenumRequiredDefaults to audio/wav

Must be audio/wav to receive binary audio. Required for proper playback.

Allowed values:

Request

This endpoint expects an object.

textstringRequiredDefaults to Hello from Waves TTS.

The text to convert to speech.

voice_idstringRequiredDefaults to magnus

The voice identifier to use for speech generation. See the model card for available voices per model.

modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to. Controls which model pool serves this synthesis.

lightning_v3.1 (default) — standard Lightning v3.1.
lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio quality and naturalness, with a curated voice catalog. See the Lightning v3.1 Pro model card for supported voice IDs.

Same concurrency and latency profile across both. Other request parameters behave identically.

Allowed values:

sample_rateenumOptionalDefaults to 44100

The sample rate for the generated audio.

Allowed values:

speeddoubleOptional0.5-2Defaults to 1

The speed of the generated speech.

languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

On lightning_v3.1, the full 12-language catalog applies. On lightning_v3.1_pro, Indian voices speak en and hi (with auto for code-switching); British and American voices speak English only.

output_formatenumOptionalDefaults to pcm

Allowed values:

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Response headers

X-Session-Idstring

Internal session identifier (system-generated UUID).

X-Request-Idstring

Internal request identifier (system-generated UUID).

X-External-Session-Idstring

Echoed client-provided session_id (empty if not provided).

X-External-Request-Idstring

Echoed client-provided request_id (empty if not provided).

Response

Synthesized speech retrieved successfully.

Errors

400

Bad Request Error

401

Unauthorized Error

500

Internal Server Error

1	import requests
2
3	url = "https://api.smallest.ai/waves/v1/tts"
4
5	payload = {
6	"text": "Hello from Waves TTS.",
7	"voice_id": "magnus",
8	"model": "lightning_v3.1",
9	"sample_rate": 44100,
10	"speed": 1,
11	"output_format": "mp3"
12	}
13	headers = {
14	"Accept": "audio/wav",
15	"Authorization": "Bearer <BearerAuth>",
16	"Content-Type": "application/json"
17	}
18
19	response = requests.post(url, json=payload, headers=headers)
20
21	print(response.json())

1	import requests
2
3	url = "https://api.smallest.ai/waves/v1/tts"
4
5	payload = {
6	"text": "Hello from Waves TTS.",
7	"voice_id": "magnus",
8	"model": "lightning_v3.1",
9	"sample_rate": 44100,
10	"speed": 1,
11	"output_format": "mp3"
12	}
13	headers = {
14	"Accept": "audio/wav",
15	"Authorization": "Bearer <BearerAuth>",
16	"Content-Type": "application/json"
17	}
18
19	response = requests.post(url, json=payload, headers=headers)
20
21	print(response.json())

$	curl -X POST "https://api.smallest.ai/waves/v1/tts" \
>	-H "Authorization: Bearer $SMALLEST_API_KEY" \
>	-H "Content-Type: application/json" \
>	-H "Accept: audio/wav" \
>	-d '{
>	"text": "Hello from Waves TTS.",
>	"voice_id": "magnus",
>	"sample_rate": 24000,
>	"output_format": "wav"
>	}' --output speech.wav