Synthesize speech from text in a single request. Pass `text` + `voice_id`, get back binary audio.
Pick the model with the `model` body parameter: default `lightning_v3.1`, or `lightning_v3.1_pro` for the Pro pool. Other request parameters are identical across models.
**Language behaviour on `lightning_v3.1_pro`:** pass `language: en` for UK + American accented English, pass `language: hi` for Indian accented English + Hindi (code-switching), or omit `language` to default to `en + hi` (mixed Indian + Western English coverage). On `lightning_v3.1` the full 12-language catalog applies (see voice catalog).
## When to use this
- **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation).
- **Use `/waves/v1/tts/live`** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps).
- **Use `/waves/v1/tts/live`** (WebSocket) when text arrives incrementally (LLM token streams, live captioning).
## Key features
- 44 kHz natural, expressive synthesis
- Model selectable per request via `model` body parameter
- Cloned voice IDs (`voice_*`) work on `lightning_v3.1` — same param as catalog voices
- 12 documented languages on `lightning_v3.1`. On `lightning_v3.1_pro`: `language: en` → UK + American accented English; `language: hi` → Indian accented English + Hindi; omit `language` → defaults to `en + hi`.
- Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw`
- Sample rates: 8 kHz – 44.1 kHz
- Speed: 0.5× – 2×
- Per-call pronunciation dictionaries via `pronunciation_dicts`
## Examples
**cURL — Lightning v3.1 (default)**
```bash
curl -X POST "https://api.smallest.ai/waves/v1/tts" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Hello from Waves TTS.",
"voice_id": "magnus",
"sample_rate": 24000,
"output_format": "wav"
}' --output speech.wav
```
**cURL — Lightning v3.1 Pro (omit `language` → defaults to `en + hi`)**
```bash
curl -X POST "https://api.smallest.ai/waves/v1/tts" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Hello from the Lightning v3.1 Pro pool.",
"voice_id": "meher",
"model": "lightning_v3.1_pro",
"sample_rate": 24000,
"output_format": "wav"
}' --output speech.wav
```
**cURL — Lightning v3.1 Pro with explicit `language: en` (UK + American accented English)**
```bash
curl -X POST "https://api.smallest.ai/waves/v1/tts" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Good morning, this is a Pro voice speaking.",
"voice_id": "meher",
"model": "lightning_v3.1_pro",
"language": "en",
"sample_rate": 24000,
"output_format": "wav"
}' --output speech.wav
```
**cURL — Lightning v3.1 Pro with explicit `language: hi` (Indian accented English + Hindi)**
```bash
curl -X POST "https://api.smallest.ai/waves/v1/tts" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Namaste, this is an Indian-accented Pro voice.",
"voice_id": "meher",
"model": "lightning_v3.1_pro",
"language": "hi",
"sample_rate": 24000,
"output_format": "wav"
}' --output speech.wav
```
## Common gotchas
- **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response.
- **Pair voice IDs with the right model.** Voice catalogs differ between `lightning_v3.1` and `lightning_v3.1_pro`. The API does not reject mismatched pairings, but using a Pro-only `voice_id` with `model=lightning_v3.1` (or omitting `model`) can return wrong or hallucinated audio. Pair Pro voices with `model=lightning_v3.1_pro`; standard catalog voices with `model=lightning_v3.1` (the default).
- **Cloned voices** (`voice_*` from `add_voice`) work with `lightning_v3.1` only; voice cloning is not available on `lightning_v3.1_pro`.
- **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.
Request
This endpoint expects an object.
textstringRequiredDefaults to Hello from Waves TTS.
The text to convert to speech.
voice_idstringRequiredDefaults to magnus
The voice identifier to use for speech generation. See the model card for available voices per model.
modelenumOptionalDefaults to lightning_v3.1
TTS model to route the request to. Controls which model pool serves
this synthesis.
lightning_v3.1 (default) — standard Lightning v3.1.
lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio
quality and naturalness, with a curated voice catalog. See the
Lightning v3.1 Pro model card
for supported voice IDs.
Same concurrency and latency profile across both. Other request
parameters behave identically.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en
Language code for synthesis. Influences pronunciation, number/date
normalization, and phoneme selection.
Each voice has its own tags.language set in the voice catalog —
query GET /waves/v1/lightning-v3.1/get_voices. Pass a language
the voice was trained on; passing other codes is accepted by the
API but produces English-pronounced output.
On lightning_v3.1, the full 12-language catalog applies.
On lightning_v3.1_pro:
- Pass
en → UK + American accented English.
- Pass
hi → Indian accented English + Hindi (code-switching).
- Omit
language → defaults to en + hi (mixed Indian + Western English coverage).
output_formatenumOptionalDefaults to pcm
Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.
word_timestampsbooleanOptionalDefaults to false
WebSocket-only feature. Accepted on this endpoint but ignored — no per-word timing information is returned in the sync HTTP or SSE response shape. To receive status: "word_timestamp" frames with per-word { id, word, start, end } data, use the WebSocket endpoint wss://api.smallest.ai/waves/v1/tts/live. See Word-level timestamps.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Response
Synthesized speech retrieved successfully.