Synthesize Speech

View as Markdown
Synthesize speech from text in a single request. Pass `text` + `voice_id`, get back binary audio. Pick the model with the `model` body parameter: default `lightning_v3.1`, or `lightning_v3.1_pro` for the Pro pool. Other request parameters are identical across models. **Language behaviour on `lightning_v3.1_pro`:** pass `language: en` for UK + American accented English, pass `language: hi` for Indian accented English + Hindi (code-switching), or omit `language` to default to `en + hi` (mixed Indian + Western English coverage). On `lightning_v3.1` the full 12-language catalog applies (see voice catalog). ## When to use this - **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation). - **Use `/waves/v1/tts/live`** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps). - **Use `/waves/v1/tts/live`** (WebSocket) when text arrives incrementally (LLM token streams, live captioning). ## Key features - 44 kHz natural, expressive synthesis - Model selectable per request via `model` body parameter - Cloned voice IDs (`voice_*`) work on `lightning_v3.1` — same param as catalog voices - 12 documented languages on `lightning_v3.1`. On `lightning_v3.1_pro`: `language: en` → UK + American accented English; `language: hi` → Indian accented English + Hindi; omit `language` → defaults to `en + hi`. - Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw` - Sample rates: 8 kHz – 44.1 kHz - Speed: 0.5× – 2× - Per-call pronunciation dictionaries via `pronunciation_dicts` ## Examples **cURL — Lightning v3.1 (default)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from Waves TTS.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **cURL — Lightning v3.1 Pro (omit `language` → defaults to `en + hi`)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from the Lightning v3.1 Pro pool.", "voice_id": "meher", "model": "lightning_v3.1_pro", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **cURL — Lightning v3.1 Pro with explicit `language: en` (UK + American accented English)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Good morning, this is a Pro voice speaking.", "voice_id": "meher", "model": "lightning_v3.1_pro", "language": "en", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **cURL — Lightning v3.1 Pro with explicit `language: hi` (Indian accented English + Hindi)** ```bash curl -X POST "https://api.smallest.ai/waves/v1/tts" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Namaste, this is an Indian-accented Pro voice.", "voice_id": "meher", "model": "lightning_v3.1_pro", "language": "hi", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` ## Common gotchas - **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response. - **Pair voice IDs with the right model.** Voice catalogs differ between `lightning_v3.1` and `lightning_v3.1_pro`. The API does not reject mismatched pairings, but using a Pro-only `voice_id` with `model=lightning_v3.1` (or omitting `model`) can return wrong or hallucinated audio. Pair Pro voices with `model=lightning_v3.1_pro`; standard catalog voices with `model=lightning_v3.1` (the default). - **Cloned voices** (`voice_*` from `add_voice`) work with `lightning_v3.1` only; voice cloning is not available on `lightning_v3.1_pro`. - **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AcceptenumRequiredDefaults to audio/wav

Must be audio/wav to receive binary audio. Required for proper playback.

Allowed values:

Request

This endpoint expects an object.
textstringRequiredDefaults to Hello from Waves TTS.
The text to convert to speech.
voice_idstringRequiredDefaults to magnus
The voice identifier to use for speech generation. See the model card for available voices per model.
modelenumOptionalDefaults to lightning_v3.1

TTS model to route the request to. Controls which model pool serves this synthesis.

  • lightning_v3.1 (default) — standard Lightning v3.1.
  • lightning_v3.1_pro — Lightning v3.1 Pro pool. Improved audio quality and naturalness, with a curated voice catalog. See the Lightning v3.1 Pro model card for supported voice IDs.

Same concurrency and latency profile across both. Other request parameters behave identically.

sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

Each voice has its own tags.language set in the voice catalog — query GET /waves/v1/lightning-v3.1/get_voices. Pass a language the voice was trained on; passing other codes is accepted by the API but produces English-pronounced output.

On lightning_v3.1, the full 12-language catalog applies.

On lightning_v3.1_pro:

  • Pass en → UK + American accented English.
  • Pass hi → Indian accented English + Hindi (code-switching).
  • Omit language → defaults to en + hi (mixed Indian + Western English coverage).
output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation. Available on both lightning_v3.1 and lightning_v3.1_pro.

word_timestampsbooleanOptionalDefaults to false

WebSocket-only feature. Accepted on this endpoint but ignored — no per-word timing information is returned in the sync HTTP or SSE response shape. To receive status: "word_timestamp" frames with per-word { id, word, start, end } data, use the WebSocket endpoint wss://api.smallest.ai/waves/v1/tts/live. See Word-level timestamps.

session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response headers

X-Session-Idstring

Internal session identifier (system-generated UUID).

X-Request-Idstring

Internal request identifier (system-generated UUID).

X-External-Session-Idstring

Echoed client-provided session_id (empty if not provided).

X-External-Request-Idstring

Echoed client-provided request_id (empty if not provided).

Response

Synthesized speech retrieved successfully.

Errors

400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error