<Warning>**Endpoint scheduled for retirement.** This URL will stop accepting requests **60 days from the Lightning v3.1 Pro launch (2026-05-15)** — i.e. on **2026-07-14**. The Lightning v3.1 model itself is current and stays. Migrate to [`POST /waves/v1/tts`](/waves/api-reference/api-reference/text-to-speech/synthesize-speech) and select Lightning v3.1 via the `model` body field (default).</Warning>
Synthesize speech from text in a single request. The simplest way to get audio when you have the full text up front — pass `text` + `voice_id`, get back binary audio.
## When to use this
- **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation).
- **Use the SSE streaming endpoint** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps).
- **Use the WebSocket endpoint** when text arrives incrementally (LLM token streams, live captioning).
## Key features
- 44 kHz natural, expressive synthesis
- Cloned voice IDs (`voice_*`) work — same param as catalog voices
- 12 documented languages — see the model card for the full list
- Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw`
- Sample rates: 8 kHz – 44.1 kHz
- Speed: 0.5× – 2×
- Per-call pronunciation dictionaries via `pronunciation_dicts`
## Examples
**cURL**
```bash
curl -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-H "Accept: audio/wav" \
-d '{
"text": "Hello from Lightning v3.1.",
"voice_id": "magnus",
"sample_rate": 24000,
"output_format": "wav"
}' --output speech.wav
```
**Python** (`pip install smallestai>=4.4.0`)
```python
from smallestai import SmallestAI
client = SmallestAI(token="YOUR_API_KEY")
with open("speech.wav", "wb") as f:
for chunk in client.waves.synthesize_lightning_v3_1(
text="Hello from Lightning v3.1.",
voice_id="magnus",
sample_rate=24000,
output_format="wav",
# Optional: cloned voice support
# voice_id="voice_FlPKRWI7DX",
# Optional: pin pronunciations for specific words
# pronunciation_dicts=["<your dict id>"],
):
f.write(chunk)
```
**JavaScript / TypeScript** (using `fetch`)
```typescript
const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech", {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
"Content-Type": "application/json",
Accept: "audio/wav",
},
body: JSON.stringify({
text: "Hello from Lightning v3.1.",
voice_id: "magnus",
sample_rate: 24000,
output_format: "wav",
}),
});
const audio = Buffer.from(await res.arrayBuffer());
require("node:fs").writeFileSync("speech.wav", audio);
```
## Common gotchas
- **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response.
- **Cloned voices** (`voice_*` from `add_voice`) work on this endpoint and support `pronunciation_dicts`.
- **`pronunciation_dicts` validates IDs at request time.** Passing an unknown ID returns `Invalid input data` — create the dict first via the pronunciation-dicts endpoint and save the returned `id`.
- **Pronunciation matching is case-sensitive.** Add both `Synopsis` and `synopsis` if your text uses both casings.
- **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters.
- **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` or `axios` as shown above.
Request
This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
modelenumOptionalDefaults to lightning_v3.1
TTS model to route the request to.
lightning_v3.1 (default) — standard Lightning v3.1 pool.
lightning_v3.1_pro — Lightning v3.1 Pro pool with a curated
voice catalog. See the
Pro model card.
New integrations should use the unified
/waves/v1/tts route instead of this endpoint, but the model
field is supported here for backwards-compatible Pro opt-in.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en
Language code for synthesis. Influences pronunciation, number/date
normalization, and phoneme selection.
- Indian:
en, hi, mr (Marathi), kn (Kannada), ta (Tamil),
bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam),
pa (Punjabi), or (Odia)
- European:
es (Spanish)
output_formatenumOptionalDefaults to pcm
Format of the returned audio. pcm is the lowest-latency option
but requires a decoder to play; mp3 and wav are directly
playable in browsers and most media players. The server default
is pcm when the field is omitted — the API playground uses
mp3 so the generated audio is directly playable.
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.
request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters
Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.
Response
Synthesized speech retrieved successfully.