Lightning v3.1

View as Markdown
Synthesize speech from text in a single request. The simplest way to get audio when you have the full text up front — pass `text` + `voice_id`, get back binary audio. ## When to use this - **Use this** for short utterances you can render before playback (notifications, prompts, batch jobs, audio file generation). - **Use the SSE streaming endpoint** when you want playback to start before the full audio is ready (long passages, latency-sensitive apps). - **Use the WebSocket endpoint** when text arrives incrementally (LLM token streams, live captioning). ## Key features - 44 kHz natural, expressive synthesis - Cloned voice IDs (`voice_*`) work — same param as catalog voices - 12 documented languages — see the model card for the full list - Output formats: `pcm`, `mp3`, `wav`, `ulaw`, `alaw` - Sample rates: 8 kHz – 44.1 kHz - Speed: 0.5× – 2× - Per-call pronunciation dictionaries via `pronunciation_dicts` ## Examples **cURL** ```bash curl -X POST "https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -H "Accept: audio/wav" \ -d '{ "text": "Hello from Lightning v3.1.", "voice_id": "magnus", "sample_rate": 24000, "output_format": "wav" }' --output speech.wav ``` **Python** (`pip install smallestai>=4.4.0`) ```python from smallestai import SmallestAI client = SmallestAI(token="YOUR_API_KEY") with open("speech.wav", "wb") as f: for chunk in client.waves.synthesize_lightning_v31( text="Hello from Lightning v3.1.", voice_id="magnus", sample_rate=24000, output_format="wav", # Optional: cloned voice support # voice_id="voice_FlPKRWI7DX", # Optional: pin pronunciations for specific words # pronunciation_dicts=["<your dict id>"], ): f.write(chunk) ``` **JavaScript / TypeScript** (using `fetch`) ```typescript const res = await fetch("https://api.smallest.ai/waves/v1/lightning-v3.1/get_speech", { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/json", Accept: "audio/wav", }, body: JSON.stringify({ text: "Hello from Lightning v3.1.", voice_id: "magnus", sample_rate: 24000, output_format: "wav", }), }); const audio = Buffer.from(await res.arrayBuffer()); require("node:fs").writeFileSync("speech.wav", audio); ``` ## Common gotchas - **Set `Accept: audio/wav`.** Omitting it can return an empty or unplayable response. - **Cloned voices** (`voice_*` from `add_voice`) work on this endpoint and support `pronunciation_dicts`. - **`pronunciation_dicts` validates IDs at request time.** Passing an unknown ID returns `Invalid input data` — create the dict first via the pronunciation-dicts endpoint and save the returned `id`. - **Pronunciation matching is case-sensitive.** Add both `Synopsis` and `synopsis` if your text uses both casings. - **44.1 kHz output** is supported but most playback environments are happy with 24 kHz — drop the sample rate if bandwidth matters. - **JavaScript / TypeScript**: the official `smallestai` npm package predates Lightning v3.1, so call this endpoint with `fetch` or `axios` as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Headers

AcceptenumRequiredDefaults to audio/wav

Must be audio/wav to receive binary audio. Required for proper playback.

Allowed values:

Request

This endpoint expects an object.
textstringRequiredDefaults to Hey i am your a text to speech model
The text to convert to speech.
voice_idstringRequiredDefaults to daniel
The voice identifier to use for speech generation.
sample_rateenumOptionalDefaults to 44100
The sample rate for the generated audio.
Allowed values:
speeddoubleOptional0.5-2Defaults to 1
The speed of the generated speech.
languageenumOptionalDefaults to en

Language code for synthesis. Influences pronunciation, number/date normalization, and phoneme selection.

  • Indian: en, hi, mr (Marathi), kn (Kannada), ta (Tamil), bn (Bengali), gu (Gujarati), te (Telugu), ml (Malayalam), pa (Punjabi), or (Odia)
  • European: es (Spanish)
  • auto — auto-detect from input text (recommended for code-switching)
output_formatenumOptionalDefaults to pcm

Format of the returned audio. pcm is the lowest-latency option but requires a decoder to play; mp3 and wav are directly playable in browsers and most media players. The server default is pcm when the field is omitted — the API playground uses mp3 so the generated audio is directly playable.

Allowed values:
pronunciation_dictslist of stringsOptional
The IDs of the pronunciation dictionaries to use for speech generation.
session_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided session identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Session-Id.

request_idstringOptionalformat: "^[a-zA-Z0-9_\-.]+$"<=128 characters

Optional client-provided request identifier for correlation. Only alphanumeric characters, hyphens, underscores, and dots are allowed. Max 128 characters. Echoed back in response headers as X-External-Request-Id.

Response headers

X-Session-Idstring

Internal session identifier (system-generated UUID).

X-Request-Idstring

Internal request identifier (system-generated UUID).

X-External-Session-Idstring

Echoed client-provided session_id (empty if not provided).

X-External-Request-Idstring

Echoed client-provided request_id (empty if not provided).

Response

Synthesized speech retrieved successfully.

Errors

400
Bad Request Error
401
Unauthorized Error
500
Internal Server Error