Lightning v3.1 Pro | Smallest AI Docs

Latest Release

Lightning v3.1 Pro is a premium 44.1 kHz text-to-speech pool with improved naturalness and a curated voice catalog. Runs on dedicated inference capacity, isolated from general traffic. Concurrency, latency, and rate limits are identical to standard Lightning v3.1; the difference is voice quality and the catalog.

Jump to: Benchmarks · Voice Catalog · API Reference · Quickstart

44.1 kHz

Native sample rate

200ms

TTFB at 40 concurrent requests

29 Languages

English + Hindi, plus 27 more — 9 Indian, 8 Asian & Middle Eastern, 10 European

3.3x

Real-time factor (faster than playback)

Model Overview


Developed by	Smallest AI
Model type	Text-to-Speech Websocket \| Text-to-Speech SSE \| Text-to-Speech HTTP
Languages	29 — English + Hindi, plus 27 more (9 Indian, 8 Asian & Middle Eastern, 10 European); see Supported Languages
Audio Output formats	PCM, MP3, WAV, ulaw, alaw
Pricing (Standard Plan)	~$0.195/10K characters
Concurrency (Standard Plan)	10
Native Sample Rate	44.1 kHz
Supported sample rates	8,000 / 16,000 / 24,000 / 44,100 Hz
Audio channels	Mono
Recommended GPU	NVIDIA L40S
Max chunk size	250 characters (optimal throughput at ~140 characters per request)

Key Capabilities

Real-Time Optimized

Ultra-low latency architecture designed for conversational AI and live streaming.

Streaming

HTTP, SSE, and WebSocket transports for real-time playback.

Multi-Language

language: en → UK + American accented English. language: hi → Indian accented English + Hindi (code-switching). Omit → defaults to en + hi. Plus 27 additional languages via dedicated Pro voices — pass the ISO 639-1 code (e.g. ta, de, ja).

High Fidelity

Broadcast-quality 44.1 kHz audio with natural prosody, intonation, and conversational rhythm.

Curated Voice Catalog

Premium voices across American, British, and Indian accents.

Pronunciation Control

Custom pronunciation dictionaries for specialized vocabulary, brand names, and domain-specific terms.

Performance & Benchmarks

Pro improves on standard Lightning v3.1 across accuracy, expressiveness, delivery, and MOS quality. Tables below pair Pro with the same competitor set documented on the Lightning v3.1 model card; refer to that card for Pro-vs-Standard comparisons. Open the accordion under each category to see what each metric measures.

Naturalness — higher is better

Metric	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
Overall	3.16	3.13	3.16	3.17	3.20	3.07	3.28	3.17	3.06	3.02
Naturalness	2.55	2.41	2.52	2.55	2.57	2.42	2.58	2.57	2.41	2.37
Intonation	3.06	3.06	3.07	3.06	3.12	2.90	3.28	3.04	2.91	2.86
Prosody	2.81	2.73	2.82	2.86	2.83	2.65	3.09	2.76	2.61	2.58

What each Naturalness metric measures

Overall — Holistic listener rating of how natural the voice sounds end-to-end.
Naturalness — How human-like the voice sounds; penalizes robotic or synthetic quality.
Intonation — Whether pitch rises and falls appropriately for the sentence type (question, statement, exclamation).
Prosody — The broader umbrella of rhythm, stress, and melody, how well the voice “reads” the sentence as a human would.

Expressiveness — higher is better

Metric	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
Overall	3.55	3.45	3.44	3.46	3.38	3.49	3.54	3.50	3.37	3.41
Paralinguistics	3.64	3.60	3.59	3.61	3.56	3.60	3.64	3.58	3.55	3.58
Emotions	3.47	3.30	3.28	3.31	3.19	3.38	3.44	3.41	3.19	3.23

What each Expressiveness metric measures

Overall — Holistic listener rating of how expressive the voice sounds given the context of the sentence.
Paralinguistics — Non-verbal vocal elements like laughter, sighs, or filler sounds (“um”, “uh”) and whether they’re rendered appropriately.
Emotions — How accurately the voice conveys the intended emotional tone (neutral, warm, urgent, etc.).

Delivery — higher is better

Metric	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
Boundary Consistency	4.96	4.94	4.93	4.95	4.93	4.88	4.99	4.77	4.90	4.88
Pronunciation Style	4.98	4.96	4.95	4.96	4.96	4.93	4.99	4.91	4.94	4.89
Natural Pace	4.72	4.57	4.51	4.51	4.01	4.23	4.66	4.47	4.33	3.74
Pause Placement	4.66	4.54	4.49	4.51	4.28	4.34	4.59	4.41	4.38	4.09
Breathing Naturalness	3.82	3.06	3.14	3.14	2.79	2.88	3.43	3.28	2.77	2.42

What each Delivery metric measures

Boundary Consistency — Whether phrase and sentence boundaries are marked consistently with pauses or pitch shifts, without arbitrary breaks mid-phrase.
Pronunciation Style — Not just correctness, but stylistic choices i.e., formal vs. casual register, regional accent consistency, honorific handling.
Natural Pace — Whether the speaking rate feels comfortable and appropriate for the content type, neither rushed nor dragging.
Pause Placement — Whether silences appear at semantically correct points (after commas, between clauses) rather than mid-word or mid-phrase.
Breathing Naturalness — Whether breath sounds occur at realistic points and with realistic frequency, not absent entirely or inserted randomly.

Accuracy

Mixed direction — WER, CER, Hallucination, and Deletion are lower is better; Pronunciation % is higher is better.

Whisper jiwer

Metric	Direction	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
WER	lower	1.36%	1.26%	1.35%	1.33%	1.43%	1.26%	1.37%	1.25%	1.10%	2.83%
CER	lower	0.40%	0.52%	0.60%	0.54%	0.59%	0.62%	0.61%	0.50%	0.47%	1.16%
Hallucination	lower	0.00%	0.07%	0.08%	0.01%	0.06%	0.04%	0.01%	0.06%	0.00%	0.22%
Deletion	lower	0.00%	0.14%	0.17%	0.18%	0.16%	0.24%	0.18%	0.15%	0.12%	0.33%
Pronunciation % _{Whisper jiwer}	higher	98.68%	98.94%	98.90%	98.87%	98.79%	99.02%	98.82%	98.95%	99.02%	97.72%

Whisper LLM

Metric	Direction	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
WER	lower	0.96%	0.82%	0.72%	0.57%	0.88%	0.70%	0.72%	0.60%	0.55%	2.15%
CER	lower	0.34%	0.30%	0.28%	0.21%	0.30%	0.35%	0.33%	0.23%	0.18%	1.03%
Hallucination	lower	0.00%	0.07%	0.07%	0.00%	0.02%	0.02%	0.01%	0.03%	0.00%	0.10%
Pronunciation % _{Whisper LLM}	higher	99.04%	99.25%	99.35%	99.43%	99.14%	99.32%	99.29%	99.43%	99.45%	97.95%

What each Accuracy metric measures

WER (Word Error Rate) — Percentage of words in the transcript that differ from the reference; measures how faithfully the TTS renders the input text.
CER (Character Error Rate) — Like WER but at the character level.
Hallucination — Words or sounds the TTS generates that have no basis in the input text. Insertions, substitutions, or fabricated content.
Deletion — Words from the reference text that the TTS dropped entirely.
Pronunciation % — The proportion of words pronounced correctly out of total words.
Whisper jiwer vs Whisper LLM — Two judging methodologies. jiwer uses raw Whisper-decoded transcripts; LLM-judged uses a follow-on LLM to normalize transcription noise. Both report the same metric family; LLM-judged tends to give lower error rates by reducing false positives from punctuation/casing.

MOS v2 — higher is better

Metric	Lightning v3.1 Pro	GPT-4o-mini	ElevenLabs Turbo v2.5	ElevenLabs Multilingual v2	Sonic-3	Gemini 2.5 Pro	Gemini 2.5 Flash	MAI-Voice-1	Inworld 1.5	S2 Pro
Mean MOS	4.22	4.16	3.98	4.02	3.76	4.11	4.24	3.97	3.73	3.99
UTMOS	3.76	3.76	3.37	3.41	2.77	3.57	3.71	3.33	2.54	3.50
WV-MOS	5.05	4.55	4.60	4.63	4.76	4.65	4.76	4.62	4.91	4.48

What each MOS metric measures

Mean MOS — Mean Opinion Score: average listener rating on a 1–5 scale across the test set; the canonical aggregate quality metric in TTS evaluation.
UTMOS — A predicted MOS from the UTMOS reference model — an automated proxy for subjective quality.
WV-MOS — A predicted MOS from the WavLM-based WV-MOS reference model — another automated proxy commonly reported alongside UTMOS for cross-validation.

Want to reproduce these results? See the TTS evaluation script to measure TTFB and synthesis quality in your own environment.

Supported Languages

Pass the language body parameter to steer the Pro pool’s output:

`language` value	Behaviour
`en`	UK + American accented English. Best paired with British or American Pro voices.
`hi`	Indian accented English + Hindi (mid-utterance code-switching). Best paired with Indian Pro voices.
ISO 639-1 code of an additional Pro language (e.g. `ta`, `de`, `ja`)	Native synthesis in that language. Pair with a Pro voice from that language’s catalog section below.
omitted	Defaults to `en + hi` — mixed Indian + Western English coverage.

Additional Pro languages

27 additional languages have dedicated Pro voices. Pass the ISO 639-1 code in the language body parameter and pick a voice_id from the matching Voice Catalog section.

Indian

Language	Code	Pro voices
Marathi	`mr`	6
Tamil	`ta`	12
Malayalam	`ml`	6
Telugu	`te`	8
Kannada	`kn`	10
Punjabi	`pa`	7
Bengali	`bn`	5
Odia	`or`	8
Gujarati	`gu`	5

Asian & Middle Eastern

Language	Code	Pro voices
Arabic	`ar`	2
Chinese (Mandarin)	`zh`	5
Indonesian	`id`	4
Japanese	`ja`	4
Korean	`ko`	1
Malay	`ms`	2
Turkish	`tr`	2
Vietnamese	`vi`	1

European

Language	Code	Pro voices
German	`de`	7
Spanish	`es`	6
French	`fr`	9
Italian	`it`	6
Portuguese (Brazilian + European)	`pt`	7
Russian	`ru`	7
Greek	`el`	5
Finnish	`fi`	6
Norwegian	`no`	4
Polish	`pl`	4

For other languages, use the standard Lightning v3.1 model (12 languages, full voice catalog).

Voice Catalog

The Pro voice catalog is distinct from standard Lightning v3.1. Voices below are listed in recommended ranking per accent group.

Indian — Female

Voice ID	Name
`rhea`	Rhea
`zariya`	Zariya
`kareena`	Kareena
`mishka`	Mishka
`inaaya`	Inaaya
`saira`	Saira
`meher`	Meher
`aarini`	Aarini

Indian — Male

Voice ID	Name
`aviraj`	Aviraj
`vyom`	Vyom
`zoravar`	Zoravar
`reyansh`	Reyansh
`ahan`	Ahan

British — Female

Voice ID	Name
`sophie`	Sophie
`ellie`	Ellie
`cressida`	Cressida
`ottilie`	Ottilie
`elowen`	Elowen
`seraphina`	Seraphina

British — Male

Voice ID	Name
`sam`	Sam
`henry`	Henry
`benedict`	Benedict
`cormac`	Cormac
`rupert`	Rupert
`finley`	Finley

American — Female

Voice ID	Name
`kaitlyn`	Kaitlyn
`savannah`	Savannah
`amelia`	Amelia
`zoe`	Zoe
`ruby`	Ruby
`leah`	Leah
`jenna`	Jenna
`kate`	Kate
`molly`	Molly
`sara`	Sara
`fiona`	Fiona

American — Male

Voice ID	Name
`blake`	Blake
`austin`	Austin
`henry`	Henry
`jack`	Jack
`leo`	Leo
`luke`	Luke
`owen`	Owen

Indian Languages — 67 voices

Pair each voice with its matching language code (e.g. "language": "ta" with a Tamil voice).

Language	Code	Female voices	Male voices
Marathi	`mr`	`mrunal`, `manasi`, `ketaki`, `tejaswini`	`mandar`, `tushar`
Tamil	`ta`	`malar`, `nila`, `tamilselvi`	`mathan`, `dinesh`, `prabhu`, `ezhil`, `kavin`, `tamizh`, `barath`, `sakthi`, `murugan`
Malayalam	`ml`	`parvathy`, `lakshmi`	`vishnu`, `sreenath`, `unni`, `aravindan`
Telugu	`te`	`sravani`, `swathi`	`naveen`, `charan`, `sasank`, `bhaskar`, `gopal`, `manohar`
Kannada	`kn`	`spoorthi`, `rashmi`, `varsha`, `sahana`	`rakshith`, `kishore`, `yogesh`, `gowtham`, `shankar`, `basava`
Punjabi	`pa`	`jasleen`, `manmeet`	`rajdeep`, `tejinder`, `sukhdeep`, `amrit`, `gagandeep`
Bengali	`bn`	—	`rajib`, `tanmoy`, `subhro`, `arghya`, `indranil`
Odia	`or`	`sasmita`, `ankita`	`subrat`, `debasish`, `sambit`, `pratik`, `rakesh`, `smruti`
Gujarati	`gu`	`krupa`, `riddhi`	`jignesh`, `mit`, `keval`

Asian & Middle Eastern Languages — 21 voices

Language	Code	Female voices	Male voices
Arabic	`ar`	`layla`	`adam`
Chinese (Mandarin)	`zh`	`hazel`, `vivian`	`dylan`, `silas`, `eli`
Indonesian	`id`	`nora`	`bryce`, `miles`, `cole`
Japanese	`ja`	`aria`, `mila`, `daisy`	`jasper`
Korean	`ko`	`june`	—
Malay	`ms`	`sasha`	`roman`
Turkish	`tr`	—	`beau`, `wes`
Vietnamese	`vi`	—	`kai`

European Languages — 61 voices

Language	Code	Female voices	Male voices
German	`de`	`hanna`, `lea`, `petra`	`max`, `ben`, `markus`, `finn`
Spanish	`es`	`martina`, `ines`, `paula`	`sebastian`, `mateo`, `gabriel`
French	`fr`	`manon`, `juliette`, `lucie`, `elise`, `amelie`	`louis`, `nicolas`, `maxime`, `raphael`
Italian	`it`	`silvia`, `concetta`, `arianna`	`davide`, `luca`, `leonardo`
Portuguese (Brazilian)	`pt`	`juliana`, `leticia`	`gustavo`, `thiago`, `bruno`
Portuguese (European)	`pt`	`catarina`	`francisco`
Russian	`ru`	`anastasia`, `ekaterina`, `olga`, `irina`	`andrei`, `nikolai`, `maksim`
Greek	`el`	`katerina`, `dimitra`, `athina`	`dimitris`, `vasilis`
Finnish	`fi`	`aino`, `helmi`, `venla`	`mika`, `timo`, `matti`
Norwegian	`no`	`solveig`, `marit`	`kristian`, `espen`
Polish	`pl`	`ewa`, `joanna`	`tomasz`, `jakub`

Need a voice not in this list? Use the standard Lightning v3.1 catalog (217 voices, more languages, voice cloning). Pass "model": "lightning_v3.1" (or omit the field) instead of lightning_v3.1_pro.

API Reference

Endpoints

Endpoint	Method	Use Case
`https://api.smallest.ai/waves/v1/tts`	POST	Synchronous synthesis
`https://api.smallest.ai/waves/v1/tts/live`	POST (SSE)	Server-sent events streaming
`wss://api.smallest.ai/waves/v1/tts/live`	WebSocket	Real-time streaming

See Synthesize Speech for the full request/response schema, supported parameters, and error codes. To route to the Pro pool you must set model to lightning_v3.1_pro explicitly — the field is optional but defaults to standard Lightning v3.1.

Quickstart

Generate your first audio in under a minute with a single API call.

Best Practices

Voice ID + model pairing

Pair Pro voice IDs above with "model": "lightning_v3.1_pro". The API does not currently reject mismatched pairings, but pairing a Pro voice with "model": "lightning_v3.1" (or omitting model) can produce wrong or hallucinated audio. Server-side validation is on the roadmap.

Language selection

language: en → UK + American accented English. Pair with British or American Pro voices for best results.
language: hi → Indian accented English + Hindi with native code-switching mid-utterance. Pair with Indian Pro voices.
Any additional Pro language (e.g. language: ta, language: de, language: ja) → native synthesis in that language. Always pair with a Pro voice from that language’s Voice Catalog section.
Omit language → defaults to en + hi. Sensible when you don’t know the input language ahead of time.

Per-voice metadata still lives in tags.language on the voice catalog (GET /waves/v1/lightning-v3.1/get_voices). The body parameter sets the target language for the synthesis pass; the voice ID controls the timbre and accent of the speaker.

Text Formatting

Chunk boundaries. Segment input at natural prosodic boundaries (. ! ? ,). Maximum chunk size is 250 characters; optimal throughput at 140 characters per request.
Script integrity. Use native script for each language. Mixed-script input within a single language token produces unpredictable phoneme mappings.
Lexicon overrides. Use pronunciation dictionaries for domain-specific terms, brand names, and acronyms where default grapheme-to-phoneme conversion is insufficient.

For comprehensive text formatting rules (numeric handling, date/time, symbols, chunking logic), see TTS Best Practices.

Use Cases

Direct Use	Downstream Use
Voice assistants and conversational AI	Multi-turn conversational agents
Interactive chatbots with voice output	Audio content generation pipelines
Real-time narration and live streaming	Telephony and IVR systems
Accessibility tools and screen readers	Podcast generation
Customer service automation

FAQ

How is Lightning v3.1 Pro different from standard Lightning v3.1?

Pro runs on dedicated inference capacity, isolated from general Lightning traffic, and ships a curated premium voice catalog with improved naturalness. Concurrency, latency, and rate-limit ceilings are identical to standard Lightning v3.1 — the difference is voice quality and the catalog.

How do I route requests to the Pro pool?

Set model to lightning_v3.1_pro explicitly. The field is optional, but it defaults to standard Lightning v3.1 — for Pro you must pass it on every request.

Can I pair any voice with the Pro model?

Pair Pro voice IDs with "model": "lightning_v3.1_pro". The API does not currently reject mismatched pairings, but pairing a Pro voice with "model": "lightning_v3.1" (or omitting model) can produce wrong or hallucinated audio. Server-side validation is on the roadmap.

Does Lightning v3.1 Pro support voice cloning?

No. Voice cloning is not available on the Pro pool. Clones continue to use standard Lightning v3.1 and the existing voice-cloning flow.

Which languages does Lightning v3.1 Pro support?

Set language: en for UK + American accented English, or language: hi for Indian accented English + Hindi with native code-switching. Omitting language defaults to en + hi.

Are word-level timestamps supported on Pro voices?

Pro shares the Lightning v3.1 base-queue voices for English + Hindi (meher, devansh, kartik, maithili, liam, avery), so word events emit on those. Other Pro-only voices fall back to silent graceful degradation — audio is normal, but no word events are emitted.

Safety & Compliance

Known Limitations

No voice cloning. Voice cloning is not available on the Pro pool. Clones continue to use standard Lightning v3.1 and the existing voice-cloning flow.

Lightning v3.1 Pro must not be used for impersonation or fraud, generating deceptive audio content (deepfakes), creating content that violates consent or privacy, harassment or abuse, or any illegal or unethical purposes.

Compliance

No retention of synthesized audio
Usage monitoring for policy compliance

For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai.

Support

Console

Documentation