> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Lightning v3.1 Pro

> Model card for Lightning v3.1 Pro. Premium 44.1 kHz TTS pool with a curated voice catalog across American, British, and Indian accents, English + Hindi code-switching, and improved naturalness.

Latest Release

Lightning v3.1 Pro is a premium 44.1 kHz text-to-speech pool with improved naturalness and a curated voice catalog. Runs on dedicated inference capacity, isolated from general traffic. Concurrency, latency, and rate limits are identical to standard Lightning v3.1; the difference is voice quality and the catalog.

Native sample rate

TTFB at 40 concurrent requests

Indian voices code-switch; British and American voices English-only

Real-time factor (faster than playback)

## Model Overview

|                        |                                      |
| ---------------------- | ------------------------------------ |
| **Developed by**       | Smallest AI                          |
| **Model type**         | Text-to-Speech / Speech Synthesis    |
| **Languages**          | English (`en`), Hindi (`hi`), `auto` |
| **License**            | Proprietary                          |
| **Version**            | v3.1 Pro                             |
| **Model ID**           | `lightning_v3.1_pro`                 |
| **Native sample rate** | 44,100 Hz                            |

### Key Capabilities

Ultra-low latency architecture designed for conversational AI and live streaming.

HTTP, SSE, and WebSocket support for real-time applications.

Indian voices speak English + Hindi with automatic code-switching. British and American voices speak English.

Broadcast-quality 44.1 kHz audio with natural prosody, intonation, and conversational rhythm.

Premium voices across American, British, and Indian accents.

Custom pronunciation dictionaries for specialized vocabulary, brand names, and domain-specific terms.

***

## How to use it

Pro is selected via the `model` body parameter on the unified TTS routes — no separate endpoint to call.

```bash
curl -X POST "https://api.smallest.ai/waves/v1/tts" \
  -H "Authorization: Bearer $SMALLEST_API_KEY" \
  -H "Content-Type: application/json" \
  -H "Accept: audio/wav" \
  -d '{
    "text": "Hello from the Lightning v3.1 Pro pool.",
    "voice_id": "meher",
    "model": "lightning_v3.1_pro",
    "language": "en",
    "sample_rate": 24000,
    "output_format": "wav"
  }' --output speech.wav
```

The same `"model": "lightning_v3.1_pro"` body field also routes to the Pro pool on the WebSocket and SSE endpoints.

**On Atoms voice agents**, open the agent's voice picker and pick a Pro voice from the **Pro** filter chip. Atoms transparently routes to the Pro pool — no code change required.

***

## Performance & Benchmarks

Pro improves on standard Lightning v3.1 across accuracy, expressiveness, delivery, and MOS quality. Tables below pair Pro with the same competitor set documented on the [Lightning v3.1 model card](/waves/model-cards/text-to-speech/lightning-v-3-1); refer to that card for Pro-vs-Standard comparisons. Open the accordion under each category to see what each metric measures.

### Naturalness — higher is better

| Metric      | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| ----------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| Overall     |               3.16 |        3.13 |                  3.16 |                       3.17 |    3.20 |           3.07 |             3.28 |        3.17 |        3.06 |   3.02 |
| Naturalness |               2.55 |        2.41 |                  2.52 |                       2.55 |    2.57 |           2.42 |             2.58 |        2.57 |        2.41 |   2.37 |
| Intonation  |               3.06 |        3.06 |                  3.07 |                       3.06 |    3.12 |           2.90 |             3.28 |        3.04 |        2.91 |   2.86 |
| Prosody     |               2.81 |        2.73 |                  2.82 |                       2.86 |    2.83 |           2.65 |             3.09 |        2.76 |        2.61 |   2.58 |

* **Overall** — Holistic listener rating of how natural the voice sounds end-to-end.
* **Naturalness** — How human-like the voice sounds; penalizes robotic or synthetic quality.
* **Intonation** — Whether pitch rises and falls appropriately for the sentence type (question, statement, exclamation).
* **Prosody** — The broader umbrella of rhythm, stress, and melody, how well the voice "reads" the sentence as a human would.

### Expressiveness — higher is better

| Metric          | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| --------------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| Overall         |           **3.55** |        3.45 |                  3.44 |                       3.46 |    3.38 |           3.49 |             3.54 |        3.50 |        3.37 |   3.41 |
| Paralinguistics |           **3.64** |        3.60 |                  3.59 |                       3.61 |    3.56 |           3.60 |             3.64 |        3.58 |        3.55 |   3.58 |
| Emotions        |           **3.47** |        3.30 |                  3.28 |                       3.31 |    3.19 |           3.38 |             3.44 |        3.41 |        3.19 |   3.23 |

* **Overall** — Holistic listener rating of how expressive the voice sounds given the context of the sentence.
* **Paralinguistics** — Non-verbal vocal elements like laughter, sighs, or filler sounds ("um", "uh") and whether they're rendered appropriately.
* **Emotions** — How accurately the voice conveys the intended emotional tone (neutral, warm, urgent, etc.).

### Delivery — higher is better

| Metric                | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| --------------------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| Boundary Consistency  |               4.96 |        4.94 |                  4.93 |                       4.95 |    4.93 |           4.88 |             4.99 |        4.77 |        4.90 |   4.88 |
| Pronunciation Style   |               4.98 |        4.96 |                  4.95 |                       4.96 |    4.96 |           4.93 |             4.99 |        4.91 |        4.94 |   4.89 |
| Natural Pace          |           **4.72** |        4.57 |                  4.51 |                       4.51 |    4.01 |           4.23 |             4.66 |        4.47 |        4.33 |   3.74 |
| Pause Placement       |           **4.66** |        4.54 |                  4.49 |                       4.51 |    4.28 |           4.34 |             4.59 |        4.41 |        4.38 |   4.09 |
| Breathing Naturalness |           **3.82** |        3.06 |                  3.14 |                       3.14 |    2.79 |           2.88 |             3.43 |        3.28 |        2.77 |   2.42 |

* **Boundary Consistency** — Whether phrase and sentence boundaries are marked consistently with pauses or pitch shifts, without arbitrary breaks mid-phrase.
* **Pronunciation Style** — Not just correctness, but stylistic choices i.e., formal vs. casual register, regional accent consistency, honorific handling.
* **Natural Pace** — Whether the speaking rate feels comfortable and appropriate for the content type, neither rushed nor dragging.
* **Pause Placement** — Whether silences appear at semantically correct points (after commas, between clauses) rather than mid-word or mid-phrase.
* **Breathing Naturalness** — Whether breath sounds occur at realistic points and with realistic frequency, not absent entirely or inserted randomly.

### Accuracy

Mixed direction — WER, CER, Hallucination, and Deletion are *lower is better*; Pronunciation % is *higher is better*.

#### Whisper jiwer

| Metric                                        | Direction | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| --------------------------------------------- | --------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| WER                                           | lower     |              1.36% |       1.26% |                 1.35% |                      1.33% |   1.43% |          1.26% |            1.37% |       1.25% |       1.10% |  2.83% |
| CER                                           | lower     |          **0.40%** |       0.52% |                 0.60% |                      0.54% |   0.59% |          0.62% |            0.61% |       0.50% |       0.47% |  1.16% |
| Hallucination                                 | lower     |          **0.00%** |       0.07% |                 0.08% |                      0.01% |   0.06% |          0.04% |            0.01% |       0.06% |       0.00% |  0.22% |
| Deletion                                      | lower     |          **0.00%** |       0.14% |                 0.17% |                      0.18% |   0.16% |          0.24% |            0.18% |       0.15% |       0.12% |  0.33% |
| Pronunciation %<br /><sub>Whisper jiwer</sub> | higher    |             98.68% |      98.94% |                98.90% |                     98.87% |  98.79% |         99.02% |           98.82% |      98.95% |      99.02% | 97.72% |

#### Whisper LLM

| Metric                                      | Direction | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| ------------------------------------------- | --------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| WER                                         | lower     |              0.96% |       0.82% |                 0.72% |                      0.57% |   0.88% |          0.70% |            0.72% |       0.60% |       0.55% |  2.15% |
| CER                                         | lower     |              0.34% |       0.30% |                 0.28% |                      0.21% |   0.30% |          0.35% |            0.33% |       0.23% |       0.18% |  1.03% |
| Hallucination                               | lower     |          **0.00%** |       0.07% |                 0.07% |                      0.00% |   0.02% |          0.02% |            0.01% |       0.03% |       0.00% |  0.10% |
| Pronunciation %<br /><sub>Whisper LLM</sub> | higher    |             99.04% |      99.25% |                99.35% |                     99.43% |  99.14% |         99.32% |           99.29% |      99.43% |      99.45% | 97.95% |

* **WER (Word Error Rate)** — Percentage of words in the transcript that differ from the reference; measures how faithfully the TTS renders the input text.
* **CER (Character Error Rate)** — Like WER but at the character level.
* **Hallucination** — Words or sounds the TTS generates that have no basis in the input text. Insertions, substitutions, or fabricated content.
* **Deletion** — Words from the reference text that the TTS dropped entirely.
* **Pronunciation %** — The proportion of words pronounced correctly out of total words.
* **Whisper jiwer vs Whisper LLM** — Two judging methodologies. `jiwer` uses raw Whisper-decoded transcripts; LLM-judged uses a follow-on LLM to normalize transcription noise. Both report the same metric family; LLM-judged tends to give lower error rates by reducing false positives from punctuation/casing.

### MOS v2 — higher is better

| Metric   | Lightning v3.1 Pro | GPT-4o-mini | ElevenLabs Turbo v2.5 | ElevenLabs Multilingual v2 | Sonic-3 | Gemini 2.5 Pro | Gemini 2.5 Flash | MAI-Voice-1 | Inworld 1.5 | S2 Pro |
| -------- | -----------------: | ----------: | --------------------: | -------------------------: | ------: | -------------: | ---------------: | ----------: | ----------: | -----: |
| Mean MOS |               4.22 |        4.16 |                  3.98 |                       4.02 |    3.76 |           4.11 |             4.24 |        3.97 |        3.73 |   3.99 |
| UTMOS    |           **3.76** |        3.76 |                  3.37 |                       3.41 |    2.77 |           3.57 |             3.71 |        3.33 |        2.54 |   3.50 |
| WV-MOS   |           **5.05** |        4.55 |                  4.60 |                       4.63 |    4.76 |           4.65 |             4.76 |        4.62 |        4.91 |   4.48 |

* **Mean MOS** — Mean Opinion Score: average listener rating on a 1–5 scale across the test set; the canonical aggregate quality metric in TTS evaluation.
* **UTMOS** — A predicted MOS from the UTMOS reference model — an automated proxy for subjective quality.
* **WV-MOS** — A predicted MOS from the WavLM-based WV-MOS reference model — another automated proxy commonly reported alongside UTMOS for cross-validation.

Want to reproduce these results? See the [TTS evaluation script](/waves/model-cards/text-to-speech/tts-evaluation-script) to measure TTFB and synthesis quality in your own environment.

***

## Supported Languages

Pro language support varies per voice — Indian Pro voices speak English with native Hindi code-switching; British and American Pro voices speak English only. For languages outside these, use the standard [Lightning v3.1](/waves/model-cards/text-to-speech/lightning-v-3-1) model.

| Voice group         | Languages                    | Code-switching                                                  |
| ------------------- | ---------------------------- | --------------------------------------------------------------- |
| Indian Pro voices   | English (`en`), Hindi (`hi`) | English ↔ Hindi within a single utterance via `language="auto"` |
| British Pro voices  | English (`en`)               | —                                                               |
| American Pro voices | English (`en`)               | —                                                               |

***

## Voice Catalog

The Pro voice catalog is distinct from standard Lightning v3.1. Voices below are listed in recommended ranking per accent group.

### Indian — Female

| Voice ID  | Name    |
| --------- | ------- |
| `rhea`    | Rhea    |
| `zariya`  | Zariya  |
| `kareena` | Kareena |
| `mishka`  | Mishka  |
| `inaaya`  | Inaaya  |
| `saira`   | Saira   |
| `meher`   | Meher   |
| `aarini`  | Aarini  |

### Indian — Male

| Voice ID  | Name    |
| --------- | ------- |
| `aviraj`  | Aviraj  |
| `vyom`    | Vyom    |
| `zoravar` | Zoravar |
| `reyansh` | Reyansh |
| `ahan`    | Ahan    |

### British — Female

| Voice ID    | Name      |
| ----------- | --------- |
| `cressida`  | Cressida  |
| `elowen`    | Elowen    |
| `ottilie`   | Ottilie   |
| `seraphina` | Seraphina |
| `tabitha`   | Tabitha   |
| `arabella`  | Arabella  |

### British — Male

| Voice ID   | Name     |
| ---------- | -------- |
| `benedict` | Benedict |
| `cormac`   | Cormac   |
| `everett`  | Everett  |
| `finley`   | Finley   |
| `rupert`   | Rupert   |
| `winston`  | Winston  |
| `caspian`  | Caspian  |

### American — Female

| Voice ID   | Name     |
| ---------- | -------- |
| `willow`   | Willow   |
| `autumn`   | Autumn   |
| `skylar`   | Skylar   |
| `savannah` | Savannah |
| `kennedy`  | Kennedy  |
| `reagan`   | Reagan   |
| `sierra`   | Sierra   |

### American — Male

| Voice ID   | Name     |
| ---------- | -------- |
| `maverick` | Maverick |
| `brooks`   | Brooks   |
| `hunter`   | Hunter   |
| `colton`   | Colton   |
| `wesley`   | Wesley   |
| `asher`    | Asher    |

Need a voice not in this list? Use the standard [Lightning v3.1](/waves/model-cards/text-to-speech/lightning-v-3-1) catalog (217 voices, more languages, voice cloning). Pass `"model": "lightning_v3.1"` (or omit the field) instead of `lightning_v3.1_pro`.

***

## API Reference

### Endpoints

| Endpoint                                    | Method     | Use Case                     |
| ------------------------------------------- | ---------- | ---------------------------- |
| `https://api.smallest.ai/waves/v1/tts`      | POST       | Synchronous synthesis        |
| `https://api.smallest.ai/waves/v1/tts/live` | POST (SSE) | Server-sent events streaming |
| `wss://api.smallest.ai/waves/v1/tts/live`   | WebSocket  | Real-time streaming          |

### Request Parameters

| Parameter             | Type    | Required | Default          | Description                                                                                                                                                        |
| --------------------- | ------- | -------- | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `text`                | string  | Yes      | —                | Text to synthesize                                                                                                                                                 |
| `voice_id`            | string  | Yes      | —                | Voice identifier (Pro catalog above)                                                                                                                               |
| `model`               | string  | No       | `lightning_v3.1` | Pass `lightning_v3.1_pro` to route to the Pro pool. The field is optional, but the default routes to standard Lightning v3.1 — for Pro you must set it explicitly. |
| `sample_rate`         | integer | No       | `44100`          | Output sample rate (Hz)                                                                                                                                            |
| `speed`               | float   | No       | `1.0`            | Speech speed (0.5–2.0)                                                                                                                                             |
| `language`            | string  | No       | `"auto"`         | `en`, `hi`, or `auto` (per voice's `tags.language`)                                                                                                                |
| `output_format`       | string  | No       | `"pcm"`          | `pcm`, `mp3`, `wav`, `ulaw`, `alaw`                                                                                                                                |
| `pronunciation_dicts` | array   | No       | —                | List of pronunciation dictionary IDs — works on REST sync, SSE, and WebSocket                                                                                      |

Generate your first audio in under a minute with a single API call.

***

## Technical Specifications

### Audio Output

| Specification              | Details                             |
| -------------------------- | ----------------------------------- |
| **Native sample rate**     | 44,100 Hz                           |
| **Supported sample rates** | 8,000 / 16,000 / 24,000 / 44,100 Hz |
| **Output formats**         | PCM, MP3, WAV, ulaw, alaw           |
| **Audio channels**         | Mono                                |

### Text Formatting Guidelines

| Aspect               | Recommendation                                                                    |
| -------------------- | --------------------------------------------------------------------------------- |
| **Language scripts** | Use native script for each language. English in Latin script; Hindi in Devanagari |
| **Break points**     | Natural punctuation (`.` `!` `?` `,`)                                             |
| **Mixed language**   | Use native script per language; avoid transliteration                             |

**Hardware**

* Recommended GPU: NVIDIA L40S
* Recommended VRAM: 48 GB

**Software**

* Server regions (AWS): India (Hyderabad), USA (Oregon)
* Automatic geo-location based routing for lowest latency

***

## Best Practices

### Voice ID + model pairing

Pair Pro voice IDs above with `"model": "lightning_v3.1_pro"`. The API does not currently reject mismatched pairings, but pairing a Pro voice with `"model": "lightning_v3.1"` (or omitting `model`) can produce wrong or hallucinated audio. Server-side validation is on the roadmap.

### Language selection

Each voice's supported languages live in `tags.language` on the voice catalog. Passing a `language` outside that list is accepted by the API but produces English-pronounced output, since the voice wasn't trained on it. Pick a voice whose `tags.language` matches your target language.

### Text Formatting

* **Chunk boundaries.** Segment input at natural prosodic boundaries (`.` `!` `?` `,`). Maximum chunk size is 250 characters; optimal throughput at 140 characters per request.
* **Script integrity.** Use native script for each language. Mixed-script input within a single language token produces unpredictable phoneme mappings.
* **Lexicon overrides.** Use [pronunciation dictionaries](/waves/documentation/text-to-speech-lightning/pronunciation-dictionaries) for domain-specific terms, brand names, and acronyms where default grapheme-to-phoneme conversion is insufficient.

For comprehensive text formatting rules (numeric handling, date/time, symbols, chunking logic), see [TTS Best Practices](/waves/documentation/best-practices/tts-best-practices).

***

## Use Cases

### Direct Use

* Voice assistants and conversational AI
* Interactive chatbots with voice output
* Real-time narration and live streaming
* Accessibility tools and screen readers
* Customer service automation

### Downstream Use

* Multi-turn conversational agents
* Audio content generation pipelines
* Telephony and IVR systems
* Podcast generation

***

## Limitations & Safety

### Known Limitations

* **No voice cloning.** Voice cloning is not available on the Pro pool. Clones continue to use standard [Lightning v3.1](/waves/model-cards/text-to-speech/lightning-v-3-1) and the existing voice-cloning flow.

Lightning v3.1 Pro must **not** be used for impersonation or fraud, generating deceptive audio content (deepfakes), creating content that violates consent or privacy, harassment or abuse, or any illegal or unethical purposes.

### Safety & Compliance

* No retention of synthesized audio
* Usage monitoring for policy compliance

For compliance documentation (GDPR, SOC2, HIPAA), contact [support@smallest.ai](mailto:support@smallest.ai).

***

| Channel           | Details                                                  |
| ----------------- | -------------------------------------------------------- |
| **Support**       | [support@smallest.ai](mailto:support@smallest.ai)        |
| **Documentation** | [docs.smallest.ai/waves](https://docs.smallest.ai/waves) |
| **Console**       | [app.smallest.ai](https://app.smallest.ai)               |
| **Community**     | [Discord](https://discord.gg/9WtSXv26WE)                 |