> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Voice Cloning Best Practices

> Guidelines for recording reference audio and achieving high-quality voice clones.

High-quality reference audio is the single most important factor in clone quality. These guidelines cover recording environment, speaking style, multi-lingual cloning, and expressive control.

Clone a voice directly in the console. 5-15 seconds of audio, no code required.

***

## Recording Reference Audio

### Environment

* Record in a quiet room with minimal background noise. Ambient noise, hiss, or rumble will be captured in the clone.
* Use a dedicated microphone when possible. MacBook and mobile device microphones are acceptable if positioned at an appropriate distance to avoid distortion.
* Avoid rooms with echo (large empty spaces, outdoor areas). Small treated rooms produce the best results.
* After recording, listen back to the audio before uploading. Verify it is free of interruptions, clipping, or background interference.

### Speaking Style

* Speak naturally in your normal conversational voice. The model captures timbre, accent, emotional tone, rhythm, and pacing automatically.
* Maintain a consistent pace throughout the recording. Avoid long pauses, as they can degrade clone quality.
* Do not exaggerate emotion unless a specific tone is the intended output (see [Expressive Cloning](#expressive-cloning) below).

### Audio Length

* Provide **5 to 15 seconds** of clean, continuous speech.

***

## Multi-Lingual Voice Cloning

### Language Matching

For best results, record reference audio in the same language as your intended output. The model supports cross-lingual cloning (e.g., English reference audio used for Spanish output), but a language-matched reference will always produce higher fidelity.

| Scenario                                      | Expected Quality                                                               |
| --------------------------------------------- | ------------------------------------------------------------------------------ |
| Reference and output in the same language     | Best results. Highest phonetic accuracy.                                       |
| Reference in a different language than output | Functional. Voice characteristics transfer, but the source accent is retained. |

### Accent Retention

When synthesizing in a different language than the reference audio, the original accent is preserved. A clone from a South Indian English speaker will retain that accent when generating Hindi or Tamil output. This is by design: the clone reproduces *your* voice, including accent characteristics.

If accent-neutral output is required for a specific language, provide reference audio recorded by a native speaker of that language.

### Language Group Constraints

Cloned voices follow the same language group routing rules as standard synthesis. See [Code-Switching](/waves/model-cards/text-to-speech/lightning-v-3-1#code-switching) for details on Indic and Global group restrictions.

***

## Expressive Cloning

The model captures emotional and prosodic characteristics from the reference audio. The tone, pace, and volume of the reference directly influence the synthesized output.

### Emotional Control

The emotion conveyed in the reference audio (e.g., calm, happy, angry) is reflected in the generated speech. To produce an angry-sounding clone, provide an angry reference. To produce a neutral clone, provide a neutral reference.

### Speed Control

The pace of the reference audio determines the output speed. A fast-paced reference produces faster delivery; a slower reference produces more measured output.

### Volume Control

The volume level in the reference audio carries over to the output. A soft-spoken reference produces quieter output; a louder, more energetic recording produces bolder output.

***

## Reference Audio Examples

Audio samples are embedded as video due to platform constraints.

### Good Reference Audio

Clear, consistent tone with no background noise.

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/f7851c2f18608e8fd4f11ef76b8cd04f1e07f3ee597937b00a87b877341a45c7/products/waves/pages/video/good_ref_t.mp4" type="video/mp4" />
</video>

### Bad Reference Audio

**Background noise present.**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/2844138a8ac40cfa4827b0f7e8a89695306d172ad1baa93c14a604ddb5507eef/products/waves/pages/video/bg_ref_t.mp4" type="video/mp4" />
</video>

**Inconsistent speaking style.**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/992a2173ab1e6b51c567e1a8d6f350a64cc224a9c33775ecf5375bdee9f6a443/products/waves/pages/video/inconsistent_ref_t.mp4" type="video/mp4" />
</video>

**Overlapping voices.**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/1a81040dce57fb03442b85c01def02abba1b182651bc19db4ba11a93e6e110dc/products/waves/pages/video/overlap_ref_t.mp4" type="video/mp4" />
</video>

***

## Expressive Audio Examples

### Angry Tone

**Reference:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/9960bdaf32924873a755c5dbd8ff99d746de77eeadf12384b4726036a63c5fd6/products/waves/pages/video/angry_ref_t.mp4" type="video/mp4" />
</video>

**Output:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/fc4a7130906315b5440124a33fe0927663a85adc17ae397ba20244ecf9908215/products/waves/pages/video/angry_gen_t.mp4" type="video/mp4" />
</video>

### Whisper Tone

**Reference:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/5471a19b705186d022452c1e396bbba4841c8edce712718b4f1f71c043738037/products/waves/pages/video/whisper_ref_t.mp4" type="video/mp4" />
</video>

**Output:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/bc349e0123a2c305749239e033a09e084efbe4840e9b7ca6ed389317e4354c0f/products/waves/pages/video/whisper_gen_t.mp4" type="video/mp4" />
</video>

### Fast-Paced Tone

**Reference:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/153f7506c813ecd2e27fae64c1f7a0cb93d55e7dadda3e97de6a4296c877b3cd/products/waves/pages/video/fast_ref_t.mp4" type="video/mp4" />
</video>

**Output:**

<video controls autoplay>
  <source src="https://files.buildwithfern.com/smallest-ai.docs.buildwithfern.com/36efbe435c03c56095bca5511ba45e7ebee7802ef09a27b3ae4e4cabdcd47c75/products/waves/pages/video/fast_gen_t.mp4" type="video/mp4" />
</video>