To achieve the best results when cloning your voice, it’s essential to provide high-quality reference audio. Below are some best practices, dos and don’ts, and examples to guide you.

Ready to Clone Your Voice? Try it out on our platform platform

🎙️ How to Record Reference Audio

Environment
- Record in a quiet room with minimal background noise.
- Use a good quality microphone. While dedicated mics are ideal, MacBook and Mobile microphones work well for this purpose.
- Mobile and Laptop recordings can work well too, as long as the device is placed at an adequate distance—not too far or too close—to ensure clear, natural sound without distortion.
- Make sure the recording environment doesn’t introduce echo or distortion (e.g., avoid large empty rooms or outdoor spaces).
- After uploading the audio, listen to it to ensure it is clear and free of interruptions, background noise, or distortion.
Speaking Style
- Speak naturally and avoid excessive emotion unless a specific tone is required.
- Maintain a consistent pace and tone throughout the recording. Be mindful of long pauses, as they can impact the quality of the cloned voice.
Length of Audio
- Provide at least 5 seconds to 15 seconds of clean audio.

🎧 Examples of Good and Bad Reference Audio

NOTE: Currently, there is no direct support for adding audio to Mintlify. As a workaround, we have embedded a video to include the necessary audio content.

Good Reference Audio

High-quality, clear, and consistent tone.

Bad Reference Audio

With Background Noise
Inconsistent Speaking Style
Overlapping Voices

🎭 Creating Expressive Voice Clones

Our platform supports emotional reference audio, meaning the emotions, pitch or tone in the reference audio will influence the output. This is ideal for creating expressive clones that match your intended tone.

😄 Emotional Control

The emotions in the reference audio (e.g., angry, happy, sad) directly impact the tone of the generated voice.
For example, if the reference audio conveys happiness, the output will replicate that cheerful tone.

⚡ Speed Control

The pace of your reference audio determines the speed of the output.
A fast-paced reference will generate a similarly fast delivery, while a slower reference will produce a more measured response.

🔊 Loudness Control

The loudness or volume in your reference audio is reflected in the output.
For instance, a soft-spoken input will result in a quieter clone, while a louder, more energetic recording will produce a bolder output.

🎧 Emotional Reference Audio Examples

Angry Tone

Reference Audio Sample:
Output Audio Example:

Silent Tone

Reference Audio Sample:
Output Audio Example:

Fast-Paced Tone

Reference Audio Sample:
Output Audio Example:

By following these guidelines and leveraging emotional reference audio, you can achieve highly accurate and expressive voice clones tailored to your needs.