For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
To achieve the best results when cloning your voice, it’s essential to provide high-quality reference audio. Below are some best practices, dos and don’ts, and examples to guide you.
Ready to Clone Your Voice? Try it out on our platform platform
🎙️ How to Record Reference Audio
Environment
Record in a quiet room with minimal background noise.
Use a good quality microphone. While dedicated mics are ideal, MacBook and Mobile microphones work well for this purpose.
Mobile and Laptop recordings can work well too, as long as the device is placed at an adequate distance—not too far or too close—to ensure clear, natural sound without distortion.
Make sure the recording environment doesn’t introduce echo or distortion (e.g., avoid large empty rooms or outdoor spaces).
After uploading the audio, listen to it to ensure it is clear and free of interruptions, background noise, or distortion.
Speaking Style
Speak naturally and avoid excessive emotion unless a specific tone is required.
Maintain a consistent pace and tone throughout the recording. Be mindful of long pauses, as they can impact the quality of the cloned voice.
Length of Audio
Provide at least 5 seconds to 15 seconds of clean audio.
🎧 Examples of Good and Bad Reference Audio
NOTE: Currently, there is no direct support for adding audio to Mintlify. As a workaround, we have embedded a video to include the necessary audio content.
Good Reference Audio
High-quality, clear, and consistent tone.
Bad Reference Audio
With Background Noise
Inconsistent Speaking Style
Overlapping Voices
🎭 Creating Expressive Voice Clones
Our platform supports emotional reference audio, meaning the emotions, pitch or tone in the reference audio will influence the output. This is ideal for creating expressive clones that match your intended tone.
😄 Emotional Control
The emotions in the reference audio (e.g., angry, happy, sad) directly impact the tone of the generated voice.
For example, if the reference audio conveys happiness, the output will replicate that cheerful tone.
⚡ Speed Control
The pace of your reference audio determines the speed of the output.
A fast-paced reference will generate a similarly fast delivery, while a slower reference will produce a more measured response.
🔊 Loudness Control
The loudness or volume in your reference audio is reflected in the output.
For instance, a soft-spoken input will result in a quieter clone, while a louder, more energetic recording will produce a bolder output.
🎧 Emotional Reference Audio Examples
Angry Tone
Reference Audio Sample:
Output Audio Example:
Silent Tone
Reference Audio Sample:
Output Audio Example:
Fast-Paced Tone
Reference Audio Sample:
Output Audio Example:
By following these guidelines and leveraging emotional reference audio, you can achieve highly accurate and expressive voice clones tailored to your needs.