How input text is written controls how a voice agent sounds out loud. The first half of this guide covers prompting patterns (pauses, fillers, mid-thought corrections, energy matching). The second half (Prompt Best Practices) covers low-level text formatting for numbers, dates, mixed-language input, units, and symbols.
Write for the ear, not the eye. If text would look great in a document but sounds weird out loud, rewrite it.
Use punctuation deliberately to control rhythm. Each mark creates a different pause in TTS.
These patterns break the illusion of natural speech in a TTS engine.
Fillers make a voice agent sound human. Place them where a real person would naturally pause, react, or think. Never scatter them randomly.
Use at the start of a response, before saying anything substantive.
"Oh wow...", "Wait, really?", "Oh nice nice...", "Ahh okay okay...", "Ooh...", "Ha, love that.", "Oh man...", "Yesss...", "Oh that's fun..."
Use before a recommendation or when shifting gears.
"Umm...", "Hmm...", "So like...", "Let me think...", "Okay so...", "I mean...", "Right right...", "Yeah so..."
Use when pivoting to a question or moving the conversation forward.
"But yeah...", "Anyway though...", "So okay...", "But honestly...", "Point being..."
This pattern makes a voice agent sound like a real person thinking out loud. Use it at least once every few responses. Start saying one thing, catch yourself, redirect.
Read the user’s energy from their first message and calibrate.
Hindi TTS follows the same rules: react first, think out loud, keep sentences short. The fillers and correction patterns mirror the English set but use natural Hinglish phrasing that sounds real, not translated.
Hinglish is the natural mix of Hindi and English used in everyday speech.
"Ooh achha...", "Arre waah...", "Haan haan...", "Sach mein?", "Oh nice...", "Arre yaar...", "Woh toh sahi hai...", "Bhai waah..."
"Hmm...", "Dekho...", "Matlab...", "Suno...", "Thoda sochte hain...", "Toh basically...", "Acha toh..."
"Par haan...", "Toh basically...", "Baat yeh hai...", "Waise...", "Anyway..."
Paste this into a voice agent’s system prompt to apply every rule above.
The rules above shape how a voice agent talks. The rules below shape what it pronounces correctly. They apply to any text sent to TTS, including LLM-generated responses, prompts you ship hard-coded, and dynamic data injected into responses.
Use the correct script per language. Avoid transliteration.
Use Devanagari for Indian city and personal names. Keep non-Indian names in their original script.
Break long input into chunks for low-latency, accurate output.
lightning, 140 for lightning-large.., !, ?).;, :).Send long digit strings as a separate request. Split the surrounding text around the number.
Numbers default to a 3-4-3 grouping. 9876543210 reads as 987-6543-210.
For a specific reading pattern, write out the exact pronunciation.
Ordinal suffixes (st, nd, rd, th) work in dates.
Spell out operations. For complex expressions, break into simpler parts.
Write out the full word. Avoid approximation symbols.
Write out units in full.
Spell out special characters in any context.