TTS Best Practices
How input text is written controls how a voice agent sounds out loud. The first half of this guide covers prompting patterns (pauses, fillers, mid-thought corrections, energy matching). The second half (Prompt Best Practices) covers low-level text formatting for numbers, dates, mixed-language input, units, and symbols.
Write for the ear, not the eye. If text would look great in a document but sounds weird out loud, rewrite it.
Pauses and pacing
Use punctuation deliberately to control rhythm. Each mark creates a different pause in TTS.
What to avoid
These patterns break the illusion of natural speech in a TTS engine.
- No parentheses. TTS reads them awkwardly. Rewrite as a separate clause or sentence.
- No bullet points, numbered lists, or markdown. A voice agent is speaking, not writing. Structure responses through sentence flow only.
- No back-to-back exclamation marks. One per response max. Overuse sounds manic.
- No ALL CAPS for emphasis. Use word choice and sentence structure to convey emphasis. Reserve caps for abbreviations.
- No consecutive sentences starting with the same word. Sounds repetitive and mechanical out loud.
Speech fillers and conversational cues
Fillers make a voice agent sound human. Place them where a real person would naturally pause, react, or think. Never scatter them randomly.
React fillers
Use at the start of a response, before saying anything substantive.
"Oh wow...", "Wait, really?", "Oh nice nice...", "Ahh okay okay...", "Ooh...", "Ha, love that.", "Oh man...", "Yesss...", "Oh that's fun..."
Thinking fillers
Use before a recommendation or when shifting gears.
"Umm...", "Hmm...", "So like...", "Let me think...", "Okay so...", "I mean...", "Right right...", "Yeah so..."
Transition fillers
Use when pivoting to a question or moving the conversation forward.
"But yeah...", "Anyway though...", "So okay...", "But honestly...", "Point being..."
Placement rules
- Start most responses with a react filler. This is the single biggest thing that makes a voice agent feel alive.
- Use one thinking filler per response, usually before the main point.
- Use one transition filler when moving from a reaction to a follow-up question.
- Aim for 2-4 fillers per response, spread naturally. Never cluster them together.
Mid-thought corrections
This pattern makes a voice agent sound like a real person thinking out loud. Use it at least once every few responses. Start saying one thing, catch yourself, redirect.
Energy matching
Read the user’s energy from their first message and calibrate.
Example responses
Good: natural, TTS-optimized
Bad: robotic, formal, unspeakable
Hindi templates
Hindi TTS follows the same rules: react first, think out loud, keep sentences short. The fillers and correction patterns mirror the English set but use natural Hinglish phrasing that sounds real, not translated.
Hinglish is the natural mix of Hindi and English used in everyday speech.
React fillers
"Ooh achha...", "Arre waah...", "Haan haan...", "Sach mein?", "Oh nice...", "Arre yaar...", "Woh toh sahi hai...", "Bhai waah..."
Thinking fillers
"Hmm...", "Dekho...", "Matlab...", "Suno...", "Thoda sochte hain...", "Toh basically...", "Acha toh..."
Transition fillers
"Par haan...", "Toh basically...", "Baat yeh hai...", "Waise...", "Anyway..."
Mid-thought correction templates
Good Hindi example responses
Copy the full prompt
Paste this into a voice agent’s system prompt to apply every rule above.
Prompt best practices
The rules above shape how a voice agent talks. The rules below shape what it pronounces correctly. They apply to any text sent to TTS, including LLM-generated responses, prompts you ship hard-coded, and dynamic data injected into responses.
Language and script
Use the correct script per language. Avoid transliteration.
- English in Latin script
- Hindi in Devanagari script
Proper nouns
Use Devanagari for Indian city and personal names. Keep non-Indian names in their original script.
Text chunking
Break long input into chunks for low-latency, accurate output.
- Maximum chunk size: 250 characters for
lightning, 140 forlightning-large. - Break at sentence-ending punctuation first (
.,!,?). - Then other punctuation (
;,:). - Then natural word breaks.
Numbers
Order IDs and large numbers
Send long digit strings as a separate request. Split the surrounding text around the number.
Phone numbers
Numbers default to a 3-4-3 grouping. 9876543210 reads as 987-6543-210.
For a specific reading pattern, write out the exact pronunciation.
Dates and time
Date formats
Ordinal suffixes (st, nd, rd, th) work in dates.
Time formats
Mathematical expressions
Spell out operations. For complex expressions, break into simpler parts.
Approximate values
Write out the full word. Avoid approximation symbols.
Units and measurements
Write out units in full.
Symbols and special characters
Spell out special characters in any context.
URLs
Email addresses
Social media handles and tags
Ranges and intervals
Quick reference
- Stay consistent. Use the same format throughout the prompt.
- Spell things out when in doubt.
- Break long URLs and handles into smaller chunks.
- Avoid symbols that have multiple interpretations.

