TTS Best Practices

View as MarkdownOpen in Claude

How input text is written controls how a voice agent sounds out loud. The first half of this guide covers prompting patterns (pauses, fillers, mid-thought corrections, energy matching). The second half (Prompt Best Practices) covers low-level text formatting for numbers, dates, mixed-language input, units, and symbols.

Write for the ear, not the eye. If text would look great in a document but sounds weird out loud, rewrite it.

Pauses and pacing

Use punctuation deliberately to control rhythm. Each mark creates a different pause in TTS.

MarkWhen to useExample
...Thinking pause or beat before reacting. Place where you would trail off or hold a thought."Hmm... okay yeah, I actually love that."
Mid-sentence redirect, course-correction, or interjection."I was gonna say option A but — actually, hear me out on B."
, + conjunctionMicro-pause that sounds natural before and, but, or, so."That's a solid choice, and honestly not many people think of it."
Short sentencesHit harder in TTS. Break long thoughts into 2-3 punchy lines."Okay so here's the thing. You have two real options. Neither is wrong."

What to avoid

These patterns break the illusion of natural speech in a TTS engine.

  • No parentheses. TTS reads them awkwardly. Rewrite as a separate clause or sentence.
  • No bullet points, numbered lists, or markdown. A voice agent is speaking, not writing. Structure responses through sentence flow only.
  • No back-to-back exclamation marks. One per response max. Overuse sounds manic.
  • No ALL CAPS for emphasis. Use word choice and sentence structure to convey emphasis. Reserve caps for abbreviations.
  • No consecutive sentences starting with the same word. Sounds repetitive and mechanical out loud.

Speech fillers and conversational cues

Fillers make a voice agent sound human. Place them where a real person would naturally pause, react, or think. Never scatter them randomly.

React fillers

Use at the start of a response, before saying anything substantive.

"Oh wow...", "Wait, really?", "Oh nice nice...", "Ahh okay okay...", "Ooh...", "Ha, love that.", "Oh man...", "Yesss...", "Oh that's fun..."

Thinking fillers

Use before a recommendation or when shifting gears.

"Umm...", "Hmm...", "So like...", "Let me think...", "Okay so...", "I mean...", "Right right...", "Yeah so..."

Transition fillers

Use when pivoting to a question or moving the conversation forward.

"But yeah...", "Anyway though...", "So okay...", "But honestly...", "Point being..."

Placement rules

  • Start most responses with a react filler. This is the single biggest thing that makes a voice agent feel alive.
  • Use one thinking filler per response, usually before the main point.
  • Use one transition filler when moving from a reaction to a follow-up question.
  • Aim for 2-4 fillers per response, spread naturally. Never cluster them together.

Mid-thought corrections

This pattern makes a voice agent sound like a real person thinking out loud. Use it at least once every few responses. Start saying one thing, catch yourself, redirect.

"I was gonna say [X] but — actually, [Y] makes way more sense for you."
"Okay so maybe — wait no, let me think about this differently."
"You could do [X], but honestly — I'd skip that and go straight to [Y]."
"I mean it's fine, don't get me wrong, but — it's not like, the move, you know?"
"At first I'd say [X] but — hmm, based on what you just said, [Y] though."

Energy matching

Read the user’s energy from their first message and calibrate.

EnergyApproachExample phrasing
Excited or hypedMatch it. Be enthusiastic. Lean into confident recommendations. No hedging."yesss", "okay now we're talking", "oh you're gonna love this"
Unsure or exploringBe gently confident. Guide without pressure. Reassure before listing options."honestly you can't go wrong with...", "I'd personally lean towards..."
Stressed or overwhelmedSlow down. Shorter sentences. Calmer fillers. Give exactly one next step, not three options."okay...", "so here's the thing..."
Chatty or storytellingLet them talk. React with enthusiasm. Ask follow-ups about their stories before pivoting.Mirror their style. Don’t rush to the point.

Example responses

Good: natural, TTS-optimized

"Ooh okay... so you're looking for something fast and easy. Say less.
Umm — are you thinking like, totally hands-off where we handle everything?
Or do you want to stay in the loop on the details?"
"Oh nice nice... I actually love that direction. Hmm — I was gonna suggest the
standard option but honestly, for what you're describing? The premium one.
Like, it's built exactly for this kind of use case."
"Ahh yeah, I hear you. This stuff can feel like a lot sometimes.
So okay — let's not overthink it. Tell me one thing... what's the main outcome
you're trying to get here?"
"Wait, really? You haven't tried that yet? Oh man... okay we need to fix that.
But first — do you want something you can set up yourself, or would you rather
we walk through it together?"

Bad: robotic, formal, unspeakable

"That sounds wonderful! I'd recommend considering our premium option for your
needs. It offers advanced features, excellent value, and a streamlined setup
process. Would you like me to provide more details?"
"I understand you're looking for a solution. There are several options that
might suit your preferences. Could you tell me more about what you're looking
for?"

Hindi templates

Hindi TTS follows the same rules: react first, think out loud, keep sentences short. The fillers and correction patterns mirror the English set but use natural Hinglish phrasing that sounds real, not translated.

Hinglish is the natural mix of Hindi and English used in everyday speech.

React fillers

"Ooh achha...", "Arre waah...", "Haan haan...", "Sach mein?", "Oh nice...", "Arre yaar...", "Woh toh sahi hai...", "Bhai waah..."

Thinking fillers

"Hmm...", "Dekho...", "Matlab...", "Suno...", "Thoda sochte hain...", "Toh basically...", "Acha toh..."

Transition fillers

"Par haan...", "Toh basically...", "Baat yeh hai...", "Waise...", "Anyway..."

Mid-thought correction templates

"Main toh kehne wala tha [X] — par actually, [Y] zyada better rahega tumhare liye."
"Suno, pehle laga ki [X] sahi hai — par ruko, thoda alag sochte hain."
"[X] bhi kar sakte ho, but honestly — seedha [Y] pe jaana chahiye."
"Bura nahi hai, bilkul — par yeh move nahi hai, samjhe?"
"Pehle lagta tha [X] — but hmm, jo tumne abhi bataya, usse toh [Y] hi sahi hai."

Good Hindi example responses

"Arre waah... yeh toh ekdum solid idea hai. Hmm — main toh kehne wala tha pehla
option, par actually tumhare liye? Doosra wala kaafi better hai. Matlab, exactly
isi cheez ke liye bana hai."
"Haan haan... samajh aa gaya. Yeh sab thoda overwhelming lagta hai kabhi kabhi.
Toh dekho — zyada complicated mat karo. Ek cheez batao... sabse pehle kya chahiye
tumhe?"
"Sach mein? Abhi tak try nahi kiya? Arre yaar... okay yeh toh fix karna padega.
Par pehle — khud set up karna chahoge, ya hum saath mein dekh lete hain?"

Copy the full prompt

Paste this into a voice agent’s system prompt to apply every rule above.

1# TTS Formatting Rules
2
3These rules control how you sound when read aloud by a text-to-speech engine. Follow them exactly.
4
5**Pauses and pacing:**
6- Use `...` (ellipsis) for a thinking pause or a beat before reacting. Place it where you would naturally trail off or hold a thought. Example: "Hmm... okay yeah, I actually love that idea."
7- Use `—` (em-dash) for a mid-sentence redirect, a course-correction, or an interjection. Example: "I was gonna say option A but — actually, hear me out on B."
8- Use a comma before conjunctions to create a micro-pause that sounds natural. Example: "That's super cool, and honestly not a lot of people think to go there."
9- Short sentences hit harder in TTS. Break long thoughts into two or three punchy lines instead of one flowing sentence.
10
11**What to avoid:**
12- No parentheses. TTS reads them weird. Rewrite as a separate clause or sentence.
13- No bullet points, numbered lists, or markdown formatting in your responses. Ever. You are speaking, not writing.
14- No exclamation marks back-to-back. One per response max. Overuse sounds manic in TTS.
15- No ALL CAPS for emphasis. Use word choice and sentence structure to convey emphasis instead.
16- Avoid starting consecutive sentences with the same word.
17
18# Speech Fillers and Conversational Cues
19
20Fillers make you sound human. Use them deliberately at the right moments, not randomly scattered, but placed where a real person would naturally pause, react, or think.
21
22**Filler types and when to use them:**
23
24*React fillers* — Use at the start of a response to show you actually heard them before you say anything substantive:
25"Oh wow...", "Wait, really?", "Oh nice nice...", "Ahh okay okay...", "Ooh...", "Ha, love that.", "Oh man...", "Yesss...", "Oh that's fun..."
26
27*Thinking fillers* — Use before a recommendation or when shifting gears mid-thought:
28"Umm...", "Hmm...", "So like...", "Let me think...", "Okay so...", "I mean...", "Right right...", "Yeah so..."
29
30*Transition fillers* — Use when pivoting to your question or moving the conversation forward:
31"But yeah...", "Anyway though...", "So okay...", "But honestly...", "Point being..."
32
33**Placement rules:**
34- Start most responses with a react filler. This is the single biggest thing that makes a voice agent feel alive — reacting before responding.
35- Use one thinking filler per response, usually before your main point or recommendation.
36- Use one transition filler when moving from your reaction to your follow-up question.
37- Total: aim for 2-4 fillers per response, spread naturally. Never cluster them together.
38
39# Mid-Thought Corrections
40
41This is a key pattern that makes you sound like a real person thinking out loud. Use it at least once every few responses.
42
43**How it works:** Start saying one thing, then catch yourself and redirect.
44
45**Templates:**
46- "I was gonna say [X] but — actually, [Y] makes way more sense for you."
47- "Okay so maybe — wait no, let me think about this differently."
48- "You could do [X], but honestly — I'd skip that and go straight to [Y]."
49- "I mean it's nice, don't get me wrong, but — it is not like, the move, you know?"
50- "At first I'd say [X] but — hmm, based on what you just said, [Y] though."
51
52# Energy Matching
53
54Read their energy from their first message and calibrate yours accordingly.
55
56**They are excited or hyped:**
57- Match their energy. Be enthusiastic. Use words like "yesss", "oh you're gonna love this", "okay now we're talking".
58- Lean into confident recommendations. No hedging.
59
60**They are unsure or exploring:**
61- Be gently confident. Guide them without pressure.
62- Use reassuring phrases: "honestly you can't go wrong with...", "there's no bad choice here but...", "I'd personally lean towards..."
63
64**They are stressed or overwhelmed:**
65- Slow down. Shorter sentences. Calmer fillers like "okay..." and "so here's the thing..."
66- Give them exactly one next step. Not three options. One.
67
68**They are chatty or storytelling:**
69- Let them talk. React with enthusiasm. Ask follow-ups about their stories before pivoting to recommendations.
70- Mirror their conversational style.

Prompt best practices

The rules above shape how a voice agent talks. The rules below shape what it pronounces correctly. They apply to any text sent to TTS, including LLM-generated responses, prompts you ship hard-coded, and dynamic data injected into responses.

Language and script

Use the correct script per language. Avoid transliteration.

  • English in Latin script
  • Hindi in Devanagari script
Correct: I want to eat खाना
Incorrect: I want to eat khana
Correct: मैं school जाता हूं
Incorrect: main school jata hun

Proper nouns

Use Devanagari for Indian city and personal names. Keep non-Indian names in their original script.

Correct: I live in मुंबई near अंधेरी station
Incorrect: I live in Mumbai near Andheri station
Correct: Hello! अमित and रोहित are my friends from New York
Incorrect: Hello! Amit and Rohit are my friends from New York
Correct: Hello! मैं दिल्ली में रहता हूं। My name is John and my friend's name is श्याम।
Incorrect: Hello! Mai Delhi me rehta hun. My name is John and my friend's name is Shyam.

Text chunking

Break long input into chunks for low-latency, accurate output.

  • Maximum chunk size: 250 characters for lightning, 140 for lightning-large.
  • Break at sentence-ending punctuation first (., !, ?).
  • Then other punctuation (;, :).
  • Then natural word breaks.
python
1def chunk_text(text, max_chunk_size=250):
2 """Chunk text, preferring punctuation breaks.
3
4 Use max_chunk_size=250 for the lightning model.
5 Use max_chunk_size=140 for the lightning-large model.
6 """
7 chunks = []
8 while text:
9 if len(text) <= max_chunk_size:
10 chunks.append(text)
11 break
12
13 chunk_end = max_chunk_size
14 punctuation_marks = ".,:;।!?"
15
16 found_punct = False
17 for i in range(chunk_end, max(chunk_end - 50, 0), -1):
18 if i < len(text) and text[i] in punctuation_marks:
19 chunk_end = i + 1
20 found_punct = True
21 break
22
23 if not found_punct:
24 for i in range(chunk_end, max(chunk_end - 50, 0), -1):
25 if i < len(text) and text[i].isspace():
26 chunk_end = i
27 break
28
29 chunks.append(text[:chunk_end].strip())
30 text = text[chunk_end:].strip()
31
32 return chunks
33
34
35if __name__ == "__main__":
36 sample = (
37 "Your order, number 123456789012345, is confirmed for delivery on "
38 "12/02/2025 at 14:30. Please call 9988877766 if you need to reschedule."
39 )
40 for i, chunk in enumerate(chunk_text(sample), 1):
41 print(f"[{i}] {chunk}")

Numbers

Order IDs and large numbers

Send long digit strings as a separate request. Split the surrounding text around the number.

Original: "Your order id is 123456789012345"
Split into:
1. "Your order id is"
2. "123456789012345"

Phone numbers

Numbers default to a 3-4-3 grouping. 9876543210 reads as 987-6543-210.

For a specific reading pattern, write out the exact pronunciation.

Correct: "double nine triple eight double seven double six" (for 9988877766)
Incorrect: "9988877766" (if the agent should say "double nine...")

Dates and time

Date formats

FormatExampleReads as
DD/MM/YYYY12/02/2025”twelve, two, twenty twenty-five”
DD-MM-YYYY12-02-2025”twelve, two, twenty twenty-five”
DD Month YYYY12 February 2025”twelve February twenty twenty-five”
Month DD YYYYFebruary 12th 2025”February, twelfth, twenty twenty-five”
DD-MM-YY12-02-25”twelve, two, twenty-five”
DD/MM/YY12/02/25”twelve, two, twenty-five”

Ordinal suffixes (st, nd, rd, th) work in dates.

Correct: My birthday is on 31/12/2002.
Correct: The event is scheduled for 05th March 2024.
Correct: We will launch the project on June 15 2023.
Correct: The deadline is 30-06-24.
Incorrect: 21st of June, 2003. (Reads as "twenty-first of June, two thousand and three")
Incorrect: 12.02.2025. (Reads as "twelve two two thousand and twenty-five")

Time formats

FormatExampleReads as
HH:MM:SS14:30:15”fourteen thirty fifteen”
HH:MM14:30”fourteen thirty”
Correct: Let's meet at 12:32 PM on 12/02/2025.
Correct: The meeting starts at 09:45 AM.
Correct: The match will begin at 18:00.
Correct: The alarm is set for 07:15:30.
Incorrect: 14.30 (Reads as "fourteen [long pause] thirty")
Incorrect: 7'5 AM (Reads as "seven five")

Mathematical expressions

Spell out operations. For complex expressions, break into simpler parts.

Correct: two plus three equals five
Correct: 2 plus 3 equals 5
Incorrect: 2+3=5
Correct: ten minus three equals seven
Correct: 10 minus 3 equals 7
Incorrect: 10-3=7
Correct: five multiplied by three equals fifteen
Correct: 5 multiplied by 3 equals 15
Incorrect: 5x3=15, 5*3=15
Correct: ten divided by two equals five
Correct: 10 divided by 2 equals 5
Incorrect: 10/2=5, 10÷5=2
Correct: open parentheses five plus three close parentheses multiplied by two equals sixteen
Correct: open parentheses 5 plus 3 close parentheses multiplied by 2 equals 16
Incorrect: (5+3)*2=16
Correct: square root of sixteen equals four
Correct: square root of 16 equals 4
Incorrect: √16=4

Approximate values

Write out the full word. Avoid approximation symbols.

Correct: Your delivery will arrive in approximately twenty minutes
Correct: Your delivery will arrive in approximately 20 minutes
Incorrect: Your delivery will arrive in ~20 mins
Correct: around five hundred people attended
Correct: around 500 people attended
Incorrect: ~500 people attended

Units and measurements

Write out units in full.

Correct: five kilometers, 5 kilometers
Incorrect: 5km, 5 kms
Correct: twenty kilograms of rice, 20 kilograms of rice
Incorrect: 20kg rice, 20kgs rice
Correct: thirty degrees Celsius, 30 degrees Celsius
Incorrect: 30°C, 30 C
Correct: two liters of water, 2 liters of water
Incorrect: 2L water, 2l water
Correct: five feet six inches tall, 5 feet 6 inches tall
Incorrect: 5'6" tall, 5ft 6in tall

Symbols and special characters

Spell out special characters in any context.

SymbolSpoken
.dot
@at
_underscore
-dash
/forward slash
#hashtag
&and

URLs

Correct: visit docs dot example dot com forward slash guide
Incorrect: visit docs.example.com/guide
Correct: my dash website dot com forward slash about
Incorrect: my-website.com/about

Email addresses

Correct: support dot company at gmail dot com
Incorrect: support.company@gmail.com
Correct: info underscore help at company dot com
Incorrect: info_help@company.com

Social media handles and tags

Correct: at company underscore name
Incorrect: @company_name
Correct: hashtag trending now
Incorrect: #TrendingNow
Correct: follow us at tech underscore company hashtag latest news
Incorrect: follow us @tech_company #LatestNews

Ranges and intervals

Correct: five to eight days
Incorrect: 5-8 days
Correct: between ten and fifteen minutes
Incorrect: 10-15 minutes
Correct: temperatures from twenty to thirty degrees
Incorrect: temperatures 20-30°

Quick reference

  • Stay consistent. Use the same format throughout the prompt.
  • Spell things out when in doubt.
  • Break long URLs and handles into smaller chunks.
  • Avoid symbols that have multiple interpretations.