For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • The minimum useful prompt
  • Anti-patterns
  • Turn-taking discipline
  • Tool use prompting
  • Greetings and generate_initial_response
  • Length and pacing
  • Worked example: phone-banking concierge
  • Next
Speech to Speech (Hydra)

Prompting voice agents

||View as Markdown|
Was this page helpful?
Previous

Tool calling

Next

Errors & reconnection

Built with

A working integration and a good-sounding voice agent are different problems. The integration is what the rest of these docs cover. This page is about the instructions string you send in session.configure.

The minimum useful prompt

You are a warm, concise voice assistant. Reply in one or two short sentences.

Three things this gets right:

  1. Voice-first framing — “voice assistant”, not “AI”, not “chatbot”. Sets the persona toward spoken language.
  2. Length discipline — “one or two short sentences”. Without this, the model writes paragraphs and TTS plays them at length — fine for chat, terrible on a phone call.
  3. Warm tone — single-word style cue. The model carries it through prosody.

Don’t micromanage prosody in prose. Hydra adapts tone from context. Telling the model “speak slowly and carefully and pause between thoughts” mostly produces text that says “slowly and carefully and pause” rather than changing how it sounds. Shape the content and length; let Hydra handle delivery.

Anti-patterns

Don’tWhy
”Be helpful and answer the user’s question accurately”Generic. Drives Hydra toward chat-style answers. Be specific about voice.
”Format your response as a numbered list”Numbered lists sound robotic when spoken. Phrase as “First, … Then, …” instead.
”Use bullet points for clarity”There are no bullet points in speech. The model will say the word “bullet”.
”Provide as much detail as possible”The opposite of what you want in voice. Constrain output length explicitly.
Long persona backstoriesThe model occasionally drifts into reciting the backstory. Keep persona to one or two sentences.

Turn-taking discipline

Hydra handles turn detection automatically, but the prompt still shapes how the model behaves around interruption.

You are a phone agent. If the user is mid-sentence, wait for them to finish.
Pause naturally between thoughts so the user can interject.
Never repeat yourself if interrupted — pick up where you left off.

This is more effective than relying on the model’s defaults, especially in noisy environments.

Tool use prompting

When you declare tools, also tell the model when to use them.

You are a weather assistant. When asked about weather conditions, use the
get_weather tool with the city name. Don't guess — call the tool every time.
If the tool returns an error, apologise and ask the user for a different city.

Without explicit instruction, the model sometimes answers from priors instead of calling the tool. Be direct.

Greetings and generate_initial_response

Pair generate_initial_response: true with an explicit opening-line instruction:

You are a hotel concierge at the Grand Pacific. Open the call by greeting
the guest warmly in one short sentence, then ask how you can help.

Without a specific instruction, the model picks a generic opener. With it, you get the line you want.

Length and pacing

Voice users tolerate roughly one breath of latency between asking and hearing an answer. The model can’t make itself talk faster, but you can make it say less.

Default to one sentence. If the user asks for detail, take two. Three is too many.

For long-form content (legal disclaimers, addresses, phone numbers), break it explicitly:

When reading back a phone number, say each digit with a brief pause:
"Six… one… seven… nine…"

Worked example: phone-banking concierge

You are Maya, a phone-banking assistant for Pacific Bank. Speak warmly and
concisely. One or two short sentences per turn.
Available tools:
- lookup_balance(account_id) — current balance
- lookup_recent_transactions(account_id, days) — list of transactions
Turn-taking:
- If the user interrupts, stop and listen. Don't repeat yourself.
- Pause naturally between sentences.
If the user asks anything outside banking, politely redirect:
"I can help with your accounts and recent transactions — is there
something specific I can look up?"

Next

  • Tool calling — the mechanics of declaring and executing tools
  • Turn detection & barge-in — what the model handles vs what your prompt should influence