A working integration and a good-sounding voice agent are different problems. The integration is what the rest of these docs cover. This page is about the instructions string you send in session.configure.
Three things this gets right:
Don’t micromanage prosody in prose. Hydra adapts tone from context. Telling the model “speak slowly and carefully and pause between thoughts” mostly produces text that says “slowly and carefully and pause” rather than changing how it sounds. Shape the content and length; let Hydra handle delivery.
Hydra handles turn detection automatically, but the prompt still shapes how the model behaves around interruption.
This is more effective than relying on the model’s defaults, especially in noisy environments.
When you declare tools, also tell the model when to use them.
Without explicit instruction, the model sometimes answers from priors instead of calling the tool. Be direct.
generate_initial_responsePair generate_initial_response: true with an explicit opening-line instruction:
Without a specific instruction, the model picks a generic opener. With it, you get the line you want.
Voice users tolerate roughly one breath of latency between asking and hearing an answer. The model can’t make itself talk faster, but you can make it say less.
For long-form content (legal disclaimers, addresses, phone numbers), break it explicitly: