Turn detection & barge-in
Turn detection & barge-in
Hydra does turn detection server-side. The client streams audio continuously — even while the model is speaking — and the server emits events as it detects the start and end of each user turn.
There’s nothing you need to do for turn detection to work. The events below are how you observe it.
Events per turn
You don’t send anything between user turns. Hydra decides when a turn ends.
Barge-in
Because the channel is full-duplex, the user can speak while the model is speaking. When that happens:
-
The server emits
input_audio_buffer.speech_startedfor the new user turn. -
The in-flight response is cancelled —
response.donearrives with: -
The new user turn proceeds normally.
You don’t have to send anything to opt into this. It’s automatic.
Dropping scheduled audio on barge-in
The hard part is client-side: when response.created arrives for a new turn, any audio chunks from the previous response that you’ve already scheduled for playback will continue to play unless you drop them. Closing the AudioContext alone leaves a tail.
The cleanest pattern in the browser is to reset your playback cursor on every fresh response.created. (playPCM16 and b64ToInt16 are defined in Audio I/O.)
If you’ve kept references to AudioBufferSourceNodes, also call stop() on each one — playCursor only controls future schedules, it doesn’t cancel buffers already started.
In Python with sounddevice, drain your playback queue on response.created — shown here as the relevant branch of the event handler:
Programmatic cancellation
To cancel an in-flight response without the user speaking — e.g. you got new info from your backend and want the model to stop — send response.cancel:
The server replies with response.done carrying status: "cancelled" and status_details.reason: "client_cancelled". The frame is a no-op if no response is in flight.
Common gotchas
- Audio keeps playing after barge-in — you didn’t reset
playCursor(browser) or didn’t drain your playback queue (Python). speech_startedfires from background noise — Hydra’s VAD is conservative but not perfect. Discarded turns arrive asconversation.item.donewithstatus: "incomplete"and should be ignored.- Sending
input_audio_buffer.appendonly when the user is talking — don’t. Stream continuously. Hydra needs the silence to detect turn boundaries.
Next
- Tool calling — what happens between
speech_stoppedand audio output when the model decides to call a function - Prompting voice agents — how to write
instructionsso the model handles turn-taking gracefully

