Hydra is a voice model — it doesn’t execute tools. The client declares tool schemas in session.configure, Hydra decides when to call them and streams the arguments JSON, and the client executes the tool locally and posts the result back.
tools is a session.configure field. Each entry is a JSON Schema for a function the model may call.
You can also add or replace tools mid-session via session.update — see Managing sessions.
When you receive response.function_call_arguments.done, parse the JSON, run your tool, and post the result back as a function_call_output item.
After posting the tool output, you need to send a single response.create to tell Hydra to narrate the result. The next section explains the gotcha.
If the model calls one tool, the obvious code works: post the output, send response.create, done.
If the model calls multiple tools in one turn, the obvious code is wrong. The server emits one response.function_call_arguments.done per call, and if you send response.create after each one, the model starts narrating before all results are in — you get a half-formed answer.
Solution: debounce response.create. Only fire one, ~200 ms after the last tool output.
For single-tool turns the debounce adds 200 ms — well below the model’s own time-to-first-audio, so users won’t notice.
The model emits arguments as a stream of JSON fragments. If you want to act on each token as it arrives (rare for tool args, common for showing a “thinking” UI), concatenate delta strings per call_id:
The done event gives you the full string under arguments either way — so most clients just wait for done and parse once.
If you declare tools but don’t post function_call_output + response.create within the server’s timeout window, you get an error and the turn is abandoned:
Common causes: a long-running tool with no async dispatch, network call to your own backend that hangs, or forgetting to send response.create after the output.
Long-running tools. Tools that take more than a few seconds should return a synchronous “working on it” output immediately and emit real results as a follow-up message via a fresh conversation.item.create. This keeps Hydra responsive — the assistant acknowledges the request out loud while the actual work happens, instead of waiting silently and risking a tool_response_timeout.
response.create per turn, not per tool. Multi-tool turns require debounce. The model decides when to call multiple tools — your client decides when to request narration.name in session.configure matches the user prompt’s intent.instructions so the model reliably calls toolstool_response_timeout and other failure modes