For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Event flow
  • Declare tools
  • Execute the tool and post the result
  • Single-tool vs multi-tool turns
  • Streaming arguments
  • Tool response timeout
  • Common gotchas
  • Next
Speech to Speech (Hydra)

Tool calling

||View as Markdown|
Was this page helpful?
Previous

Turn detection & barge-in

Next

Prompting voice agents

Built with

Hydra is a voice model — it doesn’t execute tools. The client declares tool schemas in session.configure, Hydra decides when to call them and streams the arguments JSON, and the client executes the tool locally and posts the result back.

Event flow

Declare tools

tools is a session.configure field. Each entry is a JSON Schema for a function the model may call.

1{
2 "type": "session.configure",
3 "session": {
4 "instructions": "You are a weather assistant. Use get_weather when asked.",
5 "voice": "wren",
6 "tools": [
7 {
8 "type": "function",
9 "name": "get_weather",
10 "description": "Look up current weather for a city.",
11 "parameters": {
12 "type": "object",
13 "properties": { "city": { "type": "string" } },
14 "required": ["city"]
15 }
16 }
17 ]
18 }
19}

You can also add or replace tools mid-session via session.update — see Managing sessions.

Execute the tool and post the result

When you receive response.function_call_arguments.done, parse the JSON, run your tool, and post the result back as a function_call_output item.

1async for raw in ws:
2 evt = json.loads(raw)
3
4 if evt["type"] == "response.function_call_arguments.done":
5 args = json.loads(evt["arguments"])
6 result = run_tool(evt["name"], args)
7
8 await ws.send(json.dumps({
9 "type": "conversation.item.create",
10 "item": {
11 "type": "function_call_output",
12 "call_id": evt["call_id"],
13 "output": result if isinstance(result, str) else json.dumps(result),
14 },
15 }))
16 # See "Multi-tool turns" below — don't send response.create here directly.
17 schedule_response_create()

After posting the tool output, you need to send a single response.create to tell Hydra to narrate the result. The next section explains the gotcha.

Single-tool vs multi-tool turns

If the model calls one tool, the obvious code works: post the output, send response.create, done.

If the model calls multiple tools in one turn, the obvious code is wrong. The server emits one response.function_call_arguments.done per call, and if you send response.create after each one, the model starts narrating before all results are in — you get a half-formed answer.

Solution: debounce response.create. Only fire one, ~200 ms after the last tool output.

1import asyncio, json
2
3pending_create: asyncio.Task | None = None
4DEBOUNCE_MS = 200
5
6async def _send_response_create():
7 await asyncio.sleep(DEBOUNCE_MS / 1000)
8 await ws.send(json.dumps({"type": "response.create"}))
9
10def schedule_response_create():
11 global pending_create
12 if pending_create and not pending_create.done():
13 pending_create.cancel()
14 pending_create = asyncio.create_task(_send_response_create())

For single-tool turns the debounce adds 200 ms — well below the model’s own time-to-first-audio, so users won’t notice.

Streaming arguments

The model emits arguments as a stream of JSON fragments. If you want to act on each token as it arrives (rare for tool args, common for showing a “thinking” UI), concatenate delta strings per call_id:

1args_buf: dict[str, str] = {}
2
3if evt["type"] == "response.function_call_arguments.delta":
4 args_buf.setdefault(evt["call_id"], "")
5 args_buf[evt["call_id"]] += evt.get("delta", "")

The done event gives you the full string under arguments either way — so most clients just wait for done and parse once.

Tool response timeout

If you declare tools but don’t post function_call_output + response.create within the server’s timeout window, you get an error and the turn is abandoned:

1{ "type": "error", "error": { "code": "tool_response_timeout", "type": "..." } }

Common causes: a long-running tool with no async dispatch, network call to your own backend that hangs, or forgetting to send response.create after the output.

Long-running tools. Tools that take more than a few seconds should return a synchronous “working on it” output immediately and emit real results as a follow-up message via a fresh conversation.item.create. This keeps Hydra responsive — the assistant acknowledges the request out loud while the actual work happens, instead of waiting silently and risking a tool_response_timeout.

Common gotchas

  • One response.create per turn, not per tool. Multi-tool turns require debounce. The model decides when to call multiple tools — your client decides when to request narration.
  • Tools execute on your side, not Hydra’s. Hydra streams arguments; you run the code. Same model as the OpenAI Realtime API.
  • Unknown tool names are accepted in the schema, then never called. If the model isn’t calling your tool, double-check that the name in session.configure matches the user prompt’s intent.

Next

  • Prompting voice agents — phrasing instructions so the model reliably calls tools
  • Errors & reconnection — tool_response_timeout and other failure modes