Tool calling | Smallest AI Docs

Hydra is a voice model — it doesn’t execute tools. The client declares tool schemas in session.configure, Hydra decides when to call them and streams the arguments JSON, and the client executes the tool locally and posts the result back.

Event flow

Declare tools

tools is a session.configure field. Each entry is a JSON Schema for a function the model may call.

1 {
2   "type": "session.configure",
3   "session": {
4     "instructions": "You are a weather assistant. Use get_weather when asked.",
5     "voice": "wren",
6     "tools": [
7       {
8         "type": "function",
9         "name": "get_weather",
10         "description": "Look up current weather for a city.",
11         "parameters": {
12           "type": "object",
13           "properties": { "city": { "type": "string" } },
14           "required": ["city"]
15         }
16       }
17     ]
18   }
19 }

You can also add or replace tools mid-session via session.update — see Managing sessions.

Execute the tool and post the result

When you receive response.function_call_arguments.done, parse the JSON, run your tool, and post the result back as a function_call_output item.

1 async for raw in ws:
2     evt = json.loads(raw)
3 
4     if evt["type"] == "response.function_call_arguments.done":
5         args = json.loads(evt["arguments"])
6         result = run_tool(evt["name"], args)
7 
8         await ws.send(json.dumps({
9             "type": "conversation.item.create",
10             "item": {
11                 "type": "function_call_output",
12                 "call_id": evt["call_id"],
13                 "output": result if isinstance(result, str) else json.dumps(result),
14             },
15         }))
16         # See "Multi-tool turns" below — don't send response.create here directly.
17         schedule_response_create()

After posting the tool output, you need to send a single response.create to tell Hydra to narrate the result. The next section explains the gotcha.

Single-tool vs multi-tool turns

If the model calls one tool, the obvious code works: post the output, send response.create, done.

If the model calls multiple tools in one turn, the obvious code is wrong. The server emits one response.function_call_arguments.done per call, and if you send response.create after each one, the model starts narrating before all results are in — you get a half-formed answer.

Solution: debounce response.create. Only fire one, ~200 ms after the last tool output.

1 import asyncio, json
2 
3 pending_create: asyncio.Task | None = None
4 DEBOUNCE_MS = 200
5 
6 async def _send_response_create():
7     await asyncio.sleep(DEBOUNCE_MS / 1000)
8     await ws.send(json.dumps({"type": "response.create"}))
9 
10 def schedule_response_create():
11     global pending_create
12     if pending_create and not pending_create.done():
13         pending_create.cancel()
14     pending_create = asyncio.create_task(_send_response_create())

For single-tool turns the debounce adds 200 ms — well below the model’s own time-to-first-audio, so users won’t notice.

Streaming arguments

The model emits arguments as a stream of JSON fragments. If you want to act on each token as it arrives (rare for tool args, common for showing a “thinking” UI), concatenate delta strings per call_id:

1 args_buf: dict[str, str] = {}
2 
3 if evt["type"] == "response.function_call_arguments.delta":
4     args_buf.setdefault(evt["call_id"], "")
5     args_buf[evt["call_id"]] += evt.get("delta", "")

The done event gives you the full string under arguments either way — so most clients just wait for done and parse once.

Tool response timeout

If you declare tools but don’t post function_call_output + response.create within the server’s timeout window, you get an error and the turn is abandoned:

1 { "type": "error", "error": { "code": "tool_response_timeout", "type": "..." } }

Common causes: a long-running tool with no async dispatch, network call to your own backend that hangs, or forgetting to send response.create after the output.

Long-running tools. Tools that take more than a few seconds should return a synchronous “working on it” output immediately and emit real results as a follow-up message via a fresh conversation.item.create. This keeps Hydra responsive — the assistant acknowledges the request out loud while the actual work happens, instead of waiting silently and risking a tool_response_timeout.

Common gotchas

One response.create per turn, not per tool. Multi-tool turns require debounce. The model decides when to call multiple tools — your client decides when to request narration.
Tools execute on your side, not Hydra’s. Hydra streams arguments; you run the code. Same model as the OpenAI Realtime API.
Unknown tool names are accepted in the schema, then never called. If the model isn’t calling your tool, double-check that the name in session.configure matches the user prompt’s intent.

Prompting voice agents — phrasing instructions so the model reliably calls tools
Errors & reconnection — tool_response_timeout and other failure modes