Managing sessions

View as Markdown

A Hydra session is the stateful interaction between the model and one connected client. One WebSocket = one session.

Lifecycle

The handshake is one-shot. After session.created, the server waits for exactly one session.configure before accepting audio. Subsequent session.configure frames are ignored — use session.update for mid-session changes.

session.configure

Send this once, immediately after session.created. Every field is optional.

1{
2 "type": "session.configure",
3 "session": {
4 "instructions": "You are a warm, concise voice assistant. Reply in one short sentence.",
5 "voice": "wren",
6 "tools": [],
7 "generate_initial_response": false
8 }
9}
FieldTypeNotes
instructionsstringSystem prompt. See Prompting voice agents.
voicestringOne of wren, sloane, marlowe, reed, knox, tate. Unknown values silently fall back to the default — validate client-side.
toolsarrayFunction-calling tool schemas. See Tool calling.
generate_initial_responsebooleantrue makes the model speak first, before any user audio. Honoured only at handshake.

session.configure silently accepts unknown fields — a typo like instuctions is ignored, not rejected, and the default persona ships instead. Validate keys client-side. session.update is stricter and returns an invalid_frame error on unknown fields.

session.configured (server echo)

1{
2 "type": "session.configured",
3 "event_id": "sv_df88e2e7ef6145c7",
4 "session": {
5 "instructions": "...",
6 "voice": "wren",
7 "tools": [],
8 "generate_initial_response": false
9 }
10}

Mid-session updates

Use session.update to live-patch the session without reconnecting. Only the tools field is honoured today. Persona, voice, and audio formats are frozen at handshake; changes to those require a fresh connection.

1{
2 "type": "session.update",
3 "session": {
4 "tools": [
5 { "type": "function", "name": "get_weather", "description": "...", "parameters": { ... } }
6 ]
7 }
8}

The server replies with session.updated containing only the fields it actually applied. A no-op patch produces no echo.

Bot speaks first

Setting generate_initial_response: true on session.configure makes Hydra deliver an opening line before any user audio arrives. Useful for greetings and concierge openers.

1{
2 "type": "session.configure",
3 "session": {
4 "instructions": "You are a hotel concierge. Greet the guest warmly and ask how you can help.",
5 "voice": "wren",
6 "generate_initial_response": true
7 }
8}

Immediately after session.configured, the standard response.created → audio deltas → response.done sequence fires, with no preceding input_audio_buffer.speech_started.

Conversation items

Most events carry a ConversationItem. The shape is intentionally flat — every field is optional, presence is dictated by type.

1{
2 "id": "item_…",
3 "type": "message" | "function_call" | "function_call_output",
4 "role": "user" | "assistant" | "system",
5 "status": "in_progress" | "completed" | "incomplete",
6 "content": [
7 { "type": "input_audio" | "output_audio" | "input_text" | "output_text" }
8 ],
9 "call_id": "call_…",
10 "name": "get_weather",
11 "arguments": "{...json...}",
12 "output": "..."
13}

Discarded user turns — speech that VAD started but the turn detector later rejected — arrive as conversation.item.done with status: "incomplete". Silence and sub-VAD noise produce no events at all.

response.done

Every response ends with response.done:

1{
2 "type": "response.done",
3 "response": {
4 "id": "resp_…",
5 "status": "completed" | "cancelled" | "incomplete" | "failed",
6 "status_details": { "reason": "...", "type": "...", "error": { ... } },
7 "output": [ /* ConversationItem */ ],
8 "usage": { "input_tokens": 0, "output_tokens": 0, "total_tokens": 0 }
9 }
10}
statusMeaning
completedTurn finished normally
cancelledreason: "interrupted"The user barged in — handled automatically
cancelledreason: "client_cancelled"The client sent response.cancel
incompleteStop condition (max_output_tokens, content_filter)
failedInternal error — see status_details.error

Next

  • Audio I/O — what to put in input_audio_buffer.append and how to play response.output_audio.delta
  • Turn detection & barge-in — how speech events fire and how to handle interruption on the client
  • Tool calling — declare and execute functions during a session