For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Lifecycle
  • session.configure
  • session.configured (server echo)
  • Mid-session updates
  • Bot speaks first
  • Conversation items
  • response.done
  • Next
Speech to Speech (Hydra)

Managing sessions

||View as Markdown|
Was this page helpful?
Previous

WebSocket connection

Next

Audio I/O

Built with

A Hydra session is the stateful interaction between the model and one connected client. One WebSocket = one session.

Lifecycle

The handshake is one-shot. After session.created, the server waits for exactly one session.configure before accepting audio. Subsequent session.configure frames are ignored — use session.update for mid-session changes.

session.configure

Send this once, immediately after session.created. Every field is optional.

1{
2 "type": "session.configure",
3 "session": {
4 "instructions": "You are a warm, concise voice assistant. Reply in one short sentence.",
5 "voice": "wren",
6 "tools": [],
7 "generate_initial_response": false
8 }
9}
FieldTypeNotes
instructionsstringSystem prompt. See Prompting voice agents.
voicestringOne of wren, sloane, marlowe, reed, knox, tate. Unknown values silently fall back to the default — validate client-side.
toolsarrayFunction-calling tool schemas. See Tool calling.
generate_initial_responsebooleantrue makes the model speak first, before any user audio. Honoured only at handshake.

session.configure silently accepts unknown fields — a typo like instuctions is ignored, not rejected, and the default persona ships instead. Validate keys client-side. session.update is stricter and returns an invalid_frame error on unknown fields.

session.configured (server echo)

1{
2 "type": "session.configured",
3 "event_id": "sv_df88e2e7ef6145c7",
4 "session": {
5 "instructions": "...",
6 "voice": "wren",
7 "tools": [],
8 "generate_initial_response": false
9 }
10}

Mid-session updates

Use session.update to live-patch the session without reconnecting. Only the tools field is honoured today. Persona, voice, and audio formats are frozen at handshake; changes to those require a fresh connection.

1{
2 "type": "session.update",
3 "session": {
4 "tools": [
5 { "type": "function", "name": "get_weather", "description": "...", "parameters": { ... } }
6 ]
7 }
8}

The server replies with session.updated containing only the fields it actually applied. A no-op patch produces no echo.

Bot speaks first

Setting generate_initial_response: true on session.configure makes Hydra deliver an opening line before any user audio arrives. Useful for greetings and concierge openers.

1{
2 "type": "session.configure",
3 "session": {
4 "instructions": "You are a hotel concierge. Greet the guest warmly and ask how you can help.",
5 "voice": "wren",
6 "generate_initial_response": true
7 }
8}

Immediately after session.configured, the standard response.created → audio deltas → response.done sequence fires, with no preceding input_audio_buffer.speech_started.

Conversation items

Most events carry a ConversationItem. The shape is intentionally flat — every field is optional, presence is dictated by type.

1{
2 "id": "item_…",
3 "type": "message" | "function_call" | "function_call_output",
4 "role": "user" | "assistant" | "system",
5 "status": "in_progress" | "completed" | "incomplete",
6 "content": [
7 { "type": "input_audio" | "output_audio" | "input_text" | "output_text" }
8 ],
9 "call_id": "call_…",
10 "name": "get_weather",
11 "arguments": "{...json...}",
12 "output": "..."
13}

Discarded user turns — speech that VAD started but the turn detector later rejected — arrive as conversation.item.done with status: "incomplete". Silence and sub-VAD noise produce no events at all.

response.done

Every response ends with response.done:

1{
2 "type": "response.done",
3 "response": {
4 "id": "resp_…",
5 "status": "completed" | "cancelled" | "incomplete" | "failed",
6 "status_details": { "reason": "...", "type": "...", "error": { ... } },
7 "output": [ /* ConversationItem */ ],
8 "usage": { "input_tokens": 0, "output_tokens": 0, "total_tokens": 0 }
9 }
10}
statusMeaning
completedTurn finished normally
cancelled • reason: "interrupted"The user barged in — handled automatically
cancelled • reason: "client_cancelled"The client sent response.cancel
incompleteStop condition (max_output_tokens, content_filter)
failedInternal error — see status_details.error

Next

  • Audio I/O — what to put in input_audio_buffer.append and how to play response.output_audio.delta
  • Turn detection & barge-in — how speech events fire and how to handle interruption on the client
  • Tool calling — declare and execute functions during a session