A Hydra session is the stateful interaction between the model and one connected client. One WebSocket = one session.
The handshake is one-shot. After session.created, the server waits for exactly one session.configure before accepting audio. Subsequent session.configure frames are ignored — use session.update for mid-session changes.
session.configureSend this once, immediately after session.created. Every field is optional.
session.configure silently accepts unknown fields — a typo like instuctions is ignored, not rejected, and the default persona ships instead. Validate keys client-side. session.update is stricter and returns an invalid_frame error on unknown fields.
session.configured (server echo)Use session.update to live-patch the session without reconnecting. Only the tools field is honoured today. Persona, voice, and audio formats are frozen at handshake; changes to those require a fresh connection.
The server replies with session.updated containing only the fields it actually applied. A no-op patch produces no echo.
Setting generate_initial_response: true on session.configure makes Hydra deliver an opening line before any user audio arrives. Useful for greetings and concierge openers.
Immediately after session.configured, the standard response.created → audio deltas → response.done sequence fires, with no preceding input_audio_buffer.speech_started.
Most events carry a ConversationItem. The shape is intentionally flat — every field is optional, presence is dictated by type.
Discarded user turns — speech that VAD started but the turn detector later rejected — arrive as conversation.item.done with status: "incomplete". Silence and sub-VAD noise produce no events at all.
response.doneEvery response ends with response.done:
input_audio_buffer.append and how to play response.output_audio.delta