A Hydra session is a single WebSocket connection. Open it once, stream audio in real time, close it when the user is done.
All frames are JSON text frames (UTF-8). Binary frames are not used.
Requires pip install websockets.
Auth happens during the WebSocket handshake. If api_key is missing or invalid, the server returns HTTP 401 and the WebSocket never opens — no event frames, no close code.
A valid api_key gives you ~30 seconds of idle tolerance per session — see Idle timeout below.
The first frame after connect is always session.created:
The server now waits for one session.configure from you before accepting audio. Once configured, it echoes session.configured and starts accepting input_audio_buffer.append frames. See Managing sessions.
If neither side sends a frame for ~30 seconds, the server closes with code 1000 (normal close). A fresh connect starts a new session — there is no resume; replay any required state in the new session.configure.
Heartbeat with silence frames. If your client has a UX pause where the user might think for a while (a long deliberation prompt, a UI confirmation step), keep streaming input_audio_buffer.append with whatever the mic is producing — silence still counts as traffic and keeps the session alive. Stopping the mic stream and resuming it later is exactly the case the 30 s timeout will catch.
Auth failures are not a close code — they’re HTTP 401 at handshake time, as noted above.
session.configure, voices, persona, mid-session updates