Hydra (Realtime / WebSocket)
Hydra (Realtime / WebSocket)
Hydra (Realtime / WebSocket)
Hydra is Smallest AI’s full-duplex speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. Audio in, audio out — Hydra does not emit a transcript stream. If your application needs text, transcribe the PCM you sent or received using the Pulse STT API.
wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<SMALLEST_API_KEY>.session.created → send session.configure with persona, voice, optional tools.input_audio_buffer.append continuously while the mic is open — even while the model is speaking. Hydra detects turn boundaries on its own.response.output_audio.delta chunks (base64 PCM16) and queue them for playback at 48000 Hz.response.created arrives before the previous response’s response.output_audio.done, drop any still-scheduled audio buffers from the previous response.input_audio_buffer.append. Recommended chunk size 20–40 ms (640–1280 samples).response.output_audio.delta.Currently supported: wren, sloane, marlowe, reed, knox, tate.
~30 seconds of no traffic from either side. Reconnect to resume.
api_key is rejected during the WebSocket handshake with HTTP 401 — the upgrade never completes, so no error event or close code lands on the wire.1000 — normal close (including the ~30 s server-side idle timeout).1013 — server at capacity. An error event with code: "server_full" precedes the close frame; back off with jitter and retry.See the Hydra realtime guide for the full protocol reference, tool-calling pattern, interruption handling, and complete Python + browser examples.
Header authentication of the form Bearer <token>
Sent exactly once, immediately after receiving session.created. The server will not accept audio until this frame is processed.
Apply a mid-session change. Today, only tools is honoured.
Continuously while the microphone is open. No manual commit or end-of-turn event — Hydra decides when a turn ends.
Decode the base64 delta and play at the negotiated output sample rate. If response.created arrives before the previous response’s response.output_audio.done, drop any still-scheduled audio from the previous response — the user has barged in.
The client is responsible for executing the tool and posting the result back via conversation.item.create.
Errors are non-fatal unless followed by a close frame. Treat them as diagnostics.