Hydra — Speech to Speech
Hydra — Speech to Speech
Hydra — Speech to Speech
Hydra is a realtime speech-to-speech model. The client streams microphone audio over a WebSocket, the model returns synthesised speech in the same socket, and turn-taking is handled server-side. There is no transcript on the wire — audio bytes are the payload.
If you’ve used the OpenAI Realtime API, Hydra fills the same role on the Smallest AI stack.
Sub-second latency from end-of-user-speech to first audio chunk. Drop-in for outbound and inbound voice flows.
Hands-free assistants embedded in web and mobile apps — barge-in handled by the model.
One WebSocket, predictable failure modes, no STT/LLM/TTS glue to maintain.
Companions, audio diaries, language tutors — natural turn-taking out of the box.
Two things to know up front:
commit or end-of-turn. Hydra detects turn boundaries on its own.status: "cancelled", reason: "interrupted".Clone the reference client, paste your API key, and talk to Hydra in your browser.
Connect URL, auth, idle timeout, close codes. Python + Node + Browser snippets.
Session lifecycle, persona, voice, mid-session updates, conversation items.
How the model detects speech, how to handle barge-in cleanly on the client.
Declare tools, stream arguments, post results back, narrate the answer.
System prompts, voice identity, length discipline, tool-call prompting.