Hydra is Smallest AI’s speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. There is no STT → LLM → TTS pipeline in the middle, and no transcript on the wire.
One model. One socket. No glue code.
Barge-in handled by the model. In-flight responses cancel automatically.
Standard JSON-schema function calls, executed on your side.
generate_initial_response: true for greetings.
wren, sloane, marlowe, reed, knox, tate.
JSON-schema tools, streamed args, client-side execution.
Live-patch tools without reconnecting via session.update.
Hydra currently supports English only. Additional languages are on the roadmap.
Six voice IDs are accepted on session.configure.session.voice and frozen at handshake.
Hydra is benchmarked against eight other production-grade voice / realtime models. Full methodology and metric definitions live on the dedicated Performance and Metrics Overview pages.
aiwf_medium_contextReading the table:
Latency numbers are computed from transcript.jsonl across all 10 runs (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.
Hydra is evaluated on voice-agent axes — voice-to-voice latency, turn-taking accuracy, barge-in handling, and tool-call reliability under realistic conditions. Generic LLM benchmarks (MMLU, IFEval) target a different objective and aren’t the right yardstick for a realtime voice model.
error with code: "server_full" followed by close code 1013 — back off with jitter and retry.For higher limits, contact your account manager.
Contact your Smallest AI account manager for current pricing. Hydra is billed by session minute; usage per turn is reported on response.done when available.
See the documentation hub for the full event catalog, session config, tool calling, interruption handling, and errors.
Hydra is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Hydra does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.
For voice-agent applications handling regulated content (financial, healthcare), the standard pattern applies: keep PII out of prompts where practical, apply post-processing redaction on outputs, and — if you need a transcript for compliance — transcribe the PCM you sent/received via the Pulse STT API and store that transcript with your moderation log.