Hydra
Hydra is Smallest AI’s speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. There is no STT → LLM → TTS pipeline in the middle, and no transcript on the wire.
One model. One socket. No glue code.
Barge-in handled by the model. In-flight responses cancel automatically.
Standard JSON-schema function calls, executed on your side.
generate_initial_response: true for greetings.
Model Overview
Key capabilities
wren, sloane, marlowe, reed, knox, tate.
JSON-schema tools, streamed args, client-side execution.
Live-patch tools without reconnecting via session.update.
Audio formats
Languages
Hydra currently supports English only. Additional languages are on the roadmap.
Voices
Six voice IDs are accepted on session.configure.session.voice and frozen at handshake.
Performance & benchmarks
Hydra is benchmarked against eight other production-grade voice / realtime models. Full methodology and metric definitions live on the dedicated Performance and Metrics Overview pages.
AIEWF S2S — 10 runs × 30 turns, aiwf_medium_context
Reading the table:
- Tool V2V mean latency: Hydra is the fastest of 9 (1624 ms — beats nova-2-sonic at 1689 ms, gpt-realtime-2 low at 2005 ms, ultravox at 2406 ms).
- Non-tool V2V median latency: tied-fastest (864 ms with ultravox).
- Pass rate: #3 of 8 (within ~2 pp of the leader; nova-2-sonic did not report pass rate).
Latency numbers are computed from transcript.jsonl across all 10 runs (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.
Hydra is evaluated on voice-agent axes — voice-to-voice latency, turn-taking accuracy, barge-in handling, and tool-call reliability under realistic conditions. Generic LLM benchmarks (MMLU, IFEval) target a different objective and aren’t the right yardstick for a realtime voice model.
Operational metrics
Capacity & rate limits
- One voice session per WebSocket connection.
- Concurrency follows your plan’s WebSocket pool. Excess connections receive
errorwithcode: "server_full"followed by close code1013— back off with jitter and retry.
For higher limits, contact your account manager.
Pricing
Contact your Smallest AI account manager for current pricing. Hydra is billed by session minute; usage per turn is reported on response.done when available.
Use cases
Direct use
- Realtime voice assistants — companion apps, concierge bots, in-app tutors.
- Phone agents — restaurant reservations, banking concierges, customer support.
- Voice copilots embedded in web and mobile apps.
- Accessibility — voice-first interfaces for visually-impaired users.
- Voice-controlled IoT and games — kiosks, in-car assistants, gaming companions.
Downstream use
- Conversational analytics over recorded phone-call audio (transcribe the captured audio with Pulse STT afterwards).
- Multi-agent voice systems where Hydra is one specialised speaker.
- Hybrid voice + text agents where a text fallback is needed for compliance.
Specs & API surface
See the documentation hub for the full event catalog, session config, tool calling, interruption handling, and errors.
Safety & responsible use
Hydra is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Hydra does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.
For voice-agent applications handling regulated content (financial, healthcare), the standard pattern applies: keep PII out of prompts where practical, apply post-processing redaction on outputs, and — if you need a transcript for compliance — transcribe the PCM you sent/received via the Pulse STT API and store that transcript with your moderation log.

