Hydra | Smallest AI Docs

Latest Release

Hydra is Smallest AI’s speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. There is no STT → LLM → TTS pipeline in the middle, and no transcript on the wire.

Jump to: Benchmarks · Voices · API Reference · Pricing & Throughput · Quickstart

Audio in, audio out

One model. One socket. No glue code.

Full-duplex

Barge-in handled by the model. In-flight responses cancel automatically.

Tool calling

Standard JSON-schema function calls, executed on your side.

6 voices

wren, sloane, marlowe, reed, knox, tate.

Model Overview


Developed by	Smallest AI
Model type	Full-duplex speech-to-speech
API surface	WebSocket (`wss://api.smallest.ai/waves/v1/s2s`)
Model ID (query param)	`model=hydra`
Wire version	v1
License	Proprietary, hosted API

Key Capabilities

Audio In, Audio Out

Single WebSocket carries microphone PCM in and response PCM out. No STT → LLM → TTS pipeline, no transcript on the wire.

Full-duplex Barge-in

Server handles interruption natively. In-flight responses cancel automatically when the user speaks over the bot.

Tool / Function Calling

Standard JSON-schema tools with streamed arguments, executed on your side.

6 Voices

wren, sloane, marlowe, reed, knox, tate. Frozen at handshake.

Bot Speaks First

generate_initial_response: true lets the bot open with a greeting before the user speaks.

Mid-session Updates

Live-patch tools and session config without reconnecting via session.update.

How to use it

See the Hydra quickstart for a working end-to-end browser client — clone the reference repo, paste your API key, and talk to Hydra. Hydra is selected via the model query parameter on the unified Speech-to-Speech endpoint: wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=$SMALLEST_API_KEY. For browser clients, mint a short-lived token server-side rather than embedding the long-lived key in the page.

The Overview covers the full event catalog (session config, tool calling, interruption, errors).

Performance & Benchmarks

Hydra is benchmarked against eight other production-grade voice / realtime models. Full methodology and metric definitions live on the dedicated Performance and Metrics Overview pages.

AIEWF S2S — 10 runs × 30 turns, `aiwf_medium_context`

Model	Pass rate	Non-tool V2V median	Non-tool V2V max	Tool V2V mean
ultravox-v0.7	97.7 %	864 ms	1888 ms	2406 ms
gpt-realtime-2 (low)	96.0 %	1728 ms	4032 ms	2005 ms
Hydra	95.9 %	864 ms	1984 ms	1624 ms
grok-voice-think-fast-1.0	95.3 %	2336 ms	4800 ms	2753 ms
gpt-realtime-1.5	93.3 %	1152 ms	2304 ms	2251 ms
gemini-3.1-flash-live-preview	91.7 %	1632 ms	5664 ms	3172 ms
gpt-realtime	86.7 %	1536 ms	4672 ms	2199 ms
gemini-live	86.0 %	2624 ms	30000 ms	4082 ms
nova-2-sonic	—	1280 ms	3232 ms	1689 ms

Reading the table:

Tool V2V mean latency: Hydra is the fastest of 9 (1624 ms — beats nova-2-sonic at 1689 ms, gpt-realtime-2 low at 2005 ms, ultravox at 2406 ms).
Non-tool V2V median latency: tied-fastest (864 ms with ultravox).
Pass rate: #3 of 8 (within ~2 pp of the leader; nova-2-sonic did not report pass rate).

Latency numbers are computed from transcript.jsonl across all 10 runs (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.

Hydra is evaluated on voice-agent axes — voice-to-voice latency, turn-taking accuracy, barge-in handling, and tool-call reliability under realistic conditions. Generic LLM benchmarks (MMLU, IFEval) target a different objective and aren’t the right yardstick for a realtime voice model.

Operational metrics

Metric	Value
Idle timeout	~30 s with no traffic from either side. Keep streaming audio (silence frames are fine) to hold the connection.

Supported Languages

Hydra currently supports English only. Additional languages are on the roadmap.

Language	ISO code	Status
English	`en`	✅ Production

Voices

Six voice IDs are accepted on session.configure.session.voice and frozen at handshake.

Voice ID	Notes
`wren`	Default (server-side default if `voice` is omitted).
`sloane`	—
`marlowe`	—
`reed`	—
`knox`	—
`tate`	—

API Reference

Endpoint	Method	Use case
`wss://api.smallest.ai/waves/v1/s2s?model=hydra`	WebSocket	Realtime full-duplex speech-to-speech

See Hydra (Realtime / WebSocket) for the full event schema. The documentation hub covers the event catalog, session config, tool calling, interruption handling, and errors end-to-end.

Throughput, Latency & Pricing

Metric	Typical	Notes
Non-tool voice-to-voice median	864 ms	Tied-fastest of 9 models on AIEWF S2S `aiwf_medium_context`.
Tool voice-to-voice mean	1624 ms	Fastest of 9 models on the same benchmark.
Pass rate	95.9%	#3 of 8 on AIEWF S2S, within ~2 pp of the leader.

One voice session per WebSocket connection. Concurrency follows your plan’s WebSocket pool. Excess connections receive error with code: "server_full" followed by close code 1013 — back off with jitter and retry.
Idle timeout: ~30 s with no traffic from either side. Keep streaming audio (silence frames are fine) to hold the connection.

Pricing: Contact your Smallest AI account manager. Hydra is billed by session minute; usage per turn is reported on response.done when available.

Best Practices

Keep the socket warm. Stream silence frames during pauses rather than letting the 30 s idle timer fire.
Handle code: "server_full" with jittered backoff. Capacity is per-plan WebSocket pool; surface a “please retry” UX rather than a hard error to the user.
Mint short-lived tokens for browser clients. Don’t embed the long-lived SMALLEST_API_KEY in client-side code — mint a session token server-side.
Live-patch tools via session.update rather than reconnecting. Reconnects pay handshake cost; session.update does not.
For compliance transcripts, mirror the PCM through Pulse STT after the session — Hydra itself does not emit transcripts on the wire.

Technical Specifications

Specification	Details
Endpoint	`wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<KEY>`
Frame format	JSON (UTF-8 text frames). No binary frames.
Authentication	`api_key` query parameter (browser clients should mint a short-lived token server-side)
Input audio	PCM16 signed little-endian, 16 kHz, mono, base64 in `input_audio_buffer.append`
Output audio	PCM16 signed little-endian, 48 kHz, mono, base64 in `response.output_audio.delta`
Close codes	`1000` normal · `1013` server full · auth failures are HTTP 401 during the WS handshake (no close code)

Use Cases

Direct Use

Realtime voice assistants — companion apps, concierge bots, in-app tutors.
Phone agents — restaurant reservations, banking concierges, customer support.
Voice copilots embedded in web and mobile apps.
Accessibility — voice-first interfaces for visually-impaired users.
Voice-controlled IoT and games — kiosks, in-car assistants, gaming companions.

Downstream Use

Conversational analytics over recorded phone-call audio (transcribe the captured audio with Pulse STT afterwards).
Multi-agent voice systems where Hydra is one specialised speaker.
Hybrid voice + text agents where a text fallback is needed for compliance.

Safety & Compliance

Hydra is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Hydra does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.

For voice-agent applications handling regulated content (financial, healthcare), the standard pattern applies: keep PII out of prompts where practical, apply post-processing redaction on outputs, and — if you need a transcript for compliance — transcribe the PCM you sent/received via the Pulse STT API and store that transcript with your moderation log.

For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai.

Support

Email: support@smallest.ai
Community: Discord
Documentation: docs.smallest.ai/waves
Console: app.smallest.ai/dashboard