For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Text to Speech
    • Lightning v3.1 Pro
    • Lightning v3.1
    • TTS Evaluation Script
  • Speech to Text
    • Pulse Pro
    • Pulse
  • LLM
    • Electron
  • Speech to Speech
    • Hydra
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Model Overview
  • Key capabilities
  • Audio formats
  • Languages
  • Voices
  • Performance & benchmarks
  • AIEWF S2S — 10 runs × 30 turns, aiwf_medium_context
  • Operational metrics
  • Capacity & rate limits
  • Pricing
  • Use cases
  • Direct use
  • Downstream use
  • Specs & API surface
  • Safety & responsible use
  • Related
Speech to Speech

Hydra

||View as Markdown|
Was this page helpful?
Previous

Electron

Built with
Latest Release

Hydra is Smallest AI’s speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. There is no STT → LLM → TTS pipeline in the middle, and no transcript on the wire.

Audio in, audio out

One model. One socket. No glue code.

Full-duplex

Barge-in handled by the model. In-flight responses cancel automatically.

Tool calling

Standard JSON-schema function calls, executed on your side.

Bot speaks first

generate_initial_response: true for greetings.

Model Overview

Developed bySmallest AI
Model typeFull-duplex speech-to-speech
API surfaceWebSocket (wss://api.smallest.ai/waves/v1/s2s)
Model ID (query param)model=hydra
Wire versionv1
LicenseProprietary, hosted API

Key capabilities

6 voices

wren, sloane, marlowe, reed, knox, tate.

Tool calling

JSON-schema tools, streamed args, client-side execution.

Mid-session updates

Live-patch tools without reconnecting via session.update.

Audio formats

DirectionFormatSample rateChannelsEncoding
Input (client → server)PCM16, signed little-endian16 kHzmonobase64 inside input_audio_buffer.append
Output (server → client)PCM16, signed little-endian48 kHzmonobase64 inside response.output_audio.delta

Languages

Hydra currently supports English only. Additional languages are on the roadmap.

LanguageISO codeStatus
Englishen✅ Production

Voices

Six voice IDs are accepted on session.configure.session.voice and frozen at handshake.

Voice IDNotes
wrenDefault (server-side default if voice is omitted).
sloane—
marlowe—
reed—
knox—
tate—

Performance & benchmarks

Hydra is benchmarked against eight other production-grade voice / realtime models. Full methodology and metric definitions live on the dedicated Performance and Metrics Overview pages.

AIEWF S2S — 10 runs × 30 turns, aiwf_medium_context

ModelPass rateNon-tool V2V medianNon-tool V2V maxTool V2V mean
ultravox-v0.797.7 %864 ms1888 ms2406 ms
gpt-realtime-2 (low)96.0 %1728 ms4032 ms2005 ms
Hydra95.9 %864 ms1984 ms1624 ms
grok-voice-think-fast-1.095.3 %2336 ms4800 ms2753 ms
gpt-realtime-1.593.3 %1152 ms2304 ms2251 ms
gemini-3.1-flash-live-preview91.7 %1632 ms5664 ms3172 ms
gpt-realtime86.7 %1536 ms4672 ms2199 ms
gemini-live86.0 %2624 ms30000 ms4082 ms
nova-2-sonic—1280 ms3232 ms1689 ms

Reading the table:

  • Tool V2V mean latency: Hydra is the fastest of 9 (1624 ms — beats nova-2-sonic at 1689 ms, gpt-realtime-2 low at 2005 ms, ultravox at 2406 ms).
  • Non-tool V2V median latency: tied-fastest (864 ms with ultravox).
  • Pass rate: #3 of 8 (within ~2 pp of the leader; nova-2-sonic did not report pass rate).

Latency numbers are computed from transcript.jsonl across all 10 runs (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.

Hydra is evaluated on voice-agent axes — voice-to-voice latency, turn-taking accuracy, barge-in handling, and tool-call reliability under realistic conditions. Generic LLM benchmarks (MMLU, IFEval) target a different objective and aren’t the right yardstick for a realtime voice model.

Operational metrics

MetricValue
Idle timeout~30 s with no traffic from either side. Keep streaming audio (silence frames are fine) to hold the connection.

Capacity & rate limits

  • One voice session per WebSocket connection.
  • Concurrency follows your plan’s WebSocket pool. Excess connections receive error with code: "server_full" followed by close code 1013 — back off with jitter and retry.

For higher limits, contact your account manager.

Pricing

Contact your Smallest AI account manager for current pricing. Hydra is billed by session minute; usage per turn is reported on response.done when available.

Use cases

Direct use

  • Realtime voice assistants — companion apps, concierge bots, in-app tutors.
  • Phone agents — restaurant reservations, banking concierges, customer support.
  • Voice copilots embedded in web and mobile apps.
  • Accessibility — voice-first interfaces for visually-impaired users.
  • Voice-controlled IoT and games — kiosks, in-car assistants, gaming companions.

Downstream use

  • Conversational analytics over recorded phone-call audio (transcribe the captured audio with Pulse STT afterwards).
  • Multi-agent voice systems where Hydra is one specialised speaker.
  • Hybrid voice + text agents where a text fallback is needed for compliance.

Specs & API surface

Endpointwss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<KEY>
Frame formatJSON (UTF-8 text frames). No binary frames.
Authenticationapi_key query parameter (browser clients should mint a short-lived token server-side).
Close codes1000 normal, 1013 server full — auth failures are HTTP 401 during the WS handshake (no close code)

See the documentation hub for the full event catalog, session config, tool calling, interruption handling, and errors.

Safety & responsible use

Hydra is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Hydra does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.

For voice-agent applications handling regulated content (financial, healthcare), the standard pattern applies: keep PII out of prompts where practical, apply post-processing redaction on outputs, and — if you need a transcript for compliance — transcribe the PCM you sent/received via the Pulse STT API and store that transcript with your moderation log.

Related

Quickstart

Clone the reference client, paste your API key, and talk to Hydra in your browser.

Overview

Full event reference, tool calling, interruption, errors.