> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Hydra

> Model card for Hydra — Smallest AI's full-duplex speech-to-speech model. Audio in, audio out, one WebSocket. Built for phone-grade latency and barge-in.

Latest Release

Hydra is Smallest AI's **speech-to-speech model**. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. There is no STT → LLM → TTS pipeline in the middle, and no transcript on the wire.

One model. One socket. No glue code.

Barge-in handled by the model. In-flight responses cancel automatically.

Standard JSON-schema function calls, executed on your side.

`generate_initial_response: true` for greetings.

## Model Overview

|                            |                                                  |
| -------------------------- | ------------------------------------------------ |
| **Developed by**           | Smallest AI                                      |
| **Model type**             | Full-duplex speech-to-speech                     |
| **API surface**            | WebSocket (`wss://api.smallest.ai/waves/v1/s2s`) |
| **Model ID (query param)** | `model=hydra`                                    |
| **Wire version**           | v1                                               |
| **License**                | Proprietary, hosted API                          |

### Key capabilities

`wren`, `sloane`, `marlowe`, `reed`, `knox`, `tate`.

JSON-schema tools, streamed args, client-side execution.

Live-patch `tools` without reconnecting via `session.update`.

## Audio formats

| Direction                | Format                      | Sample rate | Channels | Encoding                                    |
| ------------------------ | --------------------------- | ----------- | -------- | ------------------------------------------- |
| Input (client → server)  | PCM16, signed little-endian | **16 kHz**  | mono     | base64 inside `input_audio_buffer.append`   |
| Output (server → client) | PCM16, signed little-endian | **48 kHz**  | mono     | base64 inside `response.output_audio.delta` |

## Languages

Hydra currently supports **English only**. Additional languages are on the roadmap.

| Language | ISO code | Status       |
| -------- | -------- | ------------ |
| English  | `en`     | ✅ Production |

## Voices

Six voice IDs are accepted on `session.configure.session.voice` and frozen at handshake.

| Voice ID  | Notes                                                |
| --------- | ---------------------------------------------------- |
| `wren`    | Default (server-side default if `voice` is omitted). |
| `sloane`  | —                                                    |
| `marlowe` | —                                                    |
| `reed`    | —                                                    |
| `knox`    | —                                                    |
| `tate`    | —                                                    |

## Performance & benchmarks

Hydra is benchmarked against eight other production-grade voice / realtime models. Full methodology and metric definitions live on the dedicated [Performance](/waves/documentation/speech-to-speech-hydra/benchmarks/performance) and [Metrics Overview](/waves/documentation/speech-to-speech-hydra/benchmarks/metrics-overview) pages.

### AIEWF S2S — 10 runs × 30 turns, `aiwf_medium_context`

| Model                         | Pass rate  | Non-tool V2V median | Non-tool V2V max | Tool V2V mean |
| ----------------------------- | ---------- | ------------------- | ---------------- | ------------- |
| ultravox-v0.7                 | 97.7 %     | 864 ms              | 1888 ms          | 2406 ms       |
| gpt-realtime-2 (low)          | 96.0 %     | 1728 ms             | 4032 ms          | 2005 ms       |
| **Hydra**                     | **95.9 %** | **864 ms**          | **1984 ms**      | **1624 ms**   |
| grok-voice-think-fast-1.0     | 95.3 %     | 2336 ms             | 4800 ms          | 2753 ms       |
| gpt-realtime-1.5              | 93.3 %     | 1152 ms             | 2304 ms          | 2251 ms       |
| gemini-3.1-flash-live-preview | 91.7 %     | 1632 ms             | 5664 ms          | 3172 ms       |
| gpt-realtime                  | 86.7 %     | 1536 ms             | 4672 ms          | 2199 ms       |
| gemini-live                   | 86.0 %     | 2624 ms             | 30000 ms         | 4082 ms       |
| nova-2-sonic                  | —          | 1280 ms             | 3232 ms          | 1689 ms       |

Reading the table:

* **Tool V2V mean latency: Hydra is the fastest of 9** (1624 ms — beats nova-2-sonic at 1689 ms, gpt-realtime-2 low at 2005 ms, ultravox at 2406 ms).
* **Non-tool V2V median latency: tied-fastest** (864 ms with ultravox).
* **Pass rate: #3 of 8** (within \~2 pp of the leader; nova-2-sonic did not report pass rate).

Latency numbers are computed from `transcript.jsonl` across all 10 runs (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.

Hydra is evaluated on voice-agent axes — voice-to-voice latency, turn-taking accuracy, barge-in handling, and tool-call reliability under realistic conditions. Generic LLM benchmarks (MMLU, IFEval) target a different objective and aren't the right yardstick for a realtime voice model.

### Operational metrics

| Metric           | Value                                                                                                           |
| ---------------- | --------------------------------------------------------------------------------------------------------------- |
| **Idle timeout** | \~30 s with no traffic from either side. Keep streaming audio (silence frames are fine) to hold the connection. |

## Capacity & rate limits

* One voice session per WebSocket connection.
* Concurrency follows your plan's WebSocket pool. Excess connections receive `error` with `code: "server_full"` followed by close code `1013` — back off with jitter and retry.

For higher limits, contact your account manager.

## Pricing

Contact your Smallest AI account manager for current pricing. Hydra is billed by session minute; `usage` per turn is reported on `response.done` when available.

## Use cases

### Direct use

* **Realtime voice assistants** — companion apps, concierge bots, in-app tutors.
* **Phone agents** — restaurant reservations, banking concierges, customer support.
* **Voice copilots embedded in web and mobile apps.**
* **Accessibility** — voice-first interfaces for visually-impaired users.
* **Voice-controlled IoT and games** — kiosks, in-car assistants, gaming companions.

### Downstream use

* **Conversational analytics** over recorded phone-call audio (transcribe the captured audio with [Pulse STT](/waves/documentation/speech-to-text-pulse/overview) afterwards).
* **Multi-agent voice systems** where Hydra is one specialised speaker.
* **Hybrid voice + text agents** where a text fallback is needed for compliance.

## Specs & API surface

|                |                                                                                                        |
| -------------- | ------------------------------------------------------------------------------------------------------ |
| Endpoint       | `wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<KEY>`                                         |
| Frame format   | JSON (UTF-8 text frames). No binary frames.                                                            |
| Authentication | `api_key` query parameter (browser clients should mint a short-lived token server-side).               |
| Close codes    | `1000` normal, `1013` server full — auth failures are HTTP 401 during the WS handshake (no close code) |

See the [documentation hub](/waves/documentation/speech-to-speech-hydra/overview) for the full event catalog, session config, tool calling, interruption handling, and errors.

## Safety & responsible use

Hydra is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Hydra does not currently apply content moderation server-side — outputs reflect the model's training and the prompts you provide.

For voice-agent applications handling regulated content (financial, healthcare), the standard pattern applies: keep PII out of prompts where practical, apply post-processing redaction on outputs, and — if you need a transcript for compliance — transcribe the PCM you sent/received via the [Pulse STT API](/waves/documentation/speech-to-text-pulse/overview) and store that transcript with your moderation log.

## Related

Clone the reference client, paste your API key, and talk to Hydra in your browser.

Full event reference, tool calling, interruption, errors.