> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Hydra — Speech to Speech

> Hydra is Smallest AI's realtime, full-duplex speech-to-speech model. Audio in, audio out, over a single WebSocket — built for phone-grade voice agents.

Hydra is a **realtime speech-to-speech** model. The client streams microphone audio over a WebSocket, the model returns synthesised speech in the same socket, and turn-taking is handled server-side. There is **no transcript on the wire** — audio bytes are the payload.

If you've used the OpenAI Realtime API, Hydra fills the same role on the Smallest AI stack.

## Common use cases

Sub-second latency from end-of-user-speech to first audio chunk. Drop-in for outbound and inbound voice flows.

Hands-free assistants embedded in web and mobile apps — barge-in handled by the model.

One WebSocket, predictable failure modes, no STT/LLM/TTS glue to maintain.

Companions, audio diaries, language tutors — natural turn-taking out of the box.

## What's on the wire

```mermaid
sequenceDiagram
    autonumber
    participant C as Client
    participant H as Hydra
    C->>H: WebSocket connect
    H-->>C: session.created
    C->>H: session.configure<br />(persona, voice, tools)
    H-->>C: session.configured<br />(effective config echo)
    loop continuous mic stream
        C->>H: input_audio_buffer.append<br />(base64 PCM16, 16 kHz)
    end
    H-->>C: input_audio_buffer.speech_started
    H-->>C: input_audio_buffer.speech_stopped
    H-->>C: response.created
    loop streamed reply
        H-->>C: response.output_audio.delta<br />(base64 PCM16)
    end
    H-->>C: response.output_audio.done
    H-->>C: response.done
```

Two things to know up front:

* **Stream audio continuously** — no manual `commit` or `end-of-turn`. Hydra detects turn boundaries on its own.
* **Full-duplex** — the user can speak over the model. The in-flight response cancels automatically with `status: "cancelled"`, `reason: "interrupted"`.

## Next

Clone the reference client, paste your API key, and talk to Hydra in your browser.

Connect URL, auth, idle timeout, close codes. Python + Node + Browser snippets.

Session lifecycle, persona, voice, mid-session updates, conversation items.

How the model detects speech, how to handle barge-in cleanly on the client.

Declare tools, stream arguments, post results back, narrate the answer.

System prompts, voice identity, length discipline, tool-call prompting.

## Related

* [Model card — Hydra](/waves/model-cards/speech-to-speech/hydra) — voices, performance, pricing
* [Reference client (Next.js)](https://github.com/smallest-inc/hydra_agents) — production-grade browser client with barge-in, multi-agent presets, tool execution