Hydra (Realtime / WebSocket)

View as Markdown
# Hydra Speech-to-Speech WebSocket Hydra is Smallest AI's full-duplex speech-to-speech model. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back. **Audio in, audio out** — Hydra does not emit a transcript stream. If your application needs text, transcribe the PCM you sent or received using the Pulse STT API. ## When to use this - **Use Hydra** when latency-to-voice matters above all else — phone agents, kiosks, in-car assistants, real-time tutors. - **Use the Pulse → Electron → Lightning v3.1 stack** when you need explicit text in the middle (analytics, custom RAG, regulated content moderation, BYOM). - **Use just Lightning v3.1** when you already have text and only need TTS. ## How it works 1. Connect: `wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<SMALLEST_API_KEY>`. 2. Receive `session.created` → send `session.configure` with persona, voice, optional tools. 3. Stream `input_audio_buffer.append` continuously while the mic is open — even while the model is speaking. Hydra detects turn boundaries on its own. 4. Receive `response.output_audio.delta` chunks (base64 PCM16) and queue them for playback at 48000 Hz. 5. Handle barge-in: if `response.created` arrives before the previous response's `response.output_audio.done`, drop any still-scheduled audio buffers from the previous response. ## Audio formats - **Input** (client → server): PCM16, signed little-endian, mono, **16 kHz**, base64 inside `input_audio_buffer.append`. Recommended chunk size 20–40 ms (640–1280 samples). - **Output** (server → client): PCM16, signed little-endian, mono, **48000 Hz**. Each chunk arrives base64-encoded inside `response.output_audio.delta`. ## Voices Currently supported: `wren`, `sloane`, `marlowe`, `reed`, `knox`, `tate`. ## Idle timeout ~30 seconds of no traffic from either side. Reconnect to resume. ## Connection rejection and close codes - Bad `api_key` is rejected during the WebSocket *handshake* with **HTTP 401** — the upgrade never completes, so no `error` event or close code lands on the wire. - `1000` — normal close (including the ~30 s server-side idle timeout). - `1013` — server at capacity. An `error` event with `code: "server_full"` precedes the close frame; back off with jitter and retry. See the [Hydra realtime guide](/waves/documentation/speech-to-speech-hydra/overview) for the full protocol reference, tool-calling pattern, interruption handling, and complete Python + browser examples.