Hydra — full-duplex speech-to-speech model
Hydra — full-duplex speech-to-speech model
Hydra, Smallest AI’s in-house speech-to-speech model, is now live. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back — no STT → LLM → TTS pipeline in the middle.
What’s in this launch:
- Single WebSocket endpoint at
wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=.... JSON text frames; no binary frames. - Full-duplex with server-side VAD — stream
input_audio_buffer.appendcontinuously while the mic is open. Hydra detects turn boundaries on its own. If the user speaks while the model is responding, the in-flight response is cancelled (response.donewithreason: "interrupted") and a fresh turn begins. - Six voices:
wren,sloane,marlowe,reed,knox,tate. - Tool calling — declare JSON-schema tools in
session.configure, Hydra streams arguments viaresponse.function_call_arguments.delta, your client executes the tool and posts the result back viaconversation.item.create+response.create. - Bot speaks first — set
generate_initial_response: trueonsession.configurefor greetings and concierge openers. - Mid-session updates — live-patch
toolsviasession.updatewithout reconnecting. - Audio formats: input PCM16 mono 16 kHz; output PCM16 mono 48 kHz.
When to use Hydra vs the three-model stack:
- Use Hydra when latency-to-voice matters above all else — phone agents, kiosks, in-car assistants.
- Use Pulse → Electron → Lightning v3.1 when you need explicit text in the middle: analytics, custom retrieval, regulated content moderation, BYOM.
Docs:
- Quickstart — clone the reference client and talk to Hydra in your browser.
- Overview — full event reference, session config, tool calling, interruption, errors.
- Model Card — voices, performance, pricing.
- Reference client (Next.js) — production-grade browser client with live wire-log, multi-agent presets, tool execution.

