Hydra — full-duplex speech-to-speech model

Hydra, Smallest AI’s in-house speech-to-speech model, is now live. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back — no STT → LLM → TTS pipeline in the middle.

1import asyncio, base64, json, os, wave
2import websockets
3
4API_KEY = os.environ["SMALLEST_API_KEY"]
5URL = f"wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key={API_KEY}"
6
7async def main():
8 async with websockets.connect(URL, max_size=None) as ws:
9 async for raw in ws:
10 evt = json.loads(raw)
11 if evt["type"] == "session.created":
12 await ws.send(json.dumps({
13 "type": "session.configure",
14 "session": {"instructions": "Be brief.", "voice": "wren"},
15 }))
16 elif evt["type"] == "response.output_audio.delta":
17 ... # decode base64 PCM16 and play
18
19asyncio.run(main())

What’s in this launch:

  • Single WebSocket endpoint at wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=.... JSON text frames; no binary frames.
  • Full-duplex with server-side VAD — stream input_audio_buffer.append continuously while the mic is open. Hydra detects turn boundaries on its own. If the user speaks while the model is responding, the in-flight response is cancelled (response.done with reason: "interrupted") and a fresh turn begins.
  • Six voices: wren, sloane, marlowe, reed, knox, tate.
  • Tool calling — declare JSON-schema tools in session.configure, Hydra streams arguments via response.function_call_arguments.delta, your client executes the tool and posts the result back via conversation.item.create + response.create.
  • Bot speaks first — set generate_initial_response: true on session.configure for greetings and concierge openers.
  • Mid-session updates — live-patch tools via session.update without reconnecting.
  • Audio formats: input PCM16 mono 16 kHz; output PCM16 mono 48 kHz.

When to use Hydra vs the three-model stack:

  • Use Hydra when latency-to-voice matters above all else — phone agents, kiosks, in-car assistants.
  • Use Pulse → Electron → Lightning v3.1 when you need explicit text in the middle: analytics, custom retrieval, regulated content moderation, BYOM.

Docs:

  • Quickstart — clone the reference client and talk to Hydra in your browser.
  • Overview — full event reference, session config, tool calling, interruption, errors.
  • Model Card — voices, performance, pricing.
  • Reference client (Next.js) — production-grade browser client with live wire-log, multi-agent presets, tool execution.