WebSocket connection | Smallest AI Docs

A Hydra session is a single WebSocket connection. Open it once, stream audio in real time, close it when the user is done.

Endpoint

wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<SMALLEST_API_KEY>

Parameter	Required	Notes
`model`	yes	Pass `hydra` (the only supported speech-to-speech model today). Setting it explicitly keeps your code stable when more models are added.
`api_key`	yes	Your `SMALLEST_API_KEY`. Treat as a secret. For browser apps, mint a short-lived token server-side rather than shipping a long-lived key.

All frames are JSON text frames (UTF-8). Binary frames are not used.

Connect

Python

Node.js

Browser

1 import asyncio, os
2 import websockets
3 
4 URL = f"wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key={os.environ['SMALLEST_API_KEY']}"
5 
6 async def main():
7     async with websockets.connect(URL, max_size=None) as ws:
8         async for raw in ws:
9             print(raw)  # → session.created, then on to session.configure
10 
11 asyncio.run(main())

Requires pip install websockets.

Authentication

Auth happens during the WebSocket handshake. If api_key is missing or invalid, the server returns HTTP 401 and the WebSocket never opens — no event frames, no close code.

1 try:
2     async with websockets.connect(URL) as ws:
3         ...
4 except websockets.exceptions.InvalidStatus as e:
5     if e.response.status_code == 401:
6         print("Bad API key")

A valid api_key gives you ~30 seconds of idle tolerance per session — see Idle timeout below.

What you get back

The first frame after connect is always session.created:

1 {
2   "type": "session.created",
3   "event_id": "sv_588fd416b4a24fe3",
4   "session_id": "4bf1f4f6-5fa4-4de5-aa29-83f5eed96c82"
5 }

The server now waits for one session.configure from you before accepting audio. Once configured, it echoes session.configured and starts accepting input_audio_buffer.append frames. See Managing sessions.

Idle timeout

If neither side sends a frame for ~30 seconds, the server closes with code 1000 (normal close). A fresh connect starts a new session — there is no resume; replay any required state in the new session.configure.

Heartbeat with silence frames. If your client has a UX pause where the user might think for a while (a long deliberation prompt, a UI confirmation step), keep streaming input_audio_buffer.append with whatever the mic is producing — silence still counts as traffic and keeps the session alive. Stopping the mic stream and resuming it later is exactly the case the 30 s timeout will catch.

Close codes

Code	Meaning	Action
`1000`	Normal close, including idle timeout	Reconnect if the user is still active
`1013`	Server at capacity. Preceded by an `error` event with `code: "server_full"`.	Back off with jitter; surface a queue message

Auth failures are not a close code — they’re HTTP 401 at handshake time, as noted above.

Managing sessions — session.configure, voices, persona, mid-session updates
Audio I/O — input PCM16, output rate negotiation, browser AudioWorklet
Errors & reconnection — full error catalog