WebSocket connection

View as Markdown

A Hydra session is a single WebSocket connection. Open it once, stream audio in real time, close it when the user is done.

Endpoint

wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<SMALLEST_API_KEY>
ParameterRequiredNotes
modelyesPass hydra (the only supported speech-to-speech model today). Setting it explicitly keeps your code stable when more models are added.
api_keyyesYour SMALLEST_API_KEY. Treat as a secret. For browser apps, mint a short-lived token server-side rather than shipping a long-lived key.

All frames are JSON text frames (UTF-8). Binary frames are not used.

Connect

1import asyncio, os
2import websockets
3
4URL = f"wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key={os.environ['SMALLEST_API_KEY']}"
5
6async def main():
7 async with websockets.connect(URL, max_size=None) as ws:
8 async for raw in ws:
9 print(raw) # → session.created, then on to session.configure
10
11asyncio.run(main())

Requires pip install websockets.

Authentication

Auth happens during the WebSocket handshake. If api_key is missing or invalid, the server returns HTTP 401 and the WebSocket never opens — no event frames, no close code.

1try:
2 async with websockets.connect(URL) as ws:
3 ...
4except websockets.exceptions.InvalidStatus as e:
5 if e.response.status_code == 401:
6 print("Bad API key")

A valid api_key gives you ~30 seconds of idle tolerance per session — see Idle timeout below.

What you get back

The first frame after connect is always session.created:

1{
2 "type": "session.created",
3 "event_id": "sv_588fd416b4a24fe3",
4 "session_id": "4bf1f4f6-5fa4-4de5-aa29-83f5eed96c82"
5}

The server now waits for one session.configure from you before accepting audio. Once configured, it echoes session.configured and starts accepting input_audio_buffer.append frames. See Managing sessions.

Idle timeout

If neither side sends a frame for ~30 seconds, the server closes with code 1000 (normal close). A fresh connect starts a new session — there is no resume; replay any required state in the new session.configure.

Heartbeat with silence frames. If your client has a UX pause where the user might think for a while (a long deliberation prompt, a UI confirmation step), keep streaming input_audio_buffer.append with whatever the mic is producing — silence still counts as traffic and keeps the session alive. Stopping the mic stream and resuming it later is exactly the case the 30 s timeout will catch.

Close codes

CodeMeaningAction
1000Normal close, including idle timeoutReconnect if the user is still active
1013Server at capacity. Preceded by an error event with code: "server_full".Back off with jitter; surface a queue message

Auth failures are not a close code — they’re HTTP 401 at handshake time, as noted above.

Next