Errors & reconnection | Smallest AI Docs

Errors arrive as JSON events with type: "error". The connection stays usable unless a close frame follows immediately.

Error frame

1 {
2   "type": "error",
3   "event_id": "sv_…",
4   "error": {
5     "type": "invalid_request_error",
6     "code": "invalid_audio",
7     "message": "audio frame too small (40 bytes, need 320)",
8     "param": "audio",
9     "event_id": "evt_…"
10   }
11 }

error.event_id (when present) is the event_id of the client frame that triggered the error — correlate it with what you sent.

Two fields to switch on: `error.type` and `error.code`

An error frame has both a type (broad category) and a code (specific reason). Match on error.code for actionable handling.

1 if evt["type"] == "error":
2     code = evt["error"]["code"]
3     if code == "invalid_audio":         ...   # fix audio encoding
4     elif code == "tool_response_timeout": ...  # tool ran long
5     elif code == "server_full":         ...   # back off

`error.code` reference

`error.code`	When it fires	Recovery
`invalid_frame`	A frame failed schema validation on a strict path — `session.update` with an unknown field, malformed JSON, or wrong frame `type`.	Fix the frame and resend. Session continues.
`invalid_request_error`	Generic request validation failure — missing `type`, wrong type for a known field, unsupported audio format.	Fix the frame and resend. Session continues.
`invalid_audio`	The `input_audio_buffer.append.audio` payload was not valid base64, or not PCM16.	Stop sending audio, fix encoding, resume. Session continues.
`tool_response_timeout`	You declared tools, the model called one, and you didn’t post `function_call_output` + `response.create` within the timeout window.	The turn is abandoned. Next user turn proceeds normally. Investigate why your tool didn’t return in time.
`server_full`	The server is at capacity. Always followed by close code `1013`.	Back off with jitter, retry. Surface a queue message to the user.
`internal_error`	Unhandled server-side exception.	Reconnect once. If it persists, contact support.

What’s NOT an error event

Failure mode	What you actually see	Why
Invalid `api_key`	HTTP 401 at WebSocket handshake — connection never opens	Auth is enforced before the WebSocket upgrade completes. There’s no socket to send an `error` over.
Idle timeout	Close code `1000` with no preceding `error`	Normal close. Reconnect if the user is still active.
Network drop	Close code (various) or transport error	Treat like any WebSocket disconnect — reconnect logic on the client.

See WebSocket connection for the full close-code reference.

Reconnection strategy

Hydra sessions are stateful — when you reconnect, you get a new session_id and a fresh session.configure requirement. There’s no resume token.

A simple, correct strategy:

1 async def run_with_reconnect():
2     backoff = 1.0
3     while True:
4         try:
5             await run_one_session()
6             backoff = 1.0          # success — reset backoff
7         except websockets.exceptions.InvalidStatus as e:
8             if e.response.status_code == 401:
9                 raise              # bad key — don't retry
10             await asyncio.sleep(backoff)
11             backoff = min(backoff * 2, 30)
12         except websockets.exceptions.ConnectionClosed as e:
13             if e.code == 1013:     # server full
14                 await asyncio.sleep(backoff)
15                 backoff = min(backoff * 2, 60)
16             elif e.code == 1000:   # normal close (idle / user done)
17                 return
18             else:
19                 await asyncio.sleep(backoff)
20                 backoff = min(backoff * 2, 30)

Two principles:

Cap the backoff. Don’t let a server-full event escalate to an unbounded retry storm.
Don’t retry HTTP 401. It’s a credential error — looping won’t fix it. Bubble it up.

Diagnosing in production

event_id correlation is the single most useful debugging tool. Always include a client-side event_id on outbound frames (UUIDs are fine):

1 import uuid
2 
3 await ws.send(json.dumps({
4     "event_id": f"evt_{uuid.uuid4().hex[:12]}",
5     "type": "session.configure",
6     "session": { ... },
7 }))

When an error references your event_id, you know exactly which frame caused it.

Common gotchas

Treating every error as fatal. Most aren’t. Only act on error if a close frame follows.
Reconnecting on 1000. Idle close is normal. Reconnect only if the user is still engaged.
No backoff cap. A burst of server_full errors with no jitter is how you turn a transient outage into your own outage.

WebSocket connection — close codes and idle timeout
Tool calling — tool_response_timeout deep dive