For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Endpoint
  • Connect
  • Authentication
  • What you get back
  • Idle timeout
  • Close codes
  • Next
Speech to Speech (Hydra)

WebSocket connection

||View as Markdown|
Was this page helpful?
Previous

Quickstart

Next

Managing sessions

Built with

A Hydra session is a single WebSocket connection. Open it once, stream audio in real time, close it when the user is done.

Endpoint

wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=<SMALLEST_API_KEY>
ParameterRequiredNotes
modelyesPass hydra (the only supported speech-to-speech model today). Setting it explicitly keeps your code stable when more models are added.
api_keyyesYour SMALLEST_API_KEY. Treat as a secret. For browser apps, mint a short-lived token server-side rather than shipping a long-lived key.

All frames are JSON text frames (UTF-8). Binary frames are not used.

Connect

Python
Node.js
Browser
1import asyncio, os
2import websockets
3
4URL = f"wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key={os.environ['SMALLEST_API_KEY']}"
5
6async def main():
7 async with websockets.connect(URL, max_size=None) as ws:
8 async for raw in ws:
9 print(raw) # → session.created, then on to session.configure
10
11asyncio.run(main())

Requires pip install websockets.

Authentication

Auth happens during the WebSocket handshake. If api_key is missing or invalid, the server returns HTTP 401 and the WebSocket never opens — no event frames, no close code.

1try:
2 async with websockets.connect(URL) as ws:
3 ...
4except websockets.exceptions.InvalidStatus as e:
5 if e.response.status_code == 401:
6 print("Bad API key")

A valid api_key gives you ~30 seconds of idle tolerance per session — see Idle timeout below.

What you get back

The first frame after connect is always session.created:

1{
2 "type": "session.created",
3 "event_id": "sv_588fd416b4a24fe3",
4 "session_id": "4bf1f4f6-5fa4-4de5-aa29-83f5eed96c82"
5}

The server now waits for one session.configure from you before accepting audio. Once configured, it echoes session.configured and starts accepting input_audio_buffer.append frames. See Managing sessions.

Idle timeout

If neither side sends a frame for ~30 seconds, the server closes with code 1000 (normal close). A fresh connect starts a new session — there is no resume; replay any required state in the new session.configure.

Heartbeat with silence frames. If your client has a UX pause where the user might think for a while (a long deliberation prompt, a UI confirmation step), keep streaming input_audio_buffer.append with whatever the mic is producing — silence still counts as traffic and keeps the session alive. Stopping the mic stream and resuming it later is exactly the case the 30 s timeout will catch.

Close codes

CodeMeaningAction
1000Normal close, including idle timeoutReconnect if the user is still active
1013Server at capacity. Preceded by an error event with code: "server_full".Back off with jitter; surface a queue message

Auth failures are not a close code — they’re HTTP 401 at handshake time, as noted above.

Next

  • Managing sessions — session.configure, voices, persona, mid-session updates
  • Audio I/O — input PCM16, output rate negotiation, browser AudioWorklet
  • Errors & reconnection — full error catalog