WebSocket SDK

View as Markdown

A JavaScript library that opens a live voice conversation with a Smallest agent from a web page. Published on npm as @smallest-ai/agent-sdk. Wraps the Realtime Agent WebSocket API, plus microphone capture, PCM playback, and an event API.

When to use it

  • Voice widget or “click to talk” button on a marketing or product site.
  • In-app voice input in a web dashboard.
  • Agent demo, playground, or sandbox pages.
  • Any browser experience where the user speaks to a Smallest agent and hears a reply.

When to use something else

RuntimeUse
React Native, iOS, Android, FlutterTalk to the Realtime Agent WebSocket API directly. The SDK calls navigator.mediaDevices.getUserMedia and AudioContext, neither of which exists in those runtimes.
Python, Go, Node, any server-side processSame. Use the raw WebSocket API with the platform’s native WebSocket and audio libraries.
Outbound phone callsUse Campaigns.

You need an API key (from app.smallest.ai/dashboard/api-keys) and an agent ID. Create an agent from the Agents dashboard or follow the Developer Quickstart.

Install

$npm install @smallest-ai/agent-sdk

Quickstart

1import { AtomsAgent } from "@smallest-ai/agent-sdk";
2
3const agent = new AtomsAgent({
4 apiKey: "sk_...", // Smallest API key
5 agentId: "...", // Agent ID
6});
7
8agent.on("session_started", (e) => {
9 console.log("Session started:", e.session_id, e.call_id);
10});
11agent.on("agent_start_talking", () => console.log("Agent speaking"));
12agent.on("agent_stop_talking", () => console.log("Agent stopped"));
13agent.on("error", (e) => console.error(`[${e.code}] ${e.message}`));
14
15await agent.connect();
16// Microphone capture starts automatically. Speak, and the agent replies
17// through the default audio output. Call agent.disconnect() when done.

The CDN equivalent uses the AtomsSdk global:

1<script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
2<script>
3 const agent = new AtomsSdk.AtomsAgent({
4 apiKey: "sk_...",
5 agentId: "...",
6 });
7 agent.on("session_started", (e) => console.log("connected:", e.session_id));
8 agent.connect();
9</script>

connect() calls navigator.mediaDevices.getUserMedia(). Browsers only expose the microphone on secure contexts: HTTPS in production, or http://localhost / http://127.0.0.1 in development. Serving the page from a non-loopback HTTP origin throws a permission error.

Configuration

1interface AtomsAgentConfig {
2 apiKey: string; // Required. Smallest API key.
3 agentId: string; // Required. Agent to connect to.
4 baseUrl?: string; // Default: wss://api.smallest.ai. Override for local dev.
5 sampleRate?: number; // Default: 24000. Audio sample rate in Hz.
6 autoCaptureMic?: boolean; // Default: true. If false, no microphone is captured and mute()/unmute() are no-ops. See the Push-to-talk pattern below for the correct way to start muted.
7}

Methods

MethodReturnsDescription
connect()Promise<void>Open the WebSocket and start the session. If autoCaptureMic is true, microphone capture starts on connect.
disconnect()voidEnd the session and release the microphone and audio output.
sendText(text)voidSend a text message to the agent. The reply returns as audio via output_audio.delta.
mute()voidStop sending microphone audio. The WebSocket stays open.
unmute()voidResume sending microphone audio.
isConnected (getter)booleantrue between connect() resolving and the session ending.
isMuted (getter)booleantrue while muted.

Events

Subscribe with agent.on(eventName, handler).

EventPayloadFires when
session_started{ session_id: string, call_id: string }The server accepts the connection and creates a session. Use call_id to look up the call log or subscribe to live transcripts via SSE.
session_ended{ reason: string }The session has ended. reason is either a string sent by the server (client_requested, websocket_closed, or an error tag) or, when the WebSocket closes without a server-side session.closed frame, the numeric close code as a string ("1005", "1006", "1000", etc.).
agent_start_talkingNoneThe server begins streaming agent TTS for a turn. Useful for “agent speaking” UI state.
agent_stop_talkingNoneThe server has finished the current turn.
error{ code: string, message: string }A non-fatal session error occurred. Fatal errors also trigger session_ended.
transcript{ role: "user" | "agent", text: string }Reserved. The SDK wires this event up, but the current agent pipeline does not stream transcripts over this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream for transcript data.

Patterns

Push-to-talk

Connect normally, mute immediately, then toggle on button events.

1const agent = new AtomsAgent({
2 apiKey: "sk_...",
3 agentId: "...",
4});
5
6await agent.connect();
7agent.mute(); // silence the mic right after connect
8
9talkButton.addEventListener("pointerdown", () => agent.unmute());
10talkButton.addEventListener("pointerup", () => agent.mute());

Do not pass autoCaptureMic: false for push-to-talk. That mode skips microphone setup entirely and there is no public method to start the mic after connect(), so mute() and unmute() silently do nothing and isMuted always returns false. Use the default (autoCaptureMic: true) and call mute() right after connect() instead.

Agent-speaking indicator

Reflect agent turn state in the UI.

1agent.on("agent_start_talking", () => setSpeaking(true));
2agent.on("agent_stop_talking", () => setSpeaking(false));

Clean teardown on page unload

1window.addEventListener("beforeunload", () => {
2 if (agent.isConnected) agent.disconnect();
3});

Text input

Send a text message instead of speech. The reply still returns as audio.

1await agent.connect();
2sendButton.addEventListener("click", () => {
3 agent.sendText(inputEl.value);
4});

Error handling

Errors arrive through three different paths. Handle each.

Handshake failure (bad API key, wrong agent ID, network issue, denied microphone permission) rejects the connect() promise with a generic Error("WebSocket connection failed"). The error event does not fire for these.

1try {
2 await agent.connect();
3} catch (err) {
4 console.error("connect failed:", err.message);
5 // Likely causes: invalid API key, unknown agent ID, microphone denied, network blocked.
6}

Mid-session server error arrives as an error event with { code, message }. These are sent by the server during an active session, not during the handshake.

1agent.on("error", (e) => {
2 console.error(`[${e.code}] ${e.message}`);
3});

Session termination fires session_ended with a reason string. Handle this to detect both graceful ends and abnormal closes.

1agent.on("session_ended", (e) => {
2 // e.reason is a server string (e.g. "client_requested") or a numeric
3 // WebSocket close code as a string (e.g. "1006" for abnormal closure).
4 console.log("session ended:", e.reason);
5});

Smoke-test the integration

A minimal standalone test to confirm the SDK reaches the agent and produces audio. No build step required.

1<!doctype html>
2<meta charset="utf-8">
3<title>Atoms WebSocket SDK smoke test</title>
4<pre id="log"></pre>
5<script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
6<script>
7 const log = (...args) => {
8 document.getElementById('log').textContent +=
9 args.map(a => typeof a === 'object' ? JSON.stringify(a) : String(a)).join(' ') + '\n';
10 };
11 const params = new URLSearchParams(location.search);
12 const agent = new AtomsSdk.AtomsAgent({
13 apiKey: params.get('key'),
14 agentId: params.get('agent'),
15 });
16 agent.on("session_started", (e) => log("session_started", e));
17 agent.on("agent_start_talking",() => log("agent_start_talking"));
18 agent.on("agent_stop_talking", () => log("agent_stop_talking"));
19 agent.on("error", (e) => log("error", e));
20 agent.connect().then(() => log("connected"));
21</script>

Serve it from localhost and open with the key and agent ID in the query string:

$python3 -m http.server 8765
$# http://localhost:8765/smoke.html?key=sk_...&agent=...

Expected console output, in order: session_started with session_id and call_id, then agent_start_talking, then agent_stop_talking after the agent’s first turn completes. If the agent has a greeting, it plays through the default audio output.

Limitations

LimitationDetail
Browser runtime onlyThe SDK uses navigator.mediaDevices.getUserMedia and AudioContext. No React Native, Node, or Cordova build.
TransportWebSocket only (no WebRTC).
TTS alignmentNo per-character timing on output_audio.delta. Client-side caption sync is not supported.
Live transcriptstranscript events are not emitted on this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream.

For non-browser runtimes, connect to the Realtime Agent WebSocket API directly. A raw-protocol guide with tested reference clients for Python, React Native, Swift, and Kotlin is in progress.

Next steps