WebSocket SDK | Smallest AI Docs

A JavaScript library that opens a live voice conversation with a Smallest agent from a web page. Published on npm as @smallest-ai/agent-sdk. Wraps the Realtime Agent WebSocket API, plus microphone capture, PCM playback, and an event API.

When to use it

Voice widget or “click to talk” button on a marketing or product site.
In-app voice input in a web dashboard.
Agent demo, playground, or sandbox pages.
Any browser experience where the user speaks to a Smallest agent and hears a reply.

When to use something else

Runtime	Use
React Native, iOS, Android, Flutter	Talk to the Realtime Agent WebSocket API directly. The SDK calls `navigator.mediaDevices.getUserMedia` and `AudioContext`, neither of which exists in those runtimes.
Python, Go, Node, any server-side process	Same. Use the raw WebSocket API with the platform’s native WebSocket and audio libraries.
Outbound phone calls	Use Campaigns.

You need an API key (from app.smallest.ai/dashboard/api-keys) and an agent ID. Create an agent from the Agents dashboard or follow the Developer Quickstart.

Install

$ npm install @smallest-ai/agent-sdk

Quickstart

1 import { AtomsAgent } from "@smallest-ai/agent-sdk";
2 
3 const agent = new AtomsAgent({
4   apiKey:  "sk_...",   // Smallest API key
5   agentId: "...",      // Agent ID
6 });
7 
8 agent.on("session_started", (e) => {
9   console.log("Session started:", e.session_id, e.call_id);
10 });
11 agent.on("agent_start_talking", () => console.log("Agent speaking"));
12 agent.on("agent_stop_talking",  () => console.log("Agent stopped"));
13 agent.on("error",               (e) => console.error(`[${e.code}] ${e.message}`));
14 
15 await agent.connect();
16 // Microphone capture starts automatically. Speak, and the agent replies
17 // through the default audio output. Call agent.disconnect() when done.

The CDN equivalent uses the AtomsSdk global:

1 <script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
2 <script>
3   const agent = new AtomsSdk.AtomsAgent({
4     apiKey:  "sk_...",
5     agentId: "...",
6   });
7   agent.on("session_started", (e) => console.log("connected:", e.session_id));
8   agent.connect();
9 </script>

connect() calls navigator.mediaDevices.getUserMedia(). Browsers only expose the microphone on secure contexts: HTTPS in production, or http://localhost / http://127.0.0.1 in development. Serving the page from a non-loopback HTTP origin throws a permission error.

Configuration

1 interface AtomsAgentConfig {
2   apiKey:          string;    // Required. Smallest API key.
3   agentId:         string;    // Required. Agent to connect to.
4   baseUrl?:        string;    // Default: wss://api.smallest.ai. Override for local dev.
5   sampleRate?:     number;    // Default: 24000. Audio sample rate in Hz.
6   autoCaptureMic?: boolean;   // Default: true. If false, no microphone is captured and mute()/unmute() are no-ops. See the Push-to-talk pattern below for the correct way to start muted.
7 }

Methods

Method	Returns	Description
`connect()`	`Promise<void>`	Open the WebSocket and start the session. If `autoCaptureMic` is `true`, microphone capture starts on connect.
`disconnect()`	`void`	End the session and release the microphone and audio output.
`sendText(text)`	`void`	Send a text message to the agent. The reply returns as audio via `output_audio.delta`.
`mute()`	`void`	Stop sending microphone audio. The WebSocket stays open.
`unmute()`	`void`	Resume sending microphone audio.
`isConnected` (getter)	`boolean`	`true` between `connect()` resolving and the session ending.
`isMuted` (getter)	`boolean`	`true` while muted.

Events

Subscribe with agent.on(eventName, handler).

Event	Payload	Fires when
`session_started`	`{ session_id: string, call_id: string }`	The server accepts the connection and creates a session. Use `call_id` to look up the call log or subscribe to live transcripts via SSE.
`session_ended`	`{ reason: string }`	The session has ended. `reason` is either a string sent by the server (`client_requested`, `websocket_closed`, or an error tag) or, when the WebSocket closes without a server-side `session.closed` frame, the numeric close code as a string (`"1005"`, `"1006"`, `"1000"`, etc.).
`agent_start_talking`	None	The server begins streaming agent TTS for a turn. Useful for “agent speaking” UI state.
`agent_stop_talking`	None	The server has finished the current turn.
`error`	`{ code: string, message: string }`	A non-fatal session error occurred. Fatal errors also trigger `session_ended`.
`transcript`	`{ role: "user" \| "agent", text: string }`	Reserved. The SDK wires this event up, but the current agent pipeline does not stream transcripts over this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream for transcript data.

Patterns

Push-to-talk

Connect normally, mute immediately, then toggle on button events.

1 const agent = new AtomsAgent({
2   apiKey:  "sk_...",
3   agentId: "...",
4 });
5 
6 await agent.connect();
7 agent.mute();  // silence the mic right after connect
8 
9 talkButton.addEventListener("pointerdown", () => agent.unmute());
10 talkButton.addEventListener("pointerup",   () => agent.mute());

Do not pass autoCaptureMic: false for push-to-talk. That mode skips microphone setup entirely and there is no public method to start the mic after connect(), so mute() and unmute() silently do nothing and isMuted always returns false. Use the default (autoCaptureMic: true) and call mute() right after connect() instead.

Agent-speaking indicator

Reflect agent turn state in the UI.

1 agent.on("agent_start_talking", () => setSpeaking(true));
2 agent.on("agent_stop_talking",  () => setSpeaking(false));

Clean teardown on page unload

1 window.addEventListener("beforeunload", () => {
2   if (agent.isConnected) agent.disconnect();
3 });

Text input

Send a text message instead of speech. The reply still returns as audio.

1 await agent.connect();
2 sendButton.addEventListener("click", () => {
3   agent.sendText(inputEl.value);
4 });

Error handling

Errors arrive through three different paths. Handle each.

Handshake failure (bad API key, wrong agent ID, network issue, denied microphone permission) rejects the connect() promise with a generic Error("WebSocket connection failed"). The error event does not fire for these.

1 try {
2   await agent.connect();
3 } catch (err) {
4   console.error("connect failed:", err.message);
5   // Likely causes: invalid API key, unknown agent ID, microphone denied, network blocked.
6 }

Mid-session server error arrives as an error event with { code, message }. These are sent by the server during an active session, not during the handshake.

1 agent.on("error", (e) => {
2   console.error(`[${e.code}] ${e.message}`);
3 });

Session termination fires session_ended with a reason string. Handle this to detect both graceful ends and abnormal closes.

1 agent.on("session_ended", (e) => {
2   // e.reason is a server string (e.g. "client_requested") or a numeric
3   // WebSocket close code as a string (e.g. "1006" for abnormal closure).
4   console.log("session ended:", e.reason);
5 });

Smoke-test the integration

A minimal standalone test to confirm the SDK reaches the agent and produces audio. No build step required.

1 <!doctype html>
2 <meta charset="utf-8">
3 <title>Atoms WebSocket SDK smoke test</title>
4 <pre id="log"></pre>
5 <script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
6 <script>
7   const log = (...args) => {
8     document.getElementById('log').textContent +=
9       args.map(a => typeof a === 'object' ? JSON.stringify(a) : String(a)).join(' ') + '\n';
10   };
11   const params = new URLSearchParams(location.search);
12   const agent  = new AtomsSdk.AtomsAgent({
13     apiKey:  params.get('key'),
14     agentId: params.get('agent'),
15   });
16   agent.on("session_started",    (e) => log("session_started", e));
17   agent.on("agent_start_talking",()  => log("agent_start_talking"));
18   agent.on("agent_stop_talking", ()  => log("agent_stop_talking"));
19   agent.on("error",              (e) => log("error", e));
20   agent.connect().then(() => log("connected"));
21 </script>

Serve it from localhost and open with the key and agent ID in the query string:

$ python3 -m http.server 8765
$ # http://localhost:8765/smoke.html?key=sk_...&agent=...

Expected console output, in order: session_started with session_id and call_id, then agent_start_talking, then agent_stop_talking after the agent’s first turn completes. If the agent has a greeting, it plays through the default audio output.

Limitations

Limitation	Detail
Browser runtime only	The SDK uses `navigator.mediaDevices.getUserMedia` and `AudioContext`. No React Native, Node, or Cordova build.
Transport	WebSocket only (no WebRTC).
TTS alignment	No per-character timing on `output_audio.delta`. Client-side caption sync is not supported.
Live transcripts	`transcript` events are not emitted on this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream.

For non-browser runtimes, connect to the Realtime Agent WebSocket API directly. A raw-protocol guide with tested reference clients for Python, React Native, Swift, and Kotlin is in progress.

Next steps

Realtime Agent WebSocket API

Wire protocol: message types, payload shapes, error codes.

Post-Call Analytics

Disposition metrics and full transcripts after each call.

Live Transcripts (SSE)

Stream user and agent utterances mid-call.