Atoms WebSocket SDK smoke test

*** title: WebSocket SDK sidebarTitle: WebSocket SDK description: JavaScript SDK for connecting to a Smallest agent from a web page. Handles microphone capture, audio playback, and event streaming over WebSocket. --------------------- For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/atoms/developer-guide/integrate/llms.txt. For full documentation content, see https://docs.smallest.ai/atoms/developer-guide/integrate/llms-full.txt. A JavaScript library that opens a live voice conversation with a Smallest agent from a web page. Published on npm as [`@smallest-ai/agent-sdk`](https://www.npmjs.com/package/@smallest-ai/agent-sdk). Wraps the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent), plus microphone capture, PCM playback, and an event API. ## When to use it * Voice widget or "click to talk" button on a marketing or product site. * In-app voice input in a web dashboard. * Agent demo, playground, or sandbox pages. * Any browser experience where the user speaks to a Smallest agent and hears a reply. ## When to use something else | Runtime | Use | | ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | | React Native, iOS, Android, Flutter | Talk to the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent) directly. The SDK calls `navigator.mediaDevices.getUserMedia` and `AudioContext`, neither of which exists in those runtimes. | | Python, Go, Node, any server-side process | Same. Use the raw WebSocket API with the platform's native WebSocket and audio libraries. | | Outbound phone calls | Use [Campaigns](/atoms/developer-guide/build/campaigns/creating-campaigns). | You need an API key (from [app.smallest.ai/dashboard/api-keys](https://app.smallest.ai/dashboard/api-keys)) and an agent ID. Create an agent from the [Agents dashboard](https://app.smallest.ai/dashboard/agents) or follow the [Developer Quickstart](/atoms/developer-guide/get-started/quickstart). ## Install ```bash npm npm install @smallest-ai/agent-sdk ``` ```bash pnpm pnpm add @smallest-ai/agent-sdk ``` ```bash yarn yarn add @smallest-ai/agent-sdk ``` ```html CDN ``` ## Quickstart ```typescript import { AtomsAgent } from "@smallest-ai/agent-sdk"; const agent = new AtomsAgent({ apiKey: "sk_...", // Smallest API key agentId: "...", // Agent ID }); agent.on("session_started", (e) => { console.log("Session started:", e.session_id, e.call_id); }); agent.on("agent_start_talking", () => console.log("Agent speaking")); agent.on("agent_stop_talking", () => console.log("Agent stopped")); agent.on("error", (e) => console.error(`[${e.code}] ${e.message}`)); await agent.connect(); // Microphone capture starts automatically. Speak, and the agent replies // through the default audio output. Call agent.disconnect() when done. ``` The CDN equivalent uses the `AtomsSdk` global: ```html ``` `connect()` calls `navigator.mediaDevices.getUserMedia()`. Browsers only expose the microphone on **secure contexts**: HTTPS in production, or `http://localhost` / `http://127.0.0.1` in development. Serving the page from a non-loopback HTTP origin throws a permission error. ## Configuration ```typescript interface AtomsAgentConfig { apiKey: string; // Required. Smallest API key. agentId: string; // Required. Agent to connect to. baseUrl?: string; // Default: wss://api.smallest.ai. Override for local dev. sampleRate?: number; // Default: 24000. Audio sample rate in Hz. autoCaptureMic?: boolean; // Default: true. If false, no microphone is captured and mute()/unmute() are no-ops. See the Push-to-talk pattern below for the correct way to start muted. } ``` ## Methods | Method | Returns | Description | | ---------------------- | --------------- | -------------------------------------------------------------------------------------------------------------- | | `connect()` | `Promise` | Open the WebSocket and start the session. If `autoCaptureMic` is `true`, microphone capture starts on connect. | | `disconnect()` | `void` | End the session and release the microphone and audio output. | | `sendText(text)` | `void` | Send a text message to the agent. The reply returns as audio via `output_audio.delta`. | | `mute()` | `void` | Stop sending microphone audio. The WebSocket stays open. | | `unmute()` | `void` | Resume sending microphone audio. | | `isConnected` (getter) | `boolean` | `true` between `connect()` resolving and the session ending. | | `isMuted` (getter) | `boolean` | `true` while muted. | ## Events Subscribe with `agent.on(eventName, handler)`. | Event | Payload | Fires when | | --------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | | `session_started` | `{ session_id: string, call_id: string }` | The server accepts the connection and creates a session. Use `call_id` to look up the call log or subscribe to [live transcripts via SSE](/atoms/developer-guide/operate/analytics/sse-for-live-transcripts). | | `session_ended` | `{ reason: string }` | The session has ended. `reason` is either a string sent by the server (`client_requested`, `websocket_closed`, or an error tag) or, when the WebSocket closes without a server-side `session.closed` frame, the numeric close code as a string (`"1005"`, `"1006"`, `"1000"`, etc.). | | `agent_start_talking` | None | The server begins streaming agent TTS for a turn. Useful for "agent speaking" UI state. | | `agent_stop_talking` | None | The server has finished the current turn. | | `error` | `{ code: string, message: string }` | A non-fatal session error occurred. Fatal errors also trigger `session_ended`. | | `transcript` | `{ role: "user" \| "agent", text: string }` | Reserved. The SDK wires this event up, but the current agent pipeline does not stream transcripts over this WebSocket. Use the [post-call conversation API](/atoms/api-reference/api-reference/logs/get-conversation-log-by-id) or the [SSE live-transcripts stream](/atoms/developer-guide/operate/analytics/sse-for-live-transcripts) for transcript data. | ## Patterns ### Push-to-talk Connect normally, mute immediately, then toggle on button events. ```typescript const agent = new AtomsAgent({ apiKey: "sk_...", agentId: "...", }); await agent.connect(); agent.mute(); // silence the mic right after connect talkButton.addEventListener("pointerdown", () => agent.unmute()); talkButton.addEventListener("pointerup", () => agent.mute()); ``` Do not pass `autoCaptureMic: false` for push-to-talk. That mode skips microphone setup entirely and there is no public method to start the mic after `connect()`, so `mute()` and `unmute()` silently do nothing and `isMuted` always returns `false`. Use the default (`autoCaptureMic: true`) and call `mute()` right after `connect()` instead. ### Agent-speaking indicator Reflect agent turn state in the UI. ```typescript agent.on("agent_start_talking", () => setSpeaking(true)); agent.on("agent_stop_talking", () => setSpeaking(false)); ``` ### Clean teardown on page unload ```typescript window.addEventListener("beforeunload", () => { if (agent.isConnected) agent.disconnect(); }); ``` ### Text input Send a text message instead of speech. The reply still returns as audio. ```typescript await agent.connect(); sendButton.addEventListener("click", () => { agent.sendText(inputEl.value); }); ``` ## Error handling Errors arrive through three different paths. Handle each. **Handshake failure** (bad API key, wrong agent ID, network issue, denied microphone permission) rejects the `connect()` promise with a generic `Error("WebSocket connection failed")`. The `error` event does not fire for these. ```typescript try { await agent.connect(); } catch (err) { console.error("connect failed:", err.message); // Likely causes: invalid API key, unknown agent ID, microphone denied, network blocked. } ``` **Mid-session server error** arrives as an `error` event with `{ code, message }`. These are sent by the server during an active session, not during the handshake. ```typescript agent.on("error", (e) => { console.error(`[${e.code}] ${e.message}`); }); ``` **Session termination** fires `session_ended` with a `reason` string. Handle this to detect both graceful ends and abnormal closes. ```typescript agent.on("session_ended", (e) => { // e.reason is a server string (e.g. "client_requested") or a numeric // WebSocket close code as a string (e.g. "1006" for abnormal closure). console.log("session ended:", e.reason); }); ``` ## Smoke-test the integration A minimal standalone test to confirm the SDK reaches the agent and produces audio. No build step required. ```html Atoms WebSocket SDK smoke test

``` Serve it from `localhost` and open with the key and agent ID in the query string: ```bash python3 -m http.server 8765 # http://localhost:8765/smoke.html?key=sk_...&agent=... ``` Expected console output, in order: `session_started` with `session_id` and `call_id`, then `agent_start_talking`, then `agent_stop_talking` after the agent's first turn completes. If the agent has a greeting, it plays through the default audio output. ## Limitations | Limitation | Detail | | -------------------- | ----------------------------------------------------------------------------------------------------------------------------- | | Browser runtime only | The SDK uses `navigator.mediaDevices.getUserMedia` and `AudioContext`. No React Native, Node, or Cordova build. | | Transport | WebSocket only (no WebRTC). | | TTS alignment | No per-character timing on `output_audio.delta`. Client-side caption sync is not supported. | | Live transcripts | `transcript` events are not emitted on this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream. | For non-browser runtimes, connect to the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent) directly. A raw-protocol guide with tested reference clients for Python, React Native, Swift, and Kotlin is in progress. ## Next steps Wire protocol: message types, payload shapes, error codes. Disposition metrics and full transcripts after each call. Stream user and agent utterances mid-call.