***

title: WebSocket SDK
sidebarTitle: WebSocket SDK
description: JavaScript SDK for connecting to a Smallest agent from a web page. Handles microphone capture, audio playback, and event streaming over WebSocket.
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/atoms/developer-guide/integrate/llms.txt. For full documentation content, see https://docs.smallest.ai/atoms/developer-guide/integrate/llms-full.txt.

A JavaScript library that opens a live voice conversation with a Smallest agent from a web page. Published on npm as [`@smallest-ai/agent-sdk`](https://www.npmjs.com/package/@smallest-ai/agent-sdk). Wraps the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent), plus microphone capture, PCM playback, and an event API.

## When to use it

* Voice widget or "click to talk" button on a marketing or product site.
* In-app voice input in a web dashboard.
* Agent demo, playground, or sandbox pages.
* Any browser experience where the user speaks to a Smallest agent and hears a reply.

## When to use something else

| Runtime                                   | Use                                                                                                                                                                                                                                       |
| ----------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| React Native, iOS, Android, Flutter       | Talk to the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent) directly. The SDK calls `navigator.mediaDevices.getUserMedia` and `AudioContext`, neither of which exists in those runtimes. |
| Python, Go, Node, any server-side process | Same. Use the raw WebSocket API with the platform's native WebSocket and audio libraries.                                                                                                                                                 |
| Outbound phone calls                      | Use [Campaigns](/atoms/developer-guide/build/campaigns/creating-campaigns).                                                                                                                                                               |

<Info>
  You need an API key (from [app.smallest.ai/dashboard/api-keys](https://app.smallest.ai/dashboard/api-keys)) and an agent ID. Create an agent from the [Agents dashboard](https://app.smallest.ai/dashboard/agents) or follow the [Developer Quickstart](/atoms/developer-guide/get-started/quickstart).
</Info>

## Install

<CodeBlocks>
  ```bash npm
  npm install @smallest-ai/agent-sdk
  ```

  ```bash pnpm
  pnpm add @smallest-ai/agent-sdk
  ```

  ```bash yarn
  yarn add @smallest-ai/agent-sdk
  ```

  ```html CDN
  <script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
  <!-- Loaded via CDN, the SDK is exposed on window.AtomsSdk -->
  ```
</CodeBlocks>

## Quickstart

```typescript
import { AtomsAgent } from "@smallest-ai/agent-sdk";

const agent = new AtomsAgent({
  apiKey:  "sk_...",   // Smallest API key
  agentId: "...",      // Agent ID
});

agent.on("session_started", (e) => {
  console.log("Session started:", e.session_id, e.call_id);
});
agent.on("agent_start_talking", () => console.log("Agent speaking"));
agent.on("agent_stop_talking",  () => console.log("Agent stopped"));
agent.on("error",               (e) => console.error(`[${e.code}] ${e.message}`));

await agent.connect();
// Microphone capture starts automatically. Speak, and the agent replies
// through the default audio output. Call agent.disconnect() when done.
```

The CDN equivalent uses the `AtomsSdk` global:

```html
<script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
<script>
  const agent = new AtomsSdk.AtomsAgent({
    apiKey:  "sk_...",
    agentId: "...",
  });
  agent.on("session_started", (e) => console.log("connected:", e.session_id));
  agent.connect();
</script>
```

<Warning>
  `connect()` calls `navigator.mediaDevices.getUserMedia()`. Browsers only expose the microphone on **secure contexts**: HTTPS in production, or `http://localhost` / `http://127.0.0.1` in development. Serving the page from a non-loopback HTTP origin throws a permission error.
</Warning>

## Configuration

```typescript
interface AtomsAgentConfig {
  apiKey:          string;    // Required. Smallest API key.
  agentId:         string;    // Required. Agent to connect to.
  baseUrl?:        string;    // Default: wss://api.smallest.ai. Override for local dev.
  sampleRate?:     number;    // Default: 24000. Audio sample rate in Hz.
  autoCaptureMic?: boolean;   // Default: true. If false, no microphone is captured and mute()/unmute() are no-ops. See the Push-to-talk pattern below for the correct way to start muted.
}
```

## Methods

| Method                 | Returns         | Description                                                                                                    |
| ---------------------- | --------------- | -------------------------------------------------------------------------------------------------------------- |
| `connect()`            | `Promise<void>` | Open the WebSocket and start the session. If `autoCaptureMic` is `true`, microphone capture starts on connect. |
| `disconnect()`         | `void`          | End the session and release the microphone and audio output.                                                   |
| `sendText(text)`       | `void`          | Send a text message to the agent. The reply returns as audio via `output_audio.delta`.                         |
| `mute()`               | `void`          | Stop sending microphone audio. The WebSocket stays open.                                                       |
| `unmute()`             | `void`          | Resume sending microphone audio.                                                                               |
| `isConnected` (getter) | `boolean`       | `true` between `connect()` resolving and the session ending.                                                   |
| `isMuted` (getter)     | `boolean`       | `true` while muted.                                                                                            |

## Events

Subscribe with `agent.on(eventName, handler)`.

| Event                 | Payload                                     | Fires when                                                                                                                                                                                                                                                                                                                                                   |
| --------------------- | ------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ |
| `session_started`     | `{ session_id: string, call_id: string }`   | The server accepts the connection and creates a session. Use `call_id` to look up the call log or subscribe to [live transcripts via SSE](/atoms/developer-guide/operate/analytics/sse-for-live-transcripts).                                                                                                                                                |
| `session_ended`       | `{ reason: string }`                        | The session has ended. `reason` is either a string sent by the server (`client_requested`, `websocket_closed`, or an error tag) or, when the WebSocket closes without a server-side `session.closed` frame, the numeric close code as a string (`"1005"`, `"1006"`, `"1000"`, etc.).                                                                         |
| `agent_start_talking` | None                                        | The server begins streaming agent TTS for a turn. Useful for "agent speaking" UI state.                                                                                                                                                                                                                                                                      |
| `agent_stop_talking`  | None                                        | The server has finished the current turn.                                                                                                                                                                                                                                                                                                                    |
| `error`               | `{ code: string, message: string }`         | A non-fatal session error occurred. Fatal errors also trigger `session_ended`.                                                                                                                                                                                                                                                                               |
| `transcript`          | `{ role: "user" \| "agent", text: string }` | Reserved. The SDK wires this event up, but the current agent pipeline does not stream transcripts over this WebSocket. Use the [post-call conversation API](/atoms/api-reference/api-reference/logs/get-conversation-log-by-id) or the [SSE live-transcripts stream](/atoms/developer-guide/operate/analytics/sse-for-live-transcripts) for transcript data. |

## Patterns

### Push-to-talk

Connect normally, mute immediately, then toggle on button events.

```typescript
const agent = new AtomsAgent({
  apiKey:  "sk_...",
  agentId: "...",
});

await agent.connect();
agent.mute();  // silence the mic right after connect

talkButton.addEventListener("pointerdown", () => agent.unmute());
talkButton.addEventListener("pointerup",   () => agent.mute());
```

<Warning>
  Do not pass `autoCaptureMic: false` for push-to-talk. That mode skips microphone setup entirely and there is no public method to start the mic after `connect()`, so `mute()` and `unmute()` silently do nothing and `isMuted` always returns `false`. Use the default (`autoCaptureMic: true`) and call `mute()` right after `connect()` instead.
</Warning>

### Agent-speaking indicator

Reflect agent turn state in the UI.

```typescript
agent.on("agent_start_talking", () => setSpeaking(true));
agent.on("agent_stop_talking",  () => setSpeaking(false));
```

### Clean teardown on page unload

```typescript
window.addEventListener("beforeunload", () => {
  if (agent.isConnected) agent.disconnect();
});
```

### Text input

Send a text message instead of speech. The reply still returns as audio.

```typescript
await agent.connect();
sendButton.addEventListener("click", () => {
  agent.sendText(inputEl.value);
});
```

## Error handling

Errors arrive through three different paths. Handle each.

**Handshake failure** (bad API key, wrong agent ID, network issue, denied microphone permission) rejects the `connect()` promise with a generic `Error("WebSocket connection failed")`. The `error` event does not fire for these.

```typescript
try {
  await agent.connect();
} catch (err) {
  console.error("connect failed:", err.message);
  // Likely causes: invalid API key, unknown agent ID, microphone denied, network blocked.
}
```

**Mid-session server error** arrives as an `error` event with `{ code, message }`. These are sent by the server during an active session, not during the handshake.

```typescript
agent.on("error", (e) => {
  console.error(`[${e.code}] ${e.message}`);
});
```

**Session termination** fires `session_ended` with a `reason` string. Handle this to detect both graceful ends and abnormal closes.

```typescript
agent.on("session_ended", (e) => {
  // e.reason is a server string (e.g. "client_requested") or a numeric
  // WebSocket close code as a string (e.g. "1006" for abnormal closure).
  console.log("session ended:", e.reason);
});
```

## Smoke-test the integration

A minimal standalone test to confirm the SDK reaches the agent and produces audio. No build step required.

```html
<!doctype html>
<meta charset="utf-8">
<title>Atoms WebSocket SDK smoke test</title>
<pre id="log"></pre>
<script src="https://cdn.jsdelivr.net/npm/@smallest-ai/agent-sdk/dist/agent-sdk.browser.js"></script>
<script>
  const log = (...args) => {
    document.getElementById('log').textContent +=
      args.map(a => typeof a === 'object' ? JSON.stringify(a) : String(a)).join(' ') + '\n';
  };
  const params = new URLSearchParams(location.search);
  const agent  = new AtomsSdk.AtomsAgent({
    apiKey:  params.get('key'),
    agentId: params.get('agent'),
  });
  agent.on("session_started",    (e) => log("session_started", e));
  agent.on("agent_start_talking",()  => log("agent_start_talking"));
  agent.on("agent_stop_talking", ()  => log("agent_stop_talking"));
  agent.on("error",              (e) => log("error", e));
  agent.connect().then(() => log("connected"));
</script>
```

Serve it from `localhost` and open with the key and agent ID in the query string:

```bash
python3 -m http.server 8765
# http://localhost:8765/smoke.html?key=sk_...&agent=...
```

Expected console output, in order: `session_started` with `session_id` and `call_id`, then `agent_start_talking`, then `agent_stop_talking` after the agent's first turn completes. If the agent has a greeting, it plays through the default audio output.

## Limitations

| Limitation           | Detail                                                                                                                        |
| -------------------- | ----------------------------------------------------------------------------------------------------------------------------- |
| Browser runtime only | The SDK uses `navigator.mediaDevices.getUserMedia` and `AudioContext`. No React Native, Node, or Cordova build.               |
| Transport            | WebSocket only (no WebRTC).                                                                                                   |
| TTS alignment        | No per-character timing on `output_audio.delta`. Client-side caption sync is not supported.                                   |
| Live transcripts     | `transcript` events are not emitted on this WebSocket. Use the post-call conversation API or the SSE live-transcripts stream. |

For non-browser runtimes, connect to the [Realtime Agent WebSocket API](/atoms/api-reference/api-reference/realtime-agent/realtime-agent) directly. A raw-protocol guide with tested reference clients for Python, React Native, Swift, and Kotlin is in progress.

## Next steps

<CardGroup cols={3}>
  <Card title="Realtime Agent WebSocket API" icon="plug" href="/atoms/api-reference/api-reference/realtime-agent/realtime-agent">
    Wire protocol: message types, payload shapes, error codes.
  </Card>

  <Card title="Post-Call Analytics" icon="chart-bar" href="/atoms/developer-guide/operate/analytics/post-call-analytics">
    Disposition metrics and full transcripts after each call.
  </Card>

  <Card title="Live Transcripts (SSE)" icon="rss" href="/atoms/developer-guide/operate/analytics/sse-for-live-transcripts">
    Stream user and agent utterances mid-call.
  </Card>
</CardGroup>