React Native | Smallest AI Docs

React Native integrates with the Atoms agent over the raw WebSocket protocol. The runtime’s built-in WebSocket global handles transport, and a single audio library handles PCM16 capture and scheduled playback.

The browser WebSocket SDK cannot be used here. It calls navigator.mediaDevices.getUserMedia and the Web Audio API, both of which are DOM APIs and unavailable in the React Native JavaScript runtime.

The wire protocol is identical across runtimes. Client state machine, event types, and PCM16 payload encoding all match what the browser SDK does internally.

For a full working app, see Hearthside in the cookbook — a React Native (Expo) reference client built on this exact stack. It ships with a mute toggle, transport chunk counter, in-app settings sheet (voice / speed / language) wired to the draft → publish → activate REST flow, and the correct iOS audio session for full-volume speaker playback with hardware echo cancellation.

The quickstart is validated end-to-end on the iOS simulator with an Expo dev build: WebSocket connects, mic captures PCM, agent audio plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior.

When to use React Native

Your existing app is React Native and you want to embed an in-app voice agent without bringing in WebRTC.
You are building a cross-platform mobile client and want the JavaScript-side logic to look similar to the browser SDK.
You do not need character-level TTS alignment timings (the raw protocol does not emit them).

For iOS-only apps with strict binary-size or battery budgets, prefer the iOS (Swift) native path. For Flutter, see the Flutter guide.

Dependencies

One audio library handles both capture and playback. react-native-audio-api ships an AudioRecorder for PCM frames and an AudioContext for scheduled playback, both backed by the same native session.

$ npx expo install react-native-audio-api react-native-permissions buffer

Package	Role	Why this one
React Native built-in `WebSocket`	Transport	Part of the RN runtime. No dependency. Works identically on iOS and Android.
`react-native-audio-api`	Microphone capture + PCM playback	A Web Audio API port for React Native (by Software Mansion). `AudioRecorder` delivers Float32 PCM frames; `AudioContext.createBuffer()` + `createBufferSource()` schedules agent audio back-to-back for gapless playback. Single library keeps the iOS audio session coherent.
`react-native-permissions`	Runtime microphone permission	Required on both iOS and Android. Single API across platforms.
`buffer`	Node `Buffer` polyfill	React Native does not ship `Buffer`. You need it for base64 encoding the PCM bytes.

iOS setup (react-native-permissions)

react-native-permissions requires an explicit handler pod in the iOS Podfile. Add this near the top of ios/Podfile:

1 require_relative '../node_modules/react-native-permissions/scripts/setup'
2 setup_permissions(['Microphone'])

Then run pod install in ios/. Without this, the library crashes at runtime with “No permission handler detected.”

Alternatives considered

expo-av. High-level recording and playback. Does not expose raw PCM frames at a fixed sample rate, so it is not suitable for realtime voice streaming.
Custom Expo native module. Wrap AVAudioEngine (iOS) and AudioRecord/AudioTrack (Android) in a minimal Expo module. Recommended only when binary-size or dependency-count constraints rule out react-native-audio-api.

Permissions

iOS

Add to ios/<AppName>/Info.plist:

1 <key>NSMicrophoneUsageDescription</key>
2 <string>We need the microphone to let you talk to the voice agent.</string>

Android

Add to android/app/src/main/AndroidManifest.xml:

1 <uses-permission android:name="android.permission.RECORD_AUDIO" />
2 <uses-permission android:name="android.permission.INTERNET" />
3 <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

Request at runtime

1 import { PERMISSIONS, request, RESULTS } from "react-native-permissions";
2 import { Platform } from "react-native";
3 
4 async function ensureMicPermission(): Promise<boolean> {
5   const perm = Platform.OS === "ios"
6     ? PERMISSIONS.IOS.MICROPHONE
7     : PERMISSIONS.ANDROID.RECORD_AUDIO;
8   const result = await request(perm);
9   return result === RESULTS.GRANTED;
10 }

Quickstart

A single-file App.tsx covering permission, WebSocket, mic capture, and scheduled PCM playback. Drop it into an Expo dev build (not Expo Go; these packages ship native code).

1 import { useEffect, useRef, useState } from 'react';
2 import { View, Text, Button, StyleSheet, Platform } from 'react-native';
3 import {
4   AudioContext,
5   AudioManager,
6   AudioRecorder,
7 } from 'react-native-audio-api';
8 import { PERMISSIONS, request, RESULTS } from 'react-native-permissions';
9 import { Buffer } from 'buffer';
10 
11 const API_KEY      = 'sk_...';
12 const AGENT_ID     = '...';
13 const SAMPLE_RATE  = 24000;
14 const CHUNK_FRAMES = 480;  // 20 ms at 24 kHz: small enough for low latency
15 
16 async function ensureMicPermission(): Promise<boolean> {
17   const perm = Platform.OS === 'ios'
18     ? PERMISSIONS.IOS.MICROPHONE
19     : PERMISSIONS.ANDROID.RECORD_AUDIO;
20   return (await request(perm)) === RESULTS.GRANTED;
21 }
22 
23 function float32ToInt16LE(float32: Float32Array): Uint8Array {
24   const out = new Uint8Array(float32.length * 2);
25   const view = new DataView(out.buffer);
26   for (let i = 0; i < float32.length; i++) {
27     const s = Math.max(-1, Math.min(1, float32[i]));
28     view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true);
29   }
30   return out;
31 }
32 
33 export default function App() {
34   const wsRef       = useRef<WebSocket | null>(null);
35   const recorderRef = useRef<AudioRecorder | null>(null);
36   const audioCtxRef = useRef<AudioContext | null>(null);
37   const nextPlayRef = useRef<number>(0);
38   const [status, setStatus] = useState<'idle' | 'connecting' | 'connected' | 'error'>('idle');
39 
40   async function start() {
41     if (!(await ensureMicPermission())) { setStatus('error'); return; }
42 
43     AudioManager.setAudioSessionOptions({
44       iosCategory: 'playAndRecord',
45       iosMode:     'voiceChat',
46       iosOptions:  ['allowBluetoothHFP', 'defaultToSpeaker'],
47     });
48     await AudioManager.setAudioSessionActivity(true);
49 
50     const url =
51       'wss://api.smallest.ai/atoms/v1/agent/connect' +
52       `?token=${encodeURIComponent(API_KEY)}` +
53       `&agent_id=${encodeURIComponent(AGENT_ID)}` +
54       `&mode=webcall&sample_rate=${SAMPLE_RATE}`;
55 
56     setStatus('connecting');
57     const ws = new WebSocket(url);
58     wsRef.current = ws;
59 
60     ws.onopen    = () => startMic(ws);
61     ws.onmessage = (e) => handleServerEvent(e.data as string);
62     ws.onerror   = () => setStatus('error');
63     ws.onclose   = () => {
64       recorderRef.current?.stop();
65       recorderRef.current = null;
66       setStatus('idle');
67     };
68 
69     const ctx = new AudioContext({ sampleRate: SAMPLE_RATE });
70     audioCtxRef.current = ctx;
71     nextPlayRef.current = ctx.currentTime;
72   }
73 
74   function stop() { wsRef.current?.close(1000, 'client end'); }
75   useEffect(() => () => { wsRef.current?.close(); }, []);
76 
77   // ---- mic capture ------------------------------------------------
78   function startMic(ws: WebSocket) {
79     const recorder = new AudioRecorder();
80     recorderRef.current = recorder;
81 
82     recorder.onAudioReady(
83       { sampleRate: SAMPLE_RATE, bufferLength: CHUNK_FRAMES, channelCount: 1 },
84       ({ buffer }) => {
85         if (ws.readyState !== WebSocket.OPEN) return;
86         const float32 = buffer.getChannelData(0);
87         const int16   = float32ToInt16LE(float32);
88         ws.send(JSON.stringify({
89           type:  'input_audio_buffer.append',
90           audio: Buffer.from(int16).toString('base64'),
91         }));
92       },
93     );
94     recorder.onError((err) => console.error('mic error:', err.message));
95     recorder.start();
96   }
97 
98   // ---- server events ----------------------------------------------
99   function handleServerEvent(raw: string) {
100     const ev = JSON.parse(raw);
101     switch (ev.type) {
102       case 'session.created':      setStatus('connected'); break;
103       case 'output_audio.delta':   playPcm16(Buffer.from(ev.audio, 'base64')); break;
104       case 'agent_start_talking':  /* UI: show "speaking" */ break;
105       case 'agent_stop_talking':   /* UI: hide "speaking"  */ break;
106       case 'interruption':         flushPlayback(); break;
107       case 'session.closed':       setStatus('idle'); break;
108       case 'error':                console.error(`[${ev.code}] ${ev.message}`); break;
109     }
110   }
111 
112   // ---- playback ---------------------------------------------------
113   function playPcm16(bytes: Buffer) {
114     const ctx = audioCtxRef.current;
115     if (!ctx) return;
116     const sampleCount = Math.floor(bytes.length / 2);
117     const buffer = ctx.createBuffer(1, sampleCount, SAMPLE_RATE);
118     const channel = buffer.getChannelData(0);
119     for (let i = 0; i < sampleCount; i++) {
120       channel[i] = bytes.readInt16LE(i * 2) / 32768;
121     }
122     const source = ctx.createBufferSource();
123     source.buffer = buffer;
124     source.connect(ctx.destination);
125     const startAt = Math.max(nextPlayRef.current, ctx.currentTime);
126     source.start(startAt);
127     nextPlayRef.current = startAt + buffer.duration;
128   }
129 
130   function flushPlayback() {
131     if (audioCtxRef.current) nextPlayRef.current = audioCtxRef.current.currentTime;
132   }
133 
134   return (
135     <View style={styles.container}>
136       <Text style={styles.title}>Atoms voice agent</Text>
137       <Text>Status: {status}</Text>
138       {status === 'idle' && <Button title="Start call" onPress={start} />}
139       {(status === 'connecting' || status === 'connected') && <Button title="End call" onPress={stop} />}
140     </View>
141   );
142 }
143 
144 const styles = StyleSheet.create({
145   container: { flex: 1, padding: 20, paddingTop: 80 },
146   title:     { fontSize: 20, fontWeight: '600', marginBottom: 20 },
147 });

AudioRecorder delivers Float32 PCM frames through onAudioReady. The Atoms wire protocol expects Int16 little-endian PCM, so convert each frame with float32ToInt16LE before base64-encoding and sending.

AudioContext from react-native-audio-api implements the Web Audio API. createBuffer + createBufferSource schedules PCM buffers back-to-back with accurate timing. The nextPlayTime running pointer is the standard trick for gapless streaming playback.

AudioManager.setAudioSessionOptions configures iOS AVAudioSession with playAndRecord + voiceChat, which turns on the system’s AEC pipeline. On Android the recorder selects VOICE_COMMUNICATION internally, which enables the platform’s AEC + NS.

Server events (full list)

The quickstart’s switch handles everything, but here are the six event types with their meanings for reference:

Event	What to do
`session.created`	Connection accepted; you can start streaming mic audio. Contains `session_id` and `call_id`.
`output_audio.delta`	Decode base64, schedule on the `AudioContext`.
`agent_start_talking`	The agent’s TTS turn is starting. Show a speaking indicator; optionally mute the mic to cut self-feedback.
`agent_stop_talking`	The agent’s TTS turn is done. Unmute if you muted.
`session.closed`	Session ended. `reason` tells you why (`client_requested`, `websocket_closed`, or a server tag).
`interruption`	User barged in during the agent turn. Drop the playback queue and wait for a new `agent_start_talking`.
`error`	Server-side error during the session. Non-fatal errors keep the socket open.

Full session handling

Turn lifecycle

A single conversational turn has this sequence on the wire:

Client keeps streaming input_audio_buffer.append from the mic.
Server detects end of user utterance, runs STT → LLM → TTS.
Server sends agent_start_talking.
Server sends a stream of output_audio.delta chunks.
Server sends agent_stop_talking.
Back to step 1.

The client does not send input_audio_buffer.commit in normal conversational flow. The server’s VAD handles turn boundaries. You only send commit if you implement explicit push-to-talk and want to force an immediate response.

Backpressure on playback

The server emits output_audio.delta faster than realtime when the LLM finishes early. If you play each chunk as it arrives, audio will overlap. The AudioContext.createBufferSource() pattern in the quickstart avoids this by maintaining a nextPlayTime pointer: each new buffer is scheduled at max(nextPlayTime, currentTime), and nextPlayTime advances by the buffer’s duration. The audio engine schedules them back-to-back without overlap.

Interruptions

When the server emits interruption, the user has spoken while the agent was talking. The agent’s remaining TTS output for that turn is invalid; drop it. Call playback.flush() or your ring buffer’s equivalent. A new agent_start_talking will follow shortly.

Platform gotchas

iOS: AVAudioSession configuration

React Native’s JavaScript runtime does not interact with AVAudioSession directly; the audio library manages it through native code. In this stack, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) configures the session before the recorder starts. If the application also uses other audio libraries (media players, video), coordinate their session categories; two libraries fighting over the same session will cause one to silence the other.

iOS: background voice calls

If the call should continue when the user locks the screen or switches apps, add audio to UIBackgroundModes in Info.plist:

1 <key>UIBackgroundModes</key>
2 <array>
3   <string>audio</string>
4 </array>

This is stricter than the typical background fetch; Apple reviews for actual VoIP use. If the app is rejected, you either need to end the call when backgrounded or use voip mode with CallKit integration.

Android: foreground service for long calls

Android kills background mic access aggressively. For calls longer than ~30 seconds in the background, run a foreground service. Reference the Android guide’s foreground services for phone calls page.

Incoming phone call interruption

Both platforms fire audio session interruptions when a phone call comes in. The capture library’s data callback stops firing. Listen for your app’s focus-loss event (AppState in React Native) and tear down the WebSocket gracefully:

1 import { AppState } from "react-native";
2 
3 useEffect(() => {
4   const sub = AppState.addEventListener("change", (state) => {
5     if (state !== "active" && wsRef.current) {
6       wsRef.current.close(1000, "backgrounded");
7     }
8   });
9   return () => sub.remove();
10 }, []);

Echo cancellation

On iOS, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) turns on the system AEC + NS pipeline. On Android, react-native-audio-api selects the VOICE_COMMUNICATION input source internally, which enables the platform’s AEC + NS.

Without AEC, the agent hears its own audio through the mic and the server’s VAD fires continuous interruption events. This is the expected behavior on the iOS simulator (speaker loops back into the Mac mic). On a real device with earphones or an HFP Bluetooth headset the feedback loop is broken. If your target devices have weaker AEC (older Androids, tablets with distant mics), stop the recorder on agent_start_talking and start it again on agent_stop_talking. The user’s speech during the agent’s turn goes undetected, which is the safer trade-off than audible feedback.

Bluetooth audio routing

If the user connects a Bluetooth headset mid-call, the audio session reroutes automatically on both platforms. Test this flow: some devices buffer poorly and introduce perceptible lag for a few seconds after the route change.

Production hardening

Reconnect on transient network loss

ws.onclose with a non-1000 code is a network drop. Reconnect with exponential backoff up to 30 s. Do not retry on 1000 (clean close) or 4401 / 4403 (auth failure):

1 let reconnectMs = 500;
2 
3 ws.onclose = (e) => {
4   if (e.code === 1000 || e.code === 4401 || e.code === 4403) return;
5   setTimeout(start, Math.min(reconnectMs *= 2, 30000));
6 };

App lifecycle

Tear down the WebSocket and audio pipeline on AppState change to background and inactive. Do not try to keep the call alive across a full suspension. iOS will kill the socket anyway and the user sees a confusing silence.

Battery

Realtime audio + open WebSocket consumes roughly 3–5 % battery per minute on modern phones. For support-use calls (short, bounded duration), this is fine. For background companion apps, design for short sessions.

Error events

Subscribe to the server’s error event and surface non-transient errors to the user:

1 case "error":
2   console.error(`[${ev.code}] ${ev.message}`);
3   if (ev.code === "401" || ev.code === "403") {
4     // auth failure: user-visible, show "please sign in again"
5   }
6   break;

Next steps

Realtime Agent WebSocket API

The full wire protocol with every message type, payload, and error code.

WebSocket SDK (browser)

The JavaScript SDK for browser runtimes.

iOS (Swift)

Native Swift integration with URLSessionWebSocketTask and AVAudioEngine.

Error reference

HTTP status codes returned by every Atoms endpoint.