React Native

View as Markdown

React Native integrates with the Atoms agent over the raw WebSocket protocol. The runtime’s built-in WebSocket global handles transport, and a single audio library handles PCM16 capture and scheduled playback.

The browser WebSocket SDK cannot be used here. It calls navigator.mediaDevices.getUserMedia and the Web Audio API, both of which are DOM APIs and unavailable in the React Native JavaScript runtime.

The wire protocol is identical across runtimes. Client state machine, event types, and PCM16 payload encoding all match what the browser SDK does internally.

For a full working app, see Hearthside in the cookbook — a React Native (Expo) reference client built on this exact stack. It ships with a mute toggle, transport chunk counter, in-app settings sheet (voice / speed / language) wired to the draft → publish → activate REST flow, and the correct iOS audio session for full-volume speaker playback with hardware echo cancellation.

The quickstart is validated end-to-end on the iOS simulator with an Expo dev build: WebSocket connects, mic captures PCM, agent audio plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior.

When to use React Native

  • Your existing app is React Native and you want to embed an in-app voice agent without bringing in WebRTC.
  • You are building a cross-platform mobile client and want the JavaScript-side logic to look similar to the browser SDK.
  • You do not need character-level TTS alignment timings (the raw protocol does not emit them).

For iOS-only apps with strict binary-size or battery budgets, prefer the iOS (Swift) native path. For Flutter, see the Flutter guide.

Dependencies

One audio library handles both capture and playback. react-native-audio-api ships an AudioRecorder for PCM frames and an AudioContext for scheduled playback, both backed by the same native session.

$npx expo install react-native-audio-api react-native-permissions buffer
PackageRoleWhy this one
React Native built-in WebSocketTransportPart of the RN runtime. No dependency. Works identically on iOS and Android.
react-native-audio-apiMicrophone capture + PCM playbackA Web Audio API port for React Native (by Software Mansion). AudioRecorder delivers Float32 PCM frames; AudioContext.createBuffer() + createBufferSource() schedules agent audio back-to-back for gapless playback. Single library keeps the iOS audio session coherent.
react-native-permissionsRuntime microphone permissionRequired on both iOS and Android. Single API across platforms.
bufferNode Buffer polyfillReact Native does not ship Buffer. You need it for base64 encoding the PCM bytes.

iOS setup (react-native-permissions)

react-native-permissions requires an explicit handler pod in the iOS Podfile. Add this near the top of ios/Podfile:

1require_relative '../node_modules/react-native-permissions/scripts/setup'
2setup_permissions(['Microphone'])

Then run pod install in ios/. Without this, the library crashes at runtime with “No permission handler detected.”

Alternatives considered

  • expo-av. High-level recording and playback. Does not expose raw PCM frames at a fixed sample rate, so it is not suitable for realtime voice streaming.
  • Custom Expo native module. Wrap AVAudioEngine (iOS) and AudioRecord/AudioTrack (Android) in a minimal Expo module. Recommended only when binary-size or dependency-count constraints rule out react-native-audio-api.

Permissions

iOS

Add to ios/<AppName>/Info.plist:

1<key>NSMicrophoneUsageDescription</key>
2<string>We need the microphone to let you talk to the voice agent.</string>

Android

Add to android/app/src/main/AndroidManifest.xml:

1<uses-permission android:name="android.permission.RECORD_AUDIO" />
2<uses-permission android:name="android.permission.INTERNET" />
3<uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

Request at runtime

1import { PERMISSIONS, request, RESULTS } from "react-native-permissions";
2import { Platform } from "react-native";
3
4async function ensureMicPermission(): Promise<boolean> {
5 const perm = Platform.OS === "ios"
6 ? PERMISSIONS.IOS.MICROPHONE
7 : PERMISSIONS.ANDROID.RECORD_AUDIO;
8 const result = await request(perm);
9 return result === RESULTS.GRANTED;
10}

Quickstart

A single-file App.tsx covering permission, WebSocket, mic capture, and scheduled PCM playback. Drop it into an Expo dev build (not Expo Go; these packages ship native code).

1import { useEffect, useRef, useState } from 'react';
2import { View, Text, Button, StyleSheet, Platform } from 'react-native';
3import {
4 AudioContext,
5 AudioManager,
6 AudioRecorder,
7} from 'react-native-audio-api';
8import { PERMISSIONS, request, RESULTS } from 'react-native-permissions';
9import { Buffer } from 'buffer';
10
11const API_KEY = 'sk_...';
12const AGENT_ID = '...';
13const SAMPLE_RATE = 24000;
14const CHUNK_FRAMES = 480; // 20 ms at 24 kHz: small enough for low latency
15
16async function ensureMicPermission(): Promise<boolean> {
17 const perm = Platform.OS === 'ios'
18 ? PERMISSIONS.IOS.MICROPHONE
19 : PERMISSIONS.ANDROID.RECORD_AUDIO;
20 return (await request(perm)) === RESULTS.GRANTED;
21}
22
23function float32ToInt16LE(float32: Float32Array): Uint8Array {
24 const out = new Uint8Array(float32.length * 2);
25 const view = new DataView(out.buffer);
26 for (let i = 0; i < float32.length; i++) {
27 const s = Math.max(-1, Math.min(1, float32[i]));
28 view.setInt16(i * 2, s < 0 ? s * 0x8000 : s * 0x7fff, true);
29 }
30 return out;
31}
32
33export default function App() {
34 const wsRef = useRef<WebSocket | null>(null);
35 const recorderRef = useRef<AudioRecorder | null>(null);
36 const audioCtxRef = useRef<AudioContext | null>(null);
37 const nextPlayRef = useRef<number>(0);
38 const [status, setStatus] = useState<'idle' | 'connecting' | 'connected' | 'error'>('idle');
39
40 async function start() {
41 if (!(await ensureMicPermission())) { setStatus('error'); return; }
42
43 AudioManager.setAudioSessionOptions({
44 iosCategory: 'playAndRecord',
45 iosMode: 'voiceChat',
46 iosOptions: ['allowBluetoothHFP', 'defaultToSpeaker'],
47 });
48 await AudioManager.setAudioSessionActivity(true);
49
50 const url =
51 'wss://api.smallest.ai/atoms/v1/agent/connect' +
52 `?token=${encodeURIComponent(API_KEY)}` +
53 `&agent_id=${encodeURIComponent(AGENT_ID)}` +
54 `&mode=webcall&sample_rate=${SAMPLE_RATE}`;
55
56 setStatus('connecting');
57 const ws = new WebSocket(url);
58 wsRef.current = ws;
59
60 ws.onopen = () => startMic(ws);
61 ws.onmessage = (e) => handleServerEvent(e.data as string);
62 ws.onerror = () => setStatus('error');
63 ws.onclose = () => {
64 recorderRef.current?.stop();
65 recorderRef.current = null;
66 setStatus('idle');
67 };
68
69 const ctx = new AudioContext({ sampleRate: SAMPLE_RATE });
70 audioCtxRef.current = ctx;
71 nextPlayRef.current = ctx.currentTime;
72 }
73
74 function stop() { wsRef.current?.close(1000, 'client end'); }
75 useEffect(() => () => { wsRef.current?.close(); }, []);
76
77 // ---- mic capture ------------------------------------------------
78 function startMic(ws: WebSocket) {
79 const recorder = new AudioRecorder();
80 recorderRef.current = recorder;
81
82 recorder.onAudioReady(
83 { sampleRate: SAMPLE_RATE, bufferLength: CHUNK_FRAMES, channelCount: 1 },
84 ({ buffer }) => {
85 if (ws.readyState !== WebSocket.OPEN) return;
86 const float32 = buffer.getChannelData(0);
87 const int16 = float32ToInt16LE(float32);
88 ws.send(JSON.stringify({
89 type: 'input_audio_buffer.append',
90 audio: Buffer.from(int16).toString('base64'),
91 }));
92 },
93 );
94 recorder.onError((err) => console.error('mic error:', err.message));
95 recorder.start();
96 }
97
98 // ---- server events ----------------------------------------------
99 function handleServerEvent(raw: string) {
100 const ev = JSON.parse(raw);
101 switch (ev.type) {
102 case 'session.created': setStatus('connected'); break;
103 case 'output_audio.delta': playPcm16(Buffer.from(ev.audio, 'base64')); break;
104 case 'agent_start_talking': /* UI: show "speaking" */ break;
105 case 'agent_stop_talking': /* UI: hide "speaking" */ break;
106 case 'interruption': flushPlayback(); break;
107 case 'session.closed': setStatus('idle'); break;
108 case 'error': console.error(`[${ev.code}] ${ev.message}`); break;
109 }
110 }
111
112 // ---- playback ---------------------------------------------------
113 function playPcm16(bytes: Buffer) {
114 const ctx = audioCtxRef.current;
115 if (!ctx) return;
116 const sampleCount = Math.floor(bytes.length / 2);
117 const buffer = ctx.createBuffer(1, sampleCount, SAMPLE_RATE);
118 const channel = buffer.getChannelData(0);
119 for (let i = 0; i < sampleCount; i++) {
120 channel[i] = bytes.readInt16LE(i * 2) / 32768;
121 }
122 const source = ctx.createBufferSource();
123 source.buffer = buffer;
124 source.connect(ctx.destination);
125 const startAt = Math.max(nextPlayRef.current, ctx.currentTime);
126 source.start(startAt);
127 nextPlayRef.current = startAt + buffer.duration;
128 }
129
130 function flushPlayback() {
131 if (audioCtxRef.current) nextPlayRef.current = audioCtxRef.current.currentTime;
132 }
133
134 return (
135 <View style={styles.container}>
136 <Text style={styles.title}>Atoms voice agent</Text>
137 <Text>Status: {status}</Text>
138 {status === 'idle' && <Button title="Start call" onPress={start} />}
139 {(status === 'connecting' || status === 'connected') && <Button title="End call" onPress={stop} />}
140 </View>
141 );
142}
143
144const styles = StyleSheet.create({
145 container: { flex: 1, padding: 20, paddingTop: 80 },
146 title: { fontSize: 20, fontWeight: '600', marginBottom: 20 },
147});

AudioRecorder delivers Float32 PCM frames through onAudioReady. The Atoms wire protocol expects Int16 little-endian PCM, so convert each frame with float32ToInt16LE before base64-encoding and sending.

AudioContext from react-native-audio-api implements the Web Audio API. createBuffer + createBufferSource schedules PCM buffers back-to-back with accurate timing. The nextPlayTime running pointer is the standard trick for gapless streaming playback.

AudioManager.setAudioSessionOptions configures iOS AVAudioSession with playAndRecord + voiceChat, which turns on the system’s AEC pipeline. On Android the recorder selects VOICE_COMMUNICATION internally, which enables the platform’s AEC + NS.

Server events (full list)

The quickstart’s switch handles everything, but here are the six event types with their meanings for reference:

EventWhat to do
session.createdConnection accepted; you can start streaming mic audio. Contains session_id and call_id.
output_audio.deltaDecode base64, schedule on the AudioContext.
agent_start_talkingThe agent’s TTS turn is starting. Show a speaking indicator; optionally mute the mic to cut self-feedback.
agent_stop_talkingThe agent’s TTS turn is done. Unmute if you muted.
session.closedSession ended. reason tells you why (client_requested, websocket_closed, or a server tag).
interruptionUser barged in during the agent turn. Drop the playback queue and wait for a new agent_start_talking.
errorServer-side error during the session. Non-fatal errors keep the socket open.

Full session handling

Turn lifecycle

A single conversational turn has this sequence on the wire:

  1. Client keeps streaming input_audio_buffer.append from the mic.
  2. Server detects end of user utterance, runs STT → LLM → TTS.
  3. Server sends agent_start_talking.
  4. Server sends a stream of output_audio.delta chunks.
  5. Server sends agent_stop_talking.
  6. Back to step 1.

The client does not send input_audio_buffer.commit in normal conversational flow. The server’s VAD handles turn boundaries. You only send commit if you implement explicit push-to-talk and want to force an immediate response.

Backpressure on playback

The server emits output_audio.delta faster than realtime when the LLM finishes early. If you play each chunk as it arrives, audio will overlap. The AudioContext.createBufferSource() pattern in the quickstart avoids this by maintaining a nextPlayTime pointer: each new buffer is scheduled at max(nextPlayTime, currentTime), and nextPlayTime advances by the buffer’s duration. The audio engine schedules them back-to-back without overlap.

Interruptions

When the server emits interruption, the user has spoken while the agent was talking. The agent’s remaining TTS output for that turn is invalid; drop it. Call playback.flush() or your ring buffer’s equivalent. A new agent_start_talking will follow shortly.

Platform gotchas

iOS: AVAudioSession configuration

React Native’s JavaScript runtime does not interact with AVAudioSession directly; the audio library manages it through native code. In this stack, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) configures the session before the recorder starts. If the application also uses other audio libraries (media players, video), coordinate their session categories; two libraries fighting over the same session will cause one to silence the other.

iOS: background voice calls

If the call should continue when the user locks the screen or switches apps, add audio to UIBackgroundModes in Info.plist:

1<key>UIBackgroundModes</key>
2<array>
3 <string>audio</string>
4</array>

This is stricter than the typical background fetch; Apple reviews for actual VoIP use. If the app is rejected, you either need to end the call when backgrounded or use voip mode with CallKit integration.

Android: foreground service for long calls

Android kills background mic access aggressively. For calls longer than ~30 seconds in the background, run a foreground service. Reference the Android guide’s foreground services for phone calls page.

Incoming phone call interruption

Both platforms fire audio session interruptions when a phone call comes in. The capture library’s data callback stops firing. Listen for your app’s focus-loss event (AppState in React Native) and tear down the WebSocket gracefully:

1import { AppState } from "react-native";
2
3useEffect(() => {
4 const sub = AppState.addEventListener("change", (state) => {
5 if (state !== "active" && wsRef.current) {
6 wsRef.current.close(1000, "backgrounded");
7 }
8 });
9 return () => sub.remove();
10}, []);

Echo cancellation

On iOS, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) turns on the system AEC + NS pipeline. On Android, react-native-audio-api selects the VOICE_COMMUNICATION input source internally, which enables the platform’s AEC + NS.

Without AEC, the agent hears its own audio through the mic and the server’s VAD fires continuous interruption events. This is the expected behavior on the iOS simulator (speaker loops back into the Mac mic). On a real device with earphones or an HFP Bluetooth headset the feedback loop is broken. If your target devices have weaker AEC (older Androids, tablets with distant mics), stop the recorder on agent_start_talking and start it again on agent_stop_talking. The user’s speech during the agent’s turn goes undetected, which is the safer trade-off than audible feedback.

Bluetooth audio routing

If the user connects a Bluetooth headset mid-call, the audio session reroutes automatically on both platforms. Test this flow: some devices buffer poorly and introduce perceptible lag for a few seconds after the route change.

Production hardening

Reconnect on transient network loss

ws.onclose with a non-1000 code is a network drop. Reconnect with exponential backoff up to 30 s. Do not retry on 1000 (clean close) or 4401 / 4403 (auth failure):

1let reconnectMs = 500;
2
3ws.onclose = (e) => {
4 if (e.code === 1000 || e.code === 4401 || e.code === 4403) return;
5 setTimeout(start, Math.min(reconnectMs *= 2, 30000));
6};

App lifecycle

Tear down the WebSocket and audio pipeline on AppState change to background and inactive. Do not try to keep the call alive across a full suspension. iOS will kill the socket anyway and the user sees a confusing silence.

Battery

Realtime audio + open WebSocket consumes roughly 3–5 % battery per minute on modern phones. For support-use calls (short, bounded duration), this is fine. For background companion apps, design for short sessions.

Error events

Subscribe to the server’s error event and surface non-transient errors to the user:

1case "error":
2 console.error(`[${ev.code}] ${ev.message}`);
3 if (ev.code === "401" || ev.code === "403") {
4 // auth failure: user-visible, show "please sign in again"
5 }
6 break;

Next steps