React Native
React Native integrates with the Atoms agent over the raw WebSocket protocol. The runtime’s built-in WebSocket global handles transport, and a single audio library handles PCM16 capture and scheduled playback.
The browser WebSocket SDK cannot be used here. It calls navigator.mediaDevices.getUserMedia and the Web Audio API, both of which are DOM APIs and unavailable in the React Native JavaScript runtime.
The wire protocol is identical across runtimes. Client state machine, event types, and PCM16 payload encoding all match what the browser SDK does internally.
For a full working app, see Hearthside in the cookbook — a React Native (Expo) reference client built on this exact stack. It ships with a mute toggle, transport chunk counter, in-app settings sheet (voice / speed / language) wired to the draft → publish → activate REST flow, and the correct iOS audio session for full-volume speaker playback with hardware echo cancellation.
The quickstart is validated end-to-end on the iOS simulator with an Expo dev build: WebSocket connects, mic captures PCM, agent audio plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior.
When to use React Native
- Your existing app is React Native and you want to embed an in-app voice agent without bringing in WebRTC.
- You are building a cross-platform mobile client and want the JavaScript-side logic to look similar to the browser SDK.
- You do not need character-level TTS alignment timings (the raw protocol does not emit them).
For iOS-only apps with strict binary-size or battery budgets, prefer the iOS (Swift) native path. For Flutter, see the Flutter guide.
Dependencies
One audio library handles both capture and playback. react-native-audio-api ships an AudioRecorder for PCM frames and an AudioContext for scheduled playback, both backed by the same native session.
iOS setup (react-native-permissions)
react-native-permissions requires an explicit handler pod in the iOS Podfile. Add this near the top of ios/Podfile:
Then run pod install in ios/. Without this, the library crashes at runtime with “No permission handler detected.”
Alternatives considered
expo-av. High-level recording and playback. Does not expose raw PCM frames at a fixed sample rate, so it is not suitable for realtime voice streaming.- Custom Expo native module. Wrap
AVAudioEngine(iOS) andAudioRecord/AudioTrack(Android) in a minimal Expo module. Recommended only when binary-size or dependency-count constraints rule outreact-native-audio-api.
Permissions
iOS
Add to ios/<AppName>/Info.plist:
Android
Add to android/app/src/main/AndroidManifest.xml:
Request at runtime
Quickstart
A single-file App.tsx covering permission, WebSocket, mic capture, and scheduled PCM playback. Drop it into an Expo dev build (not Expo Go; these packages ship native code).
AudioRecorder delivers Float32 PCM frames through onAudioReady. The Atoms wire protocol expects Int16 little-endian PCM, so convert each frame with float32ToInt16LE before base64-encoding and sending.
AudioContext from react-native-audio-api implements the Web Audio API. createBuffer + createBufferSource schedules PCM buffers back-to-back with accurate timing. The nextPlayTime running pointer is the standard trick for gapless streaming playback.
AudioManager.setAudioSessionOptions configures iOS AVAudioSession with playAndRecord + voiceChat, which turns on the system’s AEC pipeline. On Android the recorder selects VOICE_COMMUNICATION internally, which enables the platform’s AEC + NS.
Server events (full list)
The quickstart’s switch handles everything, but here are the six event types with their meanings for reference:
Full session handling
Turn lifecycle
A single conversational turn has this sequence on the wire:
- Client keeps streaming
input_audio_buffer.appendfrom the mic. - Server detects end of user utterance, runs STT → LLM → TTS.
- Server sends
agent_start_talking. - Server sends a stream of
output_audio.deltachunks. - Server sends
agent_stop_talking. - Back to step 1.
The client does not send input_audio_buffer.commit in normal conversational flow. The server’s VAD handles turn boundaries. You only send commit if you implement explicit push-to-talk and want to force an immediate response.
Backpressure on playback
The server emits output_audio.delta faster than realtime when the LLM finishes early. If you play each chunk as it arrives, audio will overlap. The AudioContext.createBufferSource() pattern in the quickstart avoids this by maintaining a nextPlayTime pointer: each new buffer is scheduled at max(nextPlayTime, currentTime), and nextPlayTime advances by the buffer’s duration. The audio engine schedules them back-to-back without overlap.
Interruptions
When the server emits interruption, the user has spoken while the agent was talking. The agent’s remaining TTS output for that turn is invalid; drop it. Call playback.flush() or your ring buffer’s equivalent. A new agent_start_talking will follow shortly.
Platform gotchas
iOS: AVAudioSession configuration
React Native’s JavaScript runtime does not interact with AVAudioSession directly; the audio library manages it through native code. In this stack, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) configures the session before the recorder starts. If the application also uses other audio libraries (media players, video), coordinate their session categories; two libraries fighting over the same session will cause one to silence the other.
iOS: background voice calls
If the call should continue when the user locks the screen or switches apps, add audio to UIBackgroundModes in Info.plist:
This is stricter than the typical background fetch; Apple reviews for actual VoIP use. If the app is rejected, you either need to end the call when backgrounded or use voip mode with CallKit integration.
Android: foreground service for long calls
Android kills background mic access aggressively. For calls longer than ~30 seconds in the background, run a foreground service. Reference the Android guide’s foreground services for phone calls page.
Incoming phone call interruption
Both platforms fire audio session interruptions when a phone call comes in. The capture library’s data callback stops firing. Listen for your app’s focus-loss event (AppState in React Native) and tear down the WebSocket gracefully:
Echo cancellation
On iOS, AudioManager.setAudioSessionOptions({ iosCategory: 'playAndRecord', iosMode: 'voiceChat', ... }) turns on the system AEC + NS pipeline. On Android, react-native-audio-api selects the VOICE_COMMUNICATION input source internally, which enables the platform’s AEC + NS.
Without AEC, the agent hears its own audio through the mic and the server’s VAD fires continuous interruption events. This is the expected behavior on the iOS simulator (speaker loops back into the Mac mic). On a real device with earphones or an HFP Bluetooth headset the feedback loop is broken. If your target devices have weaker AEC (older Androids, tablets with distant mics), stop the recorder on agent_start_talking and start it again on agent_stop_talking. The user’s speech during the agent’s turn goes undetected, which is the safer trade-off than audible feedback.
Bluetooth audio routing
If the user connects a Bluetooth headset mid-call, the audio session reroutes automatically on both platforms. Test this flow: some devices buffer poorly and introduce perceptible lag for a few seconds after the route change.
Production hardening
Reconnect on transient network loss
ws.onclose with a non-1000 code is a network drop. Reconnect with exponential backoff up to 30 s. Do not retry on 1000 (clean close) or 4401 / 4403 (auth failure):
App lifecycle
Tear down the WebSocket and audio pipeline on AppState change to background and inactive. Do not try to keep the call alive across a full suspension. iOS will kill the socket anyway and the user sees a confusing silence.
Battery
Realtime audio + open WebSocket consumes roughly 3–5 % battery per minute on modern phones. For support-use calls (short, bounded duration), this is fine. For background companion apps, design for short sessions.
Error events
Subscribe to the server’s error event and surface non-transient errors to the user:
Next steps
The full wire protocol with every message type, payload, and error code.
The JavaScript SDK for browser runtimes.
Native Swift integration with URLSessionWebSocketTask and AVAudioEngine.
HTTP status codes returned by every Atoms endpoint.

