Android (Kotlin)
Android (Kotlin)
Native Android applications integrate with the Atoms agent over the raw WebSocket protocol. OkHttp handles transport, and the platform AudioRecord and AudioTrack classes handle PCM16 capture and playback.
Initialize AudioRecord with the VOICE_COMMUNICATION audio source to engage the platform’s acoustic echo cancellation and noise suppression.
Minimum supported version is Android 7 (API 24), which matches OkHttp 5’s API floor.
Validated end-to-end on a Pixel 9 emulator (Android API 35): OkHttp WebSocket connects, AudioRecord streams PCM16 with VOICE_COMMUNICATION for AEC coupling, and AudioTrack plays back agent audio in MODE_STREAM with USAGE_MEDIA (the STREAM_VOICE_CALL path is system-controlled and inaudible on emulators). Verify foreground service and Bluetooth route behavior on physical devices if those flows matter for your app.
When to use native Android
- Android-only app, or a cross-platform app where Android is the priority.
- You need fine control over the audio pipeline (specific sample rates, buffer sizes, AEC routing).
- You need proper foreground-service handling for calls that continue when the app is backgrounded.
If your app is primarily React Native, see the React Native guide. For Flutter, see Flutter.
Dependencies
Manifest permissions
Request RECORD_AUDIO at runtime:
Audio mode
Set AudioManager.MODE_IN_COMMUNICATION for the duration of the call. This signals the Android audio HAL that a bidirectional voice session is active; combined with MediaRecorder.AudioSource.VOICE_COMMUNICATION on the capture side, it enables the hardware AEC and NS pipeline. Restore the mode in the finally block of your session teardown.
Playback routing is handled separately by the AudioTrack’s AudioAttributes (see Playback). The quickstart uses USAGE_MEDIA on the player, which routes to the main speaker by default on both emulators and physical devices, so there is no need to toggle isSpeakerphoneOn.
Quickstart
A full agent session: open the WebSocket, start mic capture, play agent audio, tear down.
Microphone capture
MediaRecorder.AudioSource.VOICE_COMMUNICATION routes capture through the platform’s AEC/NS pipeline. Without it, the agent will hear its own output through the microphone and start looping.
Playback
AudioTrack in MODE_STREAM accepts writes as fast as you can feed it and plays at the hardware sample rate. Run it on a dedicated thread and queue chunks from the WebSocket callback.
Use USAGE_MEDIA (not USAGE_VOICE_COMMUNICATION) for the AudioTrack. Even though this is a voice call, USAGE_VOICE_COMMUNICATION routes to the STREAM_VOICE_CALL stream, which is system-controlled: its volume is not settable by a normal app and it is silent on emulators. Capture still uses MediaRecorder.AudioSource.VOICE_COMMUNICATION for AEC coupling, which is what matters for echo cancellation.
Handle server events
Threading model
- OkHttp
WebSocketListenercallbacks run on OkHttp’s internal executor. Do not block them. All UI work must cross to the main looper viaHandler(Looper.getMainLooper()).post { ... }or a coroutine onDispatchers.Main. AudioRecord.readin a loop must run off the main thread. Use a background coroutine as shown in the quickstart.AudioTrack.writeis a blocking call when the internal buffer is full. Run it on its own thread (as shown) to avoid stalling your capture loop.
Audio focus
If the user is playing music or on another call, request audio focus before starting:
Abandon focus in stop().
Background calls
For calls that continue when the user backgrounds the app, run the agent in a foreground service. Without this, Android will silently starve your mic capture on Android 12+.
The phoneCall foreground service type requires the FOREGROUND_SERVICE_PHONE_CALL manifest permission (Android 14+).
Interruption handling
Incoming phone calls and other communication apps revoke audio focus. Handle AUDIOFOCUS_LOSS in the listener as shown above and tear down cleanly. On AUDIOFOCUS_GAIN after a transient loss, decide whether to auto-resume or prompt the user.
Bluetooth route changes
Bluetooth headsets connect and disconnect during calls routinely. AudioRecord and AudioTrack switch routes transparently on most devices. You may want to observe AudioManager.ACTION_AUDIO_BECOMING_NOISY to pause the call if wired headphones are unplugged:
Production hardening
Reconnect on transient failure
OkHttp’s onFailure fires on network drops. Reconnect with exponential backoff up to 30 s. Do not retry on 4401/4403 codes (auth failure); check response?.code in the failure handler.
Mic mute while agent speaks
Stop AudioRecord briefly on agent_start_talking and restart on agent_stop_talking if device AEC is underperforming. The user’s speech during the agent turn goes undetected, which is usually the right trade-off versus audible self-feedback.
Battery
An open WebSocket + active AudioRecord + AudioTrack draws 3–5 % battery per minute. Design session duration accordingly. Always tear down promptly when the user ends the call.
Logging
Attach an interceptor to OkHttp for debugging the handshake. Remove before shipping.

