Android (Kotlin)

View as Markdown

Native Android applications integrate with the Atoms agent over the raw WebSocket protocol. OkHttp handles transport, and the platform AudioRecord and AudioTrack classes handle PCM16 capture and playback.

Initialize AudioRecord with the VOICE_COMMUNICATION audio source to engage the platform’s acoustic echo cancellation and noise suppression.

Minimum supported version is Android 7 (API 24), which matches OkHttp 5’s API floor.

Validated end-to-end on a Pixel 9 emulator (Android API 35): OkHttp WebSocket connects, AudioRecord streams PCM16 with VOICE_COMMUNICATION for AEC coupling, and AudioTrack plays back agent audio in MODE_STREAM with USAGE_MEDIA (the STREAM_VOICE_CALL path is system-controlled and inaudible on emulators). Verify foreground service and Bluetooth route behavior on physical devices if those flows matter for your app.

When to use native Android

  • Android-only app, or a cross-platform app where Android is the priority.
  • You need fine control over the audio pipeline (specific sample rates, buffer sizes, AEC routing).
  • You need proper foreground-service handling for calls that continue when the app is backgrounded.

If your app is primarily React Native, see the React Native guide. For Flutter, see Flutter.

Dependencies

1// build.gradle.kts (app)
2dependencies {
3 implementation("com.squareup.okhttp3:okhttp:4.12.0")
4 implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
5}
DependencyRoleWhy this one
OkHttpWebSocket clientUbiquitous in Android, production-hardened, handles reconnect/backoff primitives. Alternative: Ktor client.
kotlinx-coroutinesAsync orchestrationBridges OkHttp’s callback model to suspending functions cleanly.
AudioRecord (platform)Microphone PCM16 captureZero-dependency, gives raw Int16 access.
AudioTrack (platform)PCM16 playbackSame. Streaming mode consumes your buffer at the device sample rate.

Manifest permissions

1<manifest xmlns:android="http://schemas.android.com/apk/res/android">
2 <uses-permission android:name="android.permission.INTERNET" />
3 <uses-permission android:name="android.permission.RECORD_AUDIO" />
4 <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
5
6 <!-- If you support background calls, add the foreground service perms: -->
7 <uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
8 <uses-permission android:name="android.permission.FOREGROUND_SERVICE_PHONE_CALL" />
9</manifest>

Request RECORD_AUDIO at runtime:

1private val permissionLauncher = registerForActivityResult(
2 ActivityResultContracts.RequestPermission()
3) { granted -> onMicPermission(granted) }
4
5private fun ensureMicPermission() {
6 val status = ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
7 if (status == PackageManager.PERMISSION_GRANTED) onMicPermission(true)
8 else permissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
9}

Audio mode

Set AudioManager.MODE_IN_COMMUNICATION for the duration of the call. This signals the Android audio HAL that a bidirectional voice session is active; combined with MediaRecorder.AudioSource.VOICE_COMMUNICATION on the capture side, it enables the hardware AEC and NS pipeline. Restore the mode in the finally block of your session teardown.

1val audioManager = getSystemService(Context.AUDIO_SERVICE) as AudioManager
2val previousMode = audioManager.mode
3audioManager.mode = AudioManager.MODE_IN_COMMUNICATION
4
5try {
6 runAgentSession()
7} finally {
8 audioManager.mode = previousMode
9}

Playback routing is handled separately by the AudioTrack’s AudioAttributes (see Playback). The quickstart uses USAGE_MEDIA on the player, which routes to the main speaker by default on both emulators and physical devices, so there is no need to toggle isSpeakerphoneOn.

Quickstart

A full agent session: open the WebSocket, start mic capture, play agent audio, tear down.

1import kotlinx.coroutines.*
2import okhttp3.*
3import okio.ByteString
4import java.util.concurrent.TimeUnit
5
6class AtomsAgent(
7 private val apiKey: String,
8 private val agentId: String,
9) {
10 private val sampleRate = 24_000
11 private val client = OkHttpClient.Builder()
12 .readTimeout(0, TimeUnit.MILLISECONDS) // unlimited for long-lived WS
13 .build()
14
15 // Own the scope: the microphone coroutine is launched asynchronously from
16 // OkHttp's onOpen callback, so it must outlive any caller stack frame.
17 // A short-lived scope (for example, one passed in from lifecycleScope.launch
18 // whose lambda returns immediately) would be cancelled before onOpen fires,
19 // and the mic coroutine would silently no-op.
20 private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
21
22 private var webSocket: WebSocket? = null
23 private var micJob: Job? = null
24 private val player = AudioPlayer(sampleRate)
25
26 fun start() {
27 val url = HttpUrl.Builder()
28 .scheme("https") // OkHttp wraps wss:// via https://
29 .host("api.smallest.ai")
30 .addPathSegments("atoms/v1/agent/connect")
31 .addQueryParameter("token", apiKey)
32 .addQueryParameter("agent_id", agentId)
33 .addQueryParameter("mode", "webcall")
34 .addQueryParameter("sample_rate", sampleRate.toString())
35 .build()
36
37 val request = Request.Builder().url(url).build()
38
39 webSocket = client.newWebSocket(request, object : WebSocketListener() {
40 override fun onOpen(ws: WebSocket, response: Response) {
41 micJob = scope.launch { streamMicrophone(ws) }
42 player.start()
43 }
44 override fun onMessage(ws: WebSocket, text: String) = handleServerEvent(text)
45 override fun onMessage(ws: WebSocket, bytes: ByteString) = handleServerEvent(bytes.utf8())
46 override fun onClosing(ws: WebSocket, code: Int, reason: String) { stop() }
47 override fun onFailure(ws: WebSocket, t: Throwable, r: Response?) { stop() }
48 })
49 }
50
51 fun stop() {
52 micJob?.cancel()
53 player.stop()
54 webSocket?.close(1000, "client stop")
55 webSocket = null
56 scope.cancel()
57 }
58}

Microphone capture

1import android.media.AudioFormat
2import android.media.AudioRecord
3import android.media.MediaRecorder
4import android.util.Base64
5import org.json.JSONObject
6
7private suspend fun streamMicrophone(ws: WebSocket) {
8 val channelConfig = AudioFormat.CHANNEL_IN_MONO
9 val encoding = AudioFormat.ENCODING_PCM_16BIT
10 val minBuffer = AudioRecord.getMinBufferSize(sampleRate, channelConfig, encoding)
11 val bufferSize = (minBuffer * 2).coerceAtLeast(4096)
12
13 val record = AudioRecord(
14 MediaRecorder.AudioSource.VOICE_COMMUNICATION, // enables platform AEC + NS
15 sampleRate,
16 channelConfig,
17 encoding,
18 bufferSize,
19 )
20
21 val chunk = ByteArray(bufferSize)
22 try {
23 record.startRecording()
24 while (currentCoroutineContext().isActive) {
25 val n = record.read(chunk, 0, chunk.size)
26 if (n > 0) {
27 val audio = Base64.encodeToString(chunk, 0, n, Base64.NO_WRAP)
28 val payload = JSONObject().apply {
29 put("type", "input_audio_buffer.append")
30 put("audio", audio)
31 }
32 ws.send(payload.toString())
33 }
34 }
35 } finally {
36 record.stop()
37 record.release()
38 }
39}

MediaRecorder.AudioSource.VOICE_COMMUNICATION routes capture through the platform’s AEC/NS pipeline. Without it, the agent will hear its own output through the microphone and start looping.

Playback

AudioTrack in MODE_STREAM accepts writes as fast as you can feed it and plays at the hardware sample rate. Run it on a dedicated thread and queue chunks from the WebSocket callback.

Use USAGE_MEDIA (not USAGE_VOICE_COMMUNICATION) for the AudioTrack. Even though this is a voice call, USAGE_VOICE_COMMUNICATION routes to the STREAM_VOICE_CALL stream, which is system-controlled: its volume is not settable by a normal app and it is silent on emulators. Capture still uses MediaRecorder.AudioSource.VOICE_COMMUNICATION for AEC coupling, which is what matters for echo cancellation.

1import android.media.AudioAttributes
2import android.media.AudioFormat
3import android.media.AudioTrack
4import java.util.concurrent.LinkedBlockingQueue
5
6class AudioPlayer(private val sampleRate: Int) {
7 private val channelConfig = AudioFormat.CHANNEL_OUT_MONO
8 private val encoding = AudioFormat.ENCODING_PCM_16BIT
9 private val minBuffer = AudioTrack.getMinBufferSize(sampleRate, channelConfig, encoding)
10
11 private val queue = LinkedBlockingQueue<ByteArray>()
12 @Volatile private var running = false
13 private var thread: Thread? = null
14 private var track: AudioTrack? = null
15
16 fun start() {
17 running = true
18 track = AudioTrack.Builder()
19 .setAudioAttributes(
20 AudioAttributes.Builder()
21 .setUsage(AudioAttributes.USAGE_MEDIA)
22 .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
23 .build()
24 )
25 .setAudioFormat(
26 AudioFormat.Builder()
27 .setSampleRate(sampleRate)
28 .setChannelMask(channelConfig)
29 .setEncoding(encoding)
30 .build()
31 )
32 .setBufferSizeInBytes(minBuffer * 4)
33 .setTransferMode(AudioTrack.MODE_STREAM)
34 .build()
35 .also { it.play() }
36
37 thread = Thread {
38 while (running) {
39 val chunk = try { queue.poll(50, TimeUnit.MILLISECONDS) } catch (_: InterruptedException) { null } ?: continue
40 track?.write(chunk, 0, chunk.size)
41 }
42 }.also { it.start() }
43 }
44
45 fun enqueue(pcm: ByteArray) { queue.offer(pcm) }
46
47 fun flush() {
48 queue.clear()
49 track?.flush()
50 }
51
52 fun stop() {
53 running = false
54 thread?.join()
55 track?.stop(); track?.release(); track = null
56 }
57}

Handle server events

1private fun handleServerEvent(text: String) {
2 val json = JSONObject(text)
3 when (json.optString("type")) {
4 "session.created" -> { /* update UI on main thread */ }
5 "output_audio.delta" -> {
6 val pcm = Base64.decode(json.getString("audio"), Base64.NO_WRAP)
7 player.enqueue(pcm)
8 }
9 "agent_start_talking" -> { /* UI: show "speaking" */ }
10 "agent_stop_talking" -> { /* UI: hide "speaking" */ }
11 "interruption" -> player.flush()
12 "session.closed" -> stop()
13 "error" -> Log.e("Atoms", "[${json.optString("code")}] ${json.optString("message")}")
14 }
15}

Threading model

  • OkHttp WebSocketListener callbacks run on OkHttp’s internal executor. Do not block them. All UI work must cross to the main looper via Handler(Looper.getMainLooper()).post { ... } or a coroutine on Dispatchers.Main.
  • AudioRecord.read in a loop must run off the main thread. Use a background coroutine as shown in the quickstart.
  • AudioTrack.write is a blocking call when the internal buffer is full. Run it on its own thread (as shown) to avoid stalling your capture loop.

Audio focus

If the user is playing music or on another call, request audio focus before starting:

1val focusRequest = AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_EXCLUSIVE)
2 .setAudioAttributes(
3 AudioAttributes.Builder()
4 .setUsage(AudioAttributes.USAGE_VOICE_COMMUNICATION)
5 .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
6 .build()
7 )
8 .setAcceptsDelayedFocusGain(false)
9 .setOnAudioFocusChangeListener { change ->
10 when (change) {
11 AudioManager.AUDIOFOCUS_LOSS,
12 AudioManager.AUDIOFOCUS_LOSS_TRANSIENT -> stop()
13 }
14 }
15 .build()
16
17val result = audioManager.requestAudioFocus(focusRequest)
18if (result != AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {
19 // another app has exclusive focus; don't start the call
20}

Abandon focus in stop().

Background calls

For calls that continue when the user backgrounds the app, run the agent in a foreground service. Without this, Android will silently starve your mic capture on Android 12+.

1// Declared in AndroidManifest.xml:
2// <service
3// android:name=".AgentForegroundService"
4// android:foregroundServiceType="phoneCall"
5// android:exported="false" />
6
7class AgentForegroundService : Service() {
8 override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
9 val notification = buildOngoingCallNotification()
10 startForeground(NOTIFICATION_ID, notification, ServiceInfo.FOREGROUND_SERVICE_TYPE_PHONE_CALL)
11 return START_STICKY
12 }
13 // ... delegate to AtomsAgent from here ...
14}

The phoneCall foreground service type requires the FOREGROUND_SERVICE_PHONE_CALL manifest permission (Android 14+).

Interruption handling

Incoming phone calls and other communication apps revoke audio focus. Handle AUDIOFOCUS_LOSS in the listener as shown above and tear down cleanly. On AUDIOFOCUS_GAIN after a transient loss, decide whether to auto-resume or prompt the user.

Bluetooth route changes

Bluetooth headsets connect and disconnect during calls routinely. AudioRecord and AudioTrack switch routes transparently on most devices. You may want to observe AudioManager.ACTION_AUDIO_BECOMING_NOISY to pause the call if wired headphones are unplugged:

1val noisyReceiver = object : BroadcastReceiver() {
2 override fun onReceive(context: Context, intent: Intent) {
3 if (intent.action == AudioManager.ACTION_AUDIO_BECOMING_NOISY) {
4 pauseCall()
5 }
6 }
7}
8context.registerReceiver(
9 noisyReceiver,
10 IntentFilter(AudioManager.ACTION_AUDIO_BECOMING_NOISY)
11)

Production hardening

Reconnect on transient failure

OkHttp’s onFailure fires on network drops. Reconnect with exponential backoff up to 30 s. Do not retry on 4401/4403 codes (auth failure); check response?.code in the failure handler.

Mic mute while agent speaks

Stop AudioRecord briefly on agent_start_talking and restart on agent_stop_talking if device AEC is underperforming. The user’s speech during the agent turn goes undetected, which is usually the right trade-off versus audible self-feedback.

Battery

An open WebSocket + active AudioRecord + AudioTrack draws 3–5 % battery per minute. Design session duration accordingly. Always tear down promptly when the user ends the call.

Logging

Attach an interceptor to OkHttp for debugging the handshake. Remove before shipping.

Next steps