Android (Kotlin) | Smallest AI Docs

Native Android applications integrate with the Atoms agent over the raw WebSocket protocol. OkHttp handles transport, and the platform AudioRecord and AudioTrack classes handle PCM16 capture and playback.

Initialize AudioRecord with the VOICE_COMMUNICATION audio source to engage the platform’s acoustic echo cancellation and noise suppression.

Minimum supported version is Android 7 (API 24), which matches OkHttp 5’s API floor.

Validated end-to-end on a Pixel 9 emulator (Android API 35): OkHttp WebSocket connects, AudioRecord streams PCM16 with VOICE_COMMUNICATION for AEC coupling, and AudioTrack plays back agent audio in MODE_STREAM with USAGE_MEDIA (the STREAM_VOICE_CALL path is system-controlled and inaudible on emulators). Verify foreground service and Bluetooth route behavior on physical devices if those flows matter for your app.

When to use native Android

Android-only app, or a cross-platform app where Android is the priority.
You need fine control over the audio pipeline (specific sample rates, buffer sizes, AEC routing).
You need proper foreground-service handling for calls that continue when the app is backgrounded.

If your app is primarily React Native, see the React Native guide. For Flutter, see Flutter.

Dependencies

1 // build.gradle.kts (app)
2 dependencies {
3     implementation("com.squareup.okhttp3:okhttp:4.12.0")
4     implementation("org.jetbrains.kotlinx:kotlinx-coroutines-android:1.8.1")
5 }

Dependency	Role	Why this one
OkHttp	WebSocket client	Ubiquitous in Android, production-hardened, handles reconnect/backoff primitives. Alternative: Ktor client.
kotlinx-coroutines	Async orchestration	Bridges OkHttp’s callback model to suspending functions cleanly.
`AudioRecord` (platform)	Microphone PCM16 capture	Zero-dependency, gives raw Int16 access.
`AudioTrack` (platform)	PCM16 playback	Same. Streaming mode consumes your buffer at the device sample rate.

Manifest permissions

1 <manifest xmlns:android="http://schemas.android.com/apk/res/android">
2     <uses-permission android:name="android.permission.INTERNET" />
3     <uses-permission android:name="android.permission.RECORD_AUDIO" />
4     <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />
5 
6     <!-- If you support background calls, add the foreground service perms: -->
7     <uses-permission android:name="android.permission.FOREGROUND_SERVICE" />
8     <uses-permission android:name="android.permission.FOREGROUND_SERVICE_PHONE_CALL" />
9 </manifest>

Request RECORD_AUDIO at runtime:

1 private val permissionLauncher = registerForActivityResult(
2     ActivityResultContracts.RequestPermission()
3 ) { granted -> onMicPermission(granted) }
4 
5 private fun ensureMicPermission() {
6     val status = ContextCompat.checkSelfPermission(this, Manifest.permission.RECORD_AUDIO)
7     if (status == PackageManager.PERMISSION_GRANTED) onMicPermission(true)
8     else permissionLauncher.launch(Manifest.permission.RECORD_AUDIO)
9 }

Audio mode

Set AudioManager.MODE_IN_COMMUNICATION for the duration of the call. This signals the Android audio HAL that a bidirectional voice session is active; combined with MediaRecorder.AudioSource.VOICE_COMMUNICATION on the capture side, it enables the hardware AEC and NS pipeline. Restore the mode in the finally block of your session teardown.

1 val audioManager = getSystemService(Context.AUDIO_SERVICE) as AudioManager
2 val previousMode = audioManager.mode
3 audioManager.mode = AudioManager.MODE_IN_COMMUNICATION
4 
5 try {
6     runAgentSession()
7 } finally {
8     audioManager.mode = previousMode
9 }

Playback routing is handled separately by the AudioTrack’s AudioAttributes (see Playback). The quickstart uses USAGE_MEDIA on the player, which routes to the main speaker by default on both emulators and physical devices, so there is no need to toggle isSpeakerphoneOn.

Quickstart

A full agent session: open the WebSocket, start mic capture, play agent audio, tear down.

1 import kotlinx.coroutines.*
2 import okhttp3.*
3 import okio.ByteString
4 import java.util.concurrent.TimeUnit
5 
6 class AtomsAgent(
7     private val apiKey:  String,
8     private val agentId: String,
9 ) {
10     private val sampleRate = 24_000
11     private val client = OkHttpClient.Builder()
12         .readTimeout(0, TimeUnit.MILLISECONDS)  // unlimited for long-lived WS
13         .build()
14 
15     // Own the scope: the microphone coroutine is launched asynchronously from
16     // OkHttp's onOpen callback, so it must outlive any caller stack frame.
17     // A short-lived scope (for example, one passed in from lifecycleScope.launch
18     // whose lambda returns immediately) would be cancelled before onOpen fires,
19     // and the mic coroutine would silently no-op.
20     private val scope = CoroutineScope(Dispatchers.Default + SupervisorJob())
21 
22     private var webSocket: WebSocket? = null
23     private var micJob: Job? = null
24     private val player = AudioPlayer(sampleRate)
25 
26     fun start() {
27         val url = HttpUrl.Builder()
28             .scheme("https")  // OkHttp wraps wss:// via https://
29             .host("api.smallest.ai")
30             .addPathSegments("atoms/v1/agent/connect")
31             .addQueryParameter("token",       apiKey)
32             .addQueryParameter("agent_id",    agentId)
33             .addQueryParameter("mode",        "webcall")
34             .addQueryParameter("sample_rate", sampleRate.toString())
35             .build()
36 
37         val request = Request.Builder().url(url).build()
38 
39         webSocket = client.newWebSocket(request, object : WebSocketListener() {
40             override fun onOpen(ws: WebSocket, response: Response) {
41                 micJob = scope.launch { streamMicrophone(ws) }
42                 player.start()
43             }
44             override fun onMessage(ws: WebSocket, text: String)    = handleServerEvent(text)
45             override fun onMessage(ws: WebSocket, bytes: ByteString) = handleServerEvent(bytes.utf8())
46             override fun onClosing(ws: WebSocket, code: Int, reason: String) { stop() }
47             override fun onFailure(ws: WebSocket, t: Throwable, r: Response?) { stop() }
48         })
49     }
50 
51     fun stop() {
52         micJob?.cancel()
53         player.stop()
54         webSocket?.close(1000, "client stop")
55         webSocket = null
56         scope.cancel()
57     }
58 }

Microphone capture

1 import android.media.AudioFormat
2 import android.media.AudioRecord
3 import android.media.MediaRecorder
4 import android.util.Base64
5 import org.json.JSONObject
6 
7 private suspend fun streamMicrophone(ws: WebSocket) {
8     val channelConfig = AudioFormat.CHANNEL_IN_MONO
9     val encoding      = AudioFormat.ENCODING_PCM_16BIT
10     val minBuffer     = AudioRecord.getMinBufferSize(sampleRate, channelConfig, encoding)
11     val bufferSize    = (minBuffer * 2).coerceAtLeast(4096)
12 
13     val record = AudioRecord(
14         MediaRecorder.AudioSource.VOICE_COMMUNICATION,  // enables platform AEC + NS
15         sampleRate,
16         channelConfig,
17         encoding,
18         bufferSize,
19     )
20 
21     val chunk = ByteArray(bufferSize)
22     try {
23         record.startRecording()
24         while (currentCoroutineContext().isActive) {
25             val n = record.read(chunk, 0, chunk.size)
26             if (n > 0) {
27                 val audio = Base64.encodeToString(chunk, 0, n, Base64.NO_WRAP)
28                 val payload = JSONObject().apply {
29                     put("type",  "input_audio_buffer.append")
30                     put("audio", audio)
31                 }
32                 ws.send(payload.toString())
33             }
34         }
35     } finally {
36         record.stop()
37         record.release()
38     }
39 }

MediaRecorder.AudioSource.VOICE_COMMUNICATION routes capture through the platform’s AEC/NS pipeline. Without it, the agent will hear its own output through the microphone and start looping.

Playback

AudioTrack in MODE_STREAM accepts writes as fast as you can feed it and plays at the hardware sample rate. Run it on a dedicated thread and queue chunks from the WebSocket callback.

Use USAGE_MEDIA (not USAGE_VOICE_COMMUNICATION) for the AudioTrack. Even though this is a voice call, USAGE_VOICE_COMMUNICATION routes to the STREAM_VOICE_CALL stream, which is system-controlled: its volume is not settable by a normal app and it is silent on emulators. Capture still uses MediaRecorder.AudioSource.VOICE_COMMUNICATION for AEC coupling, which is what matters for echo cancellation.

1 import android.media.AudioAttributes
2 import android.media.AudioFormat
3 import android.media.AudioTrack
4 import java.util.concurrent.LinkedBlockingQueue
5 
6 class AudioPlayer(private val sampleRate: Int) {
7     private val channelConfig = AudioFormat.CHANNEL_OUT_MONO
8     private val encoding      = AudioFormat.ENCODING_PCM_16BIT
9     private val minBuffer     = AudioTrack.getMinBufferSize(sampleRate, channelConfig, encoding)
10 
11     private val queue = LinkedBlockingQueue<ByteArray>()
12     @Volatile private var running = false
13     private var thread: Thread? = null
14     private var track: AudioTrack? = null
15 
16     fun start() {
17         running = true
18         track = AudioTrack.Builder()
19             .setAudioAttributes(
20                 AudioAttributes.Builder()
21                     .setUsage(AudioAttributes.USAGE_MEDIA)
22                     .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
23                     .build()
24             )
25             .setAudioFormat(
26                 AudioFormat.Builder()
27                     .setSampleRate(sampleRate)
28                     .setChannelMask(channelConfig)
29                     .setEncoding(encoding)
30                     .build()
31             )
32             .setBufferSizeInBytes(minBuffer * 4)
33             .setTransferMode(AudioTrack.MODE_STREAM)
34             .build()
35             .also { it.play() }
36 
37         thread = Thread {
38             while (running) {
39                 val chunk = try { queue.poll(50, TimeUnit.MILLISECONDS) } catch (_: InterruptedException) { null } ?: continue
40                 track?.write(chunk, 0, chunk.size)
41             }
42         }.also { it.start() }
43     }
44 
45     fun enqueue(pcm: ByteArray) { queue.offer(pcm) }
46 
47     fun flush() {
48         queue.clear()
49         track?.flush()
50     }
51 
52     fun stop() {
53         running = false
54         thread?.join()
55         track?.stop(); track?.release(); track = null
56     }
57 }

Handle server events

1 private fun handleServerEvent(text: String) {
2     val json = JSONObject(text)
3     when (json.optString("type")) {
4         "session.created"      -> { /* update UI on main thread */ }
5         "output_audio.delta"    -> {
6             val pcm = Base64.decode(json.getString("audio"), Base64.NO_WRAP)
7             player.enqueue(pcm)
8         }
9         "agent_start_talking"   -> { /* UI: show "speaking" */ }
10         "agent_stop_talking"    -> { /* UI: hide "speaking" */ }
11         "interruption"          -> player.flush()
12         "session.closed"        -> stop()
13         "error"                 -> Log.e("Atoms", "[${json.optString("code")}] ${json.optString("message")}")
14     }
15 }

Threading model

OkHttp WebSocketListener callbacks run on OkHttp’s internal executor. Do not block them. All UI work must cross to the main looper via Handler(Looper.getMainLooper()).post { ... } or a coroutine on Dispatchers.Main.
AudioRecord.read in a loop must run off the main thread. Use a background coroutine as shown in the quickstart.
AudioTrack.write is a blocking call when the internal buffer is full. Run it on its own thread (as shown) to avoid stalling your capture loop.

Audio focus

If the user is playing music or on another call, request audio focus before starting:

1 val focusRequest = AudioFocusRequest.Builder(AudioManager.AUDIOFOCUS_GAIN_TRANSIENT_EXCLUSIVE)
2     .setAudioAttributes(
3         AudioAttributes.Builder()
4             .setUsage(AudioAttributes.USAGE_VOICE_COMMUNICATION)
5             .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH)
6             .build()
7     )
8     .setAcceptsDelayedFocusGain(false)
9     .setOnAudioFocusChangeListener { change ->
10         when (change) {
11             AudioManager.AUDIOFOCUS_LOSS,
12             AudioManager.AUDIOFOCUS_LOSS_TRANSIENT -> stop()
13         }
14     }
15     .build()
16 
17 val result = audioManager.requestAudioFocus(focusRequest)
18 if (result != AudioManager.AUDIOFOCUS_REQUEST_GRANTED) {
19     // another app has exclusive focus; don't start the call
20 }

Abandon focus in stop().

Background calls

For calls that continue when the user backgrounds the app, run the agent in a foreground service. Without this, Android will silently starve your mic capture on Android 12+.

1 // Declared in AndroidManifest.xml:
2 // <service
3 //     android:name=".AgentForegroundService"
4 //     android:foregroundServiceType="phoneCall"
5 //     android:exported="false" />
6 
7 class AgentForegroundService : Service() {
8     override fun onStartCommand(intent: Intent?, flags: Int, startId: Int): Int {
9         val notification = buildOngoingCallNotification()
10         startForeground(NOTIFICATION_ID, notification, ServiceInfo.FOREGROUND_SERVICE_TYPE_PHONE_CALL)
11         return START_STICKY
12     }
13     // ... delegate to AtomsAgent from here ...
14 }

The phoneCall foreground service type requires the FOREGROUND_SERVICE_PHONE_CALL manifest permission (Android 14+).

Interruption handling

Incoming phone calls and other communication apps revoke audio focus. Handle AUDIOFOCUS_LOSS in the listener as shown above and tear down cleanly. On AUDIOFOCUS_GAIN after a transient loss, decide whether to auto-resume or prompt the user.

Bluetooth route changes

Bluetooth headsets connect and disconnect during calls routinely. AudioRecord and AudioTrack switch routes transparently on most devices. You may want to observe AudioManager.ACTION_AUDIO_BECOMING_NOISY to pause the call if wired headphones are unplugged:

1 val noisyReceiver = object : BroadcastReceiver() {
2     override fun onReceive(context: Context, intent: Intent) {
3         if (intent.action == AudioManager.ACTION_AUDIO_BECOMING_NOISY) {
4             pauseCall()
5         }
6     }
7 }
8 context.registerReceiver(
9     noisyReceiver,
10     IntentFilter(AudioManager.ACTION_AUDIO_BECOMING_NOISY)
11 )

Production hardening

Reconnect on transient failure

OkHttp’s onFailure fires on network drops. Reconnect with exponential backoff up to 30 s. Do not retry on 4401/4403 codes (auth failure); check response?.code in the failure handler.

Mic mute while agent speaks

Stop AudioRecord briefly on agent_start_talking and restart on agent_stop_talking if device AEC is underperforming. The user’s speech during the agent turn goes undetected, which is usually the right trade-off versus audible self-feedback.

Battery

An open WebSocket + active AudioRecord + AudioTrack draws 3–5 % battery per minute. Design session duration accordingly. Always tear down promptly when the user ends the call.

Logging

Attach an interceptor to OkHttp for debugging the handshake. Remove before shipping.

Next steps

Realtime Agent WebSocket API

The full wire protocol with every message type, payload, and error code.

iOS (Swift)

Native iOS integration with URLSessionWebSocketTask and AVAudioEngine.

Flutter

Cross-platform Dart integration.

Error reference

HTTP status codes returned by every Atoms endpoint.