iOS (Swift) | Smallest AI Docs

Native iOS applications integrate with the Atoms agent over the raw WebSocket protocol with zero third-party dependencies.

URLSessionWebSocketTask handles transport and has been available since iOS 13. AVAudioEngine captures microphone PCM16 through an input tap, and AVAudioPlayerNode plays agent audio with sample-accurate scheduling.

Configure AVAudioSession with the .playAndRecord category and .voiceChat mode to enable the system echo cancellation pipeline.

Validated end-to-end on the iOS simulator (iPhone 16 Pro, iOS 26.4): URLSessionWebSocketTask connects, AVAudioEngine input tap captures, AVAudioPlayerNode plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior.

When to use native iOS

iOS-only app, or a cross-platform app where iOS is the priority platform.
You want zero external dependencies for audio and networking.
You need fine control over audio session routing (Bluetooth, CarPlay, external mics).

If your app is primarily React Native, the React Native guide is simpler. For Flutter, see Flutter.

Dependencies

None. URLSessionWebSocketTask and AVAudioEngine are part of the iOS SDK. Minimum deployment target: iOS 13.

Permissions

Add to Info.plist:

1 <key>NSMicrophoneUsageDescription</key>
2 <string>We need the microphone to let you talk to the voice agent.</string>

Request at runtime before starting a session:

1 import AVFoundation
2 
3 func requestMicrophonePermission() async -> Bool {
4     await withCheckedContinuation { continuation in
5         AVAudioApplication.requestRecordPermission { granted in
6             continuation.resume(returning: granted)
7         }
8     }
9 }

AVAudioApplication.requestRecordPermission replaces the deprecated AVAudioSession.sharedInstance().requestRecordPermission in iOS 17. For earlier versions, use the older API.

Audio session setup

The audio session governs capture and playback routing. For a bidirectional voice call, configure it as .playAndRecord with the .voiceChat mode. .voiceChat enables the system’s echo cancellation and noise suppression pipeline.

1 import AVFoundation
2 
3 func configureAudioSession() throws {
4     let session = AVAudioSession.sharedInstance()
5     try session.setCategory(
6         .playAndRecord,
7         mode: .voiceChat,
8         options: [.allowBluetooth, .defaultToSpeaker]
9     )
10     try session.setPreferredSampleRate(24_000)
11     try session.setPreferredIOBufferDuration(0.02)  // 20 ms buffer, low latency
12     try session.setActive(true)
13 }

setPreferredSampleRate(24000) asks the hardware to match the rate the server negotiates. The system may not honor it exactly on all devices. If the active sample rate differs, resample before sending or when receiving.

Quickstart

A full working session: configure the audio session, open the WebSocket, stream mic PCM16, play agent PCM16, close cleanly.

1 import AVFoundation
2 import Foundation
3 
4 final class AtomsAgent: NSObject {
5     private let apiKey:  String
6     private let agentId: String
7     private let sampleRate: Double = 24_000
8     private var webSocketTask: URLSessionWebSocketTask?
9     private let audioEngine = AVAudioEngine()
10     private var playerNode: AVAudioPlayerNode?
11     private var playerFormat: AVAudioFormat?
12 
13     init(apiKey: String, agentId: String) {
14         self.apiKey  = apiKey
15         self.agentId = agentId
16     }
17 
18     func start() async throws {
19         try configureAudioSession()
20         connectWebSocket()
21         try setupPlayback()
22         try startMicrophoneTap()
23     }
24 
25     func stop() {
26         audioEngine.stop()
27         webSocketTask?.cancel(with: .goingAway, reason: nil)
28     }
29 }

Open the WebSocket

1 private func connectWebSocket() {
2     var components = URLComponents(string: "wss://api.smallest.ai/atoms/v1/agent/connect")!
3     components.queryItems = [
4         URLQueryItem(name: "token",       value: apiKey),
5         URLQueryItem(name: "agent_id",    value: agentId),
6         URLQueryItem(name: "mode",        value: "webcall"),
7         URLQueryItem(name: "sample_rate", value: "24000"),
8     ]
9 
10     let session = URLSession(configuration: .default)
11     webSocketTask = session.webSocketTask(with: components.url!)
12     webSocketTask?.resume()
13     listenForServerMessages()
14 }
15 
16 private func listenForServerMessages() {
17     webSocketTask?.receive { [weak self] result in
18         switch result {
19         case .success(.string(let text)):
20             self?.handleServerEvent(text: text)
21             self?.listenForServerMessages()          // rearm
22         case .success(.data(let data)):
23             if let text = String(data: data, encoding: .utf8) {
24                 self?.handleServerEvent(text: text)
25             }
26             self?.listenForServerMessages()
27         case .failure(let error):
28             print("ws receive failed: \(error)")
29             // handle reconnect or shutdown here
30         @unknown default:
31             self?.listenForServerMessages()
32         }
33     }
34 }

URLSessionWebSocketTask.receive is one-shot. Re-call it after every message to keep the stream flowing.

Microphone capture

Install an audio tap on the input node. The tap runs on a high-priority audio thread and hands you an AVAudioPCMBuffer every few milliseconds. Convert it to Int16 PCM and send as base64.

1 private func startMicrophoneTap() throws {
2     let input = audioEngine.inputNode
3     let hwFormat = input.inputFormat(forBus: 0)
4 
5     // Tap at the hardware format, resample to 24kHz mono PCM16 before sending.
6     let targetFormat = AVAudioFormat(
7         commonFormat: .pcmFormatInt16,
8         sampleRate:   sampleRate,
9         channels:     1,
10         interleaved:  true
11     )!
12     let converter = AVAudioConverter(from: hwFormat, to: targetFormat)!
13 
14     input.installTap(onBus: 0, bufferSize: 1024, format: hwFormat) { [weak self] buffer, _ in
15         guard let self, let task = self.webSocketTask else { return }
16 
17         let frameCapacity = AVAudioFrameCount(targetFormat.sampleRate) / 10  // 100 ms
18         guard let converted = AVAudioPCMBuffer(
19             pcmFormat:     targetFormat,
20             frameCapacity: frameCapacity
21         ) else { return }
22 
23         var error: NSError?
24         converter.convert(to: converted, error: &error) { _, outStatus in
25             outStatus.pointee = .haveData
26             return buffer
27         }
28         if error != nil { return }
29 
30         // Grab the Int16 bytes and base64-encode.
31         guard let channelData = converted.int16ChannelData?[0] else { return }
32         let byteCount = Int(converted.frameLength) * MemoryLayout<Int16>.size
33         let data = Data(bytes: channelData, count: byteCount)
34         let payload: [String: Any] = [
35             "type":  "input_audio_buffer.append",
36             "audio": data.base64EncodedString(),
37         ]
38         guard let json = try? JSONSerialization.data(withJSONObject: payload) else { return }
39         task.send(.data(json)) { _ in }
40     }
41 
42     audioEngine.prepare()
43     try audioEngine.start()
44 }

The tap’s closure runs on the audio thread. Keep it short. Do not block on UI updates or synchronous I/O. URLSessionWebSocketTask.send is asynchronous and non-blocking.

Playback

Schedule incoming PCM16 chunks on an AVAudioPlayerNode. The player node manages its own queue, so you can schedule many buffers in sequence and they play gaplessly.

1 private func setupPlayback() throws {
2     let player = AVAudioPlayerNode()
3     playerNode = player
4 
5     // Player node feeds the main mixer, which feeds the output.
6     let format = AVAudioFormat(
7         commonFormat: .pcmFormatInt16,
8         sampleRate:   sampleRate,
9         channels:     1,
10         interleaved:  true
11     )!
12     playerFormat = format
13 
14     audioEngine.attach(player)
15     audioEngine.connect(player, to: audioEngine.mainMixerNode, format: format)
16     player.play()
17 }
18 
19 private func playPCM16(_ data: Data) {
20     guard let player = playerNode, let format = playerFormat else { return }
21 
22     let frames = AVAudioFrameCount(data.count / MemoryLayout<Int16>.size)
23     guard let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: frames) else { return }
24     buffer.frameLength = frames
25 
26     data.withUnsafeBytes { raw in
27         guard let src = raw.bindMemory(to: Int16.self).baseAddress else { return }
28         buffer.int16ChannelData?[0].update(from: src, count: Int(frames))
29     }
30 
31     player.scheduleBuffer(buffer, completionHandler: nil)
32 }
33 
34 private func flushPlayback() {
35     playerNode?.stop()
36     playerNode?.play()
37 }

Handle server events

1 private func handleServerEvent(text: String) {
2     guard
3         let data = text.data(using: .utf8),
4         let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
5         let type = json["type"] as? String
6     else { return }
7 
8     switch type {
9     case "session.created":
10         // update UI on main queue
11         break
12     case "output_audio.delta":
13         if let b64 = json["audio"] as? String,
14            let audio = Data(base64Encoded: b64) {
15             playPCM16(audio)
16         }
17     case "agent_start_talking", "agent_stop_talking":
18         // update UI state
19         break
20     case "interruption":
21         flushPlayback()
22     case "session.closed":
23         stop()
24     case "error":
25         let code    = (json["code"] as? String) ?? ""
26         let message = (json["message"] as? String) ?? ""
27         print("agent error [\(code)]: \(message)")
28     default:
29         break
30     }
31 }

Threading model

Audio callbacks (mic tap, player completion) run on a high-priority audio thread. Touch no UI state from there. Dispatch to MainActor for anything the user sees.
WebSocket callbacks run on URLSession’s delegate queue. Same rule: no UI on that queue; hop to main for anything visual.
handleServerEvent above dispatches implicitly through Foundation serialization; it is safe to call from the WS delegate queue.

A clean pattern:

1 @MainActor
2 final class AgentViewModel: ObservableObject {
3     @Published var status: String = "idle"
4     let agent: AtomsAgent
5 
6     init(agent: AtomsAgent) { self.agent = agent }
7 
8     func handleStateChange(_ newStatus: String) {
9         status = newStatus        // UI update, main actor
10     }
11 }

From the audio or WS thread, call await MainActor.run { viewModel.handleStateChange("connected") }.

Interruption handling

When a phone call comes in or the user triggers Siri, the audio session posts an interruption notification. Pause capture and playback, resume on the “ended” notification.

1 import AVFoundation
2 
3 NotificationCenter.default.addObserver(
4     forName: AVAudioSession.interruptionNotification,
5     object: nil,
6     queue: .main
7 ) { [weak self] notification in
8     guard
9         let info = notification.userInfo,
10         let raw  = info[AVAudioSessionInterruptionTypeKey] as? UInt,
11         let type = AVAudioSession.InterruptionType(rawValue: raw)
12     else { return }
13 
14     switch type {
15     case .began:
16         self?.audioEngine.pause()
17     case .ended:
18         if let rawOptions = info[AVAudioSessionInterruptionOptionKey] as? UInt,
19            AVAudioSession.InterruptionOptions(rawValue: rawOptions).contains(.shouldResume) {
20             try? self?.audioEngine.start()
21         }
22     @unknown default:
23         break
24     }
25 }

Route changes

Bluetooth connect/disconnect, headphone unplug, and CarPlay activation trigger AVAudioSession.routeChangeNotification. The audio engine handles most transitions transparently. Subscribe if you want to update UI (show “using Bluetooth” indicator, etc.).

Background modes

For calls that continue when the user locks the screen, add to Info.plist:

1 <key>UIBackgroundModes</key>
2 <array>
3   <string>audio</string>
4 </array>

Apple’s review expects this to be used for VoIP-style apps. Combine with PushKit and CallKit for a compliant VoIP experience. For short in-app calls that end when backgrounded, skip this and tear down on UIApplication.didEnterBackgroundNotification.

Production hardening

Reconnect on transient failure

URLSessionWebSocketTask closes with a URLError code. Retry only on network-transient codes (.notConnectedToInternet, .timedOut, .networkConnectionLost), never on auth errors (4401, 4403) or a clean client close.

1 private func onWebSocketClosed(code: URLSessionWebSocketTask.CloseCode, reason: Data?) {
2     switch code {
3     case .normalClosure, .goingAway:
4         return
5     default:
6         // exponential backoff 500 ms → 30 s
7         retry()
8     }
9 }

Mic mute while agent speaks

To reduce echo when headset AEC underperforms, stop the mic tap on agent_start_talking and reinstall it on agent_stop_talking. The user’s speech during that window goes undetected, which is usually preferable to the agent hearing itself.

Error event from server

The error event from the server carries actionable codes. Surface auth failures (401, 403) to the user immediately and stop retrying.

Next steps

Realtime Agent WebSocket API

The full wire protocol with every message type, payload, and error code.

React Native

Cross-platform mobile integration in TypeScript.

Flutter

Cross-platform mobile integration in Dart.

Error reference

HTTP status codes returned by every Atoms endpoint.