iOS (Swift)

View as Markdown

Native iOS applications integrate with the Atoms agent over the raw WebSocket protocol with zero third-party dependencies.

URLSessionWebSocketTask handles transport and has been available since iOS 13. AVAudioEngine captures microphone PCM16 through an input tap, and AVAudioPlayerNode plays agent audio with sample-accurate scheduling.

Configure AVAudioSession with the .playAndRecord category and .voiceChat mode to enable the system echo cancellation pipeline.

Validated end-to-end on the iOS simulator (iPhone 16 Pro, iOS 26.4): URLSessionWebSocketTask connects, AVAudioEngine input tap captures, AVAudioPlayerNode plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior.

When to use native iOS

  • iOS-only app, or a cross-platform app where iOS is the priority platform.
  • You want zero external dependencies for audio and networking.
  • You need fine control over audio session routing (Bluetooth, CarPlay, external mics).

If your app is primarily React Native, the React Native guide is simpler. For Flutter, see Flutter.

Dependencies

None. URLSessionWebSocketTask and AVAudioEngine are part of the iOS SDK. Minimum deployment target: iOS 13.

Permissions

Add to Info.plist:

1<key>NSMicrophoneUsageDescription</key>
2<string>We need the microphone to let you talk to the voice agent.</string>

Request at runtime before starting a session:

1import AVFoundation
2
3func requestMicrophonePermission() async -> Bool {
4 await withCheckedContinuation { continuation in
5 AVAudioApplication.requestRecordPermission { granted in
6 continuation.resume(returning: granted)
7 }
8 }
9}

AVAudioApplication.requestRecordPermission replaces the deprecated AVAudioSession.sharedInstance().requestRecordPermission in iOS 17. For earlier versions, use the older API.

Audio session setup

The audio session governs capture and playback routing. For a bidirectional voice call, configure it as .playAndRecord with the .voiceChat mode. .voiceChat enables the system’s echo cancellation and noise suppression pipeline.

1import AVFoundation
2
3func configureAudioSession() throws {
4 let session = AVAudioSession.sharedInstance()
5 try session.setCategory(
6 .playAndRecord,
7 mode: .voiceChat,
8 options: [.allowBluetooth, .defaultToSpeaker]
9 )
10 try session.setPreferredSampleRate(24_000)
11 try session.setPreferredIOBufferDuration(0.02) // 20 ms buffer, low latency
12 try session.setActive(true)
13}

setPreferredSampleRate(24000) asks the hardware to match the rate the server negotiates. The system may not honor it exactly on all devices. If the active sample rate differs, resample before sending or when receiving.

Quickstart

A full working session: configure the audio session, open the WebSocket, stream mic PCM16, play agent PCM16, close cleanly.

1import AVFoundation
2import Foundation
3
4final class AtomsAgent: NSObject {
5 private let apiKey: String
6 private let agentId: String
7 private let sampleRate: Double = 24_000
8 private var webSocketTask: URLSessionWebSocketTask?
9 private let audioEngine = AVAudioEngine()
10 private var playerNode: AVAudioPlayerNode?
11 private var playerFormat: AVAudioFormat?
12
13 init(apiKey: String, agentId: String) {
14 self.apiKey = apiKey
15 self.agentId = agentId
16 }
17
18 func start() async throws {
19 try configureAudioSession()
20 connectWebSocket()
21 try setupPlayback()
22 try startMicrophoneTap()
23 }
24
25 func stop() {
26 audioEngine.stop()
27 webSocketTask?.cancel(with: .goingAway, reason: nil)
28 }
29}

Open the WebSocket

1private func connectWebSocket() {
2 var components = URLComponents(string: "wss://api.smallest.ai/atoms/v1/agent/connect")!
3 components.queryItems = [
4 URLQueryItem(name: "token", value: apiKey),
5 URLQueryItem(name: "agent_id", value: agentId),
6 URLQueryItem(name: "mode", value: "webcall"),
7 URLQueryItem(name: "sample_rate", value: "24000"),
8 ]
9
10 let session = URLSession(configuration: .default)
11 webSocketTask = session.webSocketTask(with: components.url!)
12 webSocketTask?.resume()
13 listenForServerMessages()
14}
15
16private func listenForServerMessages() {
17 webSocketTask?.receive { [weak self] result in
18 switch result {
19 case .success(.string(let text)):
20 self?.handleServerEvent(text: text)
21 self?.listenForServerMessages() // rearm
22 case .success(.data(let data)):
23 if let text = String(data: data, encoding: .utf8) {
24 self?.handleServerEvent(text: text)
25 }
26 self?.listenForServerMessages()
27 case .failure(let error):
28 print("ws receive failed: \(error)")
29 // handle reconnect or shutdown here
30 @unknown default:
31 self?.listenForServerMessages()
32 }
33 }
34}

URLSessionWebSocketTask.receive is one-shot. Re-call it after every message to keep the stream flowing.

Microphone capture

Install an audio tap on the input node. The tap runs on a high-priority audio thread and hands you an AVAudioPCMBuffer every few milliseconds. Convert it to Int16 PCM and send as base64.

1private func startMicrophoneTap() throws {
2 let input = audioEngine.inputNode
3 let hwFormat = input.inputFormat(forBus: 0)
4
5 // Tap at the hardware format, resample to 24kHz mono PCM16 before sending.
6 let targetFormat = AVAudioFormat(
7 commonFormat: .pcmFormatInt16,
8 sampleRate: sampleRate,
9 channels: 1,
10 interleaved: true
11 )!
12 let converter = AVAudioConverter(from: hwFormat, to: targetFormat)!
13
14 input.installTap(onBus: 0, bufferSize: 1024, format: hwFormat) { [weak self] buffer, _ in
15 guard let self, let task = self.webSocketTask else { return }
16
17 let frameCapacity = AVAudioFrameCount(targetFormat.sampleRate) / 10 // 100 ms
18 guard let converted = AVAudioPCMBuffer(
19 pcmFormat: targetFormat,
20 frameCapacity: frameCapacity
21 ) else { return }
22
23 var error: NSError?
24 converter.convert(to: converted, error: &error) { _, outStatus in
25 outStatus.pointee = .haveData
26 return buffer
27 }
28 if error != nil { return }
29
30 // Grab the Int16 bytes and base64-encode.
31 guard let channelData = converted.int16ChannelData?[0] else { return }
32 let byteCount = Int(converted.frameLength) * MemoryLayout<Int16>.size
33 let data = Data(bytes: channelData, count: byteCount)
34 let payload: [String: Any] = [
35 "type": "input_audio_buffer.append",
36 "audio": data.base64EncodedString(),
37 ]
38 guard let json = try? JSONSerialization.data(withJSONObject: payload) else { return }
39 task.send(.data(json)) { _ in }
40 }
41
42 audioEngine.prepare()
43 try audioEngine.start()
44}

The tap’s closure runs on the audio thread. Keep it short. Do not block on UI updates or synchronous I/O. URLSessionWebSocketTask.send is asynchronous and non-blocking.

Playback

Schedule incoming PCM16 chunks on an AVAudioPlayerNode. The player node manages its own queue, so you can schedule many buffers in sequence and they play gaplessly.

1private func setupPlayback() throws {
2 let player = AVAudioPlayerNode()
3 playerNode = player
4
5 // Player node feeds the main mixer, which feeds the output.
6 let format = AVAudioFormat(
7 commonFormat: .pcmFormatInt16,
8 sampleRate: sampleRate,
9 channels: 1,
10 interleaved: true
11 )!
12 playerFormat = format
13
14 audioEngine.attach(player)
15 audioEngine.connect(player, to: audioEngine.mainMixerNode, format: format)
16 player.play()
17}
18
19private func playPCM16(_ data: Data) {
20 guard let player = playerNode, let format = playerFormat else { return }
21
22 let frames = AVAudioFrameCount(data.count / MemoryLayout<Int16>.size)
23 guard let buffer = AVAudioPCMBuffer(pcmFormat: format, frameCapacity: frames) else { return }
24 buffer.frameLength = frames
25
26 data.withUnsafeBytes { raw in
27 guard let src = raw.bindMemory(to: Int16.self).baseAddress else { return }
28 buffer.int16ChannelData?[0].update(from: src, count: Int(frames))
29 }
30
31 player.scheduleBuffer(buffer, completionHandler: nil)
32}
33
34private func flushPlayback() {
35 playerNode?.stop()
36 playerNode?.play()
37}

Handle server events

1private func handleServerEvent(text: String) {
2 guard
3 let data = text.data(using: .utf8),
4 let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any],
5 let type = json["type"] as? String
6 else { return }
7
8 switch type {
9 case "session.created":
10 // update UI on main queue
11 break
12 case "output_audio.delta":
13 if let b64 = json["audio"] as? String,
14 let audio = Data(base64Encoded: b64) {
15 playPCM16(audio)
16 }
17 case "agent_start_talking", "agent_stop_talking":
18 // update UI state
19 break
20 case "interruption":
21 flushPlayback()
22 case "session.closed":
23 stop()
24 case "error":
25 let code = (json["code"] as? String) ?? ""
26 let message = (json["message"] as? String) ?? ""
27 print("agent error [\(code)]: \(message)")
28 default:
29 break
30 }
31}

Threading model

  • Audio callbacks (mic tap, player completion) run on a high-priority audio thread. Touch no UI state from there. Dispatch to MainActor for anything the user sees.
  • WebSocket callbacks run on URLSession’s delegate queue. Same rule: no UI on that queue; hop to main for anything visual.
  • handleServerEvent above dispatches implicitly through Foundation serialization; it is safe to call from the WS delegate queue.

A clean pattern:

1@MainActor
2final class AgentViewModel: ObservableObject {
3 @Published var status: String = "idle"
4 let agent: AtomsAgent
5
6 init(agent: AtomsAgent) { self.agent = agent }
7
8 func handleStateChange(_ newStatus: String) {
9 status = newStatus // UI update, main actor
10 }
11}

From the audio or WS thread, call await MainActor.run { viewModel.handleStateChange("connected") }.

Interruption handling

When a phone call comes in or the user triggers Siri, the audio session posts an interruption notification. Pause capture and playback, resume on the “ended” notification.

1import AVFoundation
2
3NotificationCenter.default.addObserver(
4 forName: AVAudioSession.interruptionNotification,
5 object: nil,
6 queue: .main
7) { [weak self] notification in
8 guard
9 let info = notification.userInfo,
10 let raw = info[AVAudioSessionInterruptionTypeKey] as? UInt,
11 let type = AVAudioSession.InterruptionType(rawValue: raw)
12 else { return }
13
14 switch type {
15 case .began:
16 self?.audioEngine.pause()
17 case .ended:
18 if let rawOptions = info[AVAudioSessionInterruptionOptionKey] as? UInt,
19 AVAudioSession.InterruptionOptions(rawValue: rawOptions).contains(.shouldResume) {
20 try? self?.audioEngine.start()
21 }
22 @unknown default:
23 break
24 }
25}

Route changes

Bluetooth connect/disconnect, headphone unplug, and CarPlay activation trigger AVAudioSession.routeChangeNotification. The audio engine handles most transitions transparently. Subscribe if you want to update UI (show “using Bluetooth” indicator, etc.).

Background modes

For calls that continue when the user locks the screen, add to Info.plist:

1<key>UIBackgroundModes</key>
2<array>
3 <string>audio</string>
4</array>

Apple’s review expects this to be used for VoIP-style apps. Combine with PushKit and CallKit for a compliant VoIP experience. For short in-app calls that end when backgrounded, skip this and tear down on UIApplication.didEnterBackgroundNotification.

Production hardening

Reconnect on transient failure

URLSessionWebSocketTask closes with a URLError code. Retry only on network-transient codes (.notConnectedToInternet, .timedOut, .networkConnectionLost), never on auth errors (4401, 4403) or a clean client close.

1private func onWebSocketClosed(code: URLSessionWebSocketTask.CloseCode, reason: Data?) {
2 switch code {
3 case .normalClosure, .goingAway:
4 return
5 default:
6 // exponential backoff 500 ms → 30 s
7 retry()
8 }
9}

Mic mute while agent speaks

To reduce echo when headset AEC underperforms, stop the mic tap on agent_start_talking and reinstall it on agent_stop_talking. The user’s speech during that window goes undetected, which is usually preferable to the agent hearing itself.

Error event from server

The error event from the server carries actionable codes. Surface auth failures (401, 403) to the user immediately and stop retrying.

Next steps