Flutter | Smallest AI Docs

Flutter applications integrate with the Atoms agent over the raw WebSocket protocol.

The Dart web_socket_channel package handles transport. mic_stream captures microphone PCM16, and flutter_pcm_sound plays agent audio with low-latency scheduling.

The stack is WebRTC-free. No LiveKit, no Daily, no platform-specific media engines.

Validated end-to-end on the iOS simulator: WebSocket connects, mic captures, agent audio plays back. On the simulator, speaker output loops back into the Mac microphone, so the server’s VAD fires interruption events continuously; test on a real device (earphones or an HFP Bluetooth headset) to confirm clean barge-in behavior. mic_stream on iOS does not configure the audio session for voice chat, so you get no echo cancellation out of the box. See the iOS audio session section.

When to use Flutter

Cross-platform mobile or desktop app with a shared Dart codebase.
You want a single audio pipeline that works across iOS, Android, and desktop targets.
You do not need character-level TTS alignment timings.

For single-platform native apps, the iOS (Swift) guide gives you full platform control with fewer intermediaries.

Dependencies

1 # pubspec.yaml
2 dependencies:
3   web_socket_channel: ^3.0.2
4   mic_stream:         ^0.7.1
5   flutter_pcm_sound:  ^2.1.0
6   permission_handler: ^11.3.1

Package	Role	Why this one
`web_socket_channel`	Dart WebSocket client	Official Dart team package, supports both IO and HTML platforms, streams-based API.
`mic_stream`	Microphone PCM16 capture	Exposes a raw Int16 stream at a configurable sample rate. Works on iOS and Android.
`flutter_pcm_sound`	PCM16 playback	Purpose-built for realtime PCM playback. No buffering layer, no format conversion overhead.
`permission_handler`	Cross-platform runtime permissions	Single API for iOS `NSMicrophoneUsageDescription` prompts and Android `RECORD_AUDIO` flow.

Platform configuration

iOS

Add to ios/Runner/Info.plist:

1 <key>NSMicrophoneUsageDescription</key>
2 <string>We need the microphone to let you talk to the voice agent.</string>

Android

Add to android/app/src/main/AndroidManifest.xml:

1 <uses-permission android:name="android.permission.INTERNET" />
2 <uses-permission android:name="android.permission.RECORD_AUDIO" />
3 <uses-permission android:name="android.permission.MODIFY_AUDIO_SETTINGS" />

Minimum Android SDK should be 24 (Android 7) for mic_stream compatibility. Set in android/app/build.gradle:

1 defaultConfig {
2     minSdkVersion 24
3 }

Request at runtime

1 import 'package:permission_handler/permission_handler.dart';
2 
3 Future<bool> ensureMicPermission() async {
4   final status = await Permission.microphone.request();
5   return status.isGranted;
6 }

Quickstart

A full agent session: check permission, open the WebSocket, stream mic audio, play agent audio, clean up.

1 import 'dart:async';
2 import 'dart:convert';
3 import 'dart:typed_data';
4 
5 import 'package:flutter/material.dart';
6 import 'package:flutter_pcm_sound/flutter_pcm_sound.dart';
7 import 'package:mic_stream/mic_stream.dart';
8 import 'package:web_socket_channel/web_socket_channel.dart';
9 
10 class VoiceAgentScreen extends StatefulWidget {
11   const VoiceAgentScreen({super.key});
12   @override
13   State<VoiceAgentScreen> createState() => _VoiceAgentScreenState();
14 }
15 
16 class _VoiceAgentScreenState extends State<VoiceAgentScreen> {
17   static const apiKey     = 'sk_...';
18   static const agentId    = '...';
19   static const sampleRate = 24000;
20 
21   WebSocketChannel? _channel;
22   StreamSubscription<Uint8List>? _micSub;
23   bool _connected = false;
24 
25   Future<void> _start() async {
26     if (!await ensureMicPermission()) return;
27 
28     final uri = Uri.parse(
29       'wss://api.smallest.ai/atoms/v1/agent/connect'
30       '?token=${Uri.encodeComponent(apiKey)}'
31       '&agent_id=${Uri.encodeComponent(agentId)}'
32       '&mode=webcall'
33       '&sample_rate=$sampleRate',
34     );
35 
36     _channel = WebSocketChannel.connect(uri);
37     _channel!.stream.listen(
38       _handleServerEvent,
39       onDone: _stop,
40       onError: (_) => _stop(),
41     );
42 
43     await FlutterPcmSound.setup(sampleRate: sampleRate, channelCount: 1);
44     FlutterPcmSound.start();
45 
46     await _startMicStream();
47 
48     setState(() => _connected = true);
49   }
50 
51   Future<void> _stop() async {
52     await _micSub?.cancel();
53     await FlutterPcmSound.release();
54     await _channel?.sink.close();
55     _channel = null;
56     if (mounted) setState(() => _connected = false);
57   }
58 
59   @override
60   void dispose() {
61     _stop();
62     super.dispose();
63   }
64 
65   @override
66   Widget build(BuildContext context) {
67     return Scaffold(
68       body: Center(
69         child: _connected
70             ? ElevatedButton(onPressed: _stop, child: const Text('End call'))
71             : ElevatedButton(onPressed: _start, child: const Text('Start call')),
72       ),
73     );
74   }
75 }

Microphone capture

1 Future<void> _startMicStream() async {
2   final isIOS = Theme.of(context).platform == TargetPlatform.iOS;
3   final micStream = MicStream.microphone(
4     // mic_stream's iOS plugin only supports AudioSource.DEFAULT. Passing
5     // VOICE_COMMUNICATION on iOS crashes with a nil force-unwrap.
6     audioSource:   isIOS ? AudioSource.DEFAULT : AudioSource.VOICE_COMMUNICATION,
7     sampleRate:    sampleRate,
8     channelConfig: ChannelConfig.CHANNEL_IN_MONO,
9     audioFormat:   AudioFormat.ENCODING_PCM_16BIT,
10   );
11 
12   _micSub = micStream.listen((Uint8List bytes) {
13     if (_channel == null) return;
14     _channel!.sink.add(jsonEncode({
15       'type':  'input_audio_buffer.append',
16       'audio': base64Encode(bytes),
17     }));
18   });
19 }

MicStream.microphone returns a Stream<Uint8List> synchronously (not a Future); don’t await it.

On Android, AudioSource.VOICE_COMMUNICATION selects the platform’s echo-cancelled audio path. On iOS, mic_stream uses AVCaptureSession with a default capture device and does not configure AVAudioSession for voice chat. For iOS echo cancellation on a real device, configure the audio session yourself via the audio_session package (category .playAndRecord, mode .voiceChat) before starting the stream, or mute the mic while the agent speaks (see Mic mute while agent speaks).

Server events

1 void _handleServerEvent(dynamic raw) {
2   final ev = jsonDecode(raw as String) as Map<String, dynamic>;
3   switch (ev['type']) {
4     case 'session.created':
5       // update UI
6       break;
7     case 'output_audio.delta':
8       final bytes = base64Decode(ev['audio'] as String);
9       final byteData = bytes.buffer.asByteData(bytes.offsetInBytes, bytes.lengthInBytes);
10       FlutterPcmSound.feed(PcmArrayInt16(bytes: byteData));
11       break;
12     case 'agent_start_talking':
13       // UI: show "speaking" indicator
14       break;
15     case 'agent_stop_talking':
16       // UI: hide "speaking" indicator
17       break;
18     case 'interruption':
19       // No public flush API in flutter_pcm_sound. The residual buffer
20       // (~100 ms at 24 kHz) will play out. Stop feeding and wait for
21       // the next agent_start_talking.
22       break;
23     case 'session.closed':
24       _stop();
25       break;
26     case 'error':
27       debugPrint('agent error [${ev['code']}]: ${ev['message']}');
28       break;
29   }
30 }

FlutterPcmSound.feed queues the chunk for playback. Internally the plugin manages a ring buffer on the platform side and drains it at the hardware sample rate, so you can push chunks as fast as they arrive.

Platform differences

iOS audio session

mic_stream does not configure AVAudioSession. It uses AVCaptureSession directly, so you get whatever the system default category is (usually .soloAmbient), and no echo cancellation. Configure the session yourself with audio_session before starting the stream:

1 import 'package:audio_session/audio_session.dart';
2 
3 final session = await AudioSession.instance;
4 await session.configure(const AudioSessionConfiguration(
5   avAudioSessionCategory:     AVAudioSessionCategory.playAndRecord,
6   avAudioSessionMode:         AVAudioSessionMode.voiceChat,
7   avAudioSessionCategoryOptions:
8       AVAudioSessionCategoryOptions.defaultToSpeaker |
9       AVAudioSessionCategoryOptions.allowBluetooth,
10 ));
11 await session.setActive(true);

.playAndRecord + .voiceChat enables the iOS system AEC pipeline and the voice-chat audio mode. Without it, the agent hears its own audio through the mic and the server’s VAD fires continuous interruption events. If your app uses other audio plugins (for example, just_audio for media playback), coordinate their session categories through the same audio_session package; two plugins fighting over the session will cause one to silence the other.

Android foreground service

If the call continues when the app is backgrounded, start a foreground service on the native Android side. mic_stream will continue capturing briefly when backgrounded but Android 12+ will revoke mic access within seconds without a foreground service declaring the phoneCall type. See the Android foreground services for voice calls reference for the service implementation.

Flutter-side, trigger the service from your MainActivity or via a plugin like flutter_background_service.

Desktop support

mic_stream and flutter_pcm_sound currently target mobile. Desktop targets (macOS, Windows, Linux) need flutter_webrtc or platform-channel bridges. If you need desktop today, use web_socket_channel for the WS and write platform-channel code for capture and playback.

Threading and isolates

WebSocketChannel events arrive on the main isolate.
MicStream.microphone delivers its Uint8List frames on the main isolate as well.
FlutterPcmSound.feed is fast (enqueues to a native buffer) but avoid calling it from a blocking UI build method.

For CPU-intensive preprocessing (resampling, denoising beyond what the platform provides), use compute() or a dedicated isolate. The baseline pipeline shown here does not need one.

Interruption handling

Incoming phone calls and other audio-focus events interrupt the stream. On Android, subscribe to audio focus via a platform channel or the audio_session plugin. On iOS, the operating system pauses mic_stream automatically and resumes after the interruption ends.

1 import 'package:audio_session/audio_session.dart';
2 
3 Future<void> _installInterruptionHandler() async {
4   final session = await AudioSession.instance;
5   await session.configure(const AudioSessionConfiguration.speech());
6   session.interruptionEventStream.listen((event) {
7     if (event.begin) {
8       _stop();
9     }
10   });
11 }

Call _installInterruptionHandler once during app startup.

Production hardening

Reconnect on transient failure

The onError/onDone callbacks on the WebSocket stream fire when the connection drops. Retry with exponential backoff up to 30 s for transient network errors. Do not retry on close codes 1000, 4401, 4403.

1 int _retryMs = 500;
2 
3 void _onWebSocketClosed(int? code) {
4   if (code == 1000 || code == 4401 || code == 4403) return;
5   Future.delayed(Duration(milliseconds: _retryMs), _start);
6   _retryMs = (_retryMs * 2).clamp(500, 30000);
7 }

Mic mute while agent speaks

If your target devices have weaker echo cancellation, cancel the mic subscription on agent_start_talking and restart it on agent_stop_talking. The user’s speech during the agent turn goes undetected; it is the safer trade-off than audible feedback.

App lifecycle

Use WidgetsBindingObserver to tear down on background transitions:

1 class _VoiceAgentScreenState extends State<VoiceAgentScreen> with WidgetsBindingObserver {
2   @override
3   void initState() {
4     super.initState();
5     WidgetsBinding.instance.addObserver(this);
6   }
7   @override
8   void didChangeAppLifecycleState(AppLifecycleState state) {
9     if (state == AppLifecycleState.paused) _stop();
10   }
11   @override
12   void dispose() {
13     WidgetsBinding.instance.removeObserver(this);
14     super.dispose();
15   }
16 }