Voice Agent (Electron + Pulse + Lightning)
Voice Agent (Electron + Pulse + Lightning)
This cookbook wires together all three Smallest AI products to build a working voice agent:
The same pattern underlies most production voice agents — customer support, sales calls, voice-driven UIs. Each piece is independently optimizable; this guide shows the minimum viable wiring.
If you want a full voice-agent platform with built-in telephony, campaigns, knowledge base, and call analytics, see Atoms — it’s built on top of this exact stack. Use this cookbook when you want to build the pipeline yourself.
Architecture
Streaming STT. Audio chunks in → partial + final transcripts out. Supports 17 streaming languages with regional aggregator auto-detection.
Chat completions + tool calling. Generates a filler phrase before tool calls so the user hears natural speech while tools run.
Streaming TTS. 44.1 kHz audio, ~200 ms TTFB, 12 TTS languages including Indic. See the model card for the full latency profile.
End-to-end flow
- Capture audio from the user (mic, telephony, WebRTC, etc.).
- Send to Pulse WebSocket in real-time. Receive partial transcripts as the user speaks, and a final transcript when they pause (end-of-utterance).
- Send the final transcript to Electron as a
usermessage in your ongoing conversation. Stream the response. - As Electron streams content, feed it to Lightning for TTS immediately — don’t wait for the full response.
- If Electron returns
tool_calls: the filler incontentis spoken via Lightning while you run the tool in parallel. When the tool returns, append the tool result and continue the conversation. - Play Lightning’s audio back to the user (speaker, telephony, WebRTC).
Minimal implementation (Python)
This is a sketch — production code needs proper async coordination, jitter handling, and error recovery, but the wiring shape is real.
The voice-agent latency win
This is the part to internalize: Electron’s filler phrase + parallel tool execution.
When the user asks “What’s my account balance?”, this is what happens in milliseconds:
Without the filler-phrase pattern, the user would hear silence from 0 ms to ~1100 ms. With it, they hear natural speech start at ~600 ms — feels conversational instead of robotic.
Production checklist
- Stream everything. Pulse WebSocket for STT, Electron
stream: truefor LLM, Lightning streaming for TTS. Any non-streaming hop adds hundreds of milliseconds. -
stream_options.include_usage: trueon Electron so you bill accurately on disconnects. - Run tool calls in parallel with TTS — never serialize the filler-then-tool path.
- Capture
X-Request-Idfrom every Electron response for support traceability. - Match Pulse
language, Electron prompts, and Lightning voice language. A Hindi caller should be transcribed in Hindi, prompted in Hindi, and synthesized with a Hindi voice. Don’t translate in the middle of the pipeline — Electron and Lightning both handle Indic natively. - Cap conversation history so prompts don’t grow unbounded. Truncate older turns once you exceed your token budget; prefix caching keeps recent turns cheap.
- Set a per-utterance timeout (~10 s on the full STT→LLM→TTS round). Voice users won’t wait longer; better to fall back to “Sorry, can you repeat?” than to hang silently.

