Voice Agent (Electron + Pulse + Lightning)

This cookbook wires together all three Smallest AI products to build a working voice agent:

Mic audio  →  Pulse (STT)  →  Electron (LLM + tools)  →  Lightning (TTS)  →  Speaker

The same pattern underlies most production voice agents — customer support, sales calls, voice-driven UIs. Each piece is independently optimizable; this guide shows the minimum viable wiring.

If you want a full voice-agent platform with built-in telephony, campaigns, knowledge base, and call analytics, see Atoms — it’s built on top of this exact stack. Use this cookbook when you want to build the pipeline yourself.

Architecture

Pulse

Streaming STT. Audio chunks in → partial + final transcripts out. Supports 17 streaming languages with regional aggregator auto-detection.

Electron

Chat completions + tool calling. Generates a filler phrase before tool calls so the user hears natural speech while tools run.

Lightning

Streaming TTS. 44.1 kHz audio, ~200 ms TTFB, 12 TTS languages including Indic. See the model card for the full latency profile.

End-to-end flow

Capture audio from the user (mic, telephony, WebRTC, etc.).
Send to Pulse WebSocket in real-time. Receive partial transcripts as the user speaks, and a final transcript when they pause (end-of-utterance).
Send the final transcript to Electron as a user message in your ongoing conversation. Stream the response.
As Electron streams content, feed it to Lightning for TTS immediately — don’t wait for the full response.
If Electron returns tool_calls: the filler in content is spoken via Lightning while you run the tool in parallel. When the tool returns, append the tool result and continue the conversation.
Play Lightning’s audio back to the user (speaker, telephony, WebRTC).

Minimal implementation (Python)

This is a sketch — production code needs proper async coordination, jitter handling, and error recovery, but the wiring shape is real.

1 import asyncio
2 import json
3 import os
4 import websockets
5 from openai import OpenAI
6 
7 SMALLEST = os.environ["SMALLEST_API_KEY"]
8 
9 client = OpenAI(
10     base_url="https://api.smallest.ai/waves/v1",
11     api_key=SMALLEST,
12 )
13 
14 SYSTEM_PROMPT = """You are a friendly customer support agent for Acme Corp.
15 Keep responses concise — under three sentences.
16 Use the get_account_info tool when the user asks about their account.
17 """
18 
19 TOOLS = [
20     {
21         "type": "function",
22         "function": {
23             "name": "get_account_info",
24             "description": "Look up account information for the current caller.",
25             "parameters": {
26                 "type": "object",
27                 "properties": {
28                     "account_id": {"type": "string"},
29                 },
30                 "required": ["account_id"],
31             },
32         },
33     }
34 ]
35 
36 conversation = [{"role": "system", "content": SYSTEM_PROMPT}]
37 
38 
39 async def transcribe_loop(audio_in, transcripts_out):
40     """Stream audio to Pulse, push final transcripts onto the queue."""
41     url = "wss://api.smallest.ai/waves/v1/pulse/get_text?language=en"
42     async with websockets.connect(
43         url, additional_headers={"Authorization": f"Bearer {SMALLEST}"}
44     ) as ws:
45         async def send():
46             async for chunk in audio_in:
47                 await ws.send(chunk)
48 
49         async def recv():
50             async for msg in ws:
51                 event = json.loads(msg)
52                 # Pulse WS frames: {"type": "transcription", "is_final": true|false, "transcript": "..."}
53                 # See /waves/documentation/speech-to-text-pulse/realtime-web-socket/response-format
54                 if event.get("type") == "transcription" and event.get("is_final"):
55                     await transcripts_out.put(event["transcript"])
56 
57         await asyncio.gather(send(), recv())
58 
59 
60 async def respond(user_text):
61     """Send to Electron, stream tokens. Returns the assistant message + any tool calls."""
62     conversation.append({"role": "user", "content": user_text})
63 
64     stream = client.chat.completions.create(
65         model="electron",
66         messages=conversation,
67         tools=TOOLS,
68         stream=True,
69         stream_options={"include_usage": True},
70     )
71 
72     content = ""
73     tool_calls = []   # accumulated
74     for chunk in stream:
75         if not chunk.choices:
76             continue
77         delta = chunk.choices[0].delta
78         if delta.content:
79             content += delta.content
80             # 🎙️ feed delta.content to Lightning for TTS immediately
81             await tts_speak_delta(delta.content)
82         if delta.tool_calls:
83             # accumulate tool_calls (function name + arguments stream as deltas)
84             accumulate_tool_calls(tool_calls, delta.tool_calls)
85 
86     # Whatever Electron emitted gets recorded into history.
87     assistant_msg = {"role": "assistant", "content": content or None}
88     if tool_calls:
89         assistant_msg["tool_calls"] = tool_calls
90     conversation.append(assistant_msg)
91     return assistant_msg
92 
93 
94 async def handle_tools(assistant_msg):
95     """Run any tool calls, append results, re-prompt Electron."""
96     if not assistant_msg.get("tool_calls"):
97         return
98 
99     for call in assistant_msg["tool_calls"]:
100         args = json.loads(call["function"]["arguments"])
101         result = run_tool(call["function"]["name"], args)   # your implementation
102         conversation.append({
103             "role": "tool",
104             "tool_call_id": call["id"],
105             "content": json.dumps(result),
106         })
107 
108     # Re-prompt with tool results — this gives the final spoken response.
109     await respond_continuation()
110 
111 
112 async def respond_continuation():
113     stream = client.chat.completions.create(
114         model="electron",
115         messages=conversation,
116         tools=TOOLS,
117         stream=True,
118         stream_options={"include_usage": True},
119     )
120     content = ""
121     for chunk in stream:
122         if chunk.choices and chunk.choices[0].delta.content:
123             delta = chunk.choices[0].delta.content
124             content += delta
125             await tts_speak_delta(delta)
126     conversation.append({"role": "assistant", "content": content})
127 
128 
129 async def turn_loop(audio_in):
130     """One iteration per user utterance."""
131     transcripts = asyncio.Queue()
132     asyncio.create_task(transcribe_loop(audio_in, transcripts))
133 
134     while True:
135         user_text = await transcripts.get()
136         assistant_msg = await respond(user_text)
137         await handle_tools(assistant_msg)

The voice-agent latency win

This is the part to internalize: Electron’s filler phrase + parallel tool execution.

When the user asks “What’s my account balance?”, this is what happens in milliseconds:

Time	Event
0 ms	User finishes speaking; Pulse emits final transcript
~5 ms	Transcript sent to Electron
~250 ms	Electron emits first `delta.content` — “Let me check your account…”
~250 ms	Lightning starts TTSing the filler immediately
~280 ms	Electron emits `delta.tool_calls` — `get_account_info(...)`
~280 ms	You start the tool call in parallel with TTS
~600 ms	User hears “Let me check your account…” through their speaker
~800 ms	Tool returns the balance
~850 ms	Electron emits final response — “Your balance is ₹12,450.”
~900 ms	Lightning TTSes the final response
~1200 ms	User hears the answer

Without the filler-phrase pattern, the user would hear silence from 0 ms to ~1100 ms. With it, they hear natural speech start at ~600 ms — feels conversational instead of robotic.

Production checklist

Stream everything. Pulse WebSocket for STT, Electron stream: true for LLM, Lightning streaming for TTS. Any non-streaming hop adds hundreds of milliseconds.
stream_options.include_usage: true on Electron so you bill accurately on disconnects.
Run tool calls in parallel with TTS — never serialize the filler-then-tool path.
Capture X-Request-Id from every Electron response for support traceability.
Match Pulse language, Electron prompts, and Lightning voice language. A Hindi caller should be transcribed in Hindi, prompted in Hindi, and synthesized with a Hindi voice. Don’t translate in the middle of the pipeline — Electron and Lightning both handle Indic natively.
Cap conversation history so prompts don’t grow unbounded. Truncate older turns once you exceed your token budget; prefix caching keeps recent turns cheap.
Set a per-utterance timeout (~10 s on the full STT→LLM→TTS round). Voice users won’t wait longer; better to fall back to “Sorry, can you repeat?” than to hang silently.

Pulse Realtime quickstart

Streaming STT setup.

Lightning Streaming

Streaming TTS setup.

Electron Tool Calling

Filler-phrase pattern in depth.

Atoms — managed voice agents

Skip the wiring and use the platform.