For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Architecture
  • End-to-end flow
  • Minimal implementation (Python)
  • The voice-agent latency win
  • Production checklist
  • Related
Cookbooks

Voice Agent (Electron + Pulse + Lightning)

||View as Markdown|
Was this page helpful?
Previous

Text to Speech Examples

Next

Instant Voice Clone (Web UI)

Built with

This cookbook wires together all three Smallest AI products to build a working voice agent:

Mic audio → Pulse (STT) → Electron (LLM + tools) → Lightning (TTS) → Speaker

The same pattern underlies most production voice agents — customer support, sales calls, voice-driven UIs. Each piece is independently optimizable; this guide shows the minimum viable wiring.

If you want a full voice-agent platform with built-in telephony, campaigns, knowledge base, and call analytics, see Atoms — it’s built on top of this exact stack. Use this cookbook when you want to build the pipeline yourself.

Architecture

Pulse

Streaming STT. Audio chunks in → partial + final transcripts out. Supports 38 languages with auto-detection.

Electron

Chat completions + tool calling. Generates a filler phrase before tool calls so the user hears natural speech while tools run.

Lightning

Streaming TTS. 44.1 kHz audio, ~200 ms TTFB, 12 TTS languages including Indic. See the model card for the full latency profile.

End-to-end flow

  1. Capture audio from the user (mic, telephony, WebRTC, etc.).
  2. Send to Pulse WebSocket in real-time. Receive partial transcripts as the user speaks, and a final transcript when they pause (end-of-utterance).
  3. Send the final transcript to Electron as a user message in your ongoing conversation. Stream the response.
  4. As Electron streams content, feed it to Lightning for TTS immediately — don’t wait for the full response.
  5. If Electron returns tool_calls: the filler in content is spoken via Lightning while you run the tool in parallel. When the tool returns, append the tool result and continue the conversation.
  6. Play Lightning’s audio back to the user (speaker, telephony, WebRTC).

Minimal implementation (Python)

This is a sketch — production code needs proper async coordination, jitter handling, and error recovery, but the wiring shape is real.

1import asyncio
2import json
3import os
4import websockets
5from openai import OpenAI
6
7SMALLEST = os.environ["SMALLEST_API_KEY"]
8
9client = OpenAI(
10 base_url="https://api.smallest.ai/waves/v1",
11 api_key=SMALLEST,
12)
13
14SYSTEM_PROMPT = """You are a friendly customer support agent for Acme Corp.
15Keep responses concise — under three sentences.
16Use the get_account_info tool when the user asks about their account.
17"""
18
19TOOLS = [
20 {
21 "type": "function",
22 "function": {
23 "name": "get_account_info",
24 "description": "Look up account information for the current caller.",
25 "parameters": {
26 "type": "object",
27 "properties": {
28 "account_id": {"type": "string"},
29 },
30 "required": ["account_id"],
31 },
32 },
33 }
34]
35
36conversation = [{"role": "system", "content": SYSTEM_PROMPT}]
37
38
39async def transcribe_loop(audio_in, transcripts_out):
40 """Stream audio to Pulse, push final transcripts onto the queue."""
41 url = "wss://api.smallest.ai/waves/v1/pulse/get_text?language=en"
42 async with websockets.connect(
43 url, additional_headers={"Authorization": f"Bearer {SMALLEST}"}
44 ) as ws:
45 async def send():
46 async for chunk in audio_in:
47 await ws.send(chunk)
48
49 async def recv():
50 async for msg in ws:
51 event = json.loads(msg)
52 # Pulse WS frames: {"type": "transcription", "is_final": true|false, "transcript": "..."}
53 # See /waves/documentation/speech-to-text-pulse/realtime-web-socket/response-format
54 if event.get("type") == "transcription" and event.get("is_final"):
55 await transcripts_out.put(event["transcript"])
56
57 await asyncio.gather(send(), recv())
58
59
60async def respond(user_text):
61 """Send to Electron, stream tokens. Returns the assistant message + any tool calls."""
62 conversation.append({"role": "user", "content": user_text})
63
64 stream = client.chat.completions.create(
65 model="electron",
66 messages=conversation,
67 tools=TOOLS,
68 stream=True,
69 stream_options={"include_usage": True},
70 )
71
72 content = ""
73 tool_calls = [] # accumulated
74 for chunk in stream:
75 if not chunk.choices:
76 continue
77 delta = chunk.choices[0].delta
78 if delta.content:
79 content += delta.content
80 # 🎙️ feed delta.content to Lightning for TTS immediately
81 await tts_speak_delta(delta.content)
82 if delta.tool_calls:
83 # accumulate tool_calls (function name + arguments stream as deltas)
84 accumulate_tool_calls(tool_calls, delta.tool_calls)
85
86 # Whatever Electron emitted gets recorded into history.
87 assistant_msg = {"role": "assistant", "content": content or None}
88 if tool_calls:
89 assistant_msg["tool_calls"] = tool_calls
90 conversation.append(assistant_msg)
91 return assistant_msg
92
93
94async def handle_tools(assistant_msg):
95 """Run any tool calls, append results, re-prompt Electron."""
96 if not assistant_msg.get("tool_calls"):
97 return
98
99 for call in assistant_msg["tool_calls"]:
100 args = json.loads(call["function"]["arguments"])
101 result = run_tool(call["function"]["name"], args) # your implementation
102 conversation.append({
103 "role": "tool",
104 "tool_call_id": call["id"],
105 "content": json.dumps(result),
106 })
107
108 # Re-prompt with tool results — this gives the final spoken response.
109 await respond_continuation()
110
111
112async def respond_continuation():
113 stream = client.chat.completions.create(
114 model="electron",
115 messages=conversation,
116 tools=TOOLS,
117 stream=True,
118 stream_options={"include_usage": True},
119 )
120 content = ""
121 for chunk in stream:
122 if chunk.choices and chunk.choices[0].delta.content:
123 delta = chunk.choices[0].delta.content
124 content += delta
125 await tts_speak_delta(delta)
126 conversation.append({"role": "assistant", "content": content})
127
128
129async def turn_loop(audio_in):
130 """One iteration per user utterance."""
131 transcripts = asyncio.Queue()
132 asyncio.create_task(transcribe_loop(audio_in, transcripts))
133
134 while True:
135 user_text = await transcripts.get()
136 assistant_msg = await respond(user_text)
137 await handle_tools(assistant_msg)

The voice-agent latency win

This is the part to internalize: Electron’s filler phrase + parallel tool execution.

When the user asks “What’s my account balance?”, this is what happens in milliseconds:

TimeEvent
0 msUser finishes speaking; Pulse emits final transcript
~5 msTranscript sent to Electron
~250 msElectron emits first delta.content — “Let me check your account…”
~250 msLightning starts TTSing the filler immediately
~280 msElectron emits delta.tool_calls — get_account_info(...)
~280 msYou start the tool call in parallel with TTS
~600 msUser hears “Let me check your account…” through their speaker
~800 msTool returns the balance
~850 msElectron emits final response — “Your balance is ₹12,450.”
~900 msLightning TTSes the final response
~1200 msUser hears the answer

Without the filler-phrase pattern, the user would hear silence from 0 ms to ~1100 ms. With it, they hear natural speech start at ~600 ms — feels conversational instead of robotic.

Production checklist

  • Stream everything. Pulse WebSocket for STT, Electron stream: true for LLM, Lightning streaming for TTS. Any non-streaming hop adds hundreds of milliseconds.
  • stream_options.include_usage: true on Electron so you bill accurately on disconnects.
  • Run tool calls in parallel with TTS — never serialize the filler-then-tool path.
  • Capture X-Request-Id from every Electron response for support traceability.
  • Match Pulse language, Electron prompts, and Lightning voice language. A Hindi caller should be transcribed in Hindi, prompted in Hindi, and synthesized with a Hindi voice. Don’t translate in the middle of the pipeline — Electron and Lightning both handle Indic natively.
  • Cap conversation history so prompts don’t grow unbounded. Truncate older turns once you exceed your token budget; prefix caching keeps recent turns cheap.
  • Set a per-utterance timeout (~10 s on the full STT→LLM→TTS round). Voice users won’t wait longer; better to fall back to “Sorry, can you repeat?” than to hang silently.

Related

Pulse Realtime quickstart

Streaming STT setup.

Lightning Streaming

Streaming TTS setup.

Electron Tool Calling

Filler-phrase pattern in depth.

Atoms — managed voice agents

Skip the wiring and use the platform.