For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Basic usage
  • Response shape: the filler-phrase pattern
  • Voice-agent pattern (the reason this exists)
  • Multi-turn with tool results
  • Chained tool calls
  • Streaming with tool calls
  • Limits
  • Tips
LLM (Electron)

Tool / Function Calling

||View as Markdown|
Was this page helpful?
Previous

Streaming

Next

Prefix Caching

Built with

Electron implements the standard OpenAI function-calling API. Define tools in the request, and the model returns tool_calls in the response when it decides to invoke one.

What makes Electron’s tool calling distinctive: with a voice-agent-style system prompt, the model speaks a short filler phrase before invoking a tool (e.g. “Let me check that for you…”), so a downstream TTS layer can mask tool-call latency by speaking the filler while the tool runs.

The filler is prompt-driven, not magic. With no system prompt, Electron returns the standard OpenAI shape (content: null when tool_calls is present). To get the filler reliably, give the model a system prompt that asks for it — see the voice-agent example below.

Basic usage

1import json
2import os
3from openai import OpenAI
4
5client = OpenAI(
6 base_url="https://api.smallest.ai/waves/v1",
7 api_key=os.environ["SMALLEST_API_KEY"],
8)
9
10tools = [
11 {
12 "type": "function",
13 "function": {
14 "name": "get_weather",
15 "description": "Get current weather for a city.",
16 "parameters": {
17 "type": "object",
18 "properties": {
19 "city": {"type": "string", "description": "City name"},
20 },
21 "required": ["city"],
22 },
23 },
24 }
25]
26
27resp = client.chat.completions.create(
28 model="electron",
29 messages=[
30 {"role": "system", "content": "You are a friendly phone agent. Briefly acknowledge out loud before using any tool, like 'Let me check that' or 'One moment'."},
31 {"role": "user", "content": "What's the weather in Mumbai?"},
32 ],
33 tools=tools,
34)
35
36msg = resp.choices[0].message
37print("filler:", msg.content) # e.g. "Let me check that for you!"
38print("calls:", msg.tool_calls) # list of tool calls

Response shape: the filler-phrase pattern

When Electron decides to call a tool, the assistant message returns both a conversational content filler and the structured tool_calls:

1{
2 "role": "assistant",
3 "content": "Let me check that for you…",
4 "tool_calls": [
5 {
6 "id": "call_abc",
7 "type": "function",
8 "function": {
9 "name": "get_weather",
10 "arguments": "{\"city\": \"Mumbai\"}"
11 }
12 }
13 ]
14}

In strict OpenAI shape, content is null when tool_calls is set. With a voice-agent-style system prompt, Electron instead emits a short natural-language sentence — designed so a downstream voice agent can speak it while the tool resolves in the background.

The filler-phrase pattern is prompt-driven: when your system prompt asks for it, Electron emits the filler reliably. Without that hint, you’ll get the standard content: null shape. Always handle content defensively — either a string or null.

finish_reason will be "tool_calls" on this turn.

Voice-agent pattern (the reason this exists)

For voice agents, tool calls add visible latency — the user hears silence while you call a database, hit a webhook, etc. The standard mitigation is to play a “thinking” sound or filler phrase yourself. Electron handles this for you naturally.

Recommended pipeline:

  1. Stream the chat completion.
  2. As soon as delta.content tokens arrive, send them to your TTS engine in parallel — don’t wait for the tool call to complete.
  3. When you receive delta.tool_calls, kick off the actual tool execution in parallel with TTS.
  4. Once the tool returns, append the tool role message and continue the conversation.

The user hears “Let me check the weather in Mumbai for you…” spoken naturally while your weather API call resolves in the background. End-to-end perceived latency drops by hundreds of milliseconds.

Multi-turn with tool results

After the model returns tool_calls, run the tools and add tool role messages to the conversation, then make a follow-up call:

1# turn 1: model emits filler + tool_calls
2resp1 = client.chat.completions.create(
3 model="electron",
4 messages=[{"role": "user", "content": "What's the weather in Mumbai?"}],
5 tools=tools,
6)
7msg1 = resp1.choices[0].message
8
9# execute the tool
10call = msg1.tool_calls[0]
11args = json.loads(call.function.arguments)
12result = get_weather_impl(**args) # your implementation
13
14# turn 2: feed tool result back
15resp2 = client.chat.completions.create(
16 model="electron",
17 messages=[
18 {"role": "user", "content": "What's the weather in Mumbai?"},
19 {
20 "role": "assistant",
21 "content": msg1.content,
22 "tool_calls": [call.model_dump()],
23 },
24 {
25 "role": "tool",
26 "tool_call_id": call.id,
27 "content": json.dumps(result),
28 },
29 ],
30 tools=tools,
31)
32print(resp2.choices[0].message.content)
33# "It's 31 °C and humid in Mumbai right now."

Chained tool calls

Electron can chain multiple tool calls within a conversation. The pattern interleaves:

filler → tool_call → tool_result → filler → tool_call → tool_result → final response

Each filler is short and natural (“Let me also check…”, “One moment…”). Handle each tool_calls turn the same way: execute, append tool message, call again.

Streaming with tool calls

When stream: true and tools are involved, Electron streams in this order:

  1. Filler content first — delta.content chunks arrive as the filler is generated.
  2. Tool calls next — delta.tool_calls chunks follow, building up the function name and arguments incrementally.

For chained calls, the pattern repeats: filler → tool_call → (your tool runs) → next filler → next tool_call → … → final response.

Tool-call deltas use the standard OpenAI streaming shape:

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":"}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"Mumbai\"}"}}]}}]}
data: {"choices":[{"index":0,"finish_reason":"tool_calls"}]}
data: [DONE]

Concatenate the arguments deltas to reconstruct the final argument JSON.

Limits

Max tools per request64
Tool name lengthStandard OpenAI naming rules apply
Parallel tool calls in one turnSupported — multiple entries in tool_calls

Tips

  • Keep tool descriptions tight. They’re billed as input tokens on every turn. Aim for one sentence of intent + a clear list of parameters.
  • Use tool_choice: "required" if you want to force the model to call exactly one tool.
  • Use tool_choice: {"type":"function","function":{"name":"…"}} to force a specific tool.
  • Don’t pass empty tools: [] — omit the field entirely if no tools are available this turn.