Tool / Function Calling | Smallest AI Docs

Electron implements the standard OpenAI function-calling API. Define tools in the request, and the model returns tool_calls in the response when it decides to invoke one.

What makes Electron’s tool calling distinctive: with a voice-agent-style system prompt, the model speaks a short filler phrase before invoking a tool (e.g. “Let me check that for you…”), so a downstream TTS layer can mask tool-call latency by speaking the filler while the tool runs.

The filler is prompt-driven, not magic. With no system prompt, Electron returns the standard OpenAI shape (content: null when tool_calls is present). To get the filler reliably, give the model a system prompt that asks for it — see the voice-agent example below.

Basic usage

1 import json
2 import os
3 from openai import OpenAI
4 
5 client = OpenAI(
6     base_url="https://api.smallest.ai/waves/v1",
7     api_key=os.environ["SMALLEST_API_KEY"],
8 )
9 
10 tools = [
11     {
12         "type": "function",
13         "function": {
14             "name": "get_weather",
15             "description": "Get current weather for a city.",
16             "parameters": {
17                 "type": "object",
18                 "properties": {
19                     "city": {"type": "string", "description": "City name"},
20                 },
21                 "required": ["city"],
22             },
23         },
24     }
25 ]
26 
27 resp = client.chat.completions.create(
28     model="electron",
29     messages=[
30         {"role": "system", "content": "You are a friendly phone agent. Briefly acknowledge out loud before using any tool, like 'Let me check that' or 'One moment'."},
31         {"role": "user", "content": "What's the weather in Mumbai?"},
32     ],
33     tools=tools,
34 )
35 
36 msg = resp.choices[0].message
37 print("filler:", msg.content)         # e.g. "Let me check that for you!"
38 print("calls:", msg.tool_calls)       # list of tool calls

Response shape: the filler-phrase pattern

When Electron decides to call a tool, the assistant message returns both a conversational content filler and the structured tool_calls:

1 {
2   "role": "assistant",
3   "content": "Let me check that for you…",
4   "tool_calls": [
5     {
6       "id": "call_abc",
7       "type": "function",
8       "function": {
9         "name": "get_weather",
10         "arguments": "{\"city\": \"Mumbai\"}"
11       }
12     }
13   ]
14 }

In strict OpenAI shape, content is null when tool_calls is set. With a voice-agent-style system prompt, Electron instead emits a short natural-language sentence — designed so a downstream voice agent can speak it while the tool resolves in the background.

The filler-phrase pattern is prompt-driven: when your system prompt asks for it, Electron emits the filler reliably. Without that hint, you’ll get the standard content: null shape. Always handle content defensively — either a string or null.

finish_reason will be "tool_calls" on this turn.

Voice-agent pattern (the reason this exists)

For voice agents, tool calls add visible latency — the user hears silence while you call a database, hit a webhook, etc. The standard mitigation is to play a “thinking” sound or filler phrase yourself. Electron handles this for you naturally.

Recommended pipeline:

Stream the chat completion.
As soon as delta.content tokens arrive, send them to your TTS engine in parallel — don’t wait for the tool call to complete.
When you receive delta.tool_calls, kick off the actual tool execution in parallel with TTS.
Once the tool returns, append the tool role message and continue the conversation.

The user hears “Let me check the weather in Mumbai for you…” spoken naturally while your weather API call resolves in the background. End-to-end perceived latency drops by hundreds of milliseconds.

Multi-turn with tool results

After the model returns tool_calls, run the tools and add tool role messages to the conversation, then make a follow-up call:

1 # turn 1: model emits filler + tool_calls
2 resp1 = client.chat.completions.create(
3     model="electron",
4     messages=[{"role": "user", "content": "What's the weather in Mumbai?"}],
5     tools=tools,
6 )
7 msg1 = resp1.choices[0].message
8 
9 # execute the tool
10 call = msg1.tool_calls[0]
11 args = json.loads(call.function.arguments)
12 result = get_weather_impl(**args)   # your implementation
13 
14 # turn 2: feed tool result back
15 resp2 = client.chat.completions.create(
16     model="electron",
17     messages=[
18         {"role": "user", "content": "What's the weather in Mumbai?"},
19         {
20             "role": "assistant",
21             "content": msg1.content,
22             "tool_calls": [call.model_dump()],
23         },
24         {
25             "role": "tool",
26             "tool_call_id": call.id,
27             "content": json.dumps(result),
28         },
29     ],
30     tools=tools,
31 )
32 print(resp2.choices[0].message.content)
33 # "It's 31 °C and humid in Mumbai right now."

Chained tool calls

Electron can chain multiple tool calls within a conversation. The pattern interleaves:

filler → tool_call → tool_result → filler → tool_call → tool_result → final response

Each filler is short and natural (“Let me also check…”, “One moment…”). Handle each tool_calls turn the same way: execute, append tool message, call again.

Streaming with tool calls

When stream: true and tools are involved, Electron streams in this order:

Filler content first — delta.content chunks arrive as the filler is generated.
Tool calls next — delta.tool_calls chunks follow, building up the function name and arguments incrementally.

For chained calls, the pattern repeats: filler → tool_call → (your tool runs) → next filler → next tool_call → … → final response.

Tool-call deltas use the standard OpenAI streaming shape:

data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"city\":"}}]}}]}
data: {"choices":[{"index":0,"delta":{"tool_calls":[{"index":0,"function":{"arguments":"\"Mumbai\"}"}}]}}]}
data: {"choices":[{"index":0,"finish_reason":"tool_calls"}]}
data: [DONE]

Concatenate the arguments deltas to reconstruct the final argument JSON.

Limits


Max tools per request	64
Tool name length	Standard OpenAI naming rules apply
Parallel tool calls in one turn	Supported — multiple entries in `tool_calls`

Tips

Keep tool descriptions tight. They’re billed as input tokens on every turn. Aim for one sentence of intent + a clear list of parameters.
Use tool_choice: "required" if you want to force the model to call exactly one tool.
Use tool_choice: {"type":"function","function":{"name":"…"}} to force a specific tool.
Don’t pass empty tools: [] — omit the field entirely if no tools are available this turn.