For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • API References
    • Authentication
    • Concurrency and Limits
    • WebSocket
  • Text to Speech
    • POSTSynthesize Speech
    • STREAMStream Speech (SSE)
    • WSSStream Speech (WebSocket)
    • POSTLightning v3.1 (endpoint will be deprecated)
    • POSTLightning v3.1 SSE (endpoint will be deprecated)
    • WSSLightning v3.1 WebSocket (endpoint will be deprecated)
    • POSTLightning v2 (Deprecated)
    • POSTLightning v2 SSE (Deprecated)
    • WSSLightning v2 WebSocket (Deprecated)
    • GETGet Voices
    • POSTCreate a Voice Clone
    • GETList Voice Clones
    • DELDelete a Voice Clone
    • POSTAdd Voice (Deprecated)
    • GETGet Cloned Voices (Deprecated)
    • GETGet Pronunciation Dictionaries
    • POSTCreate Pronunciation Dictionary
    • PUTUpdate Pronunciation Dictionary
    • DELDelete Pronunciation Dictionary
  • Speech to Text
    • POSTPulse (Pre-Recorded)
    • WSSPulse (Realtime)
  • LLM (Chat Completions)
    • POSTElectron — Chat Completions
LogoLogo
Voice AgentsModels
Voice AgentsModels
LLM (Chat Completions)

Electron — Chat Completions

||View as Markdown|
POST
https://api.smallest.ai/waves/v1/chat/completions
POST
/waves/v1/chat/completions
1import requests
2
3url = "https://api.smallest.ai/waves/v1/chat/completions"
4
5payload = {
6 "model": "electron",
7 "messages": [
8 {
9 "role": "user",
10 "content": "Tell me a one-sentence fun fact."
11 }
12 ],
13 "stream": True,
14 "stream_options": { "include_usage": True }
15}
16headers = {
17 "Authorization": "Bearer <BearerAuth>",
18 "Content-Type": "application/json"
19}
20
21response = requests.post(url, json=payload, headers=headers)
22
23print(response.json())
1{
2 "id": "chatcmpl-7aX9b3Q2vL9Yz8F1eJkPqR4TnM5oXc",
3 "object": "chat.completion",
4 "created": 1712345678,
5 "model": "electron",
6 "choices": [
7 {
8 "index": 0,
9 "message": {
10 "role": "assistant",
11 "content": "Here's a fun fact: Honey never spoils and can last thousands of years!",
12 "tool_calls": [
13 {
14 "id": "toolcall-12345",
15 "type": "function",
16 "function": {
17 "name": "fetch_fun_fact",
18 "arguments": "{\"category\": \"general\"}"
19 }
20 }
21 ],
22 "tool_call_id": "toolcall-12345"
23 },
24 "finish_reason": "stop"
25 }
26 ],
27 "usage": {
28 "prompt_tokens": 8,
29 "completion_tokens": 18,
30 "total_tokens": 26,
31 "prompt_tokens_details": {
32 "cached_tokens": 5
33 }
34 }
35}
Generate a chat completion with Electron. OpenAI-compatible request/response shape — point any OpenAI SDK at `https://api.smallest.ai/waves/v1` and it just works. Set `stream: true` to receive tokens via Server-Sent Events. With `stream_options: { include_usage: true }`, the final SSE chunk carries the `usage` block so token accounting is exact even on client disconnects. Tool calling follows OpenAI's `tools` array convention. When you provide a voice-agent-style system prompt, Electron emits a short filler phrase in the assistant message `content` field alongside `tool_calls` — see the [Tool Calling guide](/waves/documentation/llm-electron/tool-calling) for the voice-agent pattern. ## Examples **cURL** ```bash curl -X POST "https://api.smallest.ai/waves/v1/chat/completions" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "electron", "messages": [ {"role": "user", "content": "Write one sentence about why the sky is blue."} ] }' ``` **Python** (`pip install openai`) ```python import os from openai import OpenAI client = OpenAI( base_url="https://api.smallest.ai/waves/v1", api_key=os.environ["SMALLEST_API_KEY"], ) response = client.chat.completions.create( model="electron", messages=[ {"role": "user", "content": "Write one sentence about why the sky is blue."} ], ) print(response.choices[0].message.content) ``` **JavaScript / TypeScript** (`npm install openai`) ```typescript import OpenAI from "openai"; const client = new OpenAI({ baseURL: "https://api.smallest.ai/waves/v1", apiKey: process.env.SMALLEST_API_KEY, }); const response = await client.chat.completions.create({ model: "electron", messages: [ { role: "user", content: "Write one sentence about why the sky is blue." }, ], }); console.log(response.choices[0].message.content); ``` **Streaming with usage** (Python) ```python stream = client.chat.completions.create( model="electron", messages=[{"role": "user", "content": "Tell me a one-sentence fun fact."}], stream=True, stream_options={"include_usage": True}, ) for chunk in stream: if chunk.choices and chunk.choices[0].delta.content: print(chunk.choices[0].delta.content, end="", flush=True) if chunk.usage: print(f"\n\nTokens: {chunk.usage.total_tokens}") ``` ## Common gotchas - **Base URL is `/waves/v1`**, not `/v1`. The OpenAI SDK appends `/chat/completions` for you. - **`stream_options.include_usage: true`** is required for exact token accounting on streaming calls — the final SSE chunk carries the `usage` block. - **`n > 1` and `prompt_logprobs` are rejected.** Use multiple requests if you need parallel completions. - **Auth header is `Authorization: Bearer $SMALLEST_API_KEY`** — get the key from the [Smallest AI Console](https://app.smallest.ai/dashboard/api-keys).
Was this page helpful?
Previous

Pulse (Realtime)

Built with

Generate a chat completion with Electron. OpenAI-compatible request/response shape — point any OpenAI SDK at https://api.smallest.ai/waves/v1 and it just works.

Set stream: true to receive tokens via Server-Sent Events. With stream_options: { include_usage: true }, the final SSE chunk carries the usage block so token accounting is exact even on client disconnects.

Tool calling follows OpenAI’s tools array convention. When you provide a voice-agent-style system prompt, Electron emits a short filler phrase in the assistant message content field alongside tool_calls — see the Tool Calling guide for the voice-agent pattern.

Examples

cURL

$curl -X POST "https://api.smallest.ai/waves/v1/chat/completions" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{
> "model": "electron",
> "messages": [
> {"role": "user", "content": "Write one sentence about why the sky is blue."}
> ]
> }'

Python (pip install openai)

1import os
2from openai import OpenAI
3
4client = OpenAI(
5 base_url="https://api.smallest.ai/waves/v1",
6 api_key=os.environ["SMALLEST_API_KEY"],
7)
8
9response = client.chat.completions.create(
10 model="electron",
11 messages=[
12 {"role": "user", "content": "Write one sentence about why the sky is blue."}
13 ],
14)
15
16print(response.choices[0].message.content)

JavaScript / TypeScript (npm install openai)

1import OpenAI from "openai";
2
3const client = new OpenAI({
4 baseURL: "https://api.smallest.ai/waves/v1",
5 apiKey: process.env.SMALLEST_API_KEY,
6});
7
8const response = await client.chat.completions.create({
9 model: "electron",
10 messages: [
11 { role: "user", content: "Write one sentence about why the sky is blue." },
12 ],
13});
14
15console.log(response.choices[0].message.content);

Streaming with usage (Python)

1stream = client.chat.completions.create(
2 model="electron",
3 messages=[{"role": "user", "content": "Tell me a one-sentence fun fact."}],
4 stream=True,
5 stream_options={"include_usage": True},
6)
7for chunk in stream:
8 if chunk.choices and chunk.choices[0].delta.content:
9 print(chunk.choices[0].delta.content, end="", flush=True)
10 if chunk.usage:
11 print(f"\n\nTokens: {chunk.usage.total_tokens}")

Common gotchas

  • Base URL is /waves/v1, not /v1. The OpenAI SDK appends /chat/completions for you.
  • stream_options.include_usage: true is required for exact token accounting on streaming calls — the final SSE chunk carries the usage block.
  • n > 1 and prompt_logprobs are rejected. Use multiple requests if you need parallel completions.
  • Auth header is Authorization: Bearer $SMALLEST_API_KEY — get the key from the Smallest AI Console.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Request

This endpoint expects an object.
modelstringRequired

Model ID. Currently only "electron".

messageslist of objectsRequired
Chat history. Standard OpenAI message array.
temperaturedoubleOptional0-2Defaults to 1
Sampling temperature.
top_pdoubleOptional0-1Defaults to 1
Nucleus sampling.
max_tokensintegerOptional>=1

Maximum output tokens. Combined input + output context ceiling is 32,768.

streambooleanOptionalDefaults to false

When true, response is text/event-stream. See the Streaming guide.

stream_optionsobjectOptional
toolslist of objectsOptional

Tool / function calling definitions. Standard OpenAI shape. See Tool Calling.

tool_choiceenum or objectOptional
response_formatobjectOptional

Output shape. {type: "text"} (default) or {type: "json_object"}.

stopstring or list of stringsOptional
seedintegerOptional

Best-effort determinism.

logit_biasmap from strings to doublesOptional
logprobsbooleanOptionalDefaults to false
top_logprobsintegerOptional0-20
presence_penaltydoubleOptional-2-2Defaults to 0
frequency_penaltydoubleOptional-2-2Defaults to 0
userstringOptional

Opaque end-user identifier. Not interpreted by Electron.

Response headers

X-Request-Idstring
Unique request identifier. Include in support tickets.

Response

Non-streaming: standard OpenAI chat.completion object.

Streaming (stream: true): text/event-stream SSE — each event is a chat.completion.chunk delta, terminated by data: [DONE].

idstring
objectenum
Allowed values:
createdinteger
modelstring
choiceslist of objects
usageobject

Errors

400
Bad Request Error
401
Unauthorized Error
403
Forbidden Error
429
Too Many Requests Error
502
Bad Gateway Error
503
Service Unavailable Error