For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Request
  • Authentication
  • Required fields
  • Common optional fields
  • Explicitly rejected fields
  • Response
  • Non-streaming (default)
  • usage block
  • finish_reason
  • Streaming
  • Errors
  • Examples
  • Minimal
  • Multi-turn
  • JSON object output
LLM (Electron)

Chat Completions

||View as Markdown|
Was this page helpful?
Previous

Overview

Next

Streaming

Built with

POST https://api.smallest.ai/waves/v1/chat/completions

OpenAI-compatible chat completion endpoint. Most OpenAI Chat Completions request fields are accepted and passed through to Electron verbatim. The response shape matches OpenAI’s chat.completion object.

Request

Authentication

1Authorization: Bearer $SMALLEST_API_KEY

Required fields

FieldTypeDescription
modelstringAlways "electron".
messagesarrayStandard OpenAI message array. Max 200 messages. Each message: {role, content, tool_calls?, tool_call_id?} where role is "system", "user", "assistant", or "tool".

Common optional fields

FieldTypeDefaultNotes
temperaturenumber ≥ 01.0Sampling temperature.
top_pnumber 0–11.0Nucleus sampling.
max_tokensintmodel maxCap on output tokens for this completion. The 32,768-token model ceiling is on combined input + output, so max_tokens is effectively 32,768 − prompt_tokens.
streamboolfalseWhen true, response is Server-Sent Events. See Streaming.
stream_options.include_usageboolfalseWhen true (with stream: true), final SSE chunk carries the usage block.
stopstring | array of strings—Up to 4 stop sequences.
seedintrandomBest-effort determinism.
response_formatobject—{"type":"text"} (default) or {"type":"json_object"}.
toolsarray—Standard OpenAI function-calling tools array. Max 64 entries. See Tool Calling.
tool_choicestring | object"auto""auto", "required", "none", or {"type":"function","function":{"name":"…"}}.
logit_biasobject—Per-token id biases.
logprobsboolfalseReturn log-probabilities.
top_logprobsint 0–20—Top-N log-probs (requires logprobs: true).
presence_penalty, frequency_penaltynumber −2 to 20Standard OpenAI knobs.

Unrecognized parameters not in the supported set are accepted by the schema but may be ignored by the model. See Supported Parameters for the full passthrough table.

Explicitly rejected fields

FieldWhy
n > 1Cost amplification with no use case in v1.
prompt_logprobsResponse-size amplification with no billing impact.

Sending either returns HTTP 400 with invalid_request_error.

Response

Non-streaming (default)

Standard OpenAI chat.completion object:

1{
2 "id": "chatcmpl-a670053f3ab2ed8f",
3 "object": "chat.completion",
4 "created": 1740000000,
5 "model": "electron",
6 "choices": [
7 {
8 "index": 0,
9 "message": {
10 "role": "assistant",
11 "content": "Hi! How can I help you today?",
12 "tool_calls": null
13 },
14 "finish_reason": "stop"
15 }
16 ],
17 "usage": {
18 "prompt_tokens": 20,
19 "completion_tokens": 9,
20 "total_tokens": 29,
21 "prompt_tokens_details": {
22 "cached_tokens": 0
23 }
24 }
25}

usage block

FieldDescription
prompt_tokensTotal input tokens (cached + uncached).
prompt_tokens_details.cached_tokensSubset of prompt_tokens served from prefix cache. Billed at the discounted rate. See Prefix Caching.
completion_tokensGenerated output tokens.
total_tokensprompt_tokens + completion_tokens.

finish_reason

ValueMeaning
"stop"Natural stop (end token or stop sequence hit).
"length"max_tokens reached.
"tool_calls"The model returned function calls; act on them and continue the conversation.
"content_filter"Content blocked.

Streaming

When stream: true, the response is text/event-stream SSE. See Streaming for chunk format and example consumer code.

Errors

All errors follow the OpenAI-style envelope. Validation failures additionally include a details array with per-field reasons:

1{
2 "error": {
3 "message": "Invalid input data",
4 "type": "invalid_request_error",
5 "details": [
6 { "code": "custom", "message": "n > 1 is not supported", "path": ["n"] }
7 ],
8 "request_id": "9db74058-2421-4793-8796-b2629ff3eb1c"
9 }
10}
FieldDescription
error.messageHuman-readable summary.
error.typeError class — e.g. invalid_request_error, authentication_error, rate_limit_error.
error.detailsPresent on schema-validation failures. Array of {code, message, path} entries naming the offending fields. Use the path array to highlight the bad field in your client.
error.request_idUnique trace ID. Always include in support tickets. Also returned as the X-Request-Id response header.
StatusWhen
400Bad request (validation, model rejected the body, context length exceeded).
401Missing or invalid API key.
403API key valid but no access to Electron on this plan.
429Rate limit (RPM) or concurrency cap hit. See Concurrency and Limits.
503Endpoint temporarily disabled, or upstream model overloaded.
502Upstream model unavailable. Retry after a short backoff.

Examples

Minimal

1import os
2from openai import OpenAI
3
4client = OpenAI(
5 base_url="https://api.smallest.ai/waves/v1",
6 api_key=os.environ["SMALLEST_API_KEY"],
7)
8
9resp = client.chat.completions.create(
10 model="electron",
11 messages=[
12 {"role": "system", "content": "You are a concise assistant."},
13 {"role": "user", "content": "Explain prefix caching in one sentence."},
14 ],
15)
16print(resp.choices[0].message.content)

Multi-turn

1messages = [
2 {"role": "system", "content": "You are a customer-support agent."},
3 {"role": "user", "content": "I want to change my plan."},
4 {"role": "assistant", "content": "Sure — which plan are you on right now?"},
5 {"role": "user", "content": "Standard, and I want to upgrade to Enterprise."},
6]
7
8resp = client.chat.completions.create(model="electron", messages=messages)

JSON object output

1resp = client.chat.completions.create(
2 model="electron",
3 messages=[
4 {"role": "system", "content": "Reply with strict JSON."},
5 {"role": "user", "content": "List three Indian state capitals as {\"capitals\": [...]}"},
6 ],
7 response_format={"type": "json_object"},
8 temperature=0,
9)