Chat Completions | Smallest AI Docs

POST https://api.smallest.ai/waves/v1/chat/completions

OpenAI-compatible chat completion endpoint. Most OpenAI Chat Completions request fields are accepted and passed through to Electron verbatim. The response shape matches OpenAI’s chat.completion object.

Request

Authentication

1 Authorization: Bearer $SMALLEST_API_KEY

Required fields

Field	Type	Description
`model`	string	Always `"electron"`.
`messages`	array	Standard OpenAI message array. Max 200 messages. Each message: `{role, content, tool_calls?, tool_call_id?}` where `role` is `"system"`, `"user"`, `"assistant"`, or `"tool"`.

Common optional fields

Field	Type	Default	Notes
`temperature`	number ≥ 0	`1.0`	Sampling temperature.
`top_p`	number 0–1	`1.0`	Nucleus sampling.
`max_tokens`	int	model max	Cap on output tokens for this completion. The 32,768-token model ceiling is on combined input + output, so `max_tokens` is effectively `32,768 − prompt_tokens`.
`stream`	bool	`false`	When `true`, response is Server-Sent Events. See Streaming.
`stream_options.include_usage`	bool	`false`	When `true` (with `stream: true`), final SSE chunk carries the `usage` block.
`stop`	string \| array of strings	—	Up to 4 stop sequences.
`seed`	int	random	Best-effort determinism.
`response_format`	object	—	`{"type":"text"}` (default) or `{"type":"json_object"}`.
`tools`	array	—	Standard OpenAI function-calling tools array. Max 64 entries. See Tool Calling.
`tool_choice`	string \| object	`"auto"`	`"auto"`, `"required"`, `"none"`, or `{"type":"function","function":{"name":"…"}}`.
`logit_bias`	object	—	Per-token id biases.
`logprobs`	bool	`false`	Return log-probabilities.
`top_logprobs`	int 0–20	—	Top-N log-probs (requires `logprobs: true`).
`presence_penalty`, `frequency_penalty`	number −2 to 2	`0`	Standard OpenAI knobs.

Unrecognized parameters not in the supported set are accepted by the schema but may be ignored by the model. See Supported Parameters for the full passthrough table.

Explicitly rejected fields

Field	Why
`n > 1`	Cost amplification with no use case in v1.
`prompt_logprobs`	Response-size amplification with no billing impact.

Sending either returns HTTP 400 with invalid_request_error.

Response

Non-streaming (default)

Standard OpenAI chat.completion object:

1 {
2   "id": "chatcmpl-a670053f3ab2ed8f",
3   "object": "chat.completion",
4   "created": 1740000000,
5   "model": "electron",
6   "choices": [
7     {
8       "index": 0,
9       "message": {
10         "role": "assistant",
11         "content": "Hi! How can I help you today?",
12         "tool_calls": null
13       },
14       "finish_reason": "stop"
15     }
16   ],
17   "usage": {
18     "prompt_tokens": 20,
19     "completion_tokens": 9,
20     "total_tokens": 29,
21     "prompt_tokens_details": {
22       "cached_tokens": 0
23     }
24   }
25 }

`usage` block

Field	Description
`prompt_tokens`	Total input tokens (cached + uncached).
`prompt_tokens_details.cached_tokens`	Subset of `prompt_tokens` served from prefix cache. Billed at the discounted rate. See Prefix Caching.
`completion_tokens`	Generated output tokens.
`total_tokens`	`prompt_tokens + completion_tokens`.

`finish_reason`

Value	Meaning
`"stop"`	Natural stop (end token or `stop` sequence hit).
`"length"`	`max_tokens` reached.
`"tool_calls"`	The model returned function calls; act on them and continue the conversation.
`"content_filter"`	Content blocked.

Streaming

When stream: true, the response is text/event-stream SSE. See Streaming for chunk format and example consumer code.

Errors

All errors follow the OpenAI-style envelope. Validation failures additionally include a details array with per-field reasons:

1 {
2   "error": {
3     "message": "Invalid input data",
4     "type": "invalid_request_error",
5     "details": [
6       { "code": "custom", "message": "n > 1 is not supported", "path": ["n"] }
7     ],
8     "request_id": "9db74058-2421-4793-8796-b2629ff3eb1c"
9   }
10 }

Field	Description
`error.message`	Human-readable summary.
`error.type`	Error class — e.g. `invalid_request_error`, `authentication_error`, `rate_limit_error`.
`error.details`	Present on schema-validation failures. Array of `{code, message, path}` entries naming the offending fields. Use the `path` array to highlight the bad field in your client.
`error.request_id`	Unique trace ID. Always include in support tickets. Also returned as the `X-Request-Id` response header.

Status	When
400	Bad request (validation, model rejected the body, context length exceeded).
401	Missing or invalid API key.
403	API key valid but no access to Electron on this plan.
429	Rate limit (RPM) or concurrency cap hit. See Concurrency and Limits.
503	Endpoint temporarily disabled, or upstream model overloaded.
502	Upstream model unavailable. Retry after a short backoff.

Examples

Minimal

1 import os
2 from openai import OpenAI
3 
4 client = OpenAI(
5     base_url="https://api.smallest.ai/waves/v1",
6     api_key=os.environ["SMALLEST_API_KEY"],
7 )
8 
9 resp = client.chat.completions.create(
10     model="electron",
11     messages=[
12         {"role": "system", "content": "You are a concise assistant."},
13         {"role": "user", "content": "Explain prefix caching in one sentence."},
14     ],
15 )
16 print(resp.choices[0].message.content)

Multi-turn

1 messages = [
2     {"role": "system", "content": "You are a customer-support agent."},
3     {"role": "user", "content": "I want to change my plan."},
4     {"role": "assistant", "content": "Sure — which plan are you on right now?"},
5     {"role": "user", "content": "Standard, and I want to upgrade to Enterprise."},
6 ]
7 
8 resp = client.chat.completions.create(model="electron", messages=messages)

JSON object output

1 resp = client.chat.completions.create(
2     model="electron",
3     messages=[
4         {"role": "system", "content": "Reply with strict JSON."},
5         {"role": "user", "content": "List three Indian state capitals as {\"capitals\": [...]}"},
6     ],
7     response_format={"type": "json_object"},
8     temperature=0,
9 )