Electron | Smallest AI Docs

Latest Release

Electron is Smallest AI’s in-house language model, optimized for voice agents and built as a drop-in replacement for the OpenAI Chat Completions API. Production-grade quality and sub-300 ms time-to-first-token, built for high-volume workloads.

Jump to: Benchmarks · Supported Languages · API Reference · Pricing & Throughput · Quickstart

< 300 ms TTFT

Time-to-first-token tuned for real-time UX.

32K context

Combined input + output context.

70 Languages

First-class Indic support.

OpenAI-Compatible

Drop-in replacement for /v1/chat/completions.

Model Overview


Developed by	Smallest AI
Model type	Causal language model — chat completions
API surface	OpenAI-compatible (`POST /waves/v1/chat/completions`)
Model ID (request body)	`"electron"`
Context window	32,768 tokens (combined input + output)
License	Proprietary, hosted API

Key Capabilities

OpenAI Wire Format

Same request/response shape as OpenAI Chat Completions. Use the official OpenAI SDKs by swapping base_url and api_key.

Streaming

Standard Server-Sent Events. Optional final usage chunk for accurate billing on client disconnect.

Tool / Function Calling

Standard OpenAI tools API, with voice-agent-optimized filler-phrase behavior before tool calls.

Prefix Caching

Automatic discount on cached input tokens. No flag needed.

70 Languages

Wide multilingual coverage, with particularly strong Indic-language performance.

JSON Mode

response_format: {type: "json_object"} for structured output.

How to use it

See the Electron quickstart for a working end-to-end example, including authentication, request shape, and streaming response handling. Electron is OpenAI-wire-compatible — swap base_url to https://api.smallest.ai/waves/v1 on the official OpenAI SDK, pass your SMALLEST_API_KEY as api_key, and set "model": "electron" on the request body.

Performance & Benchmarks

Metric	Value
Time to first token (TTFT)	< 300 ms (typical, warm connection)
End-to-end roundtrip overhead vs direct model	~20 ms with persistent HTTPS connection
Quality tier	Competitive with leading voice-agent LLMs on internal evaluations

Electron is trained for voice-agent workloads — instruction following on system prompts, conversational style, and holding long multi-turn dialogues without drift. We benchmark it internally against frontier alternatives on these tasks. General-purpose academic benchmarks like MMLU and IFEval target a different objective and are not the right yardstick for a model whose job is to drive a phone call.

Supported Languages

Electron is multilingual with strong out-of-the-box quality across 70 languages, with particularly strong performance on Indic languages including lower-resource ones.

Electron auto-detects the input language — there is no language parameter on the Chat Completions API. The ISO 639-1 codes below are for reference only (e.g., when tagging conversations downstream or routing across services); they are not passed to the model.

Region	Count	Languages
Western Europe	8	English, Spanish, French, German, Italian, Portuguese, Dutch, Catalan
Indic	11	Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu
Central / Eastern Europe	14	Polish, Russian, Ukrainian, Belarusian, Czech, Slovak, Romanian, Hungarian, Bulgarian, Croatian, Serbian, Slovenian, Macedonian, Albanian
Baltic	3	Estonian, Latvian, Lithuanian
Nordic	5	Swedish, Norwegian, Danish, Finnish, Icelandic
Other Europe	2	Greek, Turkish
Middle East	4	Arabic, Hebrew, Persian (Farsi), Kurdish
East Asia	5	Chinese (Simplified), Chinese (Traditional), Japanese, Korean, Mongolian
Southeast Asia	8	Vietnamese, Thai, Indonesian, Malay, Filipino, Burmese, Khmer, Lao
South Asia	2	Nepali, Sinhala
Central Asia	2	Kazakh, Uzbek
Africa	6	Swahili, Amharic, Afrikaans, Yoruba, Hausa, Zulu

Western Europe (8)

Language	ISO 639-1
English	`en`
Spanish	`es`
French	`fr`
German	`de`
Italian	`it`
Portuguese	`pt`
Dutch	`nl`
Catalan	`ca`

Indic (11)

Language	ISO 639-1
Hindi	`hi`
Bengali	`bn`
Tamil	`ta`
Telugu	`te`
Marathi	`mr`
Gujarati	`gu`
Kannada	`kn`
Malayalam	`ml`
Punjabi	`pa`
Odia	`or`
Urdu	`ur`

Central / Eastern Europe (14)

Language	ISO 639-1
Polish	`pl`
Russian	`ru`
Ukrainian	`uk`
Belarusian	`be`
Czech	`cs`
Slovak	`sk`
Romanian	`ro`
Hungarian	`hu`
Bulgarian	`bg`
Croatian	`hr`
Serbian	`sr`
Slovenian	`sl`
Macedonian	`mk`
Albanian	`sq`

Baltic (3)

Language	ISO 639-1
Estonian	`et`
Latvian	`lv`
Lithuanian	`lt`

Nordic (5)

Language	ISO 639-1
Swedish	`sv`
Norwegian	`no`
Danish	`da`
Finnish	`fi`
Icelandic	`is`

Other Europe (2)

Language	ISO 639-1
Greek	`el`
Turkish	`tr`

Middle East (4)

Language	ISO 639-1
Arabic	`ar`
Hebrew	`he`
Persian (Farsi)	`fa`
Kurdish	`ku`

East Asia (5)

Language	ISO 639-1
Chinese (Simplified)	`zh-CN`
Chinese (Traditional)	`zh-TW`
Japanese	`ja`
Korean	`ko`
Mongolian	`mn`

Southeast Asia (8)

Language	ISO 639-1
Vietnamese	`vi`
Thai	`th`
Indonesian	`id`
Malay	`ms`
Filipino	`tl`
Burmese	`my`
Khmer	`km`
Lao	`lo`

South Asia (2)

Language	ISO 639-1
Nepali	`ne`
Sinhala	`si`

Central Asia (2)

Language	ISO 639-1
Kazakh	`kk`
Uzbek	`uz`

Africa (6)

Language	ISO 639-1
Swahili	`sw`
Amharic	`am`
Afrikaans	`af`
Yoruba	`yo`
Hausa	`ha`
Zulu	`zu`

API Reference

Endpoint	Method	Use case
`https://api.smallest.ai/waves/v1/chat/completions`	POST	Chat completions (sync + SSE streaming)

See Electron — Chat Completions for the full request/response schema, supported parameters, and error codes. The Chat Completions guide covers the OpenAI-compatible wire format end-to-end, and Supported Parameters lists the passthrough table.

Throughput, Latency & Pricing

Metric	Typical	Notes
Time-to-first-token (TTFT)	< 300 ms	Warm connection. Sub-300 ms tuned for voice-agent UX.
Roundtrip overhead vs direct model	~20 ms	With a persistent HTTPS connection.

Plan	Requests per minute	Concurrent in-flight requests
Standard	10	3
Enterprise	200	20

Both limits enforce strictly — over either cap returns HTTP 429. See Concurrency & Limits for full rate-limit semantics.

Pricing: Contact your Smallest AI account manager. Prefix-cache discounts apply automatically — see Prefix Caching. Every response reports usage.prompt_tokens_details.cached_tokens so you can audit cache hit rates.

Best Practices

Reuse HTTPS connections. Cold connections cost a TLS handshake on every request — voice-agent workloads should pool a single keep-alive connection per worker.
Stream when you can. Set "stream": true and start your TTS engine on the first delta.content chunk to mask end-to-end latency. See Streaming.
Put repeated context at the prompt prefix. System prompts, RAG context, and conversation history live in the cached prefix automatically. See Prefix Caching.
For voice agents, prompt for a filler phrase before tool calls. Electron emits the filler in content alongside tool_calls, so your TTS can speak it while the tool runs. See Tool Calling.
Use seed for best-effort determinism in eval pipelines and regression tests.

Technical Specifications

Specification	Details
Context window	32,768 tokens (combined input + output)
Model ID	`"electron"` (request body)
Wire format	OpenAI Chat Completions (v1)
Modalities	Text in / text out only (no vision, no audio)
Auth	`Authorization: Bearer $SMALLEST_API_KEY`
Error envelope	`{"error": {"message", "type", "details", "request_id"}}` — `details: [{code, message, path}]` on validation failures

Feature support

Capability	Status
Chat completions (text in / text out)	✅
Streaming (SSE)	✅
Tool / function calling	✅
Parallel tool calls	✅
Voice-agent filler-phrase before tool calls	✅ Electron-specific
JSON object mode (`response_format`)	✅
Prefix caching	✅ Automatic
System messages	✅
Multi-turn conversation	✅
`seed` for best-effort determinism	✅
Multilingual generation (70 languages)	✅

Known limitations

No vision / no audio in or out. Electron is text-only on the public API.
n > 1 not supported. Each request returns exactly one completion. Make multiple requests if you need multiple completions.
prompt_logprobs not supported.
Context cap of 32,768 tokens combined input + output. Inputs that exceed this are rejected with a clean 400.

Use Cases

Direct Use

Voice agents and conversational AI (phone, in-app, kiosk)
Drop-in OpenAI replacement for chat-completion workloads
Multilingual chatbots with first-class Indic-language coverage
RAG-style assistants over private knowledge bases (prefix-cache friendly)

Downstream Use

Multi-turn conversational agents
Voice-pipeline LLM stage (paired with Pulse STT + Lightning TTS)
JSON-structured output generation for downstream parsing

Safety & Compliance

Electron is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Electron does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.

For voice-agent applications handling regulated content (financial, healthcare), use the standard pattern: keep PII out of prompts where practical, apply post-processing redaction on outputs, and use Smallest AI’s Pulse PII redaction features on the transcription side.

For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai.

Support

Email: support@smallest.ai
Community: Discord
Documentation: docs.smallest.ai/waves
Console: app.smallest.ai/dashboard