Electron
Electron is Smallest AI’s in-house language model, optimized for voice agents and built as a drop-in replacement for the OpenAI Chat Completions API. Production-grade quality and sub-300 ms time-to-first-token, built for high-volume workloads.
Time-to-first-token tuned for real-time UX.
Combined input + output context.
First-class Indic support.
Drop-in replacement for /v1/chat/completions.
Model Overview
Key capabilities
Same request/response shape as OpenAI Chat Completions. Use the official OpenAI SDKs by swapping base_url and api_key.
Standard Server-Sent Events. Optional final usage chunk for accurate billing on client disconnect.
Standard OpenAI tools API, with voice-agent-optimized filler-phrase behavior before tool calls.
Automatic discount on cached input tokens. No flag needed.
Wide multilingual coverage, with particularly strong Indic-language performance.
response_format: {type: "json_object"} for structured output.
Performance
Electron is trained for voice-agent workloads — instruction following on system prompts, conversational style, and holding long multi-turn dialogues without drift. We benchmark it internally against frontier alternatives on these tasks. General-purpose academic benchmarks like MMLU and IFEval target a different objective and are not the right yardstick for a model whose job is to drive a phone call.
Pricing
Contact your Smallest AI account manager for current pricing. Prefix-cache discounts apply automatically — see Prefix Caching. Every response reports usage.prompt_tokens_details.cached_tokens so you can audit cache hit rates.
Rate limits
Both limits enforce strictly — over either cap returns HTTP 429. See Concurrency and Limits.
Supported languages
Electron is multilingual with strong out-of-the-box quality across the following 70 languages. Particularly strong on Indic languages including lower-resource ones.
Western Europe (8)
English · Spanish · French · German · Italian · Portuguese · Dutch · Catalan
Indic (11)
Hindi · Bengali · Tamil · Telugu · Marathi · Gujarati · Kannada · Malayalam · Punjabi · Odia · Urdu
Central / Eastern Europe (14)
Polish · Russian · Ukrainian · Belarusian · Czech · Slovak · Romanian · Hungarian · Bulgarian · Croatian · Serbian · Slovenian · Macedonian · Albanian
Baltic (3)
Estonian · Latvian · Lithuanian
Nordic (5)
Swedish · Norwegian · Danish · Finnish · Icelandic
Other Europe (2)
Greek · Turkish
Middle East (4)
Arabic · Hebrew · Persian (Farsi) · Kurdish
East Asia (5)
Chinese (Simplified) · Chinese (Traditional) · Japanese · Korean · Mongolian
Southeast Asia (8)
Vietnamese · Thai · Indonesian · Malay · Filipino · Burmese · Khmer · Lao
South Asia (2)
Nepali · Sinhala
Central Asia (2)
Kazakh · Uzbek
Africa (6)
Swahili · Amharic · Afrikaans · Yoruba · Hausa · Zulu
Capabilities
Known limitations
- No vision / no audio in or out. Electron is text-only on the public API.
n > 1not supported. Each request returns exactly one completion. Make multiple requests if you need multiple completions.prompt_logprobsnot supported.- Context cap of 32,768 tokens combined input + output. Inputs that exceed this are rejected with a clean
400.
API surface
See Chat Completions for full request/response reference and Supported Parameters for the passthrough table.
Safety & responsible use
Electron is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Electron does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.
For voice-agent applications handling regulated content (financial, healthcare), use the standard pattern: keep PII out of prompts where practical, apply post-processing redaction on outputs, and use Smallest AI’s Pulse PII redaction features on the transcription side.

