Electron is Smallest AI’s in-house language model, optimized for voice agents. Built to be a drop-in replacement for the OpenAI Chat Completions API — same wire format, same SDKs — with production-grade quality, sub-300 ms time-to-first-token, and a price point built for high-volume workloads.
Drop-in replacement for /v1/chat/completions. Use the official OpenAI SDKs.
Time-to-first-token tuned for voice agents and real-time UX.
First-class Indic support — Hindi, Tamil, Bengali, Marathi, and more.
Prefix-cached input tokens billed at $0.10/M vs $0.40/M normal input.
Electron implements the standard OpenAI Chat Completions wire format. The official OpenAI SDKs (Python, JavaScript, Go, Java, etc.) work out of the box — change base_url to https://api.smallest.ai/waves/v1 and your existing code is portable. Most OpenAI request fields flow through verbatim: messages, temperature, top_p, max_tokens, tools, tool_choice, response_format, stream, stream_options, seed, stop, logit_bias, logprobs, and more.
Set "stream": true to receive tokens as they’re generated via Server-Sent Events (SSE). Include "stream_options": {"include_usage": true} to get a final usage chunk so you can bill exactly the tokens served, even if the client disconnects mid-stream. See Streaming.
Standard OpenAI tools array with tool_calls in the response. With a voice-agent-style system prompt, Electron also emits a short conversational filler in the content field alongside tool_calls, so voice agents can speak the filler while the tool runs in the background. Reduces perceived latency on actions like database lookups and webhook calls. See Tool Calling.
Repeated prompt prefixes (system prompts, RAG context, conversation history) are served from cache and billed at a 75% discount. usage.prompt_tokens_details.cached_tokens is returned on every response so you can audit the savings. See Prefix Caching.
Electron speaks 70 languages out of the box, with particularly strong fluency on Indian-state languages — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — and credible coverage of Nepali, Sinhala, and many lower-resource South Asian languages. See the Model Card for the full list.
Electron is designed to slot into a complete voice pipeline alongside Pulse for transcription and Lightning for speech synthesis. See the Voice Agent cookbook for an end-to-end example.
Customer support, sales, scheduling. Electron + Pulse + Lightning is a complete stack.
Prefix caching makes long retrieved-context prompts cheap and fast on repeat queries.
Customer chat in Indic languages, European languages, Arabic, East Asian languages — all from one endpoint.
Standard OpenAI tools API, with built-in voice-friendly filler-before-call behavior.
Drop-in replacement for chat/completions. Switch by changing two strings.
Lower per-token cost than frontier models with comparable quality on most tasks.
Cached pricing applies automatically when prefix caching hits — no flag required. See Prefix Caching for guidance on structuring prompts to maximize cache hits.
Contact sales for higher limits.
Make your first call in 60 seconds.
Full request/response reference.
SSE format, partial chunks, usage at end-of-stream.
Including the voice-agent filler-phrase pattern.
Side-by-side diff.
Capabilities, limits, supported languages, pricing.