Overview

View as Markdown

Electron is Smallest AI’s in-house language model, optimized for voice agents. Built to be a drop-in replacement for the OpenAI Chat Completions API — same wire format, same SDKs — with production-grade quality and sub-300 ms time-to-first-token, built for high-volume workloads.

OpenAI-Compatible

Drop-in replacement for /v1/chat/completions. Use the official OpenAI SDKs.

< 300 ms TTFT

Time-to-first-token tuned for voice agents and real-time UX.

70 Languages

First-class Indic support — Hindi, Tamil, Bengali, Marathi, and more.

Prefix Caching

Repeated prompt prefixes (system prompts, RAG context) are served from cache automatically. No flag required.

Why Electron

One endpoint, one model fieldPOST /waves/v1/chat/completions with "model": "electron". Streaming, tool calls, and structured output all toggle via body flags — no separate URLs.
Voice-agent-nativePrompt the model to acknowledge before tool calls and it returns a short filler in content alongside tool_calls (e.g. “Let me check that for you…”), so your TTS layer can mask tool-call latency. See Tool Calling.
Multilingual, with Indic strength70 supported languages, with particularly strong performance on Indian-state languages — including lower-resource ones.
Cost-efficientBuilt for high-volume workloads. Automatic prefix-cache discount on repeated input. Contact your Smallest AI account manager for current rates.
Stack-friendlyPairs with Pulse (STT) and Lightning (TTS) to build a complete voice agent on Smallest AI’s stack.

Feature highlights

Electron implements the standard OpenAI Chat Completions wire format. The official OpenAI SDKs (Python, JavaScript, Go, Java, etc.) work out of the box — change base_url to https://api.smallest.ai/waves/v1 and your existing code is portable. Most OpenAI request fields flow through verbatim: messages, temperature, top_p, max_tokens, tools, tool_choice, response_format, stream, stream_options, seed, stop, logit_bias, logprobs, and more.

Set "stream": true to receive tokens as they’re generated via Server-Sent Events (SSE). Include "stream_options": {"include_usage": true} to get a final usage chunk so you can bill exactly the tokens served, even if the client disconnects mid-stream. See Streaming.

Standard OpenAI tools array with tool_calls in the response. With a voice-agent-style system prompt, Electron also emits a short conversational filler in the content field alongside tool_calls, so voice agents can speak the filler while the tool runs in the background. Reduces perceived latency on actions like database lookups and webhook calls. See Tool Calling.

Repeated prompt prefixes (system prompts, RAG context, conversation history) are served from cache automatically. usage.prompt_tokens_details.cached_tokens is returned on every response so you can audit cache hit rates. See Prefix Caching.

Electron speaks 70 languages out of the box, with particularly strong fluency on Indian-state languages — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — and credible coverage of Nepali, Sinhala, and many lower-resource South Asian languages. See the Model Card for the full list.

Electron is designed to slot into a complete voice pipeline alongside Pulse for transcription and Lightning for speech synthesis. See the Voice Agent cookbook for an end-to-end example.

Use cases

Voice agents

Customer support, sales, scheduling. Electron + Pulse + Lightning is a complete stack.

RAG applications

Prefix caching makes long retrieved-context prompts cheap and fast on repeat queries.

Multilingual chat

Customer chat in Indic languages, European languages, Arabic, East Asian languages — all from one endpoint.

Agent / tool-use workflows

Standard OpenAI tools API, with built-in voice-friendly filler-before-call behavior.

Migration from OpenAI

Drop-in replacement for chat/completions. Switch by changing two strings.

High-throughput batch jobs

Built for volume; comparable quality to frontier models on most voice-agent tasks.

Pricing

Contact your Smallest AI account manager for current pricing. Prefix-cache discounts apply automatically when cached prefixes hit — no flag required. See Prefix Caching for guidance on structuring prompts to maximize cache hits.

Plan limits

RPMConcurrent requests
Standard103
Enterprise20020

Contact sales for higher limits.

What’s next