Overview | Smallest AI Docs

Electron is Smallest AI’s in-house language model, optimized for voice agents. Built to be a drop-in replacement for the OpenAI Chat Completions API — same wire format, same SDKs — with production-grade quality and sub-300 ms time-to-first-token, built for high-volume workloads.

OpenAI-Compatible

Drop-in replacement for /v1/chat/completions. Use the official OpenAI SDKs.

< 300 ms TTFT

Time-to-first-token tuned for voice agents and real-time UX.

70 Languages

First-class Indic support — Hindi, Tamil, Bengali, Marathi, and more.

Prefix Caching

Repeated prompt prefixes (system prompts, RAG context) are served from cache automatically. No flag required.

Quickstart

Get started in 60 seconds. Point the OpenAI SDK at Smallest AI and make your first call.

Why Electron


One endpoint, one model field	`POST /waves/v1/chat/completions` with `"model": "electron"`. Streaming, tool calls, and structured output all toggle via body flags — no separate URLs.
Voice-agent-native	Prompt the model to acknowledge before tool calls and it returns a short filler in `content` alongside `tool_calls` (e.g. “Let me check that for you…”), so your TTS layer can mask tool-call latency. See Tool Calling.
Multilingual, with Indic strength	70 supported languages, with particularly strong performance on Indian-state languages — including lower-resource ones.
Cost-efficient	Built for high-volume workloads. Automatic prefix-cache discount on repeated input. Contact your Smallest AI account manager for current rates.
Stack-friendly	Pairs with Pulse (STT) and Lightning (TTS) to build a complete voice agent on Smallest AI’s stack.

Feature highlights

OpenAI-compatible API

Electron implements the standard OpenAI Chat Completions wire format. The official OpenAI SDKs (Python, JavaScript, Go, Java, etc.) work out of the box — change base_url to https://api.smallest.ai/waves/v1 and your existing code is portable. Most OpenAI request fields flow through verbatim: messages, temperature, top_p, max_tokens, tools, tool_choice, response_format, stream, stream_options, seed, stop, logit_bias, logprobs, and more.

Streaming with token usage

Set "stream": true to receive tokens as they’re generated via Server-Sent Events (SSE). Include "stream_options": {"include_usage": true} to get a final usage chunk so you can bill exactly the tokens served, even if the client disconnects mid-stream. See Streaming.

Tool / function calling — voice-agent optimized

Standard OpenAI tools array with tool_calls in the response. With a voice-agent-style system prompt, Electron also emits a short conversational filler in the content field alongside tool_calls, so voice agents can speak the filler while the tool runs in the background. Reduces perceived latency on actions like database lookups and webhook calls. See Tool Calling.

Prefix caching

Repeated prompt prefixes (system prompts, RAG context, conversation history) are served from cache automatically. usage.prompt_tokens_details.cached_tokens is returned on every response so you can audit cache hit rates. See Prefix Caching.

70 languages with first-class Indic support

Electron speaks 70 languages out of the box, with particularly strong fluency on Indian-state languages — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — and credible coverage of Nepali, Sinhala, and many lower-resource South Asian languages. See the Model Card for the full list.

Voice-agent stack

Electron is designed to slot into a complete voice pipeline alongside Pulse for transcription and Lightning for speech synthesis. See the Voice Agent cookbook for an end-to-end example.

Use cases

Voice agents

Customer support, sales, scheduling. Electron + Pulse + Lightning is a complete stack.

RAG applications

Prefix caching makes long retrieved-context prompts cheap and fast on repeat queries.

Multilingual chat

Customer chat in Indic languages, European languages, Arabic, East Asian languages — all from one endpoint.

Agent / tool-use workflows

Standard OpenAI tools API, with built-in voice-friendly filler-before-call behavior.

Migration from OpenAI

Drop-in replacement for chat/completions. Switch by changing two strings.

High-throughput batch jobs

Built for volume; comparable quality to frontier models on most voice-agent tasks.

Pricing

Contact your Smallest AI account manager for current pricing. Prefix-cache discounts apply automatically when cached prefixes hit — no flag required. See Prefix Caching for guidance on structuring prompts to maximize cache hits.

Plan limits

	RPM	Concurrent requests
Standard	10	3
Enterprise	200	20

Contact sales for higher limits.

What’s next

Quickstart

Make your first call in 60 seconds.

Chat Completions API

Full request/response reference.

Streaming

SSE format, partial chunks, usage at end-of-stream.

Tool Calling

Including the voice-agent filler-phrase pattern.

Migrate from OpenAI

Side-by-side diff.

Model Card

Capabilities, limits, supported languages.