> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Overview

> Electron — Smallest AI's in-house language model. OpenAI-compatible chat completions, 70 languages with first-class Indic support, voice-agent-optimized tool calling, prefix caching.

Electron is Smallest AI's in-house language model, optimized for voice agents. Built to be a **drop-in replacement for the OpenAI Chat Completions API** — same wire format, same SDKs — with production-grade quality, sub-300 ms time-to-first-token, and a price point built for high-volume workloads.

Drop-in replacement for `/v1/chat/completions`. Use the official OpenAI SDKs.

Time-to-first-token tuned for voice agents and real-time UX.

First-class Indic support — Hindi, Tamil, Bengali, Marathi, and more.

Prefix-cached input tokens billed at `$0.10/M` vs `$0.40/M` normal input.

Get started in 60 seconds. Point the OpenAI SDK at Smallest AI and make your first call.

## Why Electron

|                                       |                                                                                                                                                                                                                                                                              |
| ------------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| **One endpoint, one model field**     | `POST /waves/v1/chat/completions` with `"model": "electron"`. Streaming, tool calls, and structured output all toggle via body flags — no separate URLs.                                                                                                                     |
| **Voice-agent-native**                | Prompt the model to acknowledge before tool calls and it returns a short filler in `content` alongside `tool_calls` (e.g. *"Let me check that for you…"*), so your TTS layer can mask tool-call latency. See [Tool Calling](/waves/documentation/llm-electron/tool-calling). |
| **Multilingual, with Indic strength** | 70 supported languages, with particularly strong performance on Indian-state languages — including lower-resource ones.                                                                                                                                                      |
| **Cost-efficient**                    | $0.40 / 1M input tokens, $1.60 / 1M output. With prefix caching, cached input drops to \$0.10 / 1M (75% off).                                                                                                                                                                |
| **Stack-friendly**                    | Pairs with [Pulse](/waves/documentation/speech-to-text-pulse/overview) (STT) and [Lightning](/waves/documentation/text-to-speech-lightning/overview) (TTS) to build a complete voice agent on Smallest AI's stack.                                                           |

## Feature highlights

Electron implements the standard OpenAI Chat Completions wire format. The official OpenAI SDKs (Python, JavaScript, Go, Java, etc.) work out of the box — change `base_url` to `https://api.smallest.ai/waves/v1` and your existing code is portable. Most OpenAI request fields flow through verbatim: `messages`, `temperature`, `top_p`, `max_tokens`, `tools`, `tool_choice`, `response_format`, `stream`, `stream_options`, `seed`, `stop`, `logit_bias`, `logprobs`, and more.

Set `"stream": true` to receive tokens as they're generated via Server-Sent Events (SSE). Include `"stream_options": {"include_usage": true}` to get a final usage chunk so you can bill exactly the tokens served, even if the client disconnects mid-stream. See [Streaming](/waves/documentation/llm-electron/streaming).

Standard OpenAI `tools` array with `tool_calls` in the response. With a voice-agent-style system prompt, Electron also emits a short conversational filler in the `content` field alongside `tool_calls`, so voice agents can speak the filler while the tool runs in the background. Reduces perceived latency on actions like database lookups and webhook calls. See [Tool Calling](/waves/documentation/llm-electron/tool-calling).

Repeated prompt prefixes (system prompts, RAG context, conversation history) are served from cache and billed at a 75% discount. `usage.prompt_tokens_details.cached_tokens` is returned on every response so you can audit the savings. See [Prefix Caching](/waves/documentation/llm-electron/prefix-caching).

Electron speaks 70 languages out of the box, with particularly strong fluency on Indian-state languages — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — and credible coverage of Nepali, Sinhala, and many lower-resource South Asian languages. See the [Model Card](/waves/model-cards/llm/electron) for the full list.

Electron is designed to slot into a complete voice pipeline alongside [Pulse](/waves/documentation/speech-to-text-pulse/overview) for transcription and [Lightning](/waves/documentation/text-to-speech-lightning/overview) for speech synthesis. See the [Voice Agent cookbook](/waves/documentation/cookbooks/voice-agent-electron-pulse-lightning) for an end-to-end example.

## Use cases

Customer support, sales, scheduling. Electron + Pulse + Lightning is a complete stack.

Prefix caching makes long retrieved-context prompts cheap and fast on repeat queries.

Customer chat in Indic languages, European languages, Arabic, East Asian languages — all from one endpoint.

Standard OpenAI tools API, with built-in voice-friendly filler-before-call behavior.

Drop-in replacement for `chat/completions`. Switch by changing two strings.

Lower per-token cost than frontier models with comparable quality on most tasks.

## Pricing

|                                     |                 |
| ----------------------------------- | --------------: |
| Input tokens                        | **\$0.40** / 1M |
| Cached input tokens (prefix-cached) | **\$0.10** / 1M |
| Output tokens                       | **\$1.60** / 1M |

Cached pricing applies automatically when prefix caching hits — no flag required. See [Prefix Caching](/waves/documentation/llm-electron/prefix-caching) for guidance on structuring prompts to maximize cache hits.

## Plan limits

|            | RPM | Concurrent requests |
| ---------- | --: | ------------------: |
| Standard   |  10 |                   3 |
| Enterprise | 200 |                  20 |

[Contact sales](https://smallest.ai/contact) for higher limits.

## What's next

Make your first call in 60 seconds.

Full request/response reference.

SSE format, partial chunks, usage at end-of-stream.

Including the voice-agent filler-phrase pattern.

Side-by-side diff.

Capabilities, limits, supported languages, pricing.