For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Why Electron
  • Feature highlights
  • Use cases
  • Pricing
  • Plan limits
  • What’s next
LLM (Electron)

Overview

||View as Markdown|
Was this page helpful?
Previous

Quickstart

Next

Chat Completions

Built with

Electron is Smallest AI’s in-house language model, optimized for voice agents. Built to be a drop-in replacement for the OpenAI Chat Completions API — same wire format, same SDKs — with production-grade quality, sub-300 ms time-to-first-token, and a price point built for high-volume workloads.

OpenAI-Compatible

Drop-in replacement for /v1/chat/completions. Use the official OpenAI SDKs.

< 300 ms TTFT

Time-to-first-token tuned for voice agents and real-time UX.

70 Languages

First-class Indic support — Hindi, Tamil, Bengali, Marathi, and more.

75% Cached Discount

Prefix-cached input tokens billed at $0.10/M vs $0.40/M normal input.

Quickstart

Get started in 60 seconds. Point the OpenAI SDK at Smallest AI and make your first call.

Why Electron

One endpoint, one model fieldPOST /waves/v1/chat/completions with "model": "electron". Streaming, tool calls, and structured output all toggle via body flags — no separate URLs.
Voice-agent-nativePrompt the model to acknowledge before tool calls and it returns a short filler in content alongside tool_calls (e.g. “Let me check that for you…”), so your TTS layer can mask tool-call latency. See Tool Calling.
Multilingual, with Indic strength70 supported languages, with particularly strong performance on Indian-state languages — including lower-resource ones.
Cost-efficient0.40/1Minputtokens,0.40 / 1M input tokens, 0.40/1Minputtokens,1.60 / 1M output. With prefix caching, cached input drops to $0.10 / 1M (75% off).
Stack-friendlyPairs with Pulse (STT) and Lightning (TTS) to build a complete voice agent on Smallest AI’s stack.

Feature highlights

OpenAI-compatible API

Electron implements the standard OpenAI Chat Completions wire format. The official OpenAI SDKs (Python, JavaScript, Go, Java, etc.) work out of the box — change base_url to https://api.smallest.ai/waves/v1 and your existing code is portable. Most OpenAI request fields flow through verbatim: messages, temperature, top_p, max_tokens, tools, tool_choice, response_format, stream, stream_options, seed, stop, logit_bias, logprobs, and more.

Streaming with token usage

Set "stream": true to receive tokens as they’re generated via Server-Sent Events (SSE). Include "stream_options": {"include_usage": true} to get a final usage chunk so you can bill exactly the tokens served, even if the client disconnects mid-stream. See Streaming.

Tool / function calling — voice-agent optimized

Standard OpenAI tools array with tool_calls in the response. With a voice-agent-style system prompt, Electron also emits a short conversational filler in the content field alongside tool_calls, so voice agents can speak the filler while the tool runs in the background. Reduces perceived latency on actions like database lookups and webhook calls. See Tool Calling.

Prefix caching ($0.10 / 1M cached input tokens)

Repeated prompt prefixes (system prompts, RAG context, conversation history) are served from cache and billed at a 75% discount. usage.prompt_tokens_details.cached_tokens is returned on every response so you can audit the savings. See Prefix Caching.

70 languages with first-class Indic support

Electron speaks 70 languages out of the box, with particularly strong fluency on Indian-state languages — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — and credible coverage of Nepali, Sinhala, and many lower-resource South Asian languages. See the Model Card for the full list.

Voice-agent stack

Electron is designed to slot into a complete voice pipeline alongside Pulse for transcription and Lightning for speech synthesis. See the Voice Agent cookbook for an end-to-end example.

Use cases

Voice agents

Customer support, sales, scheduling. Electron + Pulse + Lightning is a complete stack.

RAG applications

Prefix caching makes long retrieved-context prompts cheap and fast on repeat queries.

Multilingual chat

Customer chat in Indic languages, European languages, Arabic, East Asian languages — all from one endpoint.

Agent / tool-use workflows

Standard OpenAI tools API, with built-in voice-friendly filler-before-call behavior.

Migration from OpenAI

Drop-in replacement for chat/completions. Switch by changing two strings.

High-throughput batch jobs

Lower per-token cost than frontier models with comparable quality on most tasks.

Pricing

Input tokens$0.40 / 1M
Cached input tokens (prefix-cached)$0.10 / 1M
Output tokens$1.60 / 1M

Cached pricing applies automatically when prefix caching hits — no flag required. See Prefix Caching for guidance on structuring prompts to maximize cache hits.

Plan limits

RPMConcurrent requests
Standard103
Enterprise20020

Contact sales for higher limits.

What’s next

Quickstart

Make your first call in 60 seconds.

Chat Completions API

Full request/response reference.

Streaming

SSE format, partial chunks, usage at end-of-stream.

Tool Calling

Including the voice-agent filler-phrase pattern.

Migrate from OpenAI

Side-by-side diff.

Model Card

Capabilities, limits, supported languages, pricing.