For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Text to Speech
    • Lightning v3.1 Pro
    • Lightning v3.1
    • TTS Evaluation Script
  • Speech to Text
    • Pulse
  • LLM
    • Electron
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Model Overview
  • Key capabilities
  • Performance
  • Pricing
  • Rate limits
  • Supported languages
  • Western Europe (8)
  • Indic (11)
  • Central / Eastern Europe (14)
  • Baltic (3)
  • Nordic (5)
  • Other Europe (2)
  • Middle East (4)
  • East Asia (5)
  • Southeast Asia (8)
  • South Asia (2)
  • Central Asia (2)
  • Africa (6)
  • Capabilities
  • Known limitations
  • API surface
  • Safety & responsible use
  • Related
LLM

Electron

||View as Markdown|
Was this page helpful?
Previous

Pulse

Built with

Electron is Smallest AI’s in-house language model, optimized for voice agents and built as a drop-in replacement for the OpenAI Chat Completions API. Production-grade quality, sub-300 ms time-to-first-token, and a price point built for high-volume workloads.

< 300 ms TTFT

Time-to-first-token tuned for real-time UX.

32K context

Combined input + output context.

70 Languages

First-class Indic support.

OpenAI-Compatible

Drop-in replacement for /v1/chat/completions.

Model Overview

Developed bySmallest AI
Model typeCausal language model — chat completions
API surfaceOpenAI-compatible (POST /waves/v1/chat/completions)
Model ID (request body)"electron"
Context window32,768 tokens (combined input + output)
LicenseProprietary, hosted API

Key capabilities

OpenAI Wire Format

Same request/response shape as OpenAI Chat Completions. Use the official OpenAI SDKs by swapping base_url and api_key.

Streaming

Standard Server-Sent Events. Optional final usage chunk for accurate billing on client disconnect.

Tool / Function Calling

Standard OpenAI tools API, with voice-agent-optimized filler-phrase behavior before tool calls.

Prefix Caching

Cached input tokens billed at $0.10 / 1M (75% off). Automatic — no flag needed.

70 Languages

Wide multilingual coverage, with particularly strong Indic-language performance.

JSON Mode

response_format: {type: "json_object"} for structured output.


Performance

MetricValue
Time to first token (TTFT)< 300 ms (typical, warm connection)
End-to-end roundtrip overhead vs direct model~20 ms with persistent HTTPS connection
Quality tierCompetitive with leading voice-agent LLMs on internal evaluations

Electron is trained for voice-agent workloads — instruction following on system prompts, conversational style, and holding long multi-turn dialogues without drift. We benchmark it internally against frontier alternatives on these tasks. General-purpose academic benchmarks like MMLU and IFEval target a different objective and are not the right yardstick for a model whose job is to drive a phone call.


Pricing

TypeRate
Input tokens$0.40 / 1M
Cached input tokens (prefix-cached)$0.10 / 1M
Output tokens$1.60 / 1M

Prefix-cache pricing applies automatically — see Prefix Caching. Every response reports usage.prompt_tokens_details.cached_tokens so you can audit savings.

Rate limits

PlanRequests per minute (RPM)Concurrent in-flight requests
Standard103
Enterprise20020

Both limits enforce strictly — over either cap returns HTTP 429. See Concurrency and Limits.


Supported languages

Electron is multilingual with strong out-of-the-box quality across the following 70 languages. Particularly strong on Indic languages including lower-resource ones.

Western Europe (8)

English · Spanish · French · German · Italian · Portuguese · Dutch · Catalan

Indic (11)

Hindi · Bengali · Tamil · Telugu · Marathi · Gujarati · Kannada · Malayalam · Punjabi · Odia · Urdu

Central / Eastern Europe (14)

Polish · Russian · Ukrainian · Belarusian · Czech · Slovak · Romanian · Hungarian · Bulgarian · Croatian · Serbian · Slovenian · Macedonian · Albanian

Baltic (3)

Estonian · Latvian · Lithuanian

Nordic (5)

Swedish · Norwegian · Danish · Finnish · Icelandic

Other Europe (2)

Greek · Turkish

Middle East (4)

Arabic · Hebrew · Persian (Farsi) · Kurdish

East Asia (5)

Chinese (Simplified) · Chinese (Traditional) · Japanese · Korean · Mongolian

Southeast Asia (8)

Vietnamese · Thai · Indonesian · Malay · Filipino · Burmese · Khmer · Lao

South Asia (2)

Nepali · Sinhala

Central Asia (2)

Kazakh · Uzbek

Africa (6)

Swahili · Amharic · Afrikaans · Yoruba · Hausa · Zulu


Capabilities

CapabilityStatus
Chat completions (text in / text out)✅
Streaming (SSE)✅
Tool / function calling✅
Parallel tool calls✅
Voice-agent filler-phrase before tool calls✅ Electron-specific
JSON object mode (response_format)✅
Prefix caching✅ Automatic
System messages✅
Multi-turn conversation✅
seed for best-effort determinism✅
Multilingual generation (70 languages)✅

Known limitations

  • No vision / no audio in or out. Electron is text-only on the public API.
  • n > 1 not supported. Each request returns exactly one completion. Make multiple requests if you need multiple completions.
  • prompt_logprobs not supported.
  • Context cap of 32,768 tokens combined input + output. Inputs that exceed this are rejected with a clean 400.

API surface

EndpointPOST https://api.smallest.ai/waves/v1/chat/completions
AuthAuthorization: Bearer $SMALLEST_API_KEY
Request shapeOpenAI Chat Completions wire format
Response shapeOpenAI chat.completion (non-streaming) or SSE chat.completion.chunk (streaming)
Error envelope{"error": {"message", "type", "details", "request_id"}} — details: [{code, message, path}] on validation failures

See Chat Completions for full request/response reference and Supported Parameters for the passthrough table.


Safety & responsible use

Electron is intended for voice-agent and conversational workloads. Customers building user-facing applications should layer their own content moderation, prompt-injection defenses, and PII handling appropriate to their domain. Electron does not currently apply content moderation server-side — outputs reflect the model’s training and the prompts you provide.

For voice-agent applications handling regulated content (financial, healthcare), use the standard pattern: keep PII out of prompts where practical, apply post-processing redaction on outputs, and use Smallest AI’s Pulse PII redaction features on the transcription side.


Related

Pulse — STT

Pair Electron with Pulse for speech-to-text input.

Lightning v3.1 — TTS

Pair Electron with Lightning for speech output.

Voice Agent cookbook

End-to-end pipeline: Pulse → Electron → Lightning.

Chat Completions API

Full request/response reference.