Electron LLM — chat completions on the Waves API
Electron LLM — chat completions on the Waves API
Electron, Smallest AI’s in-house language model, is now generally available on the Waves API. Use it as a drop-in replacement for OpenAI’s chat completions — point the OpenAI SDK at https://api.smallest.ai/waves/v1 and pass "model": "electron".
What’s in this launch:
- OpenAI-compatible endpoint —
POST /waves/v1/chat/completions. Same wire format asapi.openai.com/v1/chat/completions. Streaming (SSE with optional final usage chunk), tool/function calling, JSON mode, multi-turn — all work via standard OpenAI request bodies. The official OpenAI SDKs (Python / JavaScript / Go / Java / Ruby) work with no code changes beyond the base URL and API key. - Sub-300 ms time-to-first-token on warm connections.
- 32,768-token context (combined input + output).
- 70 languages with first-class Indic support — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — plus broad coverage across Western/Eastern Europe, Middle East, East/Southeast/South/Central Asia, and Africa. See the Electron model card for the full list.
- Voice-agent-optimized tool calling — with a voice-agent-style system prompt, Electron emits a short filler phrase in
contentalongsidetool_calls(e.g. “Let me check that for you…”) so a downstream TTS layer can mask tool-call latency. See Tool Calling for the voice-agent pattern. - Automatic prefix caching — cached input tokens billed at a discounted rate vs normal input. Reported on every response as
usage.prompt_tokens_details.cached_tokensso you can audit cache hits. See Prefix Caching. - Cookbook: Voice Agent (Electron + Pulse + Lightning) wires Pulse (STT) + Electron (LLM + tools) + Lightning (TTS) into an end-to-end voice pipeline.
Pricing: Contact your Smallest AI account manager for the current rate card.
Plan limits: Standard 10 RPM / 3 concurrent; Enterprise 200 RPM / 20 concurrent.
Rejected parameters (vs OpenAI): n > 1 and prompt_logprobs — both return HTTP 400 with invalid_request_error.
No vision, no audio in/out on the public API — Electron is text-only.
→ Quickstart · Overview · Chat Completions API · Migrate from OpenAI · Model card
Fern CLI 5.28.2 + Python SDK generator 5.12.12
Fern CLI 5.28.2 + Python SDK generator 5.12.12
The Fern CLI has been updated to 5.28.2 and the Python SDK generator (fernapi/fern-python-sdk) bumped from 4.61.3 to 5.12.12.
Why: CLI 5.28.2 ships an updated @fern-api/replay that fixes a regression where customer commits made directly on a fern-bot regeneration PR branch could be silently dropped on the next regen if the PR was merged via the GitHub merge-commit button. Pinning the generator to the latest stable (5.12.12) aligns automated regenerations going forward.
Impact for SDK users: Next regeneration will use the v5 generator line, which restructured a few API surfaces compared to v4 (e.g., client.waves.transcribe_pulse → client.waves.speech_to_text.pulse; the format, punctuate, and capitalize query parameters were dropped from the v5 spec).

