Electron LLM — chat completions on the Waves API

Electron, Smallest AI’s in-house language model, is now generally available on the Waves API. Use it as a drop-in replacement for OpenAI’s chat completions — point the OpenAI SDK at https://api.smallest.ai/waves/v1 and pass "model": "electron".

1 import os
2 from openai import OpenAI
3 
4 client = OpenAI(
5     base_url="https://api.smallest.ai/waves/v1",
6     api_key=os.environ["SMALLEST_API_KEY"],
7 )
8 
9 response = client.chat.completions.create(
10     model="electron",
11     messages=[{"role": "user", "content": "Say hello in one short sentence."}],
12 )
13 print(response.choices[0].message.content)

What’s in this launch:

OpenAI-compatible endpoint — POST /waves/v1/chat/completions. Same wire format as api.openai.com/v1/chat/completions. Streaming (SSE with optional final usage chunk), tool/function calling, JSON mode, multi-turn — all work via standard OpenAI request bodies. The official OpenAI SDKs (Python / JavaScript / Go / Java / Ruby) work with no code changes beyond the base URL and API key.
Sub-300 ms time-to-first-token on warm connections.
32,768-token context (combined input + output).
70 languages with first-class Indic support — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — plus broad coverage across Western/Eastern Europe, Middle East, East/Southeast/South/Central Asia, and Africa. See the Electron model card for the full list.
Voice-agent-optimized tool calling — with a voice-agent-style system prompt, Electron emits a short filler phrase in content alongside tool_calls (e.g. “Let me check that for you…”) so a downstream TTS layer can mask tool-call latency. See Tool Calling for the voice-agent pattern.
Automatic prefix caching — cached input tokens billed at a discounted rate vs normal input. Reported on every response as usage.prompt_tokens_details.cached_tokens so you can audit cache hits. See Prefix Caching.
Cookbook: Voice Agent (Electron + Pulse + Lightning) wires Pulse (STT) + Electron (LLM + tools) + Lightning (TTS) into an end-to-end voice pipeline.

Pricing: Contact your Smallest AI account manager for the current rate card.

Plan limits: Standard 10 RPM / 3 concurrent; Enterprise 200 RPM / 20 concurrent.

Rejected parameters (vs OpenAI): n > 1 and prompt_logprobs — both return HTTP 400 with invalid_request_error.

No vision, no audio in/out on the public API — Electron is text-only.

→ Quickstart · Overview · Chat Completions API · Migrate from OpenAI · Model card

Fern CLI 5.28.2 + Python SDK generator 5.12.12

The Fern CLI has been updated to 5.28.2 and the Python SDK generator (fernapi/fern-python-sdk) bumped from 4.61.3 to 5.12.12.

Why: CLI 5.28.2 ships an updated @fern-api/replay that fixes a regression where customer commits made directly on a fern-bot regeneration PR branch could be silently dropped on the next regen if the PR was merged via the GitHub merge-commit button. Pinning the generator to the latest stable (5.12.12) aligns automated regenerations going forward.

Impact for SDK users: Next regeneration will use the v5 generator line, which restructured a few API surfaces compared to v4 (e.g., client.waves.transcribe_pulse → client.waves.speech_to_text.pulse; the format, punctuate, and capitalize query parameters were dropped from the v5 spec).