For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogo
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
    • General
    • Lightning v3.1
    • Pulse STT
    • Hydra
  • General
  • June 12, 2026
  • June 3, 2026
  • May 23, 2026
  • May 22, 2026
  • May 22, 2026
  • May 12, 2026
  • May 7, 2026
  • April 22, 2026
  • April 20, 2026
  • Lightning v3.1
  • June 15, 2026
  • June 5, 2026
  • June 1, 2026
  • May 19, 2026
  • May 19, 2026
  • May 15, 2026
  • May 14, 2026
  • May 8, 2026
  • May 2, 2026
  • May 2, 2026
  • Pulse STT
  • June 16, 2026
  • June 15, 2026
  • May 30, 2026
  • May 30, 2026
  • May 28, 2026
  • May 22, 2026
  • May 15, 2026
  • May 8, 2026
  • May 6, 2026
  • May 6, 2026
  • May 4, 2026
  • May 3, 2026
  • May 1, 2026
  • May 1, 2026
  • May 1, 2026
  • April 30, 2026
  • April 21, 2026
  • April 20, 2026
  • Hydra
  • May 20, 2026

General

June 12, 2026
June 12, 2026

June 3, 2026
June 3, 2026

WebSocket auth and default URL fixes

Fixed authentication configuration and default server URLs across the WebSocket specs for Lightning TTS, Pulse STT, and related endpoints.

Additional endpoints for v2.2.0 and v3.0.1 are now marked deprecated in the API reference.


May 23, 2026
May 23, 2026

May 22, 2026
May 22, 2026

May 22, 2026
May 22, 2026

May 12, 2026
May 12, 2026

May 7, 2026
May 7, 2026

April 22, 2026
April 22, 2026

April 20, 2026
April 20, 2026
Built with
Voice AgentsModels
Voice AgentsModels

Lightning v3.1 + Pulse STT API-ref Python snippets — SmallestAI(api_key=...) (was token=...)

The Python “Try it” snippets attached to the Lightning v3.1 TTS and Pulse STT API reference operations used the pre-4.4.5 SDK constructor signature:

1client = SmallestAI(token="YOUR_API_KEY")

The SDK kwarg was renamed in 4.4.5. Customers copy-pasting the old snippet hit:

TypeError: SmallestAI.__init__() got an unexpected keyword argument 'token'

All affected Python snippets in lightning-v3.1-openapi.yaml, pulse-stt-openapi.yaml, and their v4 overrides now use SmallestAI(api_key="YOUR_API_KEY"). No wire-protocol change. Existing customer code on smallestai >= 4.4.5 is unaffected.

Hydra — full-duplex speech-to-speech model

Hydra, Smallest AI’s in-house speech-to-speech model, is now live. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back — no STT → LLM → TTS pipeline in the middle.

1import asyncio, base64, json, os, wave
2import websockets
3
4API_KEY = os.environ["SMALLEST_API_KEY"]
5URL = f"wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key={API_KEY}"
6
7async def main():
8 async with websockets.connect(URL, max_size=None) as ws:
9 async for raw in ws:
10 evt = json.loads(raw)
11 if evt["type"] == "session.created":
12 await ws.send(json.dumps({
13 "type": "session.configure",
14 "session": {"instructions": "Be brief.", "voice": "wren"},
15 }))
16 elif evt["type"] == "response.output_audio.delta":
17 ... # decode base64 PCM16 and play
18
19asyncio.run(main())

What’s in this launch:

  • Single WebSocket endpoint at wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=.... JSON text frames; no binary frames.
  • Full-duplex with server-side VAD — stream input_audio_buffer.append continuously while the mic is open. Hydra detects turn boundaries on its own. If the user speaks while the model is responding, the in-flight response is cancelled (response.done with reason: "interrupted") and a fresh turn begins.
  • Six voices: wren, sloane, marlowe, reed, knox, tate.
  • Tool calling — declare JSON-schema tools in session.configure, Hydra streams arguments via response.function_call_arguments.delta, your client executes the tool and posts the result back via conversation.item.create + response.create.
  • Bot speaks first — set generate_initial_response: true on session.configure for greetings and concierge openers.
  • Mid-session updates — live-patch tools via session.update without reconnecting.
  • Audio formats: input PCM16 mono 16 kHz; output PCM16 mono 48 kHz.

When to use Hydra vs the three-model stack:

  • Use Hydra when latency-to-voice matters above all else — phone agents, kiosks, in-car assistants.
  • Use Pulse → Electron → Lightning v3.1 when you need explicit text in the middle: analytics, custom retrieval, regulated content moderation, BYOM.

Docs:

  • Quickstart — clone the reference client and talk to Hydra in your browser.
  • Overview — full event reference, session config, tool calling, interruption, errors.
  • Model Card — voices, performance, pricing.
  • Reference client (Next.js) — production-grade browser client with live wire-log, multi-agent presets, tool execution.

Electron LLM — chat completions on the Waves API

Electron, Smallest AI’s in-house language model, is now generally available on the Waves API. Use it as a drop-in replacement for OpenAI’s chat completions — point the OpenAI SDK at https://api.smallest.ai/waves/v1 and pass "model": "electron".

1import os
2from openai import OpenAI
3
4client = OpenAI(
5 base_url="https://api.smallest.ai/waves/v1",
6 api_key=os.environ["SMALLEST_API_KEY"],
7)
8
9response = client.chat.completions.create(
10 model="electron",
11 messages=[{"role": "user", "content": "Say hello in one short sentence."}],
12)
13print(response.choices[0].message.content)

What’s in this launch:

  • OpenAI-compatible endpoint — POST /waves/v1/chat/completions. Same wire format as api.openai.com/v1/chat/completions. Streaming (SSE with optional final usage chunk), tool/function calling, JSON mode, multi-turn — all work via standard OpenAI request bodies. The official OpenAI SDKs (Python / JavaScript / Go / Java / Ruby) work with no code changes beyond the base URL and API key.
  • Sub-300 ms time-to-first-token on warm connections.
  • 32,768-token context (combined input + output).
  • 70 languages with first-class Indic support — Hindi, Bengali, Tamil, Telugu, Marathi, Gujarati, Kannada, Malayalam, Punjabi, Odia, Urdu — plus broad coverage across Western/Eastern Europe, Middle East, East/Southeast/South/Central Asia, and Africa. See the Electron model card for the full list.
  • Voice-agent-optimized tool calling — with a voice-agent-style system prompt, Electron emits a short filler phrase in content alongside tool_calls (e.g. “Let me check that for you…”) so a downstream TTS layer can mask tool-call latency. See Tool Calling for the voice-agent pattern.
  • Automatic prefix caching — cached input tokens billed at a discounted rate vs normal input. Reported on every response as usage.prompt_tokens_details.cached_tokens so you can audit cache hits. See Prefix Caching.
  • Cookbook: Voice Agent (Electron + Pulse + Lightning) wires Pulse (STT) + Electron (LLM + tools) + Lightning (TTS) into an end-to-end voice pipeline.

Pricing: Contact your Smallest AI account manager for the current rate card.

Plan limits: Standard 10 RPM / 3 concurrent; Enterprise 200 RPM / 20 concurrent.

Rejected parameters (vs OpenAI): n > 1 and prompt_logprobs — both return HTTP 400 with invalid_request_error.

No vision, no audio in/out on the public API — Electron is text-only.

→ Quickstart · Overview · Chat Completions API · Migrate from OpenAI · Model card

Fern CLI 5.28.2 + Python SDK generator 5.12.12

The Fern CLI has been updated to 5.28.2 and the Python SDK generator (fernapi/fern-python-sdk) bumped from 4.61.3 to 5.12.12.

Why: CLI 5.28.2 ships an updated @fern-api/replay that fixes a regression where customer commits made directly on a fern-bot regeneration PR branch could be silently dropped on the next regen if the PR was merged via the GitHub merge-commit button. Pinning the generator to the latest stable (5.12.12) aligns automated regenerations going forward.

Impact for SDK users: Next regeneration will use the v5 generator line, which restructured a few API surfaces compared to v4 (e.g., client.waves.transcribe_pulse → client.waves.speech_to_text.pulse; the format, punctuate, and capitalize query parameters were dropped from the v5 spec).

API reference rewrite — Lightning v3.1 + Pulse

The Lightning v3.1 (sync + SSE + WebSocket) and Pulse (REST + WebSocket) endpoints in the API reference have been rewritten end-to-end. Each page now opens with a one-paragraph what, a when to use this comparison against the alternatives, a how-it-works walkthrough, copy-paste examples for cURL / Python / JavaScript, and a common gotchas section.

A note for JavaScript users: the official smallestai npm package (v1.0.1) predates Lightning v3.1 and Pulse, so the docs show fetch (REST/SSE) and ws (WebSocket) examples for those endpoints — they work in Node and the browser with no SDK install.

Waves API spec — corrected output_format enum and resynced base ↔ v4 overrides

The output_format enum on Lightning v3.1 (POST /waves/v1/lightning-v3.1/get_speech and /stream) is now correctly documented as ["pcm", "mp3", "wav", "ulaw", "alaw"]. The previously listed mulaw is rejected by the platform with an invalid_enum_value error and has been removed. Verified live against api.smallest.ai.

The sample_rate enum on the same endpoints now correctly includes 44100 (the model’s native rate) on both the SDK schema and the rendered API reference.

Internally, the base API specs at fern/apis/waves/{openapi,asyncapi}/* and the v4 docs override at fern/apis/waves-v4/overrides/* have been resynced — descriptions, error response shapes, and channel summaries that had drifted between the two layers are now identical. A new CI workflow (spec-drift-check) blocks any future PR that edits one layer without the other.

No customer-facing request shape, response field, or default value changes beyond the output_format/sample_rate corrections.

OpenWhispr integration guide

Added a documentation page for using Smallest AI’s Pulse model as the speech-to-text engine inside OpenWhispr, the open-source desktop dictation app for macOS, Windows, and Linux.

→ OpenWhispr integration

Unified voice cloning API

Voice cloning now has a single cross-model endpoint: POST /waves/v1/voice-cloning. Upload a short audio sample, get back a voice_... ID that works across supported Lightning TTS models.

The legacy Lightning-large clone endpoint is deprecated and will be removed in a future release.

→ Voice Cloning guide