Hydra — full-duplex speech-to-speech model
Hydra, Smallest AI’s in-house speech-to-speech model, is now live. A single WebSocket carries microphone audio from your client to the model and streams synthesised response audio back — no STT → LLM → TTS pipeline in the middle.
What’s in this launch:
wss://api.smallest.ai/waves/v1/s2s?model=hydra&api_key=.... JSON text frames; no binary frames.input_audio_buffer.append continuously while the mic is open. Hydra detects turn boundaries on its own. If the user speaks while the model is responding, the in-flight response is cancelled (response.done with reason: "interrupted") and a fresh turn begins.wren, sloane, marlowe, reed, knox, tate.session.configure, Hydra streams arguments via response.function_call_arguments.delta, your client executes the tool and posts the result back via conversation.item.create + response.create.generate_initial_response: true on session.configure for greetings and concierge openers.tools via session.update without reconnecting.When to use Hydra vs the three-model stack:
Docs:
Electron LLM — chat completions on the Waves API
Electron, Smallest AI’s in-house language model, is now generally available on the Waves API. Use it as a drop-in replacement for OpenAI’s chat completions — point the OpenAI SDK at https://api.smallest.ai/waves/v1 and pass "model": "electron".
What’s in this launch:
POST /waves/v1/chat/completions. Same wire format as api.openai.com/v1/chat/completions. Streaming (SSE with optional final usage chunk), tool/function calling, JSON mode, multi-turn — all work via standard OpenAI request bodies. The official OpenAI SDKs (Python / JavaScript / Go / Java / Ruby) work with no code changes beyond the base URL and API key.content alongside tool_calls (e.g. “Let me check that for you…”) so a downstream TTS layer can mask tool-call latency. See Tool Calling for the voice-agent pattern.usage.prompt_tokens_details.cached_tokens so you can audit cache hits. See Prefix Caching.Pricing: Contact your Smallest AI account manager for the current rate card.
Plan limits: Standard 10 RPM / 3 concurrent; Enterprise 200 RPM / 20 concurrent.
Rejected parameters (vs OpenAI): n > 1 and prompt_logprobs — both return HTTP 400 with invalid_request_error.
No vision, no audio in/out on the public API — Electron is text-only.
→ Quickstart · Overview · Chat Completions API · Migrate from OpenAI · Model card
Fern CLI 5.28.2 + Python SDK generator 5.12.12
The Fern CLI has been updated to 5.28.2 and the Python SDK generator (fernapi/fern-python-sdk) bumped from 4.61.3 to 5.12.12.
Why: CLI 5.28.2 ships an updated @fern-api/replay that fixes a regression where customer commits made directly on a fern-bot regeneration PR branch could be silently dropped on the next regen if the PR was merged via the GitHub merge-commit button. Pinning the generator to the latest stable (5.12.12) aligns automated regenerations going forward.
Impact for SDK users: Next regeneration will use the v5 generator line, which restructured a few API surfaces compared to v4 (e.g., client.waves.transcribe_pulse → client.waves.speech_to_text.pulse; the format, punctuate, and capitalize query parameters were dropped from the v5 spec).
API reference rewrite — Lightning v3.1 + Pulse
The Lightning v3.1 (sync + SSE + WebSocket) and Pulse (REST + WebSocket) endpoints in the API reference have been rewritten end-to-end. Each page now opens with a one-paragraph what, a when to use this comparison against the alternatives, a how-it-works walkthrough, copy-paste examples for cURL / Python / JavaScript, and a common gotchas section.
A note for JavaScript users: the official smallestai npm package (v1.0.1) predates Lightning v3.1 and Pulse, so the docs show fetch (REST/SSE) and ws (WebSocket) examples for those endpoints — they work in Node and the browser with no SDK install.
Waves API spec — corrected output_format enum and resynced base ↔ v4 overrides
The output_format enum on Lightning v3.1 (POST /waves/v1/lightning-v3.1/get_speech and /stream) is now correctly documented as ["pcm", "mp3", "wav", "ulaw", "alaw"]. The previously listed mulaw is rejected by the platform with an invalid_enum_value error and has been removed. Verified live against api.smallest.ai.
The sample_rate enum on the same endpoints now correctly includes 44100 (the model’s native rate) on both the SDK schema and the rendered API reference.
Internally, the base API specs at fern/apis/waves/{openapi,asyncapi}/* and the v4 docs override at fern/apis/waves-v4/overrides/* have been resynced — descriptions, error response shapes, and channel summaries that had drifted between the two layers are now identical. A new CI workflow (spec-drift-check) blocks any future PR that edits one layer without the other.
No customer-facing request shape, response field, or default value changes beyond the output_format/sample_rate corrections.
Added a documentation page for using Smallest AI’s Pulse model as the speech-to-text engine inside OpenWhispr, the open-source desktop dictation app for macOS, Windows, and Linux.
Voice cloning now has a single cross-model endpoint: POST /waves/v1/voice-cloning. Upload a short audio sample, get back a voice_... ID that works across supported Lightning TTS models.
The legacy Lightning-large clone endpoint is deprecated and will be removed in a future release.