Pulse STT — encoding query param now documented on the pre-recorded REST endpoint
Pulse STT — encoding query param now documented on the pre-recorded REST endpoint
The pre-recorded REST endpoint (POST /waves/v1/pulse/get_text) now lists the encoding query parameter alongside the streaming WebSocket. Same 6-value enum: linear16, linear32, alaw, mulaw, opus, ogg_opus. Default is linear16.
When omitted, the server falls back to detecting the format from the file’s container header (works for .wav, .mp3, .flac, .ogg, .m4a, .webm).
Pulse STT — age_detection removed from the pre-recorded HTTP API
Pulse STT — age_detection removed from the pre-recorded HTTP API
The age_detection query parameter and the corresponding top-level age field in the response have been removed from the Pulse STT pre-recorded HTTP API (POST /waves/v1/pulse/get_text). Gender detection (gender_detection / gender) and emotion detection (emotion_detection / emotions) are unaffected.
Specs and reference docs updated:
fern/apis/waves/openapi/pulse-stt-openapi.yaml—age_detectionquery param dropped;ageresponse field and example value removed.fern/products/waves/pages/v4.0.0/api-references/pulse-stt.mdx(+ versions mirror) — cURL/Python/JavaScript samples for both raw-bytes and audio-URL methods no longer passage_detection.fern/products/waves/pages/v4.0.0/speech-to-text/pre-recorded/code-examples.mdx(+ versions mirror) — Python end-to-end sample no longer requests or prints age.fern/products/waves/pages/v4.0.0/speech-to-text/features/age-and-gender-detection.mdx(+ versions mirror) — page retitled to Gender detection and trimmed to gender-only content. The file path is unchanged so existing/features/age-and-gender-detectionlinks keep resolving.fern/products/waves/pages/v4.0.0/integrations/n8n.mdx,speech-to-text/overview.mdx,speech-to-text/pre-recorded/features.mdx,speech-to-text/model-cards/pulse.mdx, and the STT benchmarksmetrics-overview.mdx— surrounding tables, accordions, and feature cards updated to drop age references.fern/products/waves/versions/v4.0.0.yml— sidebar entry retitled to Gender Detection.
If your code passes age_detection=true or reads response.age, drop both — the parameter is now ignored and the field will not be returned. No other Pulse STT request shape or response field changes.
Pulse STT — recommend itn_normalize over numerals for new integrations
Pulse STT — recommend itn_normalize over numerals for new integrations
The numerals query parameter on the Pulse STT WebSocket API still works and continues to behave as documented. For new integrations we now recommend itn_normalize=true instead — it covers digits as well as dates, currencies, phone numbers, and other spoken-form entities, and gives more consistent results across languages.
Existing code that uses numerals does not need to change.
Pre-rename Pulse STT WebSocket spec removed from SDK generation
Pre-rename Pulse STT WebSocket spec removed from SDK generation
The pre-rename AsyncAPI spec for wss://api.smallest.ai/waves/v1/asr has been removed. This is the final piece of the legacy STT WebSocket cleanup. Pulse STT (wss://api.smallest.ai/waves/v1/pulse/get_text) has been the supported real-time STT surface throughout; the older channel was already unlinked from the v4 API reference navigation and was kept only for historical SDK generation.
Files removed:
- The legacy AsyncAPI spec and its overrides under
fern/apis/waves/asyncapi/. - The corresponding generator entries in
fern/apis/waves/generators.ymlandfern/apis/unified/generators.yml.
SDK impact: the next SDK release will no longer expose the legacy streaming-STT methods generated from the old spec. Existing SDK releases keep working until upgrade. Migrate to Pulse STT — see the Pulse STT WebSocket reference and the Pulse quickstart.
STT Performance page rebuilt against the FLEURS, ESB, and WildASR sources; speaker-cap language scrubbed from the Pulse model card
STT Performance page rebuilt against the FLEURS, ESB, and WildASR sources; speaker-cap language scrubbed from the Pulse model card
The Pulse STT Performance page (/waves/documentation/speech-to-text-pulse/benchmarks/performance) now reflects the canonical benchmark comparison across four datasets:
- FLEURS — pre-recorded (29 languages)
- FLEURS — streaming (8 languages)
- ESB — streaming (9 English domains)
- WildASR — streaming (8 robustness conditions)
- Internal English perturbation suite (12 categories)
Each dataset section includes the source description and the full Smallest Pulse vs Deepgram Nova 2 vs Nova 3 comparison. Several previously published sections that did not come from any source benchmark — “Performance by Language”, “High-Performance Languages”, “Regional Variations”, “Performance by Audio Format”, and “Feature Impact on Performance” — have been removed. The latency-per-language pairing (e.g. “Italian: 4.2% WER, ~64ms latency”) was incorrect; latency is reported once at the top of the page and is not language-specific.
The Pulse model card (/waves/model-cards/speech-to-text/pulse) is also updated:
- Speaker-diarization phrasing such as “capped at 4 speakers; known issues” has been removed from the Key Capabilities card, the Features — Non-streaming table, and the Limitations & Safety bullets.
- The ESB, WildASR, and Internal English perturbation tables are now mirrored into the model card so the per-feature page and the model card stay in sync.
- The single hyperlink on Keyword boosting in the Features — Streaming table has been removed for consistency with the rest of the table (none of the other features link out from the cell).
Legacy Pulse STT WebSocket reference removed from docs
The legacy WebSocket reference at wss://api.smallest.ai/waves/v1/lightning/get_text has been removed from the docs. Pulse STT (wss://api.smallest.ai/waves/v1/pulse/get_text) is the supported real-time STT surface; the legacy endpoint was already unlinked from the v4 API reference navigation and was only kept for historical reference.
Files removed:
- The legacy WebSocket API reference page (and its
versions/v4.0.0mirror). - The legacy AsyncAPI spec and its overrides under
fern/apis/waves/asyncapi/. - The corresponding generator entries in
fern/apis/waves/generators.ymlandfern/apis/unified/generators.yml. - The orphan-page allow-list entry in
scripts/.nav-ignore.
If your code still calls wss://api.smallest.ai/waves/v1/lightning/get_text, migrate to Pulse STT — the request shape is similar; see the Pulse STT quickstart and the Pulse WebSocket reference.
Pulse STT — multi-indic and multi-asian clarified as pre-recorded HTTP only (correction)
Pulse STT — multi-indic and multi-asian clarified as pre-recorded HTTP only (correction)
The multi-indic and multi-asian regional auto-detection scopes announced earlier today apply only to the Pulse STT pre-recorded HTTP endpoint (POST /waves/v1/pulse/get_text). They are not supported on the WebSocket streaming endpoint (wss://api.smallest.ai/waves/v1/pulse/get_text).
Specs and reference docs corrected accordingly:
fern/apis/waves/asyncapi/pulse-stt-ws.yaml—multi-indic/multi-asianremoved from the WebSocketlanguageenum.fern/apis/waves/asyncapi/pulse-stt-ws-overrides.yml— same.fern/apis/waves-v4/overrides/pulse-stt-ws-overrides.yml— same.fern/products/waves/pages/v4.0.0/api-references/pulse-stt-ws.mdx(+ versions mirror) — WebSocket reference table cell trimmed tomulti-eu/multionly.fern/products/waves/pages/v4.0.0/speech-to-text/pre-recorded/quickstart.mdx(+ versions mirror) — language Note now lists all four scopes (multi-eu,multi-indic,multi-asian,multi).
The pre-recorded OpenAPI spec (fern/apis/waves/openapi/pulse-stt-openapi.yaml) and the rendered API reference page (pulse-stt.mdx) continue to advertise all four scopes — that surface is unchanged.
The Pulse STT WebSocket probe (scripts/probes/pulse_stt.py) was also stripped of the two invalid test cases that were probing the WS endpoint with these scopes; the probe baseline (scripts/probes/baseline-pulse-stt.json) was regenerated.
If your code already passes language=multi-indic or language=multi-asian to the pre-recorded HTTP endpoint, no action needed — that path is correct. If you wired either scope into a WebSocket streaming call, switch back to multi-eu or multi until streaming-side coverage ships.
Pulse STT pre-recorded — multi-indic and multi-asian regional auto-detection modes
Pulse STT pre-recorded — multi-indic and multi-asian regional auto-detection modes
The language parameter on the Pulse STT pre-recorded HTTP endpoint now accepts two new regional auto-detection scopes alongside the existing multi-eu and multi:
multi-indic— auto-detects across the Indic set:en,hi,mr,pa,gu,or,ka,ta,te,ml,bn. Use for Indian-language audio with optional English code-switching.multi-asian— auto-detects across the East Asian set:en,ja,ko,zh,yue. Use for Japanese / Korean / Mandarin / Cantonese audio with optional English code-switching.
The existing scopes are unchanged:
multi-eu(default) —de,en,fr,it,nl,pt,ru,es.multi— full multilingual auto-detection across all supported languages.
Pick the narrowest scope that matches your audio for the best accuracy. Omitting language still routes to multi-eu and can mis-detect on non-European audio (e.g., returning Russian for English input).
The new multi-indic and multi-asian scopes are only available on the pre-recorded HTTP endpoint for now. The realtime WebSocket streaming endpoint continues to support only multi-eu and multi for auto-detection until streaming-side coverage ships.
Specs updated:
- OpenAPI (pre-recorded):
fern/apis/waves/openapi/pulse-stt-openapi.yaml
Pulse STT — multi-eu is now the documented default; full_transcript removed
Pulse STT — multi-eu is now the documented default; full_transcript removed
The language parameter now defaults to multi-eu in the Pulse STT WebSocket and HTTP specs. Set language explicitly to the known language for best accuracy. Use multi-eu when the source is unknown European-language audio; use multi for full multilingual auto-detection. Omitting language routes to multi-eu and can mis-detect on non-European audio (e.g., returning Russian for English input).
The full_transcript query parameter and response field were removed from the spec. The server no longer returns the field. Reconstruct a session-level transcript on the client by concatenating each is_final=true transcript value — every WS code sample on the realtime surface (Python, Node.js, browser JS, mic) was updated to show the pattern.
Keyword boosting docs now spell out that the value is a single comma-separated string, not a JSON array. keywords=["I:20,smiling:26"] and keywords=['I:20,smiling:26'] silently pass and produce garbled transcripts because the API parses the brackets as keyword characters; the correct shape is keywords=I:20,smiling:26.
Pulse STT close_stream signal
Pulse STT close_stream signal
Send {"type": "close_stream"} to end a Pulse STT session. The server flushes remaining audio, emits final transcripts, responds with is_last=true, then closes the WebSocket.
{"type": "finalize"} is now documented as a mid-session flush: it returns an is_final=true transcript for buffered audio while keeping the session open for more input. Prior docs conflated the two.
Code examples in Python, Node.js, and browser JavaScript were updated to use close_stream when ending the stream.

