Pulse STT WebSocket — finalize operation restored on the API reference
Pulse STT WebSocket — finalize operation restored on the API reference
The Pulse STT WebSocket API reference now correctly renders both control-message operations on docs.smallest.ai:
sendFinalize— payload{"type":"finalize"}— flushes the current audio buffer and emits anis_final: truetranscript while keeping the session open. Useful for per-turn finalization in agentic pipelines.sendCloseStream— payload{"type":"close_stream"}— flushes any buffered audio, emits the terminalis_final: true+is_last: truetranscript, then closes the session.
No wire change — the server has accepted both control messages since launch (StreamControlType.FINALIZE and STREAM_CONTROL_TYPE_END are sibling enum values in the internal gRPC schema, and the WS controller has explicit handlers for both parsed.type === "finalize" and parsed.type === "close_stream"). Only the rendered documentation was incomplete.
Root cause for the doc reviewers who want to know: the v4 docs override (fern/apis/waves-v4/overrides/pulse-stt-ws-overrides.yml) and the SDK override (fern/apis/waves/asyncapi/pulse-stt-ws-overrides.yml) both used unprefixed message keys (audioData.message, finalizeSignal.message, …) while the base spec used a pulse* prefix (pulseAudioData.message, pulseFinalizeSignal.message, …). Fern’s merge silently dropped one of the three send operations when it couldn’t resolve refs cleanly. Convention going forward: override message keys must be identical to the base spec’s keys — every other Waves spec layer (TTS WS, Lightning v3.1 WS) already follows this rule.
Migration: nothing for customers. The server-side contract has always allowed both control messages; this is purely a docs render fix.
Also clarified: the ITN feature page’s “Recommended Setup for Agentic Use Cases” section had been recommending close_stream per utterance, which is correct for single-shot transcription but misleading for the multi-turn voice agents the section is named after. Split the recommendation into two paths: finalize per user turn (session stays open, lowest latency between turns) vs close_stream at end of session (terminal). The Python example was split into two snippets that match these two patterns directly.
Also clarified — clearer signal framing: the finalize and close_stream control messages are now documented as a turn-boundary signal vs session-end signal on the API ref (operation summaries) and in the ITN feature page. The base-spec operation summaries used to read “Flush current audio buffer” / “End the audio stream” — accurate but not actionable. They now say what the signal does to the session lifecycle: keep listening vs hang up the socket.
CI lock-in: spec_drift_check.py was extended with an AsyncAPI override-key parity check. Any new override whose channels.<chan>.messages.<KEY> or operations.<KEY> doesn’t exist in the base will fail the gate. Existing deprecated-spec drift (Lightning v2, the legacy /streaming-tts/stream route) is allow-listed with a documented rationale and tracked separately. A new post-deploy smoke check (docs_render_smoke.py, wired into publish-docs.yml) re-fetches the rendered docs after every push to main and asserts every expected operation appears — the same check would have caught this exact bug the day it deployed.
STT WebSocket — sendFinalize added to unified /stt/live API reference + doc fixes
STT WebSocket — sendFinalize added to unified /stt/live API reference + doc fixes
The unified Speech-to-Text WebSocket API reference page now documents both control messages with their correct semantics:
sendFinalize— turn-boundary signal. Flushes the current audio buffer, runs ITN, and emits oneis_final: truetranscript for that turn. The WebSocket stays open for the next user turn. Use this once per turn in any multi-turn flow.sendClose— session-end signal. Flushes remaining audio, emits the terminalis_final: true+is_last: truetranscript, then closes the WebSocket. Use this once at the end of the session.
No wire change. The server has accepted both control messages on this endpoint since launch (confirmed via live probe and the platform repo’s pulse.asr.ws.controller.ts which has explicit handlers for both parsed.type === "finalize" and parsed.type === "close_stream"). Only the spec and docs were incomplete.
ITN feature page updated to match. The “Python — WebSocket with ITN” and “JavaScript — WebSocket with ITN” examples now clearly indicate they show the single-shot pattern, with a pointer to the multi-turn voice-agent example for any flow where the same WebSocket handles multiple user turns. Sending close_stream per turn forces a WebSocket reconnect on every next turn — that’s a documented anti-pattern for voice agents and adds hundreds of ms of connection overhead per turn.
Spec-level cleanup:
- Removed the spurious
is_last: truefield from theclose_streamclient payload schema.is_lastis a server-emitted response field; it’s meaningless in the client-sent control message and the server ignores it. - Added
from_finalize,transcript,full_transcript,language, andlanguagesfields to theTranscriptionEventresponse schema (they were missing from the spec despite being emitted by the server — verified againstpulse.asr.schema.ts’slightningAsrWebsocketResponseDtoSchema).
Migration: none for customers. Any code already on close_stream-per-turn will keep working but pays a reconnect cost on every turn. Switch to finalize per turn + close_stream once at end-of-session to drop turn-2-N latency by 200–800 ms.

