Finalize Control

View as Markdown
Real-Time

By default, Pulse STT auto-finalizes the live transcript every few words and again after each end-of-utterance timeout. For agentic pipelines that need precise control over when a final transcript is emitted — for example, holding a transcript open until the agent decides it has enough context — Pulse exposes two parameters that disable or constrain automatic finalization.

When to use it

Most callers leave the defaults alone — automatic finalization gives you natural turn-taking out of the box. Reach for these flags when:

  • You’re building an agentic pipeline and want the model to keep accumulating audio until your code sends a finalize message.
  • You’re using ITN (itn_normalize=true) and want each ITN chunk to stay short for accuracy (long chunks degrade ITN quality).
  • You’re running a dictation or capture workflow and want one big final transcript, not many small ones.

Parameters

ParameterTypeDefaultDescription
finalize_on_words"true" | "false""true"When false, disables Pulse’s automatic word-count-based finalization. The transcript continues to accumulate until you send a {"type":"finalize"} message or {"type":"close_stream"}.
max_wordsinteger(unset = no cap)Maximum number of words allowed in a single transcript chunk before Pulse forces finalization. Useful for keeping ITN chunks short and accurate.

Both parameters are WebSocket-only — they don’t apply to the pre-recorded HTTP endpoint, which always returns one final transcript per request.

Pattern: agentic pipeline with manual finalize

Disable automatic finalization, stream audio for as long as you need, then trigger finalization via the finalize control message. The session stays open and you can keep streaming audio after each finalize.

1const url = new URL("wss://api.smallest.ai/waves/v1/pulse/get_text");
2url.searchParams.append("language", "en");
3url.searchParams.append("encoding", "linear16");
4url.searchParams.append("sample_rate", "16000");
5url.searchParams.append("finalize_on_words", "false"); // disable auto-finalize
6url.searchParams.append("itn_normalize", "true"); // enable ITN
7
8const ws = new WebSocket(url.toString(), {
9 headers: { Authorization: `Bearer ${API_KEY}` },
10});
11
12// Stream PCM audio as binary frames…
13ws.send(audioChunk1);
14ws.send(audioChunk2);
15
16// When YOUR pipeline decides the segment is complete:
17ws.send(JSON.stringify({ type: "finalize" }));
18// Pulse emits an is_final transcript with from_finalize: true.
19// Continue streaming for the next segment.

When you want to end the session entirely, send {"type":"close_stream"} instead of finalize. Pulse flushes any remaining audio, returns one final transcript with is_last: true, and closes the connection.

Pattern: bounded chunks with max_words

Keep the default behavior but cap each chunk’s word count. Helps when ITN is on — long chunks degrade ITN accuracy.

1url.searchParams.append("itn_normalize", "true");
2url.searchParams.append("max_words", "30"); // force finalization every 30 words

Trade-offs

SettingBehaviorBest for
finalize_on_words=true (default)Pulse auto-finalizes every few words + on EOU timeout.Most use cases. Voice agents, conversational AI, transcription.
finalize_on_words=falseNo auto-finalize. You must send {"type":"finalize"} or {"type":"close_stream"} to flush.Agentic pipelines that gate finalization on external state (LLM done thinking, user done talking, etc.).
max_words=N (with default finalize_on_words)Auto-finalize at the earlier of N words or EOU timeout.Long-form ITN-heavy transcription where chunk-size accuracy matters.
finalize_on_words=false + max_words=NNo automatic finalize unless word count hits N. Manual finalize always works.Defensive default for agent pipelines — manual control with a safety cap.

Response field

The transcript response includes a from_finalize boolean when itn_normalize=true, indicating whether the final transcript was triggered by a manual finalize message (true) or by automatic finalization (false). Use it to attribute downstream logic to the correct trigger.

1{
2 "session_id": "sess_…",
3 "transcript": "Hello, how are you doing today?",
4 "is_final": true,
5 "is_last": false,
6 "from_finalize": true,
7 "language": "en"
8}