End-of-Utterance Timeout | Smallest AI Docs

Real-Time

End-of-utterance (EOU) timeout controls how long the model waits in silence after a speaker stops talking before it flushes the transcript as final. Tuning this value lets you balance responsiveness against cutting users off mid-thought.

How It Works

When speech pauses, Pulse starts a silence timer. If no additional speech is detected within the eou_timeout_ms window, the current transcript segment is returned with is_final: true.

Lower values: faster turn detection, but more likely to split natural pauses
Higher values: more tolerant of pauses, but slower finalization

Enabling EOU Timeout

Add eou_timeout_ms to your WebSocket connection query parameters. The value must be an integer from 100 to 10000. Default is 800.

1 const url = new URL("wss://api.smallest.ai/waves/v1/stt/live?model=pulse");
2 url.searchParams.append("language", "en");
3 url.searchParams.append("encoding", "linear16");
4 url.searchParams.append("sample_rate", "16000");
5 url.searchParams.append("eou_timeout_ms", "300"); // fast turn-taking
6 
7 const ws = new WebSocket(url.toString(), {
8   headers: {
9     Authorization: `Bearer ${API_KEY}`,
10   },
11 });

How to Tune It

Start at the default 800 ms, then tune based on your use case:

Decrease for voice agents that need faster turn-taking
Increase for meeting or dictation workflows where speakers pause mid-sentence

Tuning Guide

Value	Behavior	Best for
`200-400ms`	Aggressive - responds quickly after short silence	Voice agents, IVR systems, real-time assistants
`500-800ms`	Balanced (default range) - handles natural pauses	Conversational AI, general-purpose transcription
`1000-2000ms`	Patient - waits through longer pauses	Meeting transcription, dictation, accessibility
`3000ms+`	Very patient - rarely flushes early	Lecture capture, users who pause frequently

Trade-offs

Dimension	Low timeout (e.g. `300ms`)	High timeout (e.g. `2000ms`)
Response speed	Fast - transcript finalizes quickly	Slow - waits longer before flushing
Turn accuracy	May split mid-sentence pauses into separate turns	Captures full thoughts including natural pauses
Best for	Voice agents that need snappy replies	Transcription where completeness matters

Example

A voice agent needs to detect when the caller is done speaking and respond immediately:

$ wss://api.smallest.ai/waves/v1/stt/live?model=pulse&language=en&encoding=linear16&sample_rate=16000&eou_timeout_ms=300

A meeting transcription system should wait for natural pauses:

$ wss://api.smallest.ai/waves/v1/stt/live?model=pulse&language=en&encoding=linear16&sample_rate=16000&eou_timeout_ms=1500