For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
      • Word Timestamps
      • Language Detection
      • Utterances
      • Diarization
      • Redaction
      • Gender Detection
      • Emotion Detection
      • Keyword Boosting
      • Punctuation Formatting
      • End-of-Utterance Timeout
      • Inverse Text Normalization
      • Finalize Control
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • How It Works
  • Enabling EOU Timeout
  • Real-Time WebSocket API
  • How to Tune It
  • Tuning Guide
  • Trade-offs
  • Example
Speech to Text (Pulse)Features

End-of-Utterance Timeout

||View as Markdown|
Was this page helpful?
Previous

Punctuation Formatting

Next

Inverse Text Normalization (ITN)

Built with
Real-Time

End-of-utterance (EOU) timeout controls how long the model waits in silence after a speaker stops talking before it flushes the transcript as final. Tuning this value lets you balance responsiveness against cutting users off mid-thought.

How It Works

When speech pauses, Pulse starts a silence timer. If no additional speech is detected within the eou_timeout_ms window, the current transcript segment is returned with is_final: true.

  • Lower values: faster turn detection, but more likely to split natural pauses
  • Higher values: more tolerant of pauses, but slower finalization

Enabling EOU Timeout

EOU timeout is currently only available for the Real-Time WebSocket API.

Add eou_timeout_ms to your WebSocket connection query parameters. The value must be an integer from 100 to 10000. Default is 800.

Real-Time WebSocket API

1const url = new URL("wss://api.smallest.ai/waves/v1/pulse/get_text");
2url.searchParams.append("language", "en");
3url.searchParams.append("encoding", "linear16");
4url.searchParams.append("sample_rate", "16000");
5url.searchParams.append("eou_timeout_ms", "300"); // fast turn-taking
6
7const ws = new WebSocket(url.toString(), {
8 headers: {
9 Authorization: `Bearer ${API_KEY}`,
10 },
11});

How to Tune It

Start at the default 800 ms, then tune based on your use case:

  • Decrease for voice agents that need faster turn-taking
  • Increase for meeting or dictation workflows where speakers pause mid-sentence

Tuning Guide

ValueBehaviorBest for
200-400msAggressive - responds quickly after short silenceVoice agents, IVR systems, real-time assistants
500-800msBalanced (default range) - handles natural pausesConversational AI, general-purpose transcription
1000-2000msPatient - waits through longer pausesMeeting transcription, dictation, accessibility
3000ms+Very patient - rarely flushes earlyLecture capture, users who pause frequently

Trade-offs

DimensionLow timeout (e.g. 300ms)High timeout (e.g. 2000ms)
Response speedFast - transcript finalizes quicklySlow - waits longer before flushing
Turn accuracyMay split mid-sentence pauses into separate turnsCaptures full thoughts including natural pauses
Best forVoice agents that need snappy repliesTranscription where completeness matters

Example

A voice agent needs to detect when the caller is done speaking and respond immediately:

$wss://api.smallest.ai/waves/v1/pulse/get_text?language=en&encoding=linear16&sample_rate=16000&eou_timeout_ms=300

A meeting transcription system should wait for natural pauses:

$wss://api.smallest.ai/waves/v1/pulse/get_text?language=en&encoding=linear16&sample_rate=16000&eou_timeout_ms=1500