For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
LogoLogo
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Models
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Word Timestamps
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
      • Quickstart
      • Response Format
      • Audio Formats
      • Features
      • Troubleshooting
      • Best Practices
      • Code Examples
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
  • Models
  • Quickstart
  • Overview
  • Sync & Async
  • Streaming
  • Word Timestamps
  • Pronunciation Dictionaries
  • Voices & Languages
  • HTTP vs Streaming vs WebSockets
  • Performance
  • Metrics Overview
  • Quickstart
  • Overview
  • Quickstart
  • Audio Formats
  • Webhooks
  • Features
  • Troubleshooting
  • Best Practices
  • Code Examples
  • Quickstart
  • Response Format
  • Audio Formats
  • Features
  • Troubleshooting
  • Best Practices
  • Code Examples
  • Word Timestamps
  • Language Detection
  • Utterances
  • Diarization
  • Redaction
  • Gender Detection
  • Emotion Detection
  • Keyword Boosting
  • Punctuation Formatting
  • End-of-Utterance Timeout
  • Inverse Text Normalization
  • Finalize Control
  • VAD Events
  • Performance
  • Metrics Overview
  • Evaluation Walkthrough
  • Measuring Latency
  • Overview
  • Quickstart
  • WebSocket connection
  • Managing sessions
  • Audio I/O
  • Turn detection & barge-in
  • Tool calling
  • Prompting voice agents
  • Errors & reconnection
  • Performance
  • Metrics Overview
  • Quickstart
  • Overview
  • Chat Completions
  • Streaming
  • Tool / Function Calling
  • Prefix Caching
  • Supported Parameters
  • Migrate from OpenAI
  • Best Practices
  • Speech to Text
  • Text to Speech
  • Voice Agent (Electron + Pulse + Lightning)
  • Instant Clone (UI)
  • Instant Clone (API)
  • Instant Clone (Python SDK)
  • Delete Cloned Voice
  • Voice Cloning Best Practices
  • TTS Best Practices
  • Error reference
On this page
  • Available Features
Speech to Text (Pulse)Realtime (WebSocket)

Features

||View as Markdown|

The Real-Time Pulse STT WebSocket API supports the following features:

Available Features

Word Timestamps

Get precise timing information for each word in the transcription with confidence scores

Language Detection

Automatically detect the language of the audio

Sentence Timestamps (Utterances)

Get sentence-level transcription segments with timing information

PII & PCI Redaction

Automatically redact personally identifiable information and payment card information

Speaker Diarization

Identify and label different speakers in the audio with speaker confidence scores

Keyword Boosting

Boost recognition accuracy for specific words, brand names, and domain terms

Punctuation Formatting

Control punctuation and capitalization formatting in transcripts

End-of-Utterance Timeout

Control how long Pulse waits after speech ends before finalizing the transcript

Inverse Text Normalization

Convert spoken-form numbers, dates, and currencies into written form

Finalize Control

Take manual control of when transcripts are finalized using finalize_on_words and max_words

Was this page helpful?
Previous

Audio Specifications

Next

Troubleshooting

Built with
Voice AgentsModels
Voice AgentsModels