For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
      • Performance
      • Metrics Overview
      • Evaluation Walkthrough
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Accuracy metrics
  • Word Error Rate (WER)
  • Character Error Rate (CER)
  • Sentence accuracy
  • Latency & throughput
  • Time to First Result (TTFR)
  • End-to-end latency
  • Real-Time Factor (RTF)
  • Enrichment quality
  • Coverage & robustness
  • Operational metrics
  • Reporting checklist
Speech to Text (Pulse)Benchmarks

Metrics Overview

||View as Markdown|
Was this page helpful?
Previous

Performance

Next

Evaluation Walkthrough

Built with

Pulse STT evaluations revolve around four pillars:

  1. Accuracy – how close transcripts are to the ground truth.
  2. Latency & throughput – how quickly and efficiently results arrive.
  3. Enrichment quality – how reliable diarization, timestamps, and metadata are.

Accuracy metrics

Word Error Rate (WER)

  • Formula: WER = (Substitutions + Deletions + Insertions) / Total Words.
  • Interprets overall transcript fidelity; normalize casing/punctuation before computing.

Character Error Rate (CER)

  • Parallel to WER but at the character level; useful for languages with compact scripts or heavy compounding.

Sentence accuracy

  • Percentage of sentences that match ground truth exactly.
  • Tracks readability for QA/customer-support recaps.

Latency & throughput

Time to First Result (TTFR)

  • Measures the delay between request start and first interim token.
  • For real-time agents, keep TTFR below ~30 ms to maintain natural turn-taking.

End-to-end latency

  • Wall-clock time from submission to final transcript.
  • Report p50/p90/p95 to capture outliers introduced by long files or retries.

Real-Time Factor (RTF)

  • RTF = Processing Time / Audio Duration.
  • Values less than 1 indicate faster-than-real-time processing; Pulse STT typically runs near 0.4 RTF on clean inputs.

Enrichment quality

MetricWhat to watchWhy it matters
Diarization accuracy% of words with correct speaker_idCall-center QA, coaching, compliance
Word timestamp driftGap between predicted and reference timestampsSubtitle alignment and editing
Sentence-level timestamps% of audio covered by utterances segmentsChaptering, meeting notes
Emotion/gender precisionConfidence distributionRouting, analytics, compliance flags

Coverage & robustness

  • Language detection accuracy: share of files that land on the intended ISO 639-1 code or auto-detected language.
  • Noise robustness: WER delta at multiple SNR levels (clean vs. +5 dB noise, etc.).
  • Accent/domain diversity: track WER per accent or scenario (support, media, meetings) to avoid blind spots.

Operational metrics

  • Requests per second / concurrent sessions: validate you stay within quota and plan scaling needs.
  • Cost per minute: Pulse STT bills per second at $0.025/minute list price—include enrichment toggles when modeling cost.
  • Retry volume: differentiate infrastructure retries (HTTP 5xx) from transcription failures to spot upstream vs downstream issues.

Reporting checklist

  1. Describe dataset composition (language, accent, domain, duration).
  2. Publish WER/CER, TTFR, and RTF with averages and percentiles.
  3. Include enrichment coverage (how many segments include diarization/timestamps).
  4. Summarize cost/latency impact when enabling optional features.
  5. Link to reproducible scripts or notebooks for auditing.