*** title: Metrics Overview description: Key Pulse STT metrics for quality and latency. ----------------------------------------------------------- Pulse STT evaluations revolve around four pillars: 1. **Accuracy** – how close transcripts are to the ground truth. 2. **Latency & throughput** – how quickly and efficiently results arrive. 3. **Enrichment quality** – how reliable diarization, timestamps, and metadata are. ## Accuracy metrics ### Word Error Rate (WER) * Formula: `WER = (Substitutions + Deletions + Insertions) / Total Words`. * Interprets overall transcript fidelity; normalize casing/punctuation before computing. ### Character Error Rate (CER) * Parallel to WER but at the character level; useful for languages with compact scripts or heavy compounding. ### Sentence accuracy * Percentage of sentences that match ground truth exactly. * Tracks readability for QA/customer-support recaps. ## Latency & throughput ### Time to First Result (TTFR) * Measures the delay between request start and first interim token. * For real-time agents, keep TTFR below \~30 ms to maintain natural turn-taking. ### End-to-end latency * Wall-clock time from submission to final transcript. * Report p50/p90/p95 to capture outliers introduced by long files or retries. ### Real-Time Factor (RTF) * `RTF = Processing Time / Audio Duration`. * Values less than 1 indicate faster-than-real-time processing; Pulse STT typically runs near 0.4 RTF on clean inputs. ## Enrichment quality
| Metric | What to watch | Why it matters |
|---|---|---|
| Diarization accuracy |
% of words with correct
speaker_id
|
Call-center QA, coaching, compliance |
| Word timestamp drift | Gap between predicted and reference timestamps | Subtitle alignment and editing |
| Sentence-level timestamps |
% of audio covered by
utterances
segments
|
Chaptering, meeting notes |
| Emotion/age/gender precision | Confidence distribution | Routing, analytics, compliance flags |