STT Performance page rebuilt against the FLEURS, ESB, and WildASR sources; speaker-cap language scrubbed from the Pulse model card
STT Performance page rebuilt against the FLEURS, ESB, and WildASR sources; speaker-cap language scrubbed from the Pulse model card
The Pulse STT Performance page (/waves/documentation/speech-to-text-pulse/benchmarks/performance) now reflects the canonical benchmark comparison across four datasets:
- FLEURS — pre-recorded (29 languages)
- FLEURS — streaming (8 languages)
- ESB — streaming (9 English domains)
- WildASR — streaming (8 robustness conditions)
- Internal English perturbation suite (12 categories)
Each dataset section includes the source description and the full Smallest Pulse vs Deepgram Nova 2 vs Nova 3 comparison. Several previously published sections that did not come from any source benchmark — “Performance by Language”, “High-Performance Languages”, “Regional Variations”, “Performance by Audio Format”, and “Feature Impact on Performance” — have been removed. The latency-per-language pairing (e.g. “Italian: 4.2% WER, ~64ms latency”) was incorrect; latency is reported once at the top of the page and is not language-specific.
The Pulse model card (/waves/model-cards/speech-to-text/pulse) is also updated:
- Speaker-diarization phrasing such as “capped at 4 speakers; known issues” has been removed from the Key Capabilities card, the Features — Non-streaming table, and the Limitations & Safety bullets.
- The ESB, WildASR, and Internal English perturbation tables are now mirrored into the model card so the per-feature page and the model card stay in sync.
- The single hyperlink on Keyword boosting in the Features — Streaming table has been removed for consistency with the rest of the table (none of the other features link out from the cell).

