Pulse
Pulse is a high-accuracy, low-latency speech-to-text model built for real-time transcription across 38 languages, with streaming and non-streaming support.
TTFT at 1 concurrency
TTFT at 100 concurrency
Streaming + Non-streaming
Streaming + Non-streaming
Model Overview
Key Capabilities
Ultra-low latency architecture delivering 64ms TTFT at 1 concurrency and 300ms at 100 concurrent requests — designed for live transcription and conversational AI.
38 languages supported across streaming and non-streaming modes, with automatic language detection and code-switching within a single session.
Built-in redaction of personal and payment card data across both streaming and non-streaming use cases.
Automatic multi-speaker identification across both streaming and non-streaming modes, with per-word and per-utterance speaker labels.
Background noise handling built into the model.
Supports multi-language audio within a single session. Best used by setting the known primary language (e.g. es for Spanish handles English+Spanish automatically).
Performance & Benchmarks
Pulse STT is evaluated against three open-source datasets — FLEURS, ESB, and WildASR — and one internal English perturbation suite. Word Error Rate (WER) by language. Lower is better. NA = not available or not supported by that provider.
For the full benchmark comparison across every dataset, see the Performance page.
FLEURS — Streaming
European Languages
Indic Languages
FLEURS — Pre-recorded
European Languages
Indic Languages
Hindi — multi-dataset (Streaming)
WER across seven Hindi datasets covering read speech, conversational speech, telephony / contact-center audio, and noise-augmented variants. Compared against IndicWhisper, Sarvam Saaras v3, and Deepgram Nova-3. Lower is better.
For the full breakdown including training-data and evaluation-protocol notes, see the Performance page.
English STT — ESB Dataset (Streaming)
A Hugging Face benchmark suite aggregating 8 English speech datasets across diverse domains (audiobooks, parliament, meetings, finance, etc.) to test STT generalization.
Evaluated on the open-source Hugging Face ESB datasets. Smallest Pulse numbers from internal evaluation.
ASR Robustness — WildASR Dataset (Streaming)
An open-source robustness benchmark designed to stress-test STT under real-world degraded conditions: clipping, far-field capture, background noise, phone codec compression, reverberation, and accented speech.
Evaluated on the open-source WildASR dataset. Smallest Pulse numbers from internal evaluation.
Internal English Perturbation Benchmark
Not a public dataset. The English audio is sliced by perturbation type (Emotion, Entity, Disfluency, Noise, Accent, Silence, Speaker Diversity, Speed, Boundary, Pitch, Audio Quality, Volume) to isolate model weaknesses.
Features — Non-streaming
Features — Streaming
Supported Languages — Non-streaming
Supported Languages — Streaming
Best Practices
Specify the language parameter when known
When the language of the audio is known in advance, always set it explicitly rather than relying on automatic detection. This yields better transcription accuracy because the model can optimize directly for that language without needing to first identify it.
For example, setting the language parameter to es (Spanish) tells the model to expect Spanish audio, which also handles English+Spanish code-switching scenarios. This produces more accurate outputs compared to using multi-eu or multi.
When to use multi-eu or multi:
- When the language is truly unknown beforehand
- When processing audio from varied or unpredictable sources
- Prefer
multi-eufor European-language input; usemultionly for truly mixed multilingual audio
Use Cases
Direct use
- Real-time call transcription
- Voice assistant input
- Meeting transcription
- Accessibility and captioning
- Customer support recording analysis
Downstream use
- Multi-turn conversational agents
- Voice-to-text pipelines
- Telephony and IVR systems
- Content indexing and search
- Compliance and audit logging
Safety & Compliance
Pulse must not be used for:
- Recording or transcribing individuals without their explicit consent
- Surveillance, stalking, or any form of unauthorized monitoring
- Any illegal or unethical purposes
Additionally:
- Usage is monitored for policy compliance
- For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai

