Pulse

View as Markdown

Pulse is a high-accuracy, low-latency speech-to-text model built for real-time transcription across 39 languages, with streaming and non-streaming support.

64ms

TTFT at 1 concurrency

300ms

TTFT at 100 concurrency

39 Languages

Streaming + Non-streaming

2 Modes

Streaming + Non-streaming

Model Overview

Developed bySmallest AI
Model typeSpeech-to-Text
Languages39 supported
LicenseProprietary
Model format (non-streaming)pulse_offline_<lang>_<version>.smlst
Model format (streaming)pulse_streaming_<lang>_<version>.smlst
Documentationdocs.smallest.ai/waves
Consoleconsole.smallest.ai
Supportsupport@smallest.ai

Key Capabilities

Real-Time Optimized

Ultra-low latency architecture delivering 64ms TTFT at 1 concurrency and 300ms at 100 concurrent requests — designed for live transcription and conversational AI.

Multi-Language

39 languages supported across streaming and non-streaming modes, with automatic language detection and code-switching within a single session.

PII / PCI Redaction

Built-in redaction of personal and payment card data, enterprise-ready for both streaming and non-streaming use cases.

Speaker Diarization

Automatic multi-speaker identification available across both modes. Streaming diarization is enterprise-ready; non-streaming is available with a cap of 4 speakers.

Noise Reduction

Background noise handling built into the model — enterprise-ready in streaming mode.

Code-Switching

Supports multi-language audio within a single session. Best used by setting the known primary language (e.g. es for Spanish handles English+Spanish automatically).


Performance & Benchmarks

Word Error Rate (WER) by language evaluated on the FLEURS dataset. Lower is better. NA = not available or not supported by that provider.

Evaluation. FLEURS dataset across 32 languages. Competitor numbers sourced from AssemblyAI published benchmarks and Deepgram internal benchmarks.

Streaming

LanguageSmallest PulseDeepgram Nova 2Deepgram Nova 3AssemblyAI UniversalAssemblyAI Uni-3 Pro
English4.5%7.9%6.7%4.38%8.43%
Italian6.47%10.7%6.2%3.29%5.60%
Spanish7.22%8.6%4.1%2.95%7.63%
Portuguese12.41%9.9%7.5%4.80%7.88%
German13.05%8.2%8.5%4.99%11.79%
French14.29%13.3%10.7%7.71%9.59%
Russian16.27%7.9%11.8%5.80%NA
UkrainianNA12.4%NA7.42%NA
PolishNA12.2%NA6.63%NA
Dutch16.54%16.3%12.5%7.79%NA
Czech12.4%22.9%19.2%NANA
Slovak13.5%31.2%NANANA
Swedish18.7%17.7%14.3%NANA
Finnish18.3%14.1%13.2%10.10%NA
Latvian16.5%48.7%NANANA
Romanian17.8%36.0%NANANA
Estonian17.8%49.0%NANANA
Bulgarian24.1%32.7%NANANA
Danish19.8%21.1%16.1%NANA
Hungarian22.5%31.8%28.6%NANA
Maltese25.5%NANANANA
Lithuanian25.1%44.9%NANANA

Pre-recorded

LanguageSmallest PulseDeepgram Nova 2Deepgram Nova 3AssemblyAI Universal
English4.5%7.9%6.7%8.0%
Italian3.0%10.7%6.2%NA
Spanish3.2%8.6%4.1%NA
Portuguese5.0%9.9%7.5%NA
German6.4%8.2%8.5%NA
French7.1%13.3%10.7%NA
Russian9.6%7.9%11.8%NA
Ukrainian7.5%12.4%NANA
Polish10.3%12.2%NANA
Dutch15.0%16.3%12.5%NA
Czech12.4%22.9%19.2%NA
Slovak13.5%31.2%NANA
Swedish18.7%17.7%14.3%NA
Finnish18.3%14.1%13.2%NA
Latvian16.5%48.7%NANA
Romanian17.8%36.0%NANA
Estonian17.8%49.0%NANA
Bulgarian24.1%32.7%NANA
Danish19.8%21.1%16.1%NA
Hungarian22.5%31.8%28.6%NA
Maltese25.5%NANANA
Lithuanian25.1%44.9%NANA

Features — Non-streaming

FeatureAvailableNotes
Speaker diarizationYesCapped at 4 speakers; known issues
PII redactionYesPersonal info redaction
PCI redactionYesPayment card data redaction
Word-level timestampsYesPer-word timing
Sentence-level timestampsYesRequires word_timestamps=true to be enabled
PunctuationYesAuto punctuation
Profanity filterYesExplicit content filtering
Language detectionYesAuto language ID — not enterprise-ready
Code-switchingYesMulti-language in same audio
Noise reductionYesBackground noise handling
Emotion, age and gender detectionYesReturns the percentage score of detected emotion, age, and gender

Features — Streaming

FeatureAvailableNotes
Speaker diarizationYesMulti-speaker identification
Keyword boostingYesCustom vocabulary enhancement
PII redactionYesPersonal info redaction
PCI redactionYesPayment card data redaction
Word-level timestampsYesPer-word timing
Sentence-level timestampsYesPer-sentence timing
PunctuationYesAuto punctuation
Profanity filterNo
Language detectionYesAuto language ID
Code-switchingYesMulti-language in same audio
Custom vocabularyNo
Noise reductionYesBackground noise handling

Supported Languages — Non-streaming

LanguageCodeAvailable
EnglishenYes
ItalianitYes
SpanishesYes
PortugueseptYes
HindihiYes
GermandeYes
FrenchfrYes
UkrainianukYes
RussianruYes
KannadaknYes
MalayalammlYes
PolishplYes
MarathimrYes
GujaratiguYes
CzechcsYes
SlovakskYes
TeluguteYes
Oriya (Odia)orYes
DutchnlYes
BengalibnYes
LatvianlvYes
EstonianetYes
RomanianroYes
PunjabipaYes
FinnishfiYes
SwedishsvYes
BulgarianbgYes
TamiltaYes
HungarianhuYes
DanishdaYes
LithuanianltYes
MaltesemtYes
JapanesejaYes
CantoneseyueYes
MandarinzhYes
KoreankoYes
TagalogtlYes
IndonesianidYes
MalaymsYes

Supported Languages — Streaming

LanguageCodeAvailable
EnglishenYes
ItalianitYes
SpanishesYes
PortugueseptYes
HindihiYes
GermandeYes
FrenchfrYes
UkrainianukYes
RussianruYes
KannadaknYes
MalayalammlYes
PolishplYes
MarathimrYes
GujaratiguYes
CzechcsYes
SlovakskYes
TeluguteYes
Oriya (Odia)orYes
DutchnlYes
BengalibnYes
LatvianlvYes
EstonianetYes
RomanianroYes
PunjabipaYes
FinnishfiYes
SwedishsvYes
BulgarianbgYes
TamiltaYes
HungarianhuYes
DanishdaYes
LithuanianltYes
MaltesemtYes
JapanesejaYes
CantoneseyueYes
MandarinzhYes
KoreankoYes
TagalogtlYes
IndonesianidYes
MalaymsYes

Best Practices

Specify the language parameter when known

When the language of the audio is known in advance, always set it explicitly rather than relying on automatic detection. This yields better transcription accuracy because the model can optimize directly for that language without needing to first identify it.

For example, setting the language parameter to es (Spanish) tells the model to expect Spanish audio, which also handles English+Spanish code-switching scenarios. This produces more accurate outputs compared to using the multi parameter.

ParameterUse case
enEnglish
esSpanish (handles English+Spanish)
hiHindi (handles English+Hindi)
multiUnknown or mixed-language audio only

When to use multi:

  • When the language is truly unknown beforehand
  • When processing audio from varied or unpredictable sources

Use features only when needed

Enable optional features (diarization, PII redaction, timestamps) only when the use case requires them. Unnecessary features add latency.


Use Cases

Direct use

  • Real-time call transcription
  • Voice assistant input
  • Meeting transcription
  • Accessibility and captioning
  • Customer support recording analysis

Downstream use

  • Multi-turn conversational agents
  • Voice-to-text pipelines
  • Telephony and IVR systems
  • Content indexing and search
  • Compliance and audit logging

Limitations & Safety

Known Limitations

Accuracy varies across languages. The following gaps are known and actively being addressed:

  • Hindi — still training on proper nouns and order IDs; not enterprise-ready for non-streaming
  • Low-resource languages — Kannada, Malayalam, Marathi, Gujarati, Telugu, Oriya, Bengali, Punjabi, Tamil, Japanese, Cantonese, Mandarin, Korean, Tagalog, Indonesian, and Malay are available but not yet enterprise-ready
  • Language detection (multi) — automatic language identification does not perform reliably enough for production workloads; specify the known language parameter instead
  • Non-streaming speaker diarization — capped at 4 speakers; known accuracy issues; contact support for higher speaker count requirements
  • Audio quality — transcription accuracy is directly affected by input audio quality; background noise, low bitrate, or overlapping speech may degrade results even with noise reduction enabled
  • Code-switching — works best when the primary language is explicitly set; fully automatic multi-language detection in a single audio stream is not enterprise-ready

Safety & Compliance

Pulse must not be used for:

  • Recording or transcribing individuals without their explicit consent
  • Surveillance, stalking, or any form of unauthorized monitoring
  • Any illegal or unethical purposes

Additionally:

  • Usage is monitored for policy compliance
  • For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai

Contact