Pulse

Pulse is a high-accuracy, low-latency speech-to-text model built for real-time transcription across 38 languages, with streaming and non-streaming support.

64ms

TTFT at 1 concurrency

300ms

TTFT at 100 concurrency

38 Languages

Streaming + Non-streaming

2 Modes

Streaming + Non-streaming

Model Overview


Developed by	Smallest AI
Model type	Speech-to-Text
Languages	38 supported (plus `multi`, `multi-eu`, `multi-indic`, `multi-asian` aggregators)
License	Proprietary
Model format (non-streaming)	`pulse_offline_<lang>_<version>.smlst`
Model format (streaming)	`pulse_streaming_<lang>_<version>.smlst`
Documentation	docs.smallest.ai/waves
Console	app.smallest.ai/dashboard
Support	support@smallest.ai

Key Capabilities

Real-Time Optimized

Ultra-low latency architecture delivering 64ms TTFT at 1 concurrency and 300ms at 100 concurrent requests — designed for live transcription and conversational AI.

Multi-Language

38 languages supported across streaming and non-streaming modes, with automatic language detection and code-switching within a single session.

PII / PCI Redaction

Built-in redaction of personal and payment card data across both streaming and non-streaming use cases.

Speaker Diarization

Automatic multi-speaker identification across both streaming and non-streaming modes, with per-word and per-utterance speaker labels.

Noise Reduction

Background noise handling built into the model.

Code-Switching

Supports multi-language audio within a single session. Best used by setting the known primary language (e.g. es for Spanish handles English+Spanish automatically).

Performance & Benchmarks

Pulse STT is evaluated against three open-source datasets — FLEURS, ESB, and WildASR — and one internal English perturbation suite. Word Error Rate (WER) by language. Lower is better. NA = not available or not supported by that provider.

For the full benchmark comparison across every dataset, see the Performance page.

FLEURS Streaming — English

WER on the English subset of FLEURS across providers in streaming mode. Lower is better.

Provider	Smallest Pulse	Assembly Universal 3 Pro	AWS transcribe	Azure	Deepgram Nova 3	Grok	Sarvam Saras 3	ElevenLabs Scribe V2
WER	6.03%	3.13%	6.54%	13.79%	11.59%	60.00%	6.34%	3.88%

A note on audio amplitude normalization

Audio amplitude normalization materially changes WER on FLEURS. Most competitors benchmark on raw FLEURS — which has variable, often low amplitude — without normalizing peak audio to −10 dBFS. This makes some models look much better than they actually are. Pulse is stable across all amplitude regimes.

Model	Raw FLEURS	−10 dBFS	−20 dBFS	Stable across regimes?
Pulse	6.03%	6.06%	5.81%	Yes
Deepgram Nova 3	11.59%	6.57%	6.51%	Partial — 1.8× degradation on raw
Grok	60.00%	7.58%	8.59%	Collapses on raw

FLEURS — Streaming

European Languages

Indic Languages

Language	Smallest Pulse	Deepgram Nova 2	Deepgram Nova 3
Italian	4.41%	11.05%	6.99%
English	6.03%	15.59%	11.21%
Spanish	5.99%	10.67%	7.52%
Portuguese	8.32%	14.15%	11.46%
German	9.5%	11.1%	10.15%
French	10.71%	14.3%	12.07%
Russian	14.35%	NA	NA
Dutch	11.90%	NA	NA

FLEURS — Pre-recorded

European Languages

Indic Languages

Language	Smallest Pulse	Deepgram Nova 2	Deepgram Nova 3
English	4.55%	7.9%	6.7%
Italian	3.0%	10.7%	6.2%
Spanish	3.2%	8.6%	4.1%
Portuguese	5.0%	9.9%	7.5%
German	6.4%	8.2%	8.5%
French	7.1%	13.3%	10.7%
Russian	9.6%	7.9%	11.8%
Ukrainian	7.5%	12.4%	NA
Polish	10.3%	12.2%	NA
Dutch	15.0%	16.3%	12.5%
Czech	12.4%	22.9%	19.2%
Slovak	13.5%	31.2%	NA
Swedish	18.7%	17.7%	14.3%
Finnish	18.3%	14.1%	13.2%
Latvian	16.5%	48.7%	NA
Romanian	17.8%	36.0%	NA
Estonian	17.8%	49.0%	NA
Bulgarian	24.1%	32.7%	NA
Danish	19.8%	21.1%	16.1%
Hungarian	22.5%	31.8%	28.6%
Maltese	25.5%	NA	NA
Lithuanian	25.1%	44.9%	NA

Hindi — multi-dataset (Streaming)

WER across seven Hindi datasets covering read speech, conversational speech, telephony / contact-center audio, and noise-augmented variants. Compared against IndicWhisper, Sarvam Saaras v3, and Deepgram Nova-3. Lower is better.

Dataset	Smallest Pulse	IndicWhisper	Sarvam Saaras v3	Deepgram Nova-3
FLEURS	9.55	15.00	8.31	14.09
Kathbath	9.71	10.30	8.15	16.22
Kathbath (noisy)	10.94	12.00	10.81	17.06
Common Voice	11.20	11.40	11.36	23.55
Indic-TTS	6.39	7.60	6.49	10.72
MUCS	9.19	12.00	8.96	16.20
Gramvaani	21.43	26.80	21.80	31.44

For the full breakdown including training-data and evaluation-protocol notes, see the Performance page.

English STT — ESB Dataset (Streaming)

A Hugging Face benchmark suite aggregating 9 English speech datasets across diverse domains (audiobooks, parliament, meetings, finance, etc.) to test STT generalization. Lower WER is better.

Evaluated on the open-source Hugging Face ESB datasets. Numbers from internal evaluation.

Dataset	Smallest Pulse	Assembly Universal 3 Pro	AWS Transcribe	Azure	Deepgram Nova 3	Grok	Sarvam Saras V3	ElevenLabs Scribe V2
LibriSpeech Clean	2.46	1.65	2.16	2.48	3.20	3.61	3.09	1.97
LibriSpeech Other	5.31	2.86	4.88	5.74	6.60	7.28	6.85	4.45
Common Voice	10.89	6.73	10.69	47.28	14.22	43.46	11.37	9.83
VoxPopuli	7.16	7.28	7.07	14.10	9.55	11.49	7.77	7.91
TED-LIUM	4.07	2.95	2.66	3.81	3.59	6.90	2.89	3.16
GigaSpeech	10.43	9.12	10.09	5.35	10.05	10.05	9.57	9.66
SPGISpeech	2.86	1.74	4.18	3.53	2.99	9.70	3.89	4.40
Earnings22	12.25	11.52	12.21	8.54	15.79	27.02	11.97	12.20
AMI	10.58	14.60	13.19	8.46	17.04	19.19	13.08	12.23
Aggregate	7.33	6.49	7.46	11.03	9.23	15.41	7.83	7.31

ASR Robustness — WildASR Dataset (Streaming)

An open-source robustness benchmark designed to stress-test STT under real-world degraded conditions: clipping, far-field capture, background noise, phone codec compression, reverberation, and accented speech. Lower WER is better. n/a = not supported by that provider.

Evaluated on the open-source WildASR dataset. Numbers from internal evaluation.

Dataset	Smallest Pulse	Assembly Universal 3 pro	AWS Transcribe	Azure	Deepgram Nova 3	Sarvam Saras V3	ElevenLabs Scribe
Clean	5.98	3.33	7.01	11.11	11.62	7.02	4.24
Clipping	14.03	6.59	42.10	4.35	47.35	28.74	11.20
Far-field	13.38	26.07	38.76	n/a	62.99	21.27	7.38
Noise Gap	8.90	4.04	9.77	n/a	15.04	9.74	6.30
Phone Codec	7.19	3.45	8.70	n/a	9.13	10.64	4.98
Reverberation	9.06	23.50	14.83	n/a	27.27	4.35	6.48
Accent	5.82	2.80	4.45	n/a	7.31	n/a	4.01
Aggregate	9.63	12.52	18.35	8.82	28.17	17.75	6.47

Internal English Perturbation Benchmark

Not a public dataset. The English audio is sliced by perturbation type (Noise, Silence, Telephony 911, Boundary, Disfluency, Long Audios, Repetition, Entity, Accent, Emotion, Speaker Diversity, Speed, Pitch, Volume, Audio Quality) to isolate model weaknesses. Lower WER is better.

Category	Smallest Pulse	Assembly Universal 3 Pro	AWS Transcribe	Deepgram Nova 3	ElevenLabs Scribe
Noise	10.53	11.93	14.19	14.58	10.05
Silence	5.81	4.22	8.22	13.28	10.61
Telephony 911	21.03	23.93	27.88	28.43	20.29
Boundary	2.83	3.09	3.18	3.66	1.73
Disfluency	7.68	7.81	9.23	8.62	9.29
Long Audios	12.81	8.58	11.66	11.16	9.25
Repetition	11.38	9.82	10.39	9.57	10.81
Entity	12.43	10.13	13.35	11.69	9.48
Accent	8.68	7.89	9.51	10.42	7.25
Emotion	13.92	16.34	18.57	18.07	11.84
Speaker Diversity	7.33	6.72	8.81	9.48	5.95
Speed	4.32	3.63	4.40	6.88	3.74
Pitch	2.93	3.07	3.21	4.07	1.61
Volume	2.37	3.05	2.41	3.67	1.47
Audio Quality	2.73	2.86	3.03	4.08	1.60
Average WER	8.45	8.20	9.87	10.51	7.66

Internal Hindi Perturbation Benchmark

Not a public dataset. Hindi audio sliced by perturbation type to isolate model weaknesses. Lower WER is better except for Entity EDR where higher is better (↑).

Category	Smallest Pulse	Sarvam Saras V3	Deepgram Nova 3
Noise	15.75%	22.18%	21.52%
Silence	9.72%	11.38%	18.40%
Entity	10.82%	17.36%	14.67%
Entity NE-WER	13.32%	26.72%	26.58%
Entity EDR (↑)	83.13%	76.13%	67.80%
Boundary	11.99%	17.52%	17.36%
Long Audios	18.11%	18.42%	19.21%
Speed	16.37%	21.39%	38.21%
Pitch	11.81%	11.92%	19.59%
Audio Quality	10.65%	11.75%	19.51%
Volume	8.21%	15.25%	16.76%
Disfluency	11.83%	12.06%	18.44%
Repetition	11.44%	11.27%	20.40%

Features — Non-streaming

Feature	Available	Notes
Speaker diarization	Yes	Multi-speaker identification
PII redaction	Yes	Personal info redaction
PCI redaction	Yes	Payment card data redaction
Word-level timestamps	Yes	Per-word timing
Sentence-level timestamps	Yes	Requires `word_timestamps=true` to be enabled
Punctuation	Yes	Auto punctuation
Profanity filter	Yes	Explicit content filtering
Language detection	Yes	Auto language ID
Code-switching	Yes	Multi-language in same audio
Noise reduction	Yes	Background noise handling
Emotion and gender detection	Yes	Returns the percentage score of detected emotion and gender

Features — Streaming

Feature	Available	Notes
Speaker diarization	Yes	Multi-speaker identification
Keyword boosting	Yes	Custom vocabulary enhancement
PII redaction	Yes	Personal info redaction
PCI redaction	Yes	Payment card data redaction
Word-level timestamps	Yes	Per-word timing
Sentence-level timestamps	Yes	Per-sentence timing
Punctuation	Yes	Auto punctuation
Profanity filter	No	—
Language detection	Yes	Auto language ID
Code-switching	Yes	Multi-language in same audio
Custom vocabulary	No	—
Noise reduction	Yes	Background noise handling

Supported Languages — Non-streaming

Language	Code	Available
English	`en`	Yes
Italian	`it`	Yes
Spanish	`es`	Yes
Portuguese	`pt`	Yes
Hindi	`hi`	Yes
German	`de`	Yes
French	`fr`	Yes
Ukrainian	`uk`	Yes
Russian	`ru`	Yes
Kannada	`kn`	Yes
Malayalam	`ml`	Yes
Polish	`pl`	Yes
Marathi	`mr`	Yes
Gujarati	`gu`	Yes
Czech	`cs`	Yes
Slovak	`sk`	Yes
Telugu	`te`	Yes
Oriya (Odia)	`or`	Yes
Dutch	`nl`	Yes
Bengali	`bn`	Yes
Latvian	`lv`	Yes
Estonian	`et`	Yes
Romanian	`ro`	Yes
Punjabi	`pa`	Yes
Finnish	`fi`	Yes
Swedish	`sv`	Yes
Bulgarian	`bg`	Yes
Tamil	`ta`	Yes
Hungarian	`hu`	Yes
Danish	`da`	Yes
Lithuanian	`lt`	Yes
Maltese	`mt`	Yes
Japanese	`ja`	Yes
Cantonese	`yue`	Yes
Mandarin	`zh`	Yes
Korean	`ko`	Yes
Tagalog	`tl`	Yes
Indonesian	`id`	Yes
Malay	`ms`	Yes

Supported Languages — Streaming

Language	Code	Available
English	`en`	Yes
Italian	`it`	Yes
Spanish	`es`	Yes
Portuguese	`pt`	Yes
Hindi	`hi`	Yes
German	`de`	Yes
French	`fr`	Yes
Ukrainian	`uk`	Yes
Russian	`ru`	Yes
Kannada	`kn`	Yes
Malayalam	`ml`	Yes
Polish	`pl`	Yes
Marathi	`mr`	Yes
Gujarati	`gu`	Yes
Czech	`cs`	Yes
Slovak	`sk`	Yes
Telugu	`te`	Yes
Oriya (Odia)	`or`	Yes
Dutch	`nl`	Yes
Bengali	`bn`	Yes
Latvian	`lv`	Yes
Estonian	`et`	Yes
Romanian	`ro`	Yes
Punjabi	`pa`	Yes
Finnish	`fi`	Yes
Swedish	`sv`	Yes
Bulgarian	`bg`	Yes
Tamil	`ta`	Yes
Hungarian	`hu`	Yes
Danish	`da`	Yes
Lithuanian	`lt`	Yes
Maltese	`mt`	Yes
Japanese	`ja`	Yes
Cantonese	`yue`	Yes
Mandarin	`zh`	Yes
Korean	`ko`	Yes
Tagalog	`tl`	Yes
Indonesian	`id`	Yes
Malay	`ms`	Yes

Best Practices

Specify the language parameter when known

When the language of the audio is known in advance, always set it explicitly rather than relying on automatic detection. This yields better transcription accuracy because the model can optimize directly for that language without needing to first identify it.

For example, setting the language parameter to es (Spanish) tells the model to expect Spanish audio, which also handles English+Spanish code-switching scenarios. This produces more accurate outputs compared to using multi-eu or multi.

Parameter	Use case
`en`	English
`es`	Spanish (handles English+Spanish)
`hi`	Hindi (handles English+Hindi)
`multi-eu`	Unknown European-language audio (auto-detects across the European set)
`multi`	Truly unknown or mixed-language audio (full multilingual auto-detection)

When to use multi-eu or multi:

When the language is truly unknown beforehand
When processing audio from varied or unpredictable sources
Prefer multi-eu for European-language input; use multi only for truly mixed multilingual audio

Use Cases

Direct use

Real-time call transcription
Voice assistant input
Meeting transcription
Accessibility and captioning
Customer support recording analysis

Downstream use

Multi-turn conversational agents
Voice-to-text pipelines
Telephony and IVR systems
Content indexing and search
Compliance and audit logging

Safety & Compliance

Pulse must not be used for:

Recording or transcribing individuals without their explicit consent
Surveillance, stalking, or any form of unauthorized monitoring
Any illegal or unethical purposes

Additionally:

Usage is monitored for policy compliance
For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai

Contact


Support	support@smallest.ai
Documentation	docs.smallest.ai/waves
Console	app.smallest.ai/dashboard

Pulse is a high-accuracy, low-latency speech-to-text model built for real-time transcription across 38 languages, with streaming and non-streaming support.

64ms

TTFT at 1 concurrency

300ms

TTFT at 100 concurrency

38 Languages

Streaming + Non-streaming

2 Modes

Streaming + Non-streaming

Model Overview


Developed by	Smallest AI
Model type	Speech-to-Text
Languages	38 supported (plus `multi`, `multi-eu`, `multi-indic`, `multi-asian` aggregators)
License	Proprietary
Model format (non-streaming)	`pulse_offline_<lang>_<version>.smlst`
Model format (streaming)	`pulse_streaming_<lang>_<version>.smlst`
Documentation	docs.smallest.ai/waves
Console	app.smallest.ai/dashboard
Support	support@smallest.ai

Key Capabilities

Real-Time Optimized

Ultra-low latency architecture delivering 64ms TTFT at 1 concurrency and 300ms at 100 concurrent requests — designed for live transcription and conversational AI.

Multi-Language

38 languages supported across streaming and non-streaming modes, with automatic language detection and code-switching within a single session.

PII / PCI Redaction

Built-in redaction of personal and payment card data across both streaming and non-streaming use cases.

Speaker Diarization

Automatic multi-speaker identification across both streaming and non-streaming modes, with per-word and per-utterance speaker labels.

Noise Reduction

Background noise handling built into the model.

Code-Switching

Supports multi-language audio within a single session. Best used by setting the known primary language (e.g. es for Spanish handles English+Spanish automatically).

Performance & Benchmarks

For the full benchmark comparison across every dataset, see the Performance page.

FLEURS Streaming — English

WER on the English subset of FLEURS across providers in streaming mode. Lower is better.

Provider	Smallest Pulse	Assembly Universal 3 Pro	AWS transcribe	Azure	Deepgram Nova 3	Grok	Sarvam Saras 3	ElevenLabs Scribe V2
WER	6.03%	3.13%	6.54%	13.79%	11.59%	60.00%	6.34%	3.88%

A note on audio amplitude normalization

Model	Raw FLEURS	−10 dBFS	−20 dBFS	Stable across regimes?
Pulse	6.03%	6.06%	5.81%	Yes
Deepgram Nova 3	11.59%	6.57%	6.51%	Partial — 1.8× degradation on raw
Grok	60.00%	7.58%	8.59%	Collapses on raw

FLEURS — Streaming

European Languages

Indic Languages

Language	Smallest Pulse	Deepgram Nova 2	Deepgram Nova 3
Italian	4.41%	11.05%	6.99%
English	6.03%	15.59%	11.21%
Spanish	5.99%	10.67%	7.52%
Portuguese	8.32%	14.15%	11.46%
German	9.5%	11.1%	10.15%
French	10.71%	14.3%	12.07%
Russian	14.35%	NA	NA
Dutch	11.90%	NA	NA

FLEURS — Pre-recorded

European Languages

Indic Languages

Language	Smallest Pulse	Deepgram Nova 2	Deepgram Nova 3
English	4.55%	7.9%	6.7%
Italian	3.0%	10.7%	6.2%
Spanish	3.2%	8.6%	4.1%
Portuguese	5.0%	9.9%	7.5%
German	6.4%	8.2%	8.5%
French	7.1%	13.3%	10.7%
Russian	9.6%	7.9%	11.8%
Ukrainian	7.5%	12.4%	NA
Polish	10.3%	12.2%	NA
Dutch	15.0%	16.3%	12.5%
Czech	12.4%	22.9%	19.2%
Slovak	13.5%	31.2%	NA
Swedish	18.7%	17.7%	14.3%
Finnish	18.3%	14.1%	13.2%
Latvian	16.5%	48.7%	NA
Romanian	17.8%	36.0%	NA
Estonian	17.8%	49.0%	NA
Bulgarian	24.1%	32.7%	NA
Danish	19.8%	21.1%	16.1%
Hungarian	22.5%	31.8%	28.6%
Maltese	25.5%	NA	NA
Lithuanian	25.1%	44.9%	NA

Hindi — multi-dataset (Streaming)

Dataset	Smallest Pulse	IndicWhisper	Sarvam Saaras v3	Deepgram Nova-3
FLEURS	9.55	15.00	8.31	14.09
Kathbath	9.71	10.30	8.15	16.22
Kathbath (noisy)	10.94	12.00	10.81	17.06
Common Voice	11.20	11.40	11.36	23.55
Indic-TTS	6.39	7.60	6.49	10.72
MUCS	9.19	12.00	8.96	16.20
Gramvaani	21.43	26.80	21.80	31.44

For the full breakdown including training-data and evaluation-protocol notes, see the Performance page.

English STT — ESB Dataset (Streaming)

A Hugging Face benchmark suite aggregating 9 English speech datasets across diverse domains (audiobooks, parliament, meetings, finance, etc.) to test STT generalization. Lower WER is better.

Evaluated on the open-source Hugging Face ESB datasets. Numbers from internal evaluation.

Dataset	Smallest Pulse	Assembly Universal 3 Pro	AWS Transcribe	Azure	Deepgram Nova 3	Grok	Sarvam Saras V3	ElevenLabs Scribe V2
LibriSpeech Clean	2.46	1.65	2.16	2.48	3.20	3.61	3.09	1.97
LibriSpeech Other	5.31	2.86	4.88	5.74	6.60	7.28	6.85	4.45
Common Voice	10.89	6.73	10.69	47.28	14.22	43.46	11.37	9.83
VoxPopuli	7.16	7.28	7.07	14.10	9.55	11.49	7.77	7.91
TED-LIUM	4.07	2.95	2.66	3.81	3.59	6.90	2.89	3.16
GigaSpeech	10.43	9.12	10.09	5.35	10.05	10.05	9.57	9.66
SPGISpeech	2.86	1.74	4.18	3.53	2.99	9.70	3.89	4.40
Earnings22	12.25	11.52	12.21	8.54	15.79	27.02	11.97	12.20
AMI	10.58	14.60	13.19	8.46	17.04	19.19	13.08	12.23
Aggregate	7.33	6.49	7.46	11.03	9.23	15.41	7.83	7.31

ASR Robustness — WildASR Dataset (Streaming)

Evaluated on the open-source WildASR dataset. Numbers from internal evaluation.

Dataset	Smallest Pulse	Assembly Universal 3 pro	AWS Transcribe	Azure	Deepgram Nova 3	Sarvam Saras V3	ElevenLabs Scribe
Clean	5.98	3.33	7.01	11.11	11.62	7.02	4.24
Clipping	14.03	6.59	42.10	4.35	47.35	28.74	11.20
Far-field	13.38	26.07	38.76	n/a	62.99	21.27	7.38
Noise Gap	8.90	4.04	9.77	n/a	15.04	9.74	6.30
Phone Codec	7.19	3.45	8.70	n/a	9.13	10.64	4.98
Reverberation	9.06	23.50	14.83	n/a	27.27	4.35	6.48
Accent	5.82	2.80	4.45	n/a	7.31	n/a	4.01
Aggregate	9.63	12.52	18.35	8.82	28.17	17.75	6.47

Internal English Perturbation Benchmark

Category	Smallest Pulse	Assembly Universal 3 Pro	AWS Transcribe	Deepgram Nova 3	ElevenLabs Scribe
Noise	10.53	11.93	14.19	14.58	10.05
Silence	5.81	4.22	8.22	13.28	10.61
Telephony 911	21.03	23.93	27.88	28.43	20.29
Boundary	2.83	3.09	3.18	3.66	1.73
Disfluency	7.68	7.81	9.23	8.62	9.29
Long Audios	12.81	8.58	11.66	11.16	9.25
Repetition	11.38	9.82	10.39	9.57	10.81
Entity	12.43	10.13	13.35	11.69	9.48
Accent	8.68	7.89	9.51	10.42	7.25
Emotion	13.92	16.34	18.57	18.07	11.84
Speaker Diversity	7.33	6.72	8.81	9.48	5.95
Speed	4.32	3.63	4.40	6.88	3.74
Pitch	2.93	3.07	3.21	4.07	1.61
Volume	2.37	3.05	2.41	3.67	1.47
Audio Quality	2.73	2.86	3.03	4.08	1.60
Average WER	8.45	8.20	9.87	10.51	7.66

Internal Hindi Perturbation Benchmark

Not a public dataset. Hindi audio sliced by perturbation type to isolate model weaknesses. Lower WER is better except for Entity EDR where higher is better (↑).

Category	Smallest Pulse	Sarvam Saras V3	Deepgram Nova 3
Noise	15.75%	22.18%	21.52%
Silence	9.72%	11.38%	18.40%
Entity	10.82%	17.36%	14.67%
Entity NE-WER	13.32%	26.72%	26.58%
Entity EDR (↑)	83.13%	76.13%	67.80%
Boundary	11.99%	17.52%	17.36%
Long Audios	18.11%	18.42%	19.21%
Speed	16.37%	21.39%	38.21%
Pitch	11.81%	11.92%	19.59%
Audio Quality	10.65%	11.75%	19.51%
Volume	8.21%	15.25%	16.76%
Disfluency	11.83%	12.06%	18.44%
Repetition	11.44%	11.27%	20.40%

Features — Non-streaming

Feature	Available	Notes
Speaker diarization	Yes	Multi-speaker identification
PII redaction	Yes	Personal info redaction
PCI redaction	Yes	Payment card data redaction
Word-level timestamps	Yes	Per-word timing
Sentence-level timestamps	Yes	Requires `word_timestamps=true` to be enabled
Punctuation	Yes	Auto punctuation
Profanity filter	Yes	Explicit content filtering
Language detection	Yes	Auto language ID
Code-switching	Yes	Multi-language in same audio
Noise reduction	Yes	Background noise handling
Emotion and gender detection	Yes	Returns the percentage score of detected emotion and gender

Features — Streaming

Feature	Available	Notes
Speaker diarization	Yes	Multi-speaker identification
Keyword boosting	Yes	Custom vocabulary enhancement
PII redaction	Yes	Personal info redaction
PCI redaction	Yes	Payment card data redaction
Word-level timestamps	Yes	Per-word timing
Sentence-level timestamps	Yes	Per-sentence timing
Punctuation	Yes	Auto punctuation
Profanity filter	No	—
Language detection	Yes	Auto language ID
Code-switching	Yes	Multi-language in same audio
Custom vocabulary	No	—
Noise reduction	Yes	Background noise handling

Supported Languages — Non-streaming

Language	Code	Available
English	`en`	Yes
Italian	`it`	Yes
Spanish	`es`	Yes
Portuguese	`pt`	Yes
Hindi	`hi`	Yes
German	`de`	Yes
French	`fr`	Yes
Ukrainian	`uk`	Yes
Russian	`ru`	Yes
Kannada	`kn`	Yes
Malayalam	`ml`	Yes
Polish	`pl`	Yes
Marathi	`mr`	Yes
Gujarati	`gu`	Yes
Czech	`cs`	Yes
Slovak	`sk`	Yes
Telugu	`te`	Yes
Oriya (Odia)	`or`	Yes
Dutch	`nl`	Yes
Bengali	`bn`	Yes
Latvian	`lv`	Yes
Estonian	`et`	Yes
Romanian	`ro`	Yes
Punjabi	`pa`	Yes
Finnish	`fi`	Yes
Swedish	`sv`	Yes
Bulgarian	`bg`	Yes
Tamil	`ta`	Yes
Hungarian	`hu`	Yes
Danish	`da`	Yes
Lithuanian	`lt`	Yes
Maltese	`mt`	Yes
Japanese	`ja`	Yes
Cantonese	`yue`	Yes
Mandarin	`zh`	Yes
Korean	`ko`	Yes
Tagalog	`tl`	Yes
Indonesian	`id`	Yes
Malay	`ms`	Yes

Supported Languages — Streaming

Language	Code	Available
English	`en`	Yes
Italian	`it`	Yes
Spanish	`es`	Yes
Portuguese	`pt`	Yes
Hindi	`hi`	Yes
German	`de`	Yes
French	`fr`	Yes
Ukrainian	`uk`	Yes
Russian	`ru`	Yes
Kannada	`kn`	Yes
Malayalam	`ml`	Yes
Polish	`pl`	Yes
Marathi	`mr`	Yes
Gujarati	`gu`	Yes
Czech	`cs`	Yes
Slovak	`sk`	Yes
Telugu	`te`	Yes
Oriya (Odia)	`or`	Yes
Dutch	`nl`	Yes
Bengali	`bn`	Yes
Latvian	`lv`	Yes
Estonian	`et`	Yes
Romanian	`ro`	Yes
Punjabi	`pa`	Yes
Finnish	`fi`	Yes
Swedish	`sv`	Yes
Bulgarian	`bg`	Yes
Tamil	`ta`	Yes
Hungarian	`hu`	Yes
Danish	`da`	Yes
Lithuanian	`lt`	Yes
Maltese	`mt`	Yes
Japanese	`ja`	Yes
Cantonese	`yue`	Yes
Mandarin	`zh`	Yes
Korean	`ko`	Yes
Tagalog	`tl`	Yes
Indonesian	`id`	Yes
Malay	`ms`	Yes

Best Practices

Specify the language parameter when known

Parameter	Use case
`en`	English
`es`	Spanish (handles English+Spanish)
`hi`	Hindi (handles English+Hindi)
`multi-eu`	Unknown European-language audio (auto-detects across the European set)
`multi`	Truly unknown or mixed-language audio (full multilingual auto-detection)

When to use multi-eu or multi:

When the language is truly unknown beforehand
When processing audio from varied or unpredictable sources
Prefer multi-eu for European-language input; use multi only for truly mixed multilingual audio

Use Cases

Direct use

Real-time call transcription
Voice assistant input
Meeting transcription
Accessibility and captioning
Customer support recording analysis

Downstream use

Multi-turn conversational agents
Voice-to-text pipelines
Telephony and IVR systems
Content indexing and search
Compliance and audit logging

Safety & Compliance

Pulse must not be used for:

Recording or transcribing individuals without their explicit consent
Surveillance, stalking, or any form of unauthorized monitoring
Any illegal or unethical purposes

Additionally:

Usage is monitored for policy compliance
For compliance documentation (GDPR, SOC2, HIPAA), contact support@smallest.ai

Contact


Support	support@smallest.ai
Documentation	docs.smallest.ai/waves
Console	app.smallest.ai/dashboard