Pulse STT — streaming now supports ja / yue / zh / ko + multi-asian aggregator (US region)

Pulse STT — streaming now supports ja / yue / zh / ko + `multi-asian` aggregator (US region)

The Pulse streaming Speech-to-Text API now supports four Asian languages: Japanese (ja), Cantonese (yue), Mandarin (zh), and Korean (ko). The multi-asian aggregator is also available for unknown East Asian audio — it auto-detects across the same four-language set.

These five language values are served from the US region only — connect to wss://api.us.smallest.ai/waves/v1/stt/live?model=pulse (or the legacy wss://api.us.smallest.ai/waves/v1/pulse/get_text) instead of the default wss://api.smallest.ai/... host. Requesting any of them on the default (ap-south-1) host closes the connection without a transcription frame.

All other Pulse streaming parameters work as before: punctuate, capitalize, numerals, word_timestamps, diarize, redact_pii, redact_pci, sample rates 8000/16000/22050/24000/44100/48000, encodings linear16 (default) / linear32 / alaw / mulaw / opus / ogg_opus.

Pre-recorded (batch) is unchanged. Cantonese (yue) is not enabled on the batch endpoint at all; Japanese, Mandarin, and Korean behave separately from streaming and are not formally supported on batch — use the streaming endpoint for these four languages.

Example response frame:

1 {
2   "type": "transcription",
3   "transcript": "讀書要從薄到厚再從厚到薄",
4   "is_final": true,
5   "is_last": false,
6   "language": "zh"
7 }

→ Pulse model card — Supported Languages → Speech-to-Text overview