Audio Specifications
Input Methods
Our API supports two input methods for transcribing audio:
Supported Formats
The Pulse STT API supports a wide range of audio formats for pre-recorded transcription.
Audio Requirements
Sample Rate
- Recommended: 16 kHz (16,000 Hz)
- Supported range: All frequencies
- Optimal: 16 kHz mono for speech recognition
Channels
Currently we support only single channel transcription. We are bringing in multi-channel support soon.
Limits
- Maximum size: No limit on file size
- Session timeout: 10 minutes per Session
It is recommended to split the file into chunks and then upload them in parallel for faster processing.
Format Recommendations
Best Quality
Use 16 kHz mono Linear PCM (audio/wav) for the optimal mix of accuracy and processing speed. This configuration mirrors Waves’ recommended production setup for real-time speech workloads.
Balanced (Telephony & Voice)
Use 8 kHz μ-law encoded with 8-bit encoding for low bandwidth usage. It provides standard quality for voice-only applications like phone calls.
Web-Optimized / High Fidelity
For broadcast, captioning, or multimedia scenarios, it is recommended to capture higher sample rates (44.1–48 kHz). Due to the higher quality requirements, bandwidth and processing times would be on the higher side.

