The Pulse STT WebSocket API supports the following audio encoding formats for real-time streaming:
Sample rate is the number of times the audio signal is measured per second. A higher sample rate naturally implies audio of better detail and higher quality. However it increases the size of the audio file.
The WebSocket API supports the following sample rates:
The recommended size is 4096 bytes per chunk.
Sending audio in consistent 4096-byte chunks helps maintain optimal latency and processing efficiency. It minimizes the tradeoff between processing latency and network latency, finding the right fit between number of requests and the size of each request.
Currently, we support only single-channel (mono) transcription. Multi-channel support is coming soon.
For optimal real-time performance:
Use 16 kHz mono Linear PCM (linear16) for the optimal mix of accuracy and processing speed:
Use 8 kHz μ-law or A-law encoding for low bandwidth usage:
For broadcast or high-quality scenarios, use higher sample rates:
Before streaming audio to the WebSocket API, ensure your audio is:
sample_rate parameter in your WebSocket URLSpecify encoding and sample rate in the WebSocket connection URL: