Best Practices
Real-time streaming best practices
Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity in real-time scenarios.
Chunk Size and Streaming Rate
Recommended Chunk Size
- Optimal: 4096 bytes per chunk
- Range: 1024 to 8192 bytes
- Consistency: Maintain consistent chunk sizes when possible
Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency.
Streaming Rate
- Interval: Send chunks every 50-100ms
- Avoid: Sending chunks too rapidly (< 20ms) or too slowly (> 200ms)
- Consistency: Maintain regular intervals for predictable latency
Handling Partial vs Final Transcripts
The API sends two types of transcripts:
Partial Transcripts (is_final: false)
- Purpose: Show interim results for immediate user feedback
- Behavior: May change as more audio is processed
- Use case: Display “live” transcription as the user speaks
Final Transcripts (is_final: true)
- Purpose: Confirmed transcription for a segment
- Behavior: Stable and won’t change
- Use case: Store in database, display as confirmed text
Audio Preprocessing
Before Streaming
- Convert to correct format: Ensure audio matches the
encodingparameter (linear16, linear32, alaw, mulaw, opus, ogg_opus) - Set sample rate: Match the
sample_rateparameter in your WebSocket URL - Mono channel: Downmix stereo/multi-channel to mono
- Normalize levels: Prevent clipping and ensure consistent volume
Example Preprocessing
Error Handling and Reconnection
Connection Errors
Implement robust error handling for network issues:
Handling Connection Drops
- Detect drops: Monitor connection state and implement heartbeat/ping
- Buffer audio: Store audio chunks during disconnection
- Resume streaming: Continue from where you left off after reconnection
Session Management
Session Lifecycle
- Establish connection: Create WebSocket with proper authentication
- Stream audio: Send chunks at regular intervals
- Handle responses: Process partial and final transcripts
- End session: Send
{"type": "end"}when done - Close connection: Gracefully close the WebSocket
Graceful Shutdown
To properly close a session, send the end token and wait for the server to respond with is_last=true before closing the WebSocket connection:
Do not close the WebSocket immediately after sending the end token. Always wait for the is_last=true response to ensure all audio has been processed and final transcripts are received.
Latency Optimization
Minimize Processing Delays
- Preprocess offline: Convert audio format before streaming
- Use optimal encoding:
linear16at 16 kHz for best latency/quality balance - Consistent chunking: Avoid variable chunk sizes that cause processing delays
Network Optimization
- Stable connection: Use reliable network connections
- Monitor bandwidth: Ensure sufficient bandwidth for audio streaming
- Reduce overhead: Minimize unnecessary data in WebSocket messages
Quality Checklist
- Use 16 kHz mono linear16 whenever possible for optimal latency
- Stream in 4096-byte chunks at 50-100ms intervals
- Handle partial transcripts for immediate user feedback
- Store final transcripts for accuracy and persistence
- Implement reconnection logic for production reliability
- Monitor session state to detect and handle errors gracefully
- Test with real audio to validate latency and accuracy
Performance Tips
For Low Latency
- Use
linear16encoding at 16 kHz - Stream chunks every 50ms
- Process responses asynchronously
- Avoid blocking operations in message handlers
For High Accuracy
- Use higher sample rates (44.1 kHz or 48 kHz) when latency allows
- Enable
word_timestampsfor precise timing - Wait for
is_final=truebefore committing transcripts - Use
full_transcriptfor complete session text
For Production
- Implement connection pooling for multiple sessions
- Add rate limiting to prevent overwhelming the API
- Log session IDs for debugging and support
- Monitor transcription quality and latency metrics

