Real-time streaming best practices

Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity in real-time scenarios.

Chunk Size and Streaming Rate

Recommended Chunk Size

Optimal: 4096 bytes per chunk
Range: 1024 to 8192 bytes
Consistency: Maintain consistent chunk sizes when possible

Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency.

Streaming Rate

Interval: Send chunks every 50-100ms
Avoid: Sending chunks too rapidly (< 20ms) or too slowly (> 200ms)
Consistency: Maintain regular intervals for predictable latency

1 // Good: Consistent 50ms intervals
2 setTimeout(sendChunk, 50);
3 
4 // Avoid: Variable or very short intervals
5 setTimeout(sendChunk, Math.random() * 10); // Too fast and inconsistent

Handling Partial vs Final Transcripts

The API sends two types of transcripts:

Partial Transcripts (`is_final: false`)

Purpose: Show interim results for immediate user feedback
Behavior: May change as more audio is processed
Use case: Display “live” transcription as the user speaks

1 if (!message.is_final) {
2   // Show partial transcript with visual indicator (e.g., grayed out)
3   displayPartialTranscript(message.transcript);
4 }

Final Transcripts (`is_final: true`)

Purpose: Confirmed transcription for a segment
Behavior: Stable and won’t change
Use case: Store in database, display as confirmed text

1 if (message.is_final) {
2   // Store final transcript
3   saveTranscript(message.full_transcript);
4   // Update UI with confirmed text
5   displayFinalTranscript(message.full_transcript);
6 }

Audio Preprocessing

Before Streaming

Convert to correct format: Ensure audio matches the encoding parameter (linear16, linear32, alaw, mulaw, opus, ogg_opus)
Set sample rate: Match the sample_rate parameter in your WebSocket URL
Mono channel: Downmix stereo/multi-channel to mono
Normalize levels: Prevent clipping and ensure consistent volume

Example Preprocessing

1 import numpy as np
2 import soundfile as sf
3 
4 def preprocess_audio(input_path, target_sample_rate=16000):
5     """Preprocess audio for WebSocket streaming"""
6     audio, sample_rate = sf.read(input_path)
7     
8     # Convert to mono
9     if len(audio.shape) > 1:
10         audio = np.mean(audio, axis=1)
11     
12     # Resample if needed
13     if sample_rate != target_sample_rate:
14         from scipy import signal
15         audio = signal.resample(audio, int(len(audio) * target_sample_rate / sample_rate))
16     
17     # Normalize to prevent clipping
18     max_val = np.abs(audio).max()
19     if max_val > 0:
20         audio = audio / max_val * 0.95
21     
22     # Convert to 16-bit PCM
23     audio_int16 = (audio * 32767).astype(np.int16)
24     
25     return audio_int16, target_sample_rate

Error Handling and Reconnection

Connection Errors

Implement robust error handling for network issues:

1 let reconnectAttempts = 0;
2 const maxReconnectAttempts = 5;
3 
4 function connect() {
5   const ws = new WebSocket(url.toString());
6   
7   ws.onerror = (error) => {
8     console.error("WebSocket error:", error);
9   };
10   
11   ws.onclose = (event) => {
12     if (event.code !== 1000 && reconnectAttempts < maxReconnectAttempts) {
13       reconnectAttempts++;
14       const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000);
15       console.log(`Reconnecting in ${delay}ms...`);
16       setTimeout(connect, delay);
17     }
18   };
19   
20   ws.onopen = () => {
21     reconnectAttempts = 0; // Reset on successful connection
22   };
23   
24   return ws;
25 }

Handling Connection Drops

Detect drops: Monitor connection state and implement heartbeat/ping
Buffer audio: Store audio chunks during disconnection
Resume streaming: Continue from where you left off after reconnection

Session Management

Session Lifecycle

Establish connection: Create WebSocket with proper authentication
Stream audio: Send chunks at regular intervals
Handle responses: Process partial and final transcripts
End session: Send {"type": "end"} when done
Close connection: Gracefully close the WebSocket

Graceful Shutdown

To properly close a session, send the end token and wait for the server to respond with is_last=true before closing the WebSocket connection:

1 function endTranscription(ws) {
2   // Send end signal
3   ws.send(JSON.stringify({ type: "end" }));
4   
5   // Wait for is_last=true response before closing
6   ws.onmessage = (event) => {
7     const message = JSON.parse(event.data);
8     if (message.is_last === true) {
9       ws.close(1000, "Transcription complete");
10     }
11   };
12 }

Do not close the WebSocket immediately after sending the end token. Always wait for the is_last=true response to ensure all audio has been processed and final transcripts are received.

Latency Optimization

Minimize Processing Delays

Preprocess offline: Convert audio format before streaming
Use optimal encoding: linear16 at 16 kHz for best latency/quality balance
Consistent chunking: Avoid variable chunk sizes that cause processing delays

Network Optimization

Stable connection: Use reliable network connections
Monitor bandwidth: Ensure sufficient bandwidth for audio streaming
Reduce overhead: Minimize unnecessary data in WebSocket messages

Quality Checklist

Use 16 kHz mono linear16 whenever possible for optimal latency
Stream in 4096-byte chunks at 50-100ms intervals
Handle partial transcripts for immediate user feedback
Store final transcripts for accuracy and persistence
Implement reconnection logic for production reliability
Monitor session state to detect and handle errors gracefully
Test with real audio to validate latency and accuracy

Performance Tips

For Low Latency

Use linear16 encoding at 16 kHz
Stream chunks every 50ms
Process responses asynchronously
Avoid blocking operations in message handlers

For High Accuracy

Use higher sample rates (44.1 kHz or 48 kHz) when latency allows
Enable word_timestamps for precise timing
Wait for is_final=true before committing transcripts
Use full_transcript for complete session text

For Production

Implement connection pooling for multiple sessions
Add rate limiting to prevent overwhelming the API
Log session IDs for debugging and support
Monitor transcription quality and latency metrics