*** title: Best Practices description: >- Optimize your real-time WebSocket transcription for low latency and high accuracy -------- # Real-time streaming best practices Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity in real-time scenarios. ## Chunk Size and Streaming Rate ### Recommended Chunk Size * **Optimal**: 4096 bytes per chunk * **Range**: 1024 to 8192 bytes * **Consistency**: Maintain consistent chunk sizes when possible Sending audio in 4096-byte chunks provides the best balance between latency and processing efficiency. ### Streaming Rate * **Interval**: Send chunks every 50-100ms * **Avoid**: Sending chunks too rapidly (\< 20ms) or too slowly (> 200ms) * **Consistency**: Maintain regular intervals for predictable latency ```javascript // Good: Consistent 50ms intervals setTimeout(sendChunk, 50); // Avoid: Variable or very short intervals setTimeout(sendChunk, Math.random() * 10); // Too fast and inconsistent ``` ## Handling Partial vs Final Transcripts The API sends two types of transcripts: ### Partial Transcripts (`is_final: false`) * **Purpose**: Show interim results for immediate user feedback * **Behavior**: May change as more audio is processed * **Use case**: Display "live" transcription as the user speaks ```javascript if (!message.is_final) { // Show partial transcript with visual indicator (e.g., grayed out) displayPartialTranscript(message.transcript); } ``` ### Final Transcripts (`is_final: true`) * **Purpose**: Confirmed transcription for a segment * **Behavior**: Stable and won't change * **Use case**: Store in database, display as confirmed text ```javascript if (message.is_final) { // Store final transcript saveTranscript(message.full_transcript); // Update UI with confirmed text displayFinalTranscript(message.full_transcript); } ``` ## Audio Preprocessing ### Before Streaming 1. **Convert to correct format**: Ensure audio matches the `encoding` parameter (linear16, linear32, alaw, mulaw, opus, ogg\_opus) 2. **Set sample rate**: Match the `sample_rate` parameter in your WebSocket URL 3. **Mono channel**: Downmix stereo/multi-channel to mono 4. **Normalize levels**: Prevent clipping and ensure consistent volume ### Example Preprocessing ```python import numpy as np import soundfile as sf def preprocess_audio(input_path, target_sample_rate=16000): """Preprocess audio for WebSocket streaming""" audio, sample_rate = sf.read(input_path) # Convert to mono if len(audio.shape) > 1: audio = np.mean(audio, axis=1) # Resample if needed if sample_rate != target_sample_rate: from scipy import signal audio = signal.resample(audio, int(len(audio) * target_sample_rate / sample_rate)) # Normalize to prevent clipping max_val = np.abs(audio).max() if max_val > 0: audio = audio / max_val * 0.95 # Convert to 16-bit PCM audio_int16 = (audio * 32767).astype(np.int16) return audio_int16, target_sample_rate ``` ## Error Handling and Reconnection ### Connection Errors Implement robust error handling for network issues: ```javascript let reconnectAttempts = 0; const maxReconnectAttempts = 5; function connect() { const ws = new WebSocket(url.toString()); ws.onerror = (error) => { console.error("WebSocket error:", error); }; ws.onclose = (event) => { if (event.code !== 1000 && reconnectAttempts < maxReconnectAttempts) { reconnectAttempts++; const delay = Math.min(1000 * Math.pow(2, reconnectAttempts), 30000); console.log(`Reconnecting in ${delay}ms...`); setTimeout(connect, delay); } }; ws.onopen = () => { reconnectAttempts = 0; // Reset on successful connection }; return ws; } ``` ### Handling Connection Drops * **Detect drops**: Monitor connection state and implement heartbeat/ping * **Buffer audio**: Store audio chunks during disconnection * **Resume streaming**: Continue from where you left off after reconnection ## Session Management ### Session Lifecycle 1. **Establish connection**: Create WebSocket with proper authentication 2. **Stream audio**: Send chunks at regular intervals 3. **Handle responses**: Process partial and final transcripts 4. **End session**: Send `{"type": "end"}` when done 5. **Close connection**: Gracefully close the WebSocket ### Graceful Shutdown To properly close a session, send the end token and wait for the server to respond with `is_last=true` before closing the WebSocket connection: ```javascript function endTranscription(ws) { // Send end signal ws.send(JSON.stringify({ type: "end" })); // Wait for is_last=true response before closing ws.onmessage = (event) => { const message = JSON.parse(event.data); if (message.is_last === true) { ws.close(1000, "Transcription complete"); } }; } ``` Do not close the WebSocket immediately after sending the end token. Always wait for the `is_last=true` response to ensure all audio has been processed and final transcripts are received. ## Latency Optimization ### Minimize Processing Delays * **Preprocess offline**: Convert audio format before streaming * **Use optimal encoding**: `linear16` at 16 kHz for best latency/quality balance * **Consistent chunking**: Avoid variable chunk sizes that cause processing delays ### Network Optimization * **Stable connection**: Use reliable network connections * **Monitor bandwidth**: Ensure sufficient bandwidth for audio streaming * **Reduce overhead**: Minimize unnecessary data in WebSocket messages ## Quality Checklist 1. **Use 16 kHz mono linear16** whenever possible for optimal latency 2. **Stream in 4096-byte chunks** at 50-100ms intervals 3. **Handle partial transcripts** for immediate user feedback 4. **Store final transcripts** for accuracy and persistence 5. **Implement reconnection logic** for production reliability 6. **Monitor session state** to detect and handle errors gracefully 7. **Test with real audio** to validate latency and accuracy ## Performance Tips ### For Low Latency * Use `linear16` encoding at 16 kHz * Stream chunks every 50ms * Process responses asynchronously * Avoid blocking operations in message handlers ### For High Accuracy * Use higher sample rates (44.1 kHz or 48 kHz) when latency allows * Enable `word_timestamps` for precise timing * Wait for `is_final=true` before committing transcripts * Use `full_transcript` for complete session text ### For Production * Implement connection pooling for multiple sessions * Add rate limiting to prevent overwhelming the API * Log session IDs for debugging and support * Monitor transcription quality and latency metrics