*** title: Best Practices description: Prepare audio inputs before submitting them to Pulse STT --------------------------------------------------------------------- # Pre-recorded best practices Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity. ## Audio preprocessing workflow ### Convert with FFmpeg ```bash # Convert to 16 kHz mono WAV (recommended ingest format) ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 output.wav # Convert to MP3 with optimal speech settings ffmpeg -i input.wav -ar 16000 -ac 1 -b:a 128k output.mp3 ``` ### Python example ```python from pydub import AudioSegment audio = AudioSegment.from_file("input.mp3") audio = audio.set_frame_rate(16000).set_channels(1) audio.export("output.wav", format="wav") ``` ### JavaScript example ```javascript import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg'; const ffmpeg = createFFmpeg({ log: true }); await ffmpeg.load(); ffmpeg.FS('writeFile', 'input.mp3', await fetchFile('input.mp3')); await ffmpeg.run('-i', 'input.mp3', '-ar', '16000', '-ac', '1', 'output.wav'); const data = ffmpeg.FS('readFile', 'output.wav'); ``` ## Quality checklist 1. **Use 16 kHz mono** whenever possible; downsample higher-fidelity recordings. 2. **Normalize audio levels** so peaks stay consistent across large batches. 3. **Remove silence** at the beginning and end to avoid wasted compute. 4. **Handle multiple speakers** by enabling diarization when agents and customers share a channel. 5. **Test with a sample clip** before launching full backfills to validate accuracy and metadata.