Best Practices

Pre-recorded best practices

Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity.

Audio preprocessing workflow

Convert with FFmpeg

$ # Convert to 16 kHz mono WAV (recommended ingest format)
$ ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 output.wav
$ 
$ # Convert to MP3 with optimal speech settings
$ ffmpeg -i input.wav -ar 16000 -ac 1 -b:a 128k output.mp3

Python example

1 from pydub import AudioSegment
2 
3 audio = AudioSegment.from_file("input.mp3")
4 audio = audio.set_frame_rate(16000).set_channels(1)
5 audio.export("output.wav", format="wav")

JavaScript example

1 import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';
2 
3 const ffmpeg = createFFmpeg({ log: true });
4 await ffmpeg.load();
5 
6 ffmpeg.FS('writeFile', 'input.mp3', await fetchFile('input.mp3'));
7 await ffmpeg.run('-i', 'input.mp3', '-ar', '16000', '-ac', '1', 'output.wav');
8 const data = ffmpeg.FS('readFile', 'output.wav');

Quality checklist

Use 16 kHz mono whenever possible; downsample higher-fidelity recordings.
Normalize audio levels so peaks stay consistent across large batches.
Remove silence at the beginning and end to avoid wasted compute.
Handle multiple speakers by enabling diarization when agents and customers share a channel.
Test with a sample clip before launching full backfills to validate accuracy and metadata.