Best Practices

View as MarkdownOpen in Claude

Pre-recorded best practices

Follow these recommendations to keep Pulse STT latencies low while preserving transcript fidelity.

Audio preprocessing workflow

Convert with FFmpeg

$# Convert to 16 kHz mono WAV (recommended ingest format)
$ffmpeg -i input.mp3 -ar 16000 -ac 1 -sample_fmt s16 output.wav
$
$# Convert to MP3 with optimal speech settings
$ffmpeg -i input.wav -ar 16000 -ac 1 -b:a 128k output.mp3

Python example

1from pydub import AudioSegment
2
3audio = AudioSegment.from_file("input.mp3")
4audio = audio.set_frame_rate(16000).set_channels(1)
5audio.export("output.wav", format="wav")

JavaScript example

1import { createFFmpeg, fetchFile } from '@ffmpeg/ffmpeg';
2
3const ffmpeg = createFFmpeg({ log: true });
4await ffmpeg.load();
5
6ffmpeg.FS('writeFile', 'input.mp3', await fetchFile('input.mp3'));
7await ffmpeg.run('-i', 'input.mp3', '-ar', '16000', '-ac', '1', 'output.wav');
8const data = ffmpeg.FS('readFile', 'output.wav');

Quality checklist

  1. Use 16 kHz mono whenever possible; downsample higher-fidelity recordings.
  2. Normalize audio levels so peaks stay consistent across large batches.
  3. Remove silence at the beginning and end to avoid wasted compute.
  4. Handle multiple speakers by enabling diarization when agents and customers share a channel.
  5. Test with a sample clip before launching full backfills to validate accuracy and metadata.