Code Examples
Below is a complete Python example demonstrating audio preprocessing, transcription with age/gender detection, emotion detection, and sentence-level timestamps (utterances).
Prerequisites
Install required dependencies:
Key Features Demonstrated
- Audio Preprocessing: Converts audio to 16 kHz mono WAV, normalizes levels, and removes silence
- Age & Gender Detection: Enables demographic analysis
- Emotion Detection: Captures emotional tone with confidence scores
- Utterances: Retrieves sentence-level timestamps with speaker labels
- Diarization: Separates speakers for multi-speaker audio
Expected Output
The script will output:
- Full transcription text
- Age and gender predictions
- Emotion scores (happiness, sadness, disgust, fear, anger)
- Sentence-level utterances with timestamps and speaker IDs

