Audio I/O
Audio I/O
Hydra is strict about audio formats. Get this wrong and you’ll either see invalid_audio errors or distorted playback.
Input (client → server)
Frames smaller than ~20 ms add overhead without helping latency. Frames larger than ~40 ms make barge-in detection feel sluggish.
Python
Browser (AudioWorklet)
The mic delivers float32 samples; you need to (a) convert to int16 and (b) base64-encode each chunk. Use an AudioWorklet — the deprecated ScriptProcessorNode works for a prototype but blocks the main thread under load.
Output (server → client)
Python
Browser (gapless playback)
Schedule each chunk against a running playCursor so chunks play back-to-back with no audible gap.
For barge-in, you reset playCursor = playCtx.currentTime when a fresh response.created arrives — see Turn detection & barge-in.
Common gotchas
- Sending audio before
session.configured— frames are silently dropped; the server does not queue them and does not emit an error. Always wait for thesession.configuredecho before starting the mic. - Sample-rate mismatch — sending 24 kHz audio while claiming PCM16 16 kHz produces unintelligible transcription on the model side. Resample explicitly.
- Stereo input — Hydra expects mono. If you have stereo, downmix before encoding.
Streaming a WAV file (for CI / regression tests)
Replay a known utterance through Hydra and capture the reply to disk
Hydra is built for live mic streams. For test fixtures, regression tests, or batch jobs you sometimes want to replay a known WAV instead. The pattern paces a 16 kHz mono PCM16 WAV at real-time speed, then collects the response audio to disk.
Don’t have a 16 kHz mono WAV? Convert with ffmpeg:
This pattern is for testing only — it doesn’t exercise full-duplex behaviour (no overlap, no barge-in). For interactive use, see the quickstart.
Next
- Turn detection & barge-in — VAD events and how to flush scheduled audio
- Errors & reconnection —
invalid_audioand friends

