Audio I/O
Audio I/O
Audio I/O
Hydra is strict about audio formats. Get this wrong and you’ll either see invalid_audio errors or distorted playback.
Frames smaller than ~20 ms add overhead without helping latency. Frames larger than ~40 ms make barge-in detection feel sluggish.
The mic delivers float32 samples; you need to (a) convert to int16 and (b) base64-encode each chunk. Use an AudioWorklet — the deprecated ScriptProcessorNode works for a prototype but blocks the main thread under load.
Schedule each chunk against a running playCursor so chunks play back-to-back with no audible gap.
For barge-in, you reset playCursor = playCtx.currentTime when a fresh response.created arrives — see Turn detection & barge-in.
session.configured — frames are silently dropped; the server does not queue them and does not emit an error. Always wait for the session.configured echo before starting the mic.Hydra is built for live mic streams. For test fixtures, regression tests, or batch jobs you sometimes want to replay a known WAV instead. The pattern paces a 16 kHz mono PCM16 WAV at real-time speed, then collects the response audio to disk.
Don’t have a 16 kHz mono WAV? Convert with ffmpeg:
This pattern is for testing only — it doesn’t exercise full-duplex behaviour (no overlap, no barge-in). For interactive use, see the quickstart.
invalid_audio and friends