Hydra is benchmarked against eight other production-grade voice / realtime models on the AIEWF S2S benchmark. For each metric, the comparison set is identical across runs and Hydra is the only model under test from Smallest AI.
aiwf_medium_contextThe same fixed prompt + transcript distribution is replayed against every model, ten times each. Latency numbers are computed from transcript.jsonl (n = 224 non-tool turns, n = 64 tool turns). Pass rate is the fraction of turns that completed the expected interaction.
1624 ms — fastest of 9. Beats nova-2-sonic by 65 ms, gpt-realtime-2 low by 381 ms, ultravox by 782 ms.
864 ms — tied-fastest with ultravox. Beats gpt-realtime-2 low by 864 ms and gemini-live by 1760 ms.
95.9 % — #3 of 8 (nova-2-sonic did not report pass rate). Within ~2 pp of the leader.
See Metrics Overview for the exact definitions.
The model is purpose-built for realtime voice agents that call tools. Three architectural choices show up in the numbers:
response.function_call_arguments.delta as soon as the arguments start materialising — the client can begin executing tools before the arguments JSON has finished streaming. Most other models wait until arguments are fully formed before emitting them.