For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
  • Speech to Speech (Hydra)
    • Overview
    • Quickstart
    • WebSocket connection
    • Managing sessions
    • Audio I/O
    • Turn detection & barge-in
    • Tool calling
    • Prompting voice agents
    • Errors & reconnection
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • 1. Get an API key
  • 2. Run the reference client
  • What just happened
  • Next
Speech to Speech (Hydra)

Quickstart

||View as Markdown|
Was this page helpful?
Previous

Hydra — Speech to Speech

Next

WebSocket connection

Built with

Hydra is realtime, full-duplex, speech-to-speech. The fastest way to feel that is to talk to it. The reference client below is single-clone and ships with multiple agent presets so you can hear barge-in, tool calls, and persona switching live.

1. Get an API key

In the Smallest AI Console, create an API key. You’ll paste it into the demo in the next step.

2. Run the reference client

A production-grade Next.js app with multi-agent presets, local tool execution, and a live wire log.

$git clone https://github.com/smallest-inc/hydra_agents.git
$cd hydra_agents && npm install && npm run dev

Open http://localhost:3000, paste your API key into the right-hand panel, pick an agent preset, click Connect, and talk. Speak over Hydra to interrupt — barge-in is automatic.

What just happened

StepEvent
WebSocket opensServer emits session.created
Client configuresClient sends session.configure once
Server confirmsServer emits session.configured with the negotiated audio sample rate
Client streams audioClient sends input_audio_buffer.append continuously, base64-encoded PCM16
User speaks / pausesServer emits input_audio_buffer.speech_started / speech_stopped
Model repliesServer emits response.output_audio.delta chunks until response.done
User barges inIn-flight response cancels with status: "cancelled", reason: "interrupted"

Next

WebSocket connection

Auth, query params, idle timeout, close codes.

Managing sessions

Voices, persona, mid-session updates, conversation items.

Audio I/O

Input PCM16, output rate negotiation, browser AudioWorklet.

Turn detection & barge-in

Server-side VAD events and how to flush scheduled audio on the client.

Tool calling

Declare tools, run them locally, narrate the result.

Model card

Capabilities, voices, performance, pricing.