> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Turn detection & barge-in

> How Hydra detects user speech, the events it emits per turn, and how to handle barge-in cleanly on the client.

Hydra does turn detection server-side. The client streams audio continuously — even while the model is speaking — and the server emits events as it detects the start and end of each user turn.

There's nothing you need to *do* for turn detection to work. The events below are how you observe it.

## Events per turn

```
client: input_audio_buffer.append (continuous)
server: input_audio_buffer.speech_started      { audio_start_ms, item_id }
server: conversation.item.added                (role=user, in_progress)
server: input_audio_buffer.speech_stopped      { audio_end_ms, item_id }
server: conversation.item.done                 (role=user, completed)
server: response.created                       { response: { id } }
server: conversation.item.added                (role=assistant, in_progress)
server: response.output_audio.delta            (N audio chunks)
server: response.output_audio.done
server: conversation.item.done                 (role=assistant, completed)
server: response.done                          (status=completed)
```

| Event                               | Meaning                                                                                              |
| ----------------------------------- | ---------------------------------------------------------------------------------------------------- |
| `input_audio_buffer.speech_started` | VAD fired — the user started talking. `audio_start_ms` is the offset from session start.             |
| `input_audio_buffer.speech_stopped` | The user finished. The audio between the two events becomes a user message item.                     |
| `response.created`                  | The model is generating a reply. Use this as your client-side "the assistant is about to talk" hook. |
| `response.done`                     | The turn is over. Inspect `status` (`completed`, `cancelled`, `incomplete`, `failed`).               |

You don't send anything between user turns. Hydra decides when a turn ends.

## Barge-in

Because the channel is full-duplex, the user can speak while the model is speaking. When that happens:

1. The server emits `input_audio_buffer.speech_started` for the new user turn.

2. The in-flight response is cancelled — `response.done` arrives with:

   ```json
   {
     "type": "response.done",
     "response": {
       "id": "resp_…",
       "status": "cancelled",
       "status_details": { "reason": "interrupted" }
     }
   }
   ```

3. The new user turn proceeds normally.

You don't have to send anything to opt into this. It's automatic.

## Dropping scheduled audio on barge-in

The hard part is client-side: when `response.created` arrives for a new turn, **any audio chunks from the previous response that you've already scheduled for playback will continue to play unless you drop them**. Closing the `AudioContext` alone leaves a tail.

The cleanest pattern in the browser is to reset your playback cursor on every fresh `response.created`. (`playPCM16` and `b64ToInt16` are defined in [Audio I/O](/waves/documentation/speech-to-speech-hydra/audio-i-o#browser-gapless-playback).)

```javascript
let playCursor = 0;

ws.onmessage = (ev) => {
  const evt = JSON.parse(ev.data);
  if (evt.type === "response.created") {
    // Fresh response — drop anything still scheduled from the previous one.
    playCursor = playCtx.currentTime;
  } else if (evt.type === "response.output_audio.delta") {
    playPCM16(b64ToInt16(evt.delta));
  }
};
```

If you've kept references to `AudioBufferSourceNode`s, also call `stop()` on each one — `playCursor` only controls *future* schedules, it doesn't cancel buffers already started.

In Python with `sounddevice`, drain your playback queue on `response.created` — shown here as the relevant branch of the event handler:

```python
async for raw in ws:
    evt = json.loads(raw)
    if evt["type"] == "response.created":
        # Fresh response — drop anything still scheduled in the playback queue.
        while not play_queue.empty():
            try:
                play_queue.get_nowait()
            except asyncio.QueueEmpty:
                break
```

## Programmatic cancellation

To cancel an in-flight response without the user speaking — e.g. you got new info from your backend and want the model to stop — send `response.cancel`:

```json
{ "type": "response.cancel" }
```

The server replies with `response.done` carrying `status: "cancelled"` and `status_details.reason: "client_cancelled"`. The frame is a no-op if no response is in flight.

## Common gotchas

* **Audio keeps playing after barge-in** — you didn't reset `playCursor` (browser) or didn't drain your playback queue (Python).
* **`speech_started` fires from background noise** — Hydra's VAD is conservative but not perfect. Discarded turns arrive as `conversation.item.done` with `status: "incomplete"` and should be ignored.
* **Sending `input_audio_buffer.append` only when the user is talking** — don't. Stream continuously. Hydra needs the silence to detect turn boundaries.

## Next

* [Tool calling](/waves/documentation/speech-to-speech-hydra/tool-calling) — what happens between `speech_stopped` and audio output when the model decides to call a function
* [Prompting voice agents](/waves/documentation/speech-to-speech-hydra/prompting-voice-agents) — how to write `instructions` so the model handles turn-taking gracefully