Set "stream": true and the response becomes Server-Sent Events (SSE). Each token (or small group of tokens) arrives as a data: {...} line. The stream ends with a data: [DONE] marker.
This is the same wire format OpenAI uses, so OpenAI client SDKs work without changes.
Always send stream_options.include_usage: true if you bill or log per-token. With it, the server emits a final usage chunk so you get exact token counts even when the client disconnects mid-stream.
Each delta chunk:
Field-by-field:
include_usage: true)After the final content delta and before [DONE], the server emits one extra chunk with empty choices and a usage object:
If the client closes the connection mid-stream, the server still finalizes usage and records the tokens that were actually generated. You’re billed only for what the model produced before the disconnect — no double-charge on retry.
For voice-agent pipelines, start your TTS engine on the first delta.content chunk to mask end-to-end latency. See Best Practices and the Voice Agent cookbook.