Streaming
Streaming sends response chunks to the user as they’re generated, rather than waiting for the complete response. For voice agents, this is essential — users hear the first words immediately instead of waiting in silence.
Why Streaming Matters
Basic Streaming
Set stream=True and yield each chunk.content for instant TTS playback.
Streaming with Tools
Collect tool calls while streaming, execute them, then stream the follow-up response.
Chunking Strategies
Word-by-Word (Default)
LLMs typically stream tokens, which map roughly to words or word fragments:
Sentence Buffering
Buffer complete sentences for more natural speech boundaries:
Phrase Buffering
Buffer by phrase for smoother speech rhythm:
Intermediate Feedback
Provide feedback while processing long operations:
Streaming Best Practices
Do's and Don'ts
Do: Always set stream=True for LLM calls. Yield chunks as soon as they’re available. Provide intermediate feedback during long operations. Keep responses concise since shorter means faster to speak.
Don’t: Buffer the entire response before yielding. Never leave users in silence for more than 2 seconds. Avoid yielding empty strings or whitespace-only chunks.
Measuring Stream Performance
Track time-to-first-chunk:

