Streaming
Streaming TTS delivers audio chunks as they’re generated — playback starts immediately instead of waiting for the full file. First chunk arrives in ~100ms.
Streamed audio output:
WebSocket Streaming
Persistent connections for continuous, low-latency audio. Best for conversational AI and real-time apps.
Endpoint: wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream
SSE Streaming
Server-Sent Events over HTTP — simpler to set up, no persistent connection needed.
Endpoint: POST https://api.smallest.ai/waves/v1/lightning-v3.1/stream
Streaming Text Input (SDK)
For real-time applications where text arrives incrementally (e.g., from an LLM), the SDK supports streaming text input:
WebSocket vs SSE
Use WebSocket when sending multiple TTS requests over time (conversations, voice bots). Use SSE for simple one-shot streaming where you don’t need a persistent connection.
Response Format
The two transports emit different JSON shapes — match your parser to the transport you’re using.
WebSocket — each message is a nested envelope:
Access audio at data["data"]["audio"]; terminator is data["status"] == "complete".
SSE — each data: line is a flat object:
Access audio at data["audio"]; terminator is data["done"] == true. SSE frames are prefixed with event: audio\n followed by data: {...}\n\n.
Configuration Parameters
For concurrency limits and connection management, see Concurrency and Limits.
Full runnable source: streaming-python.py

