Streaming

View as Markdown

Streaming TTS delivers audio chunks as they’re generated — playback starts immediately instead of waiting for the full file. First chunk arrives in ~100ms.

Streamed audio output:

WebSocket Streaming

Persistent connections for continuous, low-latency audio. Best for conversational AI and real-time apps.

Endpoint: wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream

1import asyncio
2import json
3import base64
4import wave
5import os
6import websockets
7
8API_KEY = os.environ["SMALLEST_API_KEY"]
9WS_URL = "wss://api.smallest.ai/waves/v1/lightning-v3.1/get_speech/stream"
10
11async def stream_tts(text):
12 audio_chunks = []
13
14 async with websockets.connect(
15 WS_URL,
16 extra_headers={"Authorization": f"Bearer {API_KEY}"},
17 ) as ws:
18 await ws.send(json.dumps({
19 "text": text,
20 "voice_id": "magnus",
21 "sample_rate": 24000,
22 }))
23
24 while True:
25 response = await ws.recv()
26 data = json.loads(response)
27
28 if data["status"] == "chunk":
29 audio = base64.b64decode(data["data"]["audio"])
30 audio_chunks.append(audio)
31 elif data["status"] == "complete":
32 break
33
34 # Save as WAV
35 raw = b"".join(audio_chunks)
36 with wave.open("streamed.wav", "wb") as wf:
37 wf.setnchannels(1)
38 wf.setsampwidth(2)
39 wf.setframerate(24000)
40 wf.writeframes(raw)
41
42 print(f"Saved streamed.wav ({len(audio_chunks)} chunks)")
43
44asyncio.run(stream_tts("Streaming delivers audio in real-time for voice assistants and chatbots."))

SSE Streaming

Server-Sent Events over HTTP — simpler to set up, no persistent connection needed.

Endpoint: POST https://api.smallest.ai/waves/v1/lightning-v3.1/stream

1import os
2import json
3import base64
4import wave
5import requests
6
7API_KEY = os.environ["SMALLEST_API_KEY"]
8
9response = requests.post(
10 "https://api.smallest.ai/waves/v1/lightning-v3.1/stream",
11 headers={
12 "Authorization": f"Bearer {API_KEY}",
13 "Content-Type": "application/json",
14 "Accept": "text/event-stream",
15 },
16 json={
17 "text": "SSE streaming is simpler to set up than WebSocket.",
18 "voice_id": "magnus",
19 "sample_rate": 24000,
20 },
21 stream=True,
22)
23
24audio_chunks = []
25for line in response.iter_lines():
26 if not line:
27 continue
28 line = line.decode()
29 if not line.startswith("data: "):
30 continue
31
32 data = json.loads(line[6:])
33 if data["status"] == "chunk":
34 audio_chunks.append(base64.b64decode(data["data"]["audio"]))
35 elif data["status"] == "complete":
36 break
37
38raw = b"".join(audio_chunks)
39with wave.open("sse_output.wav", "wb") as wf:
40 wf.setnchannels(1)
41 wf.setsampwidth(2)
42 wf.setframerate(24000)
43 wf.writeframes(raw)

Streaming Text Input (SDK)

For real-time applications where text arrives incrementally (e.g., from an LLM), the SDK supports streaming text input:

1from smallestai.waves import TTSConfig, WavesStreamingTTS
2
3config = TTSConfig(voice_id="magnus", api_key="YOUR_API_KEY", sample_rate=24000)
4streaming_tts = WavesStreamingTTS(config)
5
6def text_stream():
7 """Simulates text arriving word by word (e.g., from an LLM)."""
8 text = "Streaming synthesis with chunked text input."
9 for word in text.split():
10 yield word + " "
11
12audio_chunks = []
13for chunk in streaming_tts.synthesize_streaming(text_stream()):
14 audio_chunks.append(chunk)
15 # In a real app, play each chunk immediately

WebSocket vs SSE

WebSocketSSE
ConnectionPersistent, bidirectionalNew HTTP request each time
Multiple messagesReuse same connectionNew request per message
Best forVoice assistants, chatbotsSimple one-off streaming
LatencyLowest (no reconnect overhead)Slightly higher
ConcurrencyUp to 5 connections per unitPer-request

Use WebSocket when sending multiple TTS requests over time (conversations, voice bots). Use SSE for simple one-shot streaming where you don’t need a persistent connection.

Response Format

Each WebSocket/SSE message is JSON:

Audio chunk:

1{
2 "status": "chunk",
3 "data": { "audio": "base64_encoded_pcm_data" }
4}

Stream complete:

1{
2 "status": "complete",
3 "message": "All chunks sent",
4 "done": true
5}

Configuration Parameters

ParameterDefaultDescription
voice_idrequiredVoice identifier
sample_rate44100Audio sample rate (8000–44100 Hz)
speed1.0Speech speed (0.5–2.0)
languageautoLanguage code
output_formatpcmpcm, mp3, wav, or mulaw

For concurrency limits and connection management, see Concurrency and Limits.

Full runnable source: streaming-python.py