TTS Evaluation Script | Smallest AI Docs

A standalone Python script for evaluating Lightning v3.1 synthesis performance over WebSocket. Connects to the streaming endpoint, sends text for synthesis, measures time-to-first-byte (TTFB), and saves the output as a WAV file.

Use this to benchmark latency in your own environment, validate audio output quality, or integrate into automated evaluation pipelines.

Prerequisites

$ pip install websocket-client

Configuration

Parameter	Default	Description
`WS_URL`	`wss://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech/stream`	WebSocket endpoint
`TOKEN`	—	Your Smallest AI API key
`VOICE_ID`	`quinn`	Voice identifier
`SAMPLE_TEXT`	`Hi, this is sample text.`	Input text to synthesize
`OUTPUT_PATH`	`output.wav`	Output file path
`SAMPLE_RATE`	`44100`	Audio sample rate in Hz
`SPEED`	`1.0`	Speech speed (0.5-2.0)
`LANGUAGE`	`auto`	Language code or `auto` for detection

Script

1 #!/usr/bin/env python3
2 
3 import time
4 import json
5 import base64
6 import wave
7 from websocket import WebSocketApp
8 
9 # =========== CONFIG ===========
10 WS_URL = "wss://waves-api.smallest.ai/api/v1/lightning-v3.1/get_speech/stream"
11 TOKEN = "<YOUR_API_KEY>"
12 
13 HEADERS = {
14     "Authorization": f"Bearer {TOKEN}"
15 }
16 
17 VOICE_ID    = "quinn"
18 SAMPLE_TEXT = "Hi, this is sample text."
19 OUTPUT_PATH = "output.wav"
20 SAMPLE_RATE = 44100
21 SPEED       = 1.0
22 LANGUAGE    = "auto"
23 
24 
25 def save_wav(chunks, path, sample_rate=44100):
26     """Decode base64 audio chunks and write a 16-bit mono WAV file."""
27     pcm_data = b"".join(base64.b64decode(c) for c in chunks)
28     with wave.open(path, "wb") as wf:
29         wf.setnchannels(1)
30         wf.setsampwidth(2)
31         wf.setframerate(sample_rate)
32         wf.writeframes(pcm_data)
33     print(f"Saved audio to: {path}")
34 
35 
36 def tts_and_save(text, voice_id, output_path):
37     audio_chunks = []
38     start_time = None
39     ttfb_ms = None
40 
41     def on_open(ws):
42         nonlocal start_time
43         payload = {
44             "voice_id": voice_id,
45             "text": text,
46             "language": LANGUAGE,
47             "sample_rate": SAMPLE_RATE,
48             "speed": SPEED,
49         }
50         start_time = time.time()
51         ws.send(json.dumps(payload))
52         print("Request sent...")
53 
54     def on_message(ws, message):
55         nonlocal ttfb_ms
56         data = json.loads(message)
57         status = data.get("status") or data.get("payload", {}).get("status")
58 
59         if status == "error":
60             raise Exception(data.get("message", "Unknown error"))
61 
62         audio_b64 = data.get("data", {}).get("audio")
63 
64         # Measure TTFB on first audio chunk
65         if audio_b64 and ttfb_ms is None:
66             ttfb_ms = (time.time() - start_time) * 1000
67             print(f"Time to first byte: {ttfb_ms:.1f} ms")
68 
69         if audio_b64:
70             audio_chunks.append(audio_b64)
71 
72         if status == "complete":
73             ws.close()
74 
75     def on_error(ws, error):
76         print("WebSocket error:", error)
77         ws.close()
78 
79     def on_close(ws, *args):
80         total_ms = (time.time() - start_time) * 1000
81         print(f"Total time: {total_ms:.1f} ms")
82 
83         if audio_chunks:
84             save_wav(audio_chunks, output_path, sample_rate=SAMPLE_RATE)
85         else:
86             print("No audio received.")
87 
88     ws = WebSocketApp(
89         WS_URL,
90         header=[f"{k}: {v}" for k, v in HEADERS.items()],
91         on_open=on_open,
92         on_message=on_message,
93         on_error=on_error,
94         on_close=on_close,
95     )
96     ws.run_forever()
97 
98 
99 if __name__ == "__main__":
100     tts_and_save(SAMPLE_TEXT, VOICE_ID, OUTPUT_PATH)

Usage

Replace <YOUR_API_KEY> with your Smallest AI API key.
Adjust VOICE_ID, SAMPLE_TEXT, SAMPLE_RATE, and other parameters as needed.
Run the script:

$ python tts_eval.py

Expected output:

Request sent...
Time to first byte: 187.3 ms
Total time: 1243.6 ms
Saved audio to: output.wav

What It Measures

Metric	Description
TTFB	Time from WebSocket send to first audio chunk received. Primary latency indicator for real-time applications.
Total time	Time from send to connection close (all chunks received). Reflects full synthesis duration.
Audio output	Saved WAV file for manual listening or automated quality evaluation (e.g., WVMOS, MOS scoring).