*** title: How to use Streaming Text to Speech (TTS) with websockets description: Learn how to convert text to speech with real-time streaming synthesis. icon: bars-staggered -------------------- ## Real-time Text to Speech Synthesis The `WavesStreamingTTS` class provides high-performance text-to-speech conversion with configurable streaming parameters. This implementation is optimized for low-latency applications where immediate audio feedback is critical, such as voice assistants, live narration, or interactive applications. ### Configuration Setup The streaming TTS system uses a `TTSConfig` object to manage synthesis parameters: ```python from smallestai.waves import TTSConfig, WavesStreamingTTS config = TTSConfig( voice_id="aditi", api_key="YOUR_SMALLEST_API_KEY", sample_rate=24000, speed=1.0, max_buffer_flush_ms=100 ) streaming_tts = WavesStreamingTTS(config) ``` ### Basic Text Synthesis For straightforward text-to-speech conversion, use the `synthesize` method: ```python text = "Hello world, this is a test of the Smallest AI streaming TTS SDK." audio_chunks = [] for chunk in streaming_tts.synthesize(text): audio_chunks.append(chunk) ``` ### Streaming Text Input For real-time applications where text arrives incrementally, use `synthesize_streaming`: ```python def text_stream(): text = "Streaming synthesis with chunked text input for Smallest SDK." for word in text.split(): yield word + " " audio_chunks = [] for chunk in streaming_tts.synthesize_streaming(text_stream()): audio_chunks.append(chunk) ``` ### Saving Audio to WAV File Convert the raw PCM audio chunks to a standard WAV file: ```python import wave from io import BytesIO def save_audio_chunks_to_wav(audio_chunks, filename="output.wav"): with wave.open(filename, 'wb') as wf: wf.setnchannels(1) # Mono wf.setsampwidth(2) # 16-bit wf.setframerate(24000) # 24kHz wf.writeframes(b''.join(audio_chunks)) text = "Your text to synthesize here." audio_chunks = list(streaming_tts.synthesize(text)) save_audio_chunks_to_wav(audio_chunks, "speech_output.wav") ``` ### Configuration Parameters * **`voice_id`**: Voice identifier (e.g., "aditi", "male-1", "female-2") * **`api_key`**: Your Smallest AI API key * **`language`**: Language code for synthesis (default: "en") * **`sample_rate`**: Audio sample rate in Hz (default: 24000) * **`speed`**: Speech speed multiplier (default: 1.0 - normal speed, 0.5 = half speed, 2.0 = double speed) * **`consistency`**: Voice consistency parameter (default: 0.5, range: 0.0-1.0) * **`enhancement`**: Audio enhancement level (default: 1) * **`similarity`**: Voice similarity parameter (default: 0, range: 0.0-1.0) * **`max_buffer_flush_ms`**: Maximum buffer time in milliseconds before forcing audio output (default: 0) ### Output Format The streaming TTS returns raw PCM audio data as bytes objects. Each chunk represents a portion of the synthesized audio that can be: * Played directly through audio hardware * Saved to audio files (WAV, MP3, etc.) * Streamed over network protocols * Processed with additional audio effects The raw format ensures minimal latency and maximum flexibility for real-time applications where immediate audio feedback is essential.