For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceClient LibrariesChangelog
DocumentationAPI ReferenceClient LibrariesChangelog
  • API References
    • Authentication
    • WebSocket
  • Lightning v2
    • POSTText to Speech
    • POSTText to Speech (SSE)
    • WSSText to Speech (WebSocket)
  • API Reference
    • POSTText to speech
    • POSTText to Speech
    • POSTText to Speech (SSE)
    • GETGet Voices
    • POSTAdd your Voice (Deprecated)
    • GETGet your cloned Voices (Deprecated)
    • DELDelete a Voice Clone
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • When to Use WebSockets
  • 1. Real-Time Streaming
  • 2. Interactive Applications
  • 3. Reduced Latency
  • How It Works
  • Timeout Behavior
  • To extend the timeout:
  • Implementation Details
  • Example Request Flow
API References

WebSocket Support for TTS API

||View as Markdown|
Was this page helpful?
Previous

Authentication

Next

Text to Speech

Built with

Our Text-to-Speech (TTS) API supports WebSocket communication, providing a real-time, low-latency streaming experience for applications that require instant speech synthesis. WebSockets allow continuous data exchange, making them ideal for use cases that demand uninterrupted audio generation.


When to Use WebSockets

1. Real-Time Streaming

WebSockets are perfect for applications that need real-time speech synthesis, eliminating the delays associated with traditional HTTP requests.

2. Interactive Applications

For voice assistants, chatbots, and live transcription services, WebSockets ensure smooth, uninterrupted audio playback and response times.

3. Reduced Latency

A persistent WebSocket connection reduces the need for repeated request-response cycles, significantly improving performance for applications requiring rapid audio generation.


How It Works

  1. Establish a Connection: The client opens a WebSocket connection to our TTS API.
  2. Send Text Data: The client sends the text payload to be synthesized.
  3. Process in Chunks: The API breaks the text into chunks and processes them individually.
  4. Receive Audio Stream: As each chunk is processed, it is sent back to the client as a base64-encoded audio buffer.
  5. Completion: Once all chunks are processed, a complete message is sent to indicate the end of the stream.

Timeout Behavior

By default, the WebSocket connection enforces a 20-second inactivity timeout. This means that if the client does not send any data within 20 seconds, the server will automatically close the connection to free up resources.

To support longer sessions for use cases where clients need more time (e.g., long pauses between messages), the timeout can be extended up to 60 seconds.

To extend the timeout:

You can include the timeout parameter in the WebSocket URL like so:

1wss://api.smallest.ai/waves/v1/lightning-v2/get_speech/stream?timeout=60

This sets the inactivity timeout to 60 seconds. Valid values range from 20 (default) to 60 seconds.


Implementation Details

The WebSocket TTS API is optimized to handle real-time text-to-speech conversions efficiently. Key aspects include:

  • Input Validation: Ensures the provided text and voice ID are valid before processing.
  • Chunk Processing: Long texts are split into smaller chunks (e.g., 240 characters) to optimize processing.
  • Voice Caching: The API fetches and caches voice configurations to reduce redundant database queries.
  • Task Queue System: Tasks are pushed to a Redis-based queue for efficient processing and real-time audio generation.
  • Error Handling: If any chunk fails, an error message is logged and sent to the client.

Example Request Flow

  1. The client sends a WebSocket message:

    1{
    2 "text": "Hello, world!",
    3 "voice_id": "12345",
    4 "speed": 1.0,
    5 "sample_rate": 24000
    6}
  2. The API validates the request and retrieves the voice settings.

  3. The text is split into chunks and processed in the background.

  4. The client receives responses like:

1{
2 "request_id": "047c9091-b770-41d8-b96b-907d1c8406c0",
3 "status": "chunk",
4 "data": {
5 "audio": "<base64_encoded_audio_chunk>"
6 }
7}
  1. Once all chunks are sent, a final message is returned:
1{
2 "request_id": "047c9091-b770-41d8-b96b-907d1c8406c0",
3 "status": "comp",
4 "message": "All chunks sent",
5 "done": true
6}

For implementation details, check our WebSocket API documentation.