For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceClient LibrariesChangelog
DocumentationAPI ReferenceClient LibrariesChangelog
  • Introduction
    • Introduction
  • Getting Started
    • Quickstart
    • Models
    • Authentication
    • HTTP Streaming
  • Text to Speech
    • How to TTS
    • LLM to TTS
    • Voice Models & Languages
  • Voice Cloning
    • Types of Cloning
    • Voice Clone via UI
    • How to Voice Clone
    • Delete Cloned Voice
    • Professional Voice Cloning
  • Integrations
    • LiveKit
    • Plivo
    • Vonage
  • Product
    • Projects
  • Best Practices
    • Voice Cloning Best Practices
    • PVC Best Practices
    • TTS Best Practices
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Choosing the Right Protocol for Your TTS Application: HTTP, HTTP Streaming, or WebSocket?
  • HTTP: Best for Simplicity and Short Requests
  • HTTP Streaming: Best for Faster Playback Without Complexity
  • WebSocket: Best for Real-Time, Interactive Applications
Getting Started

HTTP vs HTTP Streaming vs Websockets

||View as Markdown|
Was this page helpful?
Previous

Authentication

Next

How to use Text to Speech

Built with

Choosing the Right Protocol for Your TTS Application: HTTP, HTTP Streaming, or WebSocket?

If you’re integrating Waves TTS into your application, one important decision is how to connect to the TTS engine. We support three protocols: HTTP, HTTP Streaming, and WebSocket, each tailored to different use cases. In this post, we’ll break down the strengths of each and help you choose the best fit for your needs.

HTTP: Best for Simplicity and Short Requests

What it is:
A classic REST-style interaction. You send a complete request (e.g., the full text to be converted to speech), and receive the synthesized audio as a downloadable response.

When to use it:

  • You have short or moderate-length texts.
  • You want a simple integration, such as from a browser, mobile app, or backend job.
  • You don’t need real-time feedback or streaming audio.

Pros and Cons:

ProsCons
Simple to integrate with standard HTTP toolsFull audio is returned only after complete synthesis
Easy to debug and monitorNot suitable for real-time or long-form audio
Stateless; good for serverless environmentsReconnect needed for each request
Works well with caching and CDNsHigher latency compared to streaming methods

HTTP Streaming: Best for Faster Playback Without Complexity

What it is:
An enhancement of standard HTTP. The client sends a complete request, but the server streams back the audio as it’s being generated, no need to wait for the full file.

When to use it:

  • You want faster playback with lower perceived latency.
  • You send full input text but need audio to start as soon as possible.
  • You want low-latency audio delivery without handling connection persistence.

Pros and Cons:

ProsCons
Lower latency than regular HTTPOnly one-way communication (client → server)
Compatible with standard HTTP infrastructureFull input must still be sent before synthesis starts
Audio starts playing as it’s generatedNo partial or live input updates
Easy to adopt with minimal changesSlightly more complex than basic HTTP

WebSocket: Best for Real-Time, Interactive Applications

What it is:
A full-duplex, persistent connection that allows two-way communication between the client and server. You can send text dynamically and receive streaming audio back continuously.

When to use it:

  • You need real-time, interactive TTS responses.
  • Input is dynamic or arrives in chunks (e.g., live typing, conversation).
  • You want persistent connections with minimal overhead per message.

Pros and Cons:

ProsCons
Ultra low latencyMore complex to implement and manage
Supports real-time, chunked input and responsesRequires persistent connection management
Bi-directional communicationNot ideal for simple or infrequent tasks
Great for chatbots, live agents, or dictation appsMay require additional libraries or WebSocket support