> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Stream Speech (SSE)

POST https://api.smallest.ai/waves/v1/tts/live
Content-Type: application/json

Synthesize speech and stream the audio back over Server-Sent Events. Same body as `/waves/v1/tts` — the only difference is the response is a stream of base64-encoded PCM chunks instead of one binary blob.

Pick the model with the `model` body parameter, same as the sync route.

<Note>
  **The same URL serves the WebSocket endpoint.** `wss://api.smallest.ai/waves/v1/tts/live` accepts a WebSocket upgrade for streaming-text scenarios (LLM token streams, live captioning). The HTTP `POST` documented on this page returns SSE; use `wss://` to use the WebSocket protocol instead. See the [WebSocket reference](/waves/api-reference/api-reference/text-to-speech/tts).
</Note>

## When to use this

- **Use this** when you want playback to start before synthesis is complete — long passages, latency-sensitive UI, live narration.
- **Use sync `/waves/v1/tts`** when total latency doesn't matter and you'd rather get one buffer.
- **Use `/waves/v1/tts/live`** (WebSocket) when the *text* arrives incrementally (LLM token stream). SSE assumes you have the full text up front.

## How it works

1. POST your text + voice settings — same payload as `/waves/v1/tts`, plus optional `model`.
2. The response is `Content-Type: text/event-stream`. Each chunk frame is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`.
3. Decode each chunk's `audio` field with base64 and feed the PCM bytes to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM player, etc.).
4. A final `data: {"done": true}\n\n` frame marks end of stream.

## Examples

**cURL**
```bash
curl -N -X POST "https://api.smallest.ai/waves/v1/tts/live" \
  -H "Authorization: Bearer $SMALLEST_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
    "voice_id": "magnus",
    "sample_rate": 24000,
    "output_format": "pcm"
  }'
```

## Common gotchas

- **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`, or a `fetch` `ReadableStream` reader. Buffering clients will hide the latency win.
- **Audio is base64 inside the event payload**, not the raw event bytes. Decode the `data.audio` field per event.
- **`output_format=pcm`** gives the lowest overhead for streaming playback. `wav`/`mp3` work but add per-chunk framing bytes.


Reference: https://docs.smallest.ai/waves/api-reference/api-reference/text-to-speech/synthesize-speech-sse

## OpenAPI Specification

```yaml
openapi: 3.1.0
info:
  title: waves-v4
  version: 1.0.0
paths:
  /waves/v1/tts/live:
    post:
      operationId: synthesize-speech-sse
      summary: Stream speech (SSE)
      description: >
        Synthesize speech and stream the audio back over Server-Sent Events.
        Same body as `/waves/v1/tts` — the only difference is the response is a
        stream of base64-encoded PCM chunks instead of one binary blob.


        Pick the model with the `model` body parameter, same as the sync route.


        <Note>
          **The same URL serves the WebSocket endpoint.** `wss://api.smallest.ai/waves/v1/tts/live` accepts a WebSocket upgrade for streaming-text scenarios (LLM token streams, live captioning). The HTTP `POST` documented on this page returns SSE; use `wss://` to use the WebSocket protocol instead. See the [WebSocket reference](/waves/api-reference/api-reference/text-to-speech/tts).
        </Note>


        ## When to use this


        - **Use this** when you want playback to start before synthesis is
        complete — long passages, latency-sensitive UI, live narration.

        - **Use sync `/waves/v1/tts`** when total latency doesn't matter and
        you'd rather get one buffer.

        - **Use `/waves/v1/tts/live`** (WebSocket) when the *text* arrives
        incrementally (LLM token stream). SSE assumes you have the full text up
        front.


        ## How it works


        1. POST your text + voice settings — same payload as `/waves/v1/tts`,
        plus optional `model`.

        2. The response is `Content-Type: text/event-stream`. Each chunk frame
        is `event: audio\n` followed by `data: {"audio": "<base64-pcm>"}\n\n`.

        3. Decode each chunk's `audio` field with base64 and feed the PCM bytes
        to your audio pipeline (browser `MediaSource`, ffmpeg pipe, raw PCM
        player, etc.).

        4. A final `data: {"done": true}\n\n` frame marks end of stream.


        ## Examples


        **cURL**

        ```bash

        curl -N -X POST "https://api.smallest.ai/waves/v1/tts/live" \
          -H "Authorization: Bearer $SMALLEST_API_KEY" \
          -H "Content-Type: application/json" \
          -d '{
            "text": "Streaming this paragraph chunk by chunk so playback can start sooner.",
            "voice_id": "magnus",
            "sample_rate": 24000,
            "output_format": "pcm"
          }'
        ```


        ## Common gotchas


        - **Use a streaming-friendly client.** `curl -N`, Python `iter_lines`,
        or a `fetch` `ReadableStream` reader. Buffering clients will hide the
        latency win.

        - **Audio is base64 inside the event payload**, not the raw event bytes.
        Decode the `data.audio` field per event.

        - **`output_format=pcm`** gives the lowest overhead for streaming
        playback. `wav`/`mp3` work but add per-chunk framing bytes.
      tags:
        - subpackage_textToSpeech
      parameters:
        - name: Authorization
          in: header
          required: true
          schema:
            type: string
      responses:
        '200':
          description: Synthesized speech retrieved successfully.
          content:
            text/event-stream:
              schema:
                type: string
        '400':
          description: Bad request.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TtsError'
        '401':
          description: Unauthorized.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TtsError'
        '500':
          description: Server error occurred.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/TtsError'
      requestBody:
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/TtsRequest'
servers:
  - url: https://api.smallest.ai
    description: waves
components:
  schemas:
    TtsRequestModel:
      type: string
      enum:
        - lightning_v3.1
        - lightning_v3.1_pro
      default: lightning_v3.1
      description: |
        TTS model to route the request to. Controls which model pool serves
        this synthesis.

        - `lightning_v3.1` (default) — standard Lightning v3.1.
        - `lightning_v3.1_pro` — Lightning v3.1 Pro pool. Improved audio
          quality and naturalness, with a curated voice catalog. See the
          [Lightning v3.1 Pro model card](/waves/model-cards/text-to-speech/lightning-v-3-1-pro)
          for supported voice IDs.

        Same concurrency and latency profile across both. Other request
        parameters behave identically.
      title: TtsRequestModel
    TtsRequestSampleRate:
      type: string
      enum:
        - '8000'
        - '16000'
        - '24000'
        - '44100'
      description: The sample rate for the generated audio.
      title: TtsRequestSampleRate
    TtsRequestLanguage:
      type: string
      enum:
        - en
        - hi
        - mr
        - kn
        - ta
        - bn
        - gu
        - te
        - ml
        - pa
        - or
        - es
      description: >
        Language code for synthesis. Influences pronunciation, number/date

        normalization, and phoneme selection.


        Each voice has its own `tags.language` set in the voice catalog —

        query `GET /waves/v1/lightning-v3.1/get_voices`. Pass a language

        the voice was trained on; passing other codes is accepted by the

        API but produces English-pronounced output.


        **On `lightning_v3.1`**, the full 12-language catalog applies.


        **On `lightning_v3.1_pro`**:

        - Pass `en` → UK + American accented English.

        - Pass `hi` → Indian accented English + Hindi (code-switching).

        - Omit `language` → defaults to `en + hi` (mixed Indian + Western
        English coverage).
      title: TtsRequestLanguage
    TtsRequestOutputFormat:
      type: string
      enum:
        - mp3
        - pcm
        - wav
        - ulaw
        - alaw
      default: pcm
      description: |
        Format of the returned audio. `pcm` is the lowest-latency option
        but requires a decoder to play; `mp3` and `wav` are directly
        playable in browsers and most media players. The server default
        is `pcm` when the field is omitted — the API playground uses
        `mp3` so the generated audio is directly playable.
      title: TtsRequestOutputFormat
    TtsRequest:
      type: object
      properties:
        text:
          type: string
          default: Hello from Waves TTS.
          description: The text to convert to speech.
        voice_id:
          type: string
          default: magnus
          description: >-
            The voice identifier to use for speech generation. See the model
            card for available voices per model.
        model:
          $ref: '#/components/schemas/TtsRequestModel'
          default: lightning_v3.1
          description: |
            TTS model to route the request to. Controls which model pool serves
            this synthesis.

            - `lightning_v3.1` (default) — standard Lightning v3.1.
            - `lightning_v3.1_pro` — Lightning v3.1 Pro pool. Improved audio
              quality and naturalness, with a curated voice catalog. See the
              [Lightning v3.1 Pro model card](/waves/model-cards/text-to-speech/lightning-v-3-1-pro)
              for supported voice IDs.

            Same concurrency and latency profile across both. Other request
            parameters behave identically.
        sample_rate:
          $ref: '#/components/schemas/TtsRequestSampleRate'
          default: 44100
          description: The sample rate for the generated audio.
        speed:
          type: number
          format: double
          default: 1
          description: The speed of the generated speech.
        language:
          $ref: '#/components/schemas/TtsRequestLanguage'
          description: >
            Language code for synthesis. Influences pronunciation, number/date

            normalization, and phoneme selection.


            Each voice has its own `tags.language` set in the voice catalog —

            query `GET /waves/v1/lightning-v3.1/get_voices`. Pass a language

            the voice was trained on; passing other codes is accepted by the

            API but produces English-pronounced output.


            **On `lightning_v3.1`**, the full 12-language catalog applies.


            **On `lightning_v3.1_pro`**:

            - Pass `en` → UK + American accented English.

            - Pass `hi` → Indian accented English + Hindi (code-switching).

            - Omit `language` → defaults to `en + hi` (mixed Indian + Western
            English coverage).
        output_format:
          $ref: '#/components/schemas/TtsRequestOutputFormat'
          default: pcm
          description: |
            Format of the returned audio. `pcm` is the lowest-latency option
            but requires a decoder to play; `mp3` and `wav` are directly
            playable in browsers and most media players. The server default
            is `pcm` when the field is omitted — the API playground uses
            `mp3` so the generated audio is directly playable.
        pronunciation_dicts:
          type: array
          items:
            type: string
          description: >-
            The IDs of the pronunciation dictionaries to use for speech
            generation. Available on both `lightning_v3.1` and
            `lightning_v3.1_pro`.
        word_timestamps:
          type: boolean
          default: false
          description: >
            **WebSocket-only feature.** Accepted on this endpoint but ignored —
            no per-word timing information is returned in the sync HTTP or SSE
            response shape. To receive `status: "word_timestamp"` frames with
            per-word `{ id, word, start, end }` data, use the WebSocket endpoint
            `wss://api.smallest.ai/waves/v1/tts/live`. See [Word-level
            timestamps](/waves/documentation/text-to-speech-lightning/word-timestamps).
        session_id:
          type: string
          description: >-
            Optional client-provided session identifier for correlation. Only
            alphanumeric characters, hyphens, underscores, and dots are allowed.
            Max 128 characters. Echoed back in response headers as
            `X-External-Session-Id`.
        request_id:
          type: string
          description: >-
            Optional client-provided request identifier for correlation. Only
            alphanumeric characters, hyphens, underscores, and dots are allowed.
            Max 128 characters. Echoed back in response headers as
            `X-External-Request-Id`.
      required:
        - text
        - voice_id
      title: TtsRequest
    TtsError:
      type: object
      properties:
        error:
          type: string
          description: Error type.
        message:
          type: string
          description: Error message.
      title: TtsError
  securitySchemes:
    BearerAuth:
      type: apiKey
      in: header
      name: Authorization

```

## Examples


**Request**

```json
{
  "text": "Hello from Waves TTS.",
  "voice_id": "magnus"
}
```

**SDK Code**

```python
import requests

url = "https://api.smallest.ai/waves/v1/tts/live"

payload = {
    "text": "Hello from Waves TTS.",
    "voice_id": "magnus"
}
headers = {
    "Authorization": "Bearer <BearerAuth>",
    "Content-Type": "application/json"
}

response = requests.post(url, json=payload, headers=headers)

print(response.json())
```

```javascript
const url = 'https://api.smallest.ai/waves/v1/tts/live';
const options = {
  method: 'POST',
  headers: {Authorization: 'Bearer <BearerAuth>', 'Content-Type': 'application/json'},
  body: '{"text":"Hello from Waves TTS.","voice_id":"magnus"}'
};

try {
  const response = await fetch(url, options);
  const data = await response.json();
  console.log(data);
} catch (error) {
  console.error(error);
}
```

```go
package main

import (
	"fmt"
	"strings"
	"net/http"
	"io"
)

func main() {

	url := "https://api.smallest.ai/waves/v1/tts/live"

	payload := strings.NewReader("{\n  \"text\": \"Hello from Waves TTS.\",\n  \"voice_id\": \"magnus\"\n}")

	req, _ := http.NewRequest("POST", url, payload)

	req.Header.Add("Authorization", "Bearer <BearerAuth>")
	req.Header.Add("Content-Type", "application/json")

	res, _ := http.DefaultClient.Do(req)

	defer res.Body.Close()
	body, _ := io.ReadAll(res.Body)

	fmt.Println(res)
	fmt.Println(string(body))

}
```

```ruby
require 'uri'
require 'net/http'

url = URI("https://api.smallest.ai/waves/v1/tts/live")

http = Net::HTTP.new(url.host, url.port)
http.use_ssl = true

request = Net::HTTP::Post.new(url)
request["Authorization"] = 'Bearer <BearerAuth>'
request["Content-Type"] = 'application/json'
request.body = "{\n  \"text\": \"Hello from Waves TTS.\",\n  \"voice_id\": \"magnus\"\n}"

response = http.request(request)
puts response.read_body
```

```java
import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.Unirest;

HttpResponse<String> response = Unirest.post("https://api.smallest.ai/waves/v1/tts/live")
  .header("Authorization", "Bearer <BearerAuth>")
  .header("Content-Type", "application/json")
  .body("{\n  \"text\": \"Hello from Waves TTS.\",\n  \"voice_id\": \"magnus\"\n}")
  .asString();
```

```php
<?php
require_once('vendor/autoload.php');

$client = new \GuzzleHttp\Client();

$response = $client->request('POST', 'https://api.smallest.ai/waves/v1/tts/live', [
  'body' => '{
  "text": "Hello from Waves TTS.",
  "voice_id": "magnus"
}',
  'headers' => [
    'Authorization' => 'Bearer <BearerAuth>',
    'Content-Type' => 'application/json',
  ],
]);

echo $response->getBody();
```

```csharp
using RestSharp;

var client = new RestClient("https://api.smallest.ai/waves/v1/tts/live");
var request = new RestRequest(Method.POST);
request.AddHeader("Authorization", "Bearer <BearerAuth>");
request.AddHeader("Content-Type", "application/json");
request.AddParameter("application/json", "{\n  \"text\": \"Hello from Waves TTS.\",\n  \"voice_id\": \"magnus\"\n}", ParameterType.RequestBody);
IRestResponse response = client.Execute(request);
```

```swift
import Foundation

let headers = [
  "Authorization": "Bearer <BearerAuth>",
  "Content-Type": "application/json"
]
let parameters = [
  "text": "Hello from Waves TTS.",
  "voice_id": "magnus"
] as [String : Any]

let postData = JSONSerialization.data(withJSONObject: parameters, options: [])

let request = NSMutableURLRequest(url: NSURL(string: "https://api.smallest.ai/waves/v1/tts/live")! as URL,
                                        cachePolicy: .useProtocolCachePolicy,
                                    timeoutInterval: 10.0)
request.httpMethod = "POST"
request.allHTTPHeaderFields = headers
request.httpBody = postData as Data

let session = URLSession.shared
let dataTask = session.dataTask(with: request as URLRequest, completionHandler: { (data, response, error) -> Void in
  if (error != nil) {
    print(error as Any)
  } else {
    let httpResponse = response as? HTTPURLResponse
    print(httpResponse)
  }
})

dataTask.resume()
```