Lightning v2 SSE | Smallest AI Docs

When to Use

Interactive Applications: Perfect for chatbots, virtual assistants, and other applications requiring immediate voice responses
Long-Form Content: Efficiently stream audio for articles, stories, or other long-form content without buffering delays
Voice User Interfaces: Create natural-sounding voice interfaces with minimal perceived latency
Accessibility Solutions: Provide real-time audio versions of written content for users with visual impairments

How It Works

Make a POST Request: Send your text and voice settings to the API endpoint
Receive Audio Chunks: The API processes your text and streams audio back as base64-encoded chunks with 1024 byte size
Process the Stream: Handle the SSE events to decode and play audio chunks sequentially
End of Stream: The API sends a completion event when all audio has been delivered

The Lightning v2 SSE API provides real-time text-to-speech streaming capabilities with high-quality voice synthesis. This API uses Server-Sent Events (SSE) to deliver audio chunks as they're generated, enabling low-latency audio playback without waiting for the entire audio file to process. For an end-to-end example of how to use the Lightning v2 SSE API, check out [Text to Speech (SSE) Example](https://github.com/smallest-inc/waves-examples/blob/main/lightning_v2/http_streaming/http_streaming_api.py) ## When to Use - **Interactive Applications**: Perfect for chatbots, virtual assistants, and other applications requiring immediate voice responses - **Long-Form Content**: Efficiently stream audio for articles, stories, or other long-form content without buffering delays - **Voice User Interfaces**: Create natural-sounding voice interfaces with minimal perceived latency - **Accessibility Solutions**: Provide real-time audio versions of written content for users with visual impairments ## How It Works 1. **Make a POST Request**: Send your text and voice settings to the API endpoint 2. **Receive Audio Chunks**: The API processes your text and streams audio back as base64-encoded chunks with 1024 byte size 3. **Process the Stream**: Handle the SSE events to decode and play audio chunks sequentially 4. **End of Stream**: The API sends a completion event when all audio has been delivered

Authentication

AuthorizationBearer

Bearer authentication of the form Bearer <token>, where token is your auth token.

Request

This endpoint expects an object.

textstringRequired

The text to convert to speech.

voice_idstringRequired

The voice identifier to use for speech generation.

sample_rateintegerOptional8000-24000Defaults to 24000

The sample rate for the generated audio.

speeddoubleOptional0.5-2Defaults to 1

The speed of the generated speech.

consistencydoubleOptional0-1Defaults to 0.5

This parameter controls word repetition and skipping. Decrease it to prevent skipped words, and increase it to prevent repetition.

similaritydoubleOptional0-1Defaults to 0

This parameter controls the similarity between the generated speech and the reference audio. Increase it to make the speech more similar to the reference audio.

enhancementdoubleOptional0-2Defaults to 1

Enhances speech quality at the cost of increased latency.

languageenumOptionalDefaults to en

Determines how numbers are spelled out. If set to 'en', numbers will be read as individual digits in English. If set to 'hi', numbers will be read as individual digits in Hindi.

output_formatenumOptionalDefaults to pcm

The format of the output audio.

Allowed values:

pronunciation_dictslist of stringsOptional

The IDs of the pronunciation dictionaries to use for speech generation.

Response

Synthesized speech retrieved successfully.

Errors

When to Use

Interactive Applications: Perfect for chatbots, virtual assistants, and other applications requiring immediate voice responses
Long-Form Content: Efficiently stream audio for articles, stories, or other long-form content without buffering delays
Voice User Interfaces: Create natural-sounding voice interfaces with minimal perceived latency
Accessibility Solutions: Provide real-time audio versions of written content for users with visual impairments

How It Works

Make a POST Request: Send your text and voice settings to the API endpoint
Receive Audio Chunks: The API processes your text and streams audio back as base64-encoded chunks with 1024 byte size
Process the Stream: Handle the SSE events to decode and play audio chunks sequentially
End of Stream: The API sends a completion event when all audio has been delivered

1	from smallest_ai import SmallestAI
2
3	client = SmallestAI(
4	token="YOUR_TOKEN_HERE"
5	)
6
7	client.waves.lightning_v2.stream_lightningv2speech(
8	text="string",
9	voice_id="string"
10	)