For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • API References
    • Authentication
    • Concurrency and Limits
    • WebSocket
  • Text to Speech
    • POSTSynthesize Speech
    • STREAMStream Speech (SSE)
    • WSSStream Speech (WebSocket)
    • POSTLightning v3.1 (endpoint will be deprecated)
    • POSTLightning v3.1 SSE (endpoint will be deprecated)
    • WSSLightning v3.1 WebSocket (endpoint will be deprecated)
    • POSTLightning v2 (Deprecated)
    • POSTLightning v2 SSE (Deprecated)
    • WSSLightning v2 WebSocket (Deprecated)
    • GETGet Voices
    • POSTCreate a Voice Clone
    • GETList Voice Clones
    • DELDelete a Voice Clone
    • POSTAdd Voice (Deprecated)
    • GETGet Cloned Voices (Deprecated)
    • GETGet Pronunciation Dictionaries
    • POSTCreate Pronunciation Dictionary
    • PUTUpdate Pronunciation Dictionary
    • DELDelete Pronunciation Dictionary
  • Speech to Text
    • POSTPulse (Pre-Recorded)
    • WSSPulse (Realtime)
  • LLM (Chat Completions)
    • POSTElectron — Chat Completions
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Overview
  • What is Concurrency?
  • How Concurrency Works
  • HTTP API Requests
  • WebSocket Connections
  • Monitoring Your Usage
  • Dashboard Monitoring
  • Parallel Conversational Bots
  • How It Works
  • Upgrading Limits
API References

Concurrency and Limits

||View as Markdown|
Was this page helpful?
Previous

Authentication

Next

WebSocket Support for Text to Speech (TTS) API

Built with

Overview

Smallest AI API implements concurrency limits to ensure fair usage and optimal performance across all users. Understanding these limits is crucial for building robust applications that integrate with our services.

What is Concurrency?

Concurrency refers to the number of simultaneous requests that can be processed at any given moment. In the context of Smallest AI API:

  • 1 TTS request concurrency: Only 1 Text-to-Speech request can be actively processed at a time per account
  • This applies to every Lightning v3.1 TTS endpoint (sync, SSE, WebSocket) and the deprecated Lightning v2 endpoints

How Concurrency Works

HTTP API Requests

  • Each HTTP API call (POST request) counts as 1 concurrency unit while being processed
  • Once the request completes and returns a response, the concurrency slot is freed
  • If you attempt to make a second HTTP request while one is already being processed, you’ll receive a 429 Too Many Requests error

WebSocket Connections

  • You can establish up to 5 WebSocket connections simultaneously (5 × concurrency limit)
  • However, only 1 concurrent request can be processed across all WebSocket connections
  • Additional requests sent through any WebSocket while one is being processed will be rejected with an error

Monitoring Your Usage

Dashboard Monitoring

Check your usage patterns in the Waves dashboard to:

  • Monitor request patterns
  • Identify peak usage times
  • Plan capacity requirements

Link to dashboard: https://app.smallest.ai/dashboard/developers/usage?utm_source=documentation&utm_medium=api-references

Parallel Conversational Bots

For conversational applications, you can potentially support approximately 4x your concurrency limit in parallel conversations. This is based on the typical speaking patterns where users don’t speak continuously.

How It Works

  • Concurrency limit: 1 active TTS request
  • Potential parallel conversations: ~4 conversations simultaneously
  • Reasoning: In natural conversation, users speak intermittently with pauses between responses

    This is a rough estimate and may fail when multiple conversations simultaneously request TTS generation. Your application must handle 429 errors gracefully when the actual concurrency limit is reached.

Upgrading Limits

If your application requires higher concurrency limits, please contact our support team to discuss enterprise plans with increased limits.

Concurrency limits are account basis. If you are using multiple models, all models share the same concurrency limit.