Smallest AI API implements concurrency limits to ensure fair usage and optimal performance across all users. Understanding these limits is crucial for building robust applications that integrate with our services.
Concurrency refers to the number of simultaneous requests that can be processed at any given moment. In the context of Smallest AI API:
429 Too Many Requests errorCheck your usage patterns in the Waves dashboard to:
Link to dashboard: https://app.smallest.ai/dashboard/developers/usage?utm_source=documentation&utm_medium=api-references
For conversational applications, you can potentially support approximately 4x your concurrency limit in parallel conversations. This is based on the typical speaking patterns where users don’t speak continuously.
This is a rough estimate and may fail when multiple conversations simultaneously request TTS generation. Your application must handle 429 errors gracefully when the actual concurrency limit is reached.
If your application requires higher concurrency limits, please contact our support team to discuss enterprise plans with increased limits.
Concurrency limits are account basis. If you are using multiple models, all models share the same concurrency limit.