Quickstart
This guide shows you how to transcribe streaming audio using Smallest AI’s Pulse STT model via the WebSocket API. The Pulse model provides state-of-the-art low latencies (64ms) for TTFT (Time to First Transcript), making it an ideal choice for speech-to-text conversion during live conversations.
Real-Time Audio Transcription
The Real-Time API allows you to stream audio data and receive transcription results as the audio is processed. This is ideal for live conversations, voice assistants, and scenarios where you need immediate transcription feedback. For these scenarios, where minimizing latency is critical, stream audio in chunks of a few kilobytes over a live connection.
When to Use Real-Time Transcription
- Live conversations: Transcribe phone calls, video conferences, or live events.
- Voice assistants: Build interactive voice applications that respond immediately.
- Streaming workflows: Process audio as it is being captured or generated.
- Low-latency requirements: When you need transcription results with minimal delay.
Endpoint
Authentication
Head over to the smallest console to generate an API key if not done previously. Also look at Authentication guide for more information about API keys and their usage.
Include your API key in the Authorization header when establishing the WebSocket connection:
Example Connection
Example Response
The server responds with JSON messages containing transcription results:
For detailed information about response fields, see the response format documentation.
Streaming Audio
Send raw audio bytes as binary WebSocket messages. The recommended chunk size is 4096 bytes:
When you’re done streaming, send an end signal:
Next Steps
- Learn about supported audio formats for WebSocket streaming.
- Review complete code examples for Python, Node.js, and Browser JavaScript.
- Follow best practices for optimal streaming performance.
- Troubleshoot common issues in the troubleshooting guide.

