Transcription
Overview
The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.
Endpoint
Authentication
Requires Bearer token authentication with your license key.
See Authentication for details.
Request
From URL
Transcribe audio from a publicly accessible URL:
From File Upload
Upload audio directly:
Parameters
URL to audio file (mutually exclusive with file upload)
Supported protocols: http://, https://, s3://
Audio file upload (mutually exclusive with URL)
Supported formats: WAV, MP3, FLAC, OGG, M4A
Language code (ISO 639-1)
Examples: en, es, fr, de, zh
Add punctuation to transcript
Enable speaker diarization (identify different speakers)
Expected number of speakers (for diarization)
If not specified, automatically detected
Include word-level timestamps
Webhook URL for async results delivery
If provided, returns immediately with job ID
Response
Successful Response
Response Fields
Examples
Basic Transcription
cURL
Python
JavaScript
With Punctuation and Timestamps
Response:
With Speaker Diarization
Response:
File Upload
Async with Callback
Immediate response:
Later, webhook receives:
Error Responses
400 Bad Request
415 Unsupported Media Type
422 Unprocessable Entity
503 Service Unavailable
Audio Format Requirements
Supported Formats
Recommended Specifications
- Sample Rate: 16 kHz or higher (44.1 kHz recommended)
- Bit Depth: 16-bit or higher
- Channels: Mono or stereo
- Max Duration: 2 hours
- Max File Size: 100 MB
Audio Preprocessing
For best results:
- Remove background noise
- Normalize audio levels
- Use mono audio when possible
- Encode at 16 kHz or 44.1 kHz
Rate Limits
Default rate limits:
- Requests per minute: 60
- Concurrent requests: 10
- Audio hours per day: 100
Contact support@smallest.ai to increase limits for your license.
Performance
Typical performance metrics:
Performance varies based on:
- Audio duration and complexity
- Number of speakers
- GPU instance type
- Current load
Best Practices
Optimize Audio Quality
- Use lossless formats (WAV, FLAC) when possible
- Ensure clear audio with minimal background noise
- Use appropriate sample rate (16 kHz minimum)
Handle Errors Gracefully
Implement retry logic with exponential backoff:
Use Async for Long Audio
For audio longer than 5 minutes, use callback URL:
Cache Results
Cache transcription results to avoid duplicate processing:

