For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Prerequisites
    • Why Self-Host?
    • Architecture
  • Docker Setup
  • Kubernetes Setup
    • Quick Start
    • Troubleshooting
  • Troubleshooting
    • Common Issues
    • Debugging Guide
    • Logs Analysis
  • API Reference
    • Authentication
      • Health Check
      • Transcription
    • Examples
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Overview
  • Endpoint
  • Authentication
  • Request
  • From URL
  • From File Upload
  • Parameters
  • Response
  • Successful Response
  • Response Fields
  • Examples
  • Basic Transcription
  • With Punctuation and Timestamps
  • With Speaker Diarization
  • File Upload
  • Async with Callback
  • Error Responses
  • 400 Bad Request
  • 415 Unsupported Media Type
  • 422 Unprocessable Entity
  • 503 Service Unavailable
  • Audio Format Requirements
  • Supported Formats
  • Recommended Specifications
  • Audio Preprocessing
  • Rate Limits
  • Performance
  • Best Practices
  • What’s Next?
API ReferenceEndpoints

Transcription

||View as Markdown|
Was this page helpful?
Previous

Health Check

Next

Integration Examples

Built with

Overview

The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.

Endpoint

POST /v1/listen

Authentication

Requires Bearer token authentication with your license key.

1Authorization: Token YOUR_LICENSE_KEY

See Authentication for details.

Request

From URL

Transcribe audio from a publicly accessible URL:

1{
2 "url": "https://example.com/audio.wav"
3}

From File Upload

Upload audio directly:

$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -F "audio=@/path/to/audio.wav"

Parameters

url
string

URL to audio file (mutually exclusive with file upload)

Supported protocols: http://, https://, s3://

audio
file

Audio file upload (mutually exclusive with URL)

Supported formats: WAV, MP3, FLAC, OGG, M4A

language
stringDefaults to en

Language code (ISO 639-1)

Examples: en, es, fr, de, zh

punctuate
booleanDefaults to true

Add punctuation to transcript

diarize
booleanDefaults to false

Enable speaker diarization (identify different speakers)

num_speakers
integer

Expected number of speakers (for diarization)

If not specified, automatically detected

timestamps
booleanDefaults to false

Include word-level timestamps

callback_url
string

Webhook URL for async results delivery

If provided, returns immediately with job ID

Response

Successful Response

1{
2 "request_id": "req_abc123",
3 "text": "Hello, this is a sample transcription.",
4 "confidence": 0.95,
5 "duration": 3.2,
6 "language": "en",
7 "words": [
8 {
9 "word": "Hello",
10 "start": 0.0,
11 "end": 0.5,
12 "confidence": 0.98
13 },
14 {
15 "word": "this",
16 "start": 0.6,
17 "end": 0.8,
18 "confidence": 0.97
19 }
20 ]
21}

Response Fields

Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!

Examples

Basic Transcription

cURL
Python
JavaScript
$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -H "Content-Type: application/json" \
> -d '{
> "url": "https://example.com/audio.wav"
> }'

With Punctuation and Timestamps

1{
2 "url": "https://example.com/audio.wav",
3 "punctuate": true,
4 "timestamps": true
5}

Response:

1{
2 "request_id": "req_abc123",
3 "text": "Hello, this is a sample transcription.",
4 "confidence": 0.95,
5 "duration": 3.2,
6 "words": [
7 {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
8 {"word": ",", "start": 0.5, "end": 0.5, "confidence": 1.0},
9 {"word": "this", "start": 0.6, "end": 0.8, "confidence": 0.97}
10 ]
11}

With Speaker Diarization

1{
2 "url": "https://example.com/conversation.wav",
3 "diarize": true,
4 "num_speakers": 2
5}

Response:

1{
2 "request_id": "req_abc123",
3 "text": "Hello. Hi there!",
4 "speakers": [
5 {
6 "speaker": "SPEAKER_00",
7 "text": "Hello.",
8 "start": 0.0,
9 "end": 0.8
10 },
11 {
12 "speaker": "SPEAKER_01",
13 "text": "Hi there!",
14 "start": 1.0,
15 "end": 1.8
16 }
17 ]
18}

File Upload

$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -F "audio=@recording.wav" \
> -F "punctuate=true" \
> -F "language=en"

Async with Callback

1{
2 "url": "https://example.com/long-audio.wav",
3 "callback_url": "https://myapp.com/webhook/transcription"
4}

Immediate response:

1{
2 "job_id": "job_xyz789",
3 "status": "processing"
4}

Later, webhook receives:

1{
2 "job_id": "job_xyz789",
3 "status": "completed",
4 "result": {
5 "text": "...",
6 "confidence": 0.95
7 }
8}

Error Responses

400 Bad Request

1{
2 "error": "Missing required parameter: url or audio file",
3 "code": "MISSING_PARAMETER"
4}

415 Unsupported Media Type

1{
2 "error": "Unsupported audio format",
3 "code": "UNSUPPORTED_FORMAT",
4 "supported_formats": ["wav", "mp3", "flac", "ogg", "m4a"]
5}

422 Unprocessable Entity

1{
2 "error": "Audio file too large",
3 "code": "FILE_TOO_LARGE",
4 "max_size_mb": 100
5}

503 Service Unavailable

1{
2 "error": "No ASR workers available",
3 "code": "SERVICE_UNAVAILABLE",
4 "retry_after": 30
5}

Audio Format Requirements

Supported Formats

FormatExtensionNotes
WAV.wavRecommended for best quality
MP3.mp3Widely supported
FLAC.flacLossless compression
OGG.oggOpen format
M4A.m4aApple format

Recommended Specifications

  • Sample Rate: 16 kHz or higher (44.1 kHz recommended)
  • Bit Depth: 16-bit or higher
  • Channels: Mono or stereo
  • Max Duration: 2 hours
  • Max File Size: 100 MB

Audio Preprocessing

For best results:

  • Remove background noise
  • Normalize audio levels
  • Use mono audio when possible
  • Encode at 16 kHz or 44.1 kHz

Rate Limits

Default rate limits:

  • Requests per minute: 60
  • Concurrent requests: 10
  • Audio hours per day: 100

Contact support@smallest.ai to increase limits for your license.

Performance

Typical performance metrics:

MetricValue
Real-time Factor0.05-0.15x
Latency (1 min audio)3-9 seconds
Concurrent capacity100+ requests
Throughput100+ hours/hour

Performance varies based on:

  • Audio duration and complexity
  • Number of speakers
  • GPU instance type
  • Current load

Best Practices

Optimize Audio Quality
  • Use lossless formats (WAV, FLAC) when possible
  • Ensure clear audio with minimal background noise
  • Use appropriate sample rate (16 kHz minimum)
Handle Errors Gracefully

Implement retry logic with exponential backoff:

1import time
2from requests.adapters import HTTPAdapter
3from requests.packages.urllib3.util.retry import Retry
4
5session = requests.Session()
6retry = Retry(
7 total=3,
8 backoff_factor=1,
9 status_forcelist=[429, 500, 502, 503, 504]
10)
11adapter = HTTPAdapter(max_retries=retry)
12session.mount('http://', adapter)
Use Async for Long Audio

For audio longer than 5 minutes, use callback URL:

1{
2 "url": "https://example.com/podcast.mp3",
3 "callback_url": "https://myapp.com/webhook"
4}
Cache Results

Cache transcription results to avoid duplicate processing:

1import hashlib
2
3def get_cache_key(audio_url):
4 return hashlib.md5(audio_url.encode()).hexdigest()
5
6cache_key = get_cache_key(audio_url)
7if cache_key in cache:
8 return cache[cache_key]
9
10result = transcribe(audio_url)
11cache[cache_key] = result
12return result

What’s Next?

Health Check

Monitor service availability

Examples

Complete integration examples