Transcription | Smallest AI Docs

Overview

The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.

Endpoint

POST /v1/listen

Authentication

Requires Bearer token authentication with your license key.

1 Authorization: Token YOUR_LICENSE_KEY

See Authentication for details.

Request

From URL

Transcribe audio from a publicly accessible URL:

1 {
2   "url": "https://example.com/audio.wav"
3 }

From File Upload

Upload audio directly:

$ curl -X POST http://localhost:7100/v1/listen \
>   -H "Authorization: Token ${LICENSE_KEY}" \
>   -F "audio=@/path/to/audio.wav"

Parameters

url

string

URL to audio file (mutually exclusive with file upload)

Supported protocols: http://, https://, s3://

audio

file

Audio file upload (mutually exclusive with URL)

Supported formats: WAV, MP3, FLAC, OGG, M4A

language

stringDefaults to en

Language code (ISO 639-1)

Examples: en, es, fr, de, zh

punctuate

booleanDefaults to true

Add punctuation to transcript

diarize

booleanDefaults to false

Enable speaker diarization (identify different speakers)

num_speakers

integer

Expected number of speakers (for diarization)

If not specified, automatically detected

timestamps

booleanDefaults to false

Include word-level timestamps

callback_url

string

Webhook URL for async results delivery

If provided, returns immediately with job ID

Response

Successful Response

1 {
2   "request_id": "req_abc123",
3   "text": "Hello, this is a sample transcription.",
4   "confidence": 0.95,
5   "duration": 3.2,
6   "language": "en",
7   "words": [
8     {
9       "word": "Hello",
10       "start": 0.0,
11       "end": 0.5,
12       "confidence": 0.98
13     },
14     {
15       "word": "this",
16       "start": 0.6,
17       "end": 0.8,
18       "confidence": 0.97
19     }
20   ]
21 }

Response Fields

Something went wrong!

Examples

Basic Transcription

cURL

Python

JavaScript

$ curl -X POST http://localhost:7100/v1/listen \
>   -H "Authorization: Token ${LICENSE_KEY}" \
>   -H "Content-Type: application/json" \
>   -d '{
>     "url": "https://example.com/audio.wav"
>   }'

With Punctuation and Timestamps

1 {
2   "url": "https://example.com/audio.wav",
3   "punctuate": true,
4   "timestamps": true
5 }

Response:

1 {
2   "request_id": "req_abc123",
3   "text": "Hello, this is a sample transcription.",
4   "confidence": 0.95,
5   "duration": 3.2,
6   "words": [
7     {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
8     {"word": ",", "start": 0.5, "end": 0.5, "confidence": 1.0},
9     {"word": "this", "start": 0.6, "end": 0.8, "confidence": 0.97}
10   ]
11 }

With Speaker Diarization

1 {
2   "url": "https://example.com/conversation.wav",
3   "diarize": true,
4   "num_speakers": 2
5 }

Response:

1 {
2   "request_id": "req_abc123",
3   "text": "Hello. Hi there!",
4   "speakers": [
5     {
6       "speaker": "SPEAKER_00",
7       "text": "Hello.",
8       "start": 0.0,
9       "end": 0.8
10     },
11     {
12       "speaker": "SPEAKER_01",
13       "text": "Hi there!",
14       "start": 1.0,
15       "end": 1.8
16     }
17   ]
18 }

File Upload

$ curl -X POST http://localhost:7100/v1/listen \
>   -H "Authorization: Token ${LICENSE_KEY}" \
>   -F "audio=@recording.wav" \
>   -F "punctuate=true" \
>   -F "language=en"

Async with Callback

1 {
2   "url": "https://example.com/long-audio.wav",
3   "callback_url": "https://myapp.com/webhook/transcription"
4 }

Immediate response:

1 {
2   "job_id": "job_xyz789",
3   "status": "processing"
4 }

Later, webhook receives:

1 {
2   "job_id": "job_xyz789",
3   "status": "completed",
4   "result": {
5     "text": "...",
6     "confidence": 0.95
7   }
8 }

Error Responses

400 Bad Request

1 {
2   "error": "Missing required parameter: url or audio file",
3   "code": "MISSING_PARAMETER"
4 }

415 Unsupported Media Type

1 {
2   "error": "Unsupported audio format",
3   "code": "UNSUPPORTED_FORMAT",
4   "supported_formats": ["wav", "mp3", "flac", "ogg", "m4a"]
5 }

422 Unprocessable Entity

1 {
2   "error": "Audio file too large",
3   "code": "FILE_TOO_LARGE",
4   "max_size_mb": 100
5 }

503 Service Unavailable

1 {
2   "error": "No ASR workers available",
3   "code": "SERVICE_UNAVAILABLE",
4   "retry_after": 30
5 }

Audio Format Requirements

Supported Formats

Format	Extension	Notes
WAV	`.wav`	Recommended for best quality
MP3	`.mp3`	Widely supported
FLAC	`.flac`	Lossless compression
OGG	`.ogg`	Open format
M4A	`.m4a`	Apple format

Recommended Specifications

Sample Rate: 16 kHz or higher (44.1 kHz recommended)
Bit Depth: 16-bit or higher
Channels: Mono or stereo
Max Duration: 2 hours
Max File Size: 100 MB

Audio Preprocessing

For best results:

Remove background noise
Normalize audio levels
Use mono audio when possible
Encode at 16 kHz or 44.1 kHz

Rate Limits

Default rate limits:

Requests per minute: 60
Concurrent requests: 10
Audio hours per day: 100

Contact support@smallest.ai to increase limits for your license.

Performance

Typical performance metrics:

Metric	Value
Real-time Factor	0.05-0.15x
Latency (1 min audio)	3-9 seconds
Concurrent capacity	100+ requests
Throughput	100+ hours/hour

Performance varies based on:

Audio duration and complexity
Number of speakers
GPU instance type
Current load

Best Practices

Optimize Audio Quality

Use lossless formats (WAV, FLAC) when possible
Ensure clear audio with minimal background noise
Use appropriate sample rate (16 kHz minimum)

Handle Errors Gracefully

Implement retry logic with exponential backoff:

1 import time
2 from requests.adapters import HTTPAdapter
3 from requests.packages.urllib3.util.retry import Retry
4 
5 session = requests.Session()
6 retry = Retry(
7     total=3,
8     backoff_factor=1,
9     status_forcelist=[429, 500, 502, 503, 504]
10 )
11 adapter = HTTPAdapter(max_retries=retry)
12 session.mount('http://', adapter)

Use Async for Long Audio

For audio longer than 5 minutes, use callback URL:

1 {
2   "url": "https://example.com/podcast.mp3",
3   "callback_url": "https://myapp.com/webhook"
4 }

Cache Results

Cache transcription results to avoid duplicate processing:

1 import hashlib
2 
3 def get_cache_key(audio_url):
4     return hashlib.md5(audio_url.encode()).hexdigest()
5 
6 cache_key = get_cache_key(audio_url)
7 if cache_key in cache:
8     return cache[cache_key]
9 
10 result = transcribe(audio_url)
11 cache[cache_key] = result
12 return result

What’s Next?

Health Check

Monitor service availability

Examples

Complete integration examples