Transcription

View as Markdown

Overview

The transcription endpoint converts audio files to text using Lightning ASR. Supports both batch processing and streaming.

Endpoint

POST /v1/listen

Authentication

Requires Bearer token authentication with your license key.

1Authorization: Token YOUR_LICENSE_KEY

See Authentication for details.

Request

From URL

Transcribe audio from a publicly accessible URL:

1{
2 "url": "https://example.com/audio.wav"
3}

From File Upload

Upload audio directly:

$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -F "audio=@/path/to/audio.wav"

Parameters

url
string

URL to audio file (mutually exclusive with file upload)

Supported protocols: http://, https://, s3://

audio
file

Audio file upload (mutually exclusive with URL)

Supported formats: WAV, MP3, FLAC, OGG, M4A

language
stringDefaults to en

Language code (ISO 639-1)

Examples: en, es, fr, de, zh

punctuate
booleanDefaults to true

Add punctuation to transcript

diarize
booleanDefaults to false

Enable speaker diarization (identify different speakers)

num_speakers
integer

Expected number of speakers (for diarization)

If not specified, automatically detected

timestamps
booleanDefaults to false

Include word-level timestamps

callback_url
string

Webhook URL for async results delivery

If provided, returns immediately with job ID

Response

Successful Response

1{
2 "request_id": "req_abc123",
3 "text": "Hello, this is a sample transcription.",
4 "confidence": 0.95,
5 "duration": 3.2,
6 "language": "en",
7 "words": [
8 {
9 "word": "Hello",
10 "start": 0.0,
11 "end": 0.5,
12 "confidence": 0.98
13 },
14 {
15 "word": "this",
16 "start": 0.6,
17 "end": 0.8,
18 "confidence": 0.97
19 }
20 ]
21}

Response Fields

Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!
Something went wrong!

Examples

Basic Transcription

$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -H "Content-Type: application/json" \
> -d '{
> "url": "https://example.com/audio.wav"
> }'

With Punctuation and Timestamps

1{
2 "url": "https://example.com/audio.wav",
3 "punctuate": true,
4 "timestamps": true
5}

Response:

1{
2 "request_id": "req_abc123",
3 "text": "Hello, this is a sample transcription.",
4 "confidence": 0.95,
5 "duration": 3.2,
6 "words": [
7 {"word": "Hello", "start": 0.0, "end": 0.5, "confidence": 0.98},
8 {"word": ",", "start": 0.5, "end": 0.5, "confidence": 1.0},
9 {"word": "this", "start": 0.6, "end": 0.8, "confidence": 0.97}
10 ]
11}

With Speaker Diarization

1{
2 "url": "https://example.com/conversation.wav",
3 "diarize": true,
4 "num_speakers": 2
5}

Response:

1{
2 "request_id": "req_abc123",
3 "text": "Hello. Hi there!",
4 "speakers": [
5 {
6 "speaker": "SPEAKER_00",
7 "text": "Hello.",
8 "start": 0.0,
9 "end": 0.8
10 },
11 {
12 "speaker": "SPEAKER_01",
13 "text": "Hi there!",
14 "start": 1.0,
15 "end": 1.8
16 }
17 ]
18}

File Upload

$curl -X POST http://localhost:7100/v1/listen \
> -H "Authorization: Token ${LICENSE_KEY}" \
> -F "audio=@recording.wav" \
> -F "punctuate=true" \
> -F "language=en"

Async with Callback

1{
2 "url": "https://example.com/long-audio.wav",
3 "callback_url": "https://myapp.com/webhook/transcription"
4}

Immediate response:

1{
2 "job_id": "job_xyz789",
3 "status": "processing"
4}

Later, webhook receives:

1{
2 "job_id": "job_xyz789",
3 "status": "completed",
4 "result": {
5 "text": "...",
6 "confidence": 0.95
7 }
8}

Error Responses

400 Bad Request

1{
2 "error": "Missing required parameter: url or audio file",
3 "code": "MISSING_PARAMETER"
4}

415 Unsupported Media Type

1{
2 "error": "Unsupported audio format",
3 "code": "UNSUPPORTED_FORMAT",
4 "supported_formats": ["wav", "mp3", "flac", "ogg", "m4a"]
5}

422 Unprocessable Entity

1{
2 "error": "Audio file too large",
3 "code": "FILE_TOO_LARGE",
4 "max_size_mb": 100
5}

503 Service Unavailable

1{
2 "error": "No ASR workers available",
3 "code": "SERVICE_UNAVAILABLE",
4 "retry_after": 30
5}

Audio Format Requirements

Supported Formats

FormatExtensionNotes
WAV.wavRecommended for best quality
MP3.mp3Widely supported
FLAC.flacLossless compression
OGG.oggOpen format
M4A.m4aApple format
  • Sample Rate: 16 kHz or higher (44.1 kHz recommended)
  • Bit Depth: 16-bit or higher
  • Channels: Mono or stereo
  • Max Duration: 2 hours
  • Max File Size: 100 MB

Audio Preprocessing

For best results:

  • Remove background noise
  • Normalize audio levels
  • Use mono audio when possible
  • Encode at 16 kHz or 44.1 kHz

Rate Limits

Default rate limits:

  • Requests per minute: 60
  • Concurrent requests: 10
  • Audio hours per day: 100

Contact support@smallest.ai to increase limits for your license.

Performance

Typical performance metrics:

MetricValue
Real-time Factor0.05-0.15x
Latency (1 min audio)3-9 seconds
Concurrent capacity100+ requests
Throughput100+ hours/hour

Performance varies based on:

  • Audio duration and complexity
  • Number of speakers
  • GPU instance type
  • Current load

Best Practices

  • Use lossless formats (WAV, FLAC) when possible
  • Ensure clear audio with minimal background noise
  • Use appropriate sample rate (16 kHz minimum)

Implement retry logic with exponential backoff:

1import time
2from requests.adapters import HTTPAdapter
3from requests.packages.urllib3.util.retry import Retry
4
5session = requests.Session()
6retry = Retry(
7 total=3,
8 backoff_factor=1,
9 status_forcelist=[429, 500, 502, 503, 504]
10)
11adapter = HTTPAdapter(max_retries=retry)
12session.mount('http://', adapter)

For audio longer than 5 minutes, use callback URL:

1{
2 "url": "https://example.com/podcast.mp3",
3 "callback_url": "https://myapp.com/webhook"
4}

Cache transcription results to avoid duplicate processing:

1import hashlib
2
3def get_cache_key(audio_url):
4 return hashlib.md5(audio_url.encode()).hexdigest()
5
6cache_key = get_cache_key(audio_url)
7if cache_key in cache:
8 return cache[cache_key]
9
10result = transcribe(audio_url)
11cache[cache_key] = result
12return result

What’s Next?