Quickstart

View as Markdown

This guide shows you how to convert an audio file into text using the unified Speech-to-Text endpoint. Both Pulse (multilingual, 38 languages) and Pulse Pro (leaderboard-ranked English) live behind the same path; you pick the model with ?model=.

Pre-Recorded Audio

Transcribe pre-recorded audio files using synchronous HTTPS POST requests. Perfect for batch processing, archived media, and offline transcription workflows.

The Pre-Recorded API takes an audio file and returns a complete transcript in a single request. Send raw bytes or, for the Pulse model, a URL.

Pick a model

If your audio is…UseWhy
English, and you want highest accuracypulse-proTied #2 on the public Open ASR Leaderboard (5.42% ESB avg WER). Pre-recorded HTTP only.
Multilingual, or you need streamingpulse38 languages, runs on both HTTP and the live WebSocket endpoint.

See the Pulse Pro model card and Pulse model card for full benchmarks and feature matrices.

Endpoint

POST https://api.smallest.ai/waves/v1/stt/?model={pulse|pulse-pro}

The existing path POST /waves/v1/pulse/get_text continues to work alongside the new unified path.

Authentication

Head over to the smallest console to generate an API key, if not done previously. Also look at the Authentication guide for more information about API keys.

Include your API key in the Authorization header:

1Authorization: Bearer SMALLEST_API_KEY

Example Request: Pulse Pro (English)

Send raw audio bytes against ?model=pulse-pro. Word timestamps add per-word timing and confidence scores; omit for higher throughput.

$# Download sample audio
$curl -L -o sample.wav "https://github.com/smallest-inc/cookbook/raw/main/speech-to-text/getting-started/samples/audio.wav"
$
$# Transcribe with Pulse Pro
$curl --request POST \
> --url "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
> --header "Authorization: Bearer $SMALLEST_API_KEY" \
> --header "Content-Type: application/octet-stream" \
> --data-binary "@sample.wav"

Async via webhook (Pulse Pro)

For long audio files where you do not want to hold an HTTP connection open, pass webhook_url. The endpoint returns 200 immediately with {"status": "processing", "request_id": "..."}; the transcription hits your webhook when ready.

$curl --request POST \
> --url "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
> --header "Authorization: Bearer $SMALLEST_API_KEY" \
> --header "Content-Type: application/octet-stream" \
> --data-binary "@longcall.wav"

Example Request: Pulse (multilingual)

For non-English audio, code-switching, or when you need streaming, use ?model=pulse. Set language explicitly to the known code (en, hi, es, etc.) for best accuracy, or use a multi-* aggregator for unknown audio.

Raw audio bytes

$curl --request POST \
> --url "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=hi&word_timestamps=true" \
> --header "Authorization: Bearer $SMALLEST_API_KEY" \
> --header "Content-Type: application/octet-stream" \
> --data-binary "@hindi-sample.wav"

Audio URL (Pulse only)

Pulse also accepts a URL for audio hosted in cloud storage. Pulse Pro does not support audio-by-URL.

$curl --request POST \
> --url "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en&word_timestamps=true" \
> --header "Authorization: Bearer $SMALLEST_API_KEY" \
> --header "Content-Type: application/json" \
> --data '{
> "url": "https://github.com/smallest-inc/cookbook/raw/main/speech-to-text/getting-started/samples/audio.wav"
> }'

For Pulse, set language explicitly to match the audio (en, hi, es, etc.) for the best accuracy. For unknown audio, pick the regional auto-detect scope: multi-eu (de, en, fr, it, nl, pt, ru, es), multi-indic (en, hi, mr, pa, gu, or, ka, ta, te, ml, bn), multi-asian (en, ja, ko, zh, yue), or multi for full multilingual auto-detection.

Example Response

Pulse Pro

1{
2 "status": "success",
3 "transcription": "This is a sample audio file for testing speech-to-text transcription with the Pulse API.",
4 "words": [
5 {"word": "This", "start": 0.32, "end": 0.4, "confidence": 0.9625},
6 {"word": "is", "start": 0.48, "end": 0.56, "confidence": 0.9344},
7 {"word": "a", "start": 0.64, "end": 0.72, "confidence": 0.9695}
8 ],
9 "language": "en",
10 "metadata": {
11 "duration": 5.6,
12 "processing_time_ms": 240.51,
13 "rtfx": 23.3,
14 "num_chunks": 1
15 },
16 "request_id": "87dd36c1-4267-472d-96ee-4113e0a770a6"
17}

Pulse

1{
2 "status": "success",
3 "transcription": "This is a sample audio file for testing speech to text transcription with the Pulse API.",
4 "words": [
5 {"start": 0.48, "end": 1.12, "word": "This"},
6 {"start": 1.12, "end": 1.28, "word": "is"}
7 ],
8 "utterances": [
9 {"start": 0.48, "end": 4.96, "text": "This is a sample audio file for testing speech to text transcription with the Pulse API."}
10 ],
11 "metadata": {
12 "duration": 5.6,
13 "fileSize": 268844
14 }
15}

Full runnable source files: Python | JavaScript | cURL

Next Steps