Transcribe (Pre-recorded) | Smallest AI Docs

Transcribe an audio file. The model is chosen via ?model=:

?model=pulse-pro: English-only, leaderboard-ranked accuracy. Raw bytes only; pass webhook_url to receive transcription asynchronously on long files.
?model=pulse: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL.

When to use this

Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (WS /waves/v1/stt/live) instead.

Pulse Pro has no streaming worker today; calls to WS /waves/v1/stt/live?model=pulse-pro return 400 before the WebSocket upgrades.

Input methods

Raw bytes: Content-Type: application/octet-stream with the audio in the body. All knobs are query parameters.
URL (?model=pulse only): Content-Type: application/json with {"url": "..."} in the body.

Examples

cURL: Pulse Pro, sync

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

cURL: Pulse Pro, async via webhook

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

Returns 200 { "status": "processing", "request_id": "..." } immediately. The webhook receives the full transcription when ready.

cURL: Pulse, audio-by-URL

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'

Python

1 import requests
2 
3 with open("./call.wav", "rb") as f:
4     audio = f.read()
5 
6 r = requests.post(
7     "https://api.smallest.ai/waves/v1/stt/",
8     params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
9     headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
10     data=audio,
11 )
12 r.raise_for_status()
13 print(r.json()["transcription"])

JavaScript / TypeScript

1 import { readFileSync } from "node:fs";
2 
3 const audio = readFileSync("./call.wav");
4 const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
5 
6 const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
7   method: "POST",
8   headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
9   body: audio,
10 });
11 console.log((await res.json()).transcription);

Common gotchas

model is required. Missing or invalid values return 400 with an enum-validation error.
Pulse Pro is English only. Pass language=en. Other language codes are accepted at the wire level but produce unpredictable output.
Pulse Pro does not support audio-by-URL. Send raw bytes or use ?model=pulse for the URL flow.
Async (webhook) mode is Pulse Pro only. Pulse runs sync only on this endpoint.
Max payload 250 MB. Larger requests return 413. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Transcribe an audio file. The model is chosen via `?model=`: - `?model=pulse-pro`: English-only, leaderboard-ranked accuracy. Raw bytes only; pass `webhook_url` to receive transcription asynchronously on long files. - `?model=pulse`: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL. ## When to use this Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WS /waves/v1/stt/live`) instead. Pulse Pro has no streaming worker today; calls to `WS /waves/v1/stt/live?model=pulse-pro` return `400` before the WebSocket upgrades. ## Input methods - **Raw bytes**: `Content-Type: application/octet-stream` with the audio in the body. All knobs are query parameters. - **URL (`?model=pulse` only)**: `Content-Type: application/json` with `{"url": "..."}` in the body. ## Examples **cURL**: Pulse Pro, sync ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` **cURL**: Pulse Pro, async via webhook ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` Returns `200 { "status": "processing", "request_id": "..." }` immediately. The webhook receives the full transcription when ready. **cURL**: Pulse, audio-by-URL ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}' ``` **Python** ```python import requests with open("./call.wav", "rb") as f: audio = f.read() r = requests.post( "https://api.smallest.ai/waves/v1/stt/", params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"}, headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"}, data=audio, ) r.raise_for_status() print(r.json()["transcription"]) ``` **JavaScript / TypeScript** ```typescript import { readFileSync } from "node:fs"; const audio = readFileSync("./call.wav"); const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" }); const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" }, body: audio, }); console.log((await res.json()).transcription); ``` ## Common gotchas - **`model` is required.** Missing or invalid values return `400` with an enum-validation error. - **Pulse Pro is English only.** Pass `language=en`. Other language codes are accepted at the wire level but produce unpredictable output. - **Pulse Pro does not support audio-by-URL.** Send raw bytes or use `?model=pulse` for the URL flow. - **Async (webhook) mode is Pulse Pro only.** Pulse runs sync only on this endpoint. - **Max payload 250 MB.** Larger requests return `413`. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Query parameters

modelenumRequired

Selects which ASR model handles the request. Required; missing or invalid values return 400.

pulse-pro: English only, leaderboard-ranked accuracy, raw bytes only; supports async via webhook_url.
pulse: multilingual (39 languages), raw bytes OR URL.

Selects which ASR model handles the request. Required; missing or invalid values return `400`. - `pulse-pro`: English only, leaderboard-ranked accuracy, raw bytes only; supports async via `webhook_url`. - `pulse`: multilingual (39 languages), raw bytes OR URL.

Allowed values:

languageenumRequired

Language of the audio file. This endpoint is Pre-Recorded (HTTP) — for streaming, switch to WSS /waves/v1/stt/live (different supported language set).

26 single-language codes: en, hi, de, es, ru, it, fr, nl, pt, uk, pl, cs, sk, lv, et, ro, fi, sv, bg, hu, da, lt, mt, zh, ja, ko.

Regional auto-detect aggregators for unknown audio:

multi-eu — auto-detects across all 21 European codes plus en.
multi-asian — auto-detects across zh, ko, ja, en.
Pulse Pro: pass en.
Pulse: pass any of the single-language codes above, or use the multi-eu / multi-asian aggregator for unknown audio. See the Pulse model card for the full table with language names.

Language of the audio file. This endpoint is **Pre-Recorded (HTTP)** — for streaming, switch to `WSS /waves/v1/stt/live` (different supported language set). **26 single-language codes:** `en`, `hi`, `de`, `es`, `ru`, `it`, `fr`, `nl`, `pt`, `uk`, `pl`, `cs`, `sk`, `lv`, `et`, `ro`, `fi`, `sv`, `bg`, `hu`, `da`, `lt`, `mt`, `zh`, `ja`, `ko`. **Regional auto-detect aggregators** for unknown audio: - `multi-eu` — auto-detects across all 21 European codes plus `en`. - `multi-asian` — auto-detects across `zh`, `ko`, `ja`, `en`. - **Pulse Pro**: pass `en`. - **Pulse**: pass any of the single-language codes above, or use the `multi-eu` / `multi-asian` aggregator for unknown audio. See the [Pulse model card](/waves/model-cards/speech-to-text/pulse) for the full table with language names.

word_timestampsbooleanOptionalDefaults to false

Include the per-word words[] array in the response — each entry carries the recognized word, its start/end timestamps, and a per-word confidence score (0.0–1.0). With diarize=true, entries also include speaker. On Pulse Pro this costs roughly one-third of throughput.

Include the per-word `words[]` array in the response — each entry carries the recognized `word`, its `start`/`end` timestamps, and a per-word `confidence` score (0.0–1.0). With `diarize=true`, entries also include `speaker`. On Pulse Pro this costs roughly one-third of throughput.

diarizebooleanOptionalDefaults to false

Multi-speaker identification; adds per-word and per-utterance speaker labels.

webhook_urlstringOptionalformat: "uri"

Pulse Pro only. If set, the response is 200 with {"status": "processing", "request_id": "..."} immediately, and the full transcription is delivered to this URL when ready. Use for long files where you do not want to hold an HTTP connection open.

webhook_methodenumOptionalDefaults to POST

HTTP method to use when calling the webhook. Pulse Pro only.

Allowed values:

webhook_extrastringOptional

Arbitrary metadata returned to the webhook in addition to the transcription payload. Pulse Pro only.

redact_piienumOptionalDefaults to false

Redact personally identifiable information from the transcript. Names → [FIRSTNAME_*] / [LASTNAME_*], phone numbers → [PHONENUMBER_*], addresses → [ADDRESS_*], etc. The redaction tokens use sequential indices so multiple occurrences of the same entity get distinct labels ([FIRSTNAME_1], [FIRSTNAME_2]).

Redact personally identifiable information from the transcript. Names → `[FIRSTNAME_*]` / `[LASTNAME_*]`, phone numbers → `[PHONENUMBER_*]`, addresses → `[ADDRESS_*]`, etc. The redaction tokens use sequential indices so multiple occurrences of the same entity get distinct labels (`[FIRSTNAME_1]`, `[FIRSTNAME_2]`).

Allowed values:

redact_pcienumOptionalDefaults to false

Redact payment card information (credit-card numbers, CVV, account numbers, etc.). Replaces matches with [ACCOUNTNUMBER_*] tokens. Use alongside redact_pii=true for full PCI-compliant transcript handling.

Allowed values:

emotion_detectionenumOptionalDefaults to false

When true, the response adds an emotions object mapping detected emotion labels to confidence scores. Useful for voice-of-customer analytics on call recordings.

Allowed values:

gender_detectionenumOptionalDefaults to false

When true, the response adds a gender field with the detected speaker gender label. Pulse pre-recorded only.

Allowed values:

Request

This endpoint expects binary data of type application/octet-stream.

Response

Transcription succeeded. The response body has two shapes:

Sync: full TranscriptionResponse with transcription, words, metadata, etc. Returned when webhook_url is not set (all ?model=pulse requests, and ?model=pulse-pro requests without a webhook).
Async: { "status": "processing", "request_id": "..." }. Returned when ?model=pulse-pro is paired with webhook_url. The full TranscriptionResponse then arrives on the webhook when ready.

Transcription succeeded. The response body has two shapes: - **Sync**: full `TranscriptionResponse` with `transcription`, `words`, `metadata`, etc. Returned when `webhook_url` is not set (all `?model=pulse` requests, and `?model=pulse-pro` requests without a webhook). - **Async**: `{ "status": "processing", "request_id": "..." }`. Returned when `?model=pulse-pro` is paired with `webhook_url`. The full `TranscriptionResponse` then arrives on the webhook when ready.

TranscriptionResponseobject

AsyncAcceptedobject

Returned by Pulse Pro when webhook_url is set. The transcription arrives on the webhook when ready.

Errors

400

Bad Request Error

401

Unauthorized Error

403

Forbidden Error

413

Content Too Large Error

429

Too Many Requests Error

503

Service Unavailable Error

Transcribe an audio file. The model is chosen via ?model=:

?model=pulse-pro: English-only, leaderboard-ranked accuracy. Raw bytes only; pass webhook_url to receive transcription asynchronously on long files.
?model=pulse: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL.

When to use this

Pulse Pro has no streaming worker today; calls to WS /waves/v1/stt/live?model=pulse-pro return 400 before the WebSocket upgrades.

Input methods

Raw bytes: Content-Type: application/octet-stream with the audio in the body. All knobs are query parameters.
URL (?model=pulse only): Content-Type: application/json with {"url": "..."} in the body.

Examples

cURL: Pulse Pro, sync

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

cURL: Pulse Pro, async via webhook

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

Returns 200 { "status": "processing", "request_id": "..." } immediately. The webhook receives the full transcription when ready.

cURL: Pulse, audio-by-URL

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'

Python

1 import requests
2 
3 with open("./call.wav", "rb") as f:
4     audio = f.read()
5 
6 r = requests.post(
7     "https://api.smallest.ai/waves/v1/stt/",
8     params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
9     headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
10     data=audio,
11 )
12 r.raise_for_status()
13 print(r.json()["transcription"])

JavaScript / TypeScript

1 import { readFileSync } from "node:fs";
2 
3 const audio = readFileSync("./call.wav");
4 const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
5 
6 const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
7   method: "POST",
8   headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
9   body: audio,
10 });
11 console.log((await res.json()).transcription);

Common gotchas

model is required. Missing or invalid values return 400 with an enum-validation error.
Pulse Pro is English only. Pass language=en. Other language codes are accepted at the wire level but produce unpredictable output.
Pulse Pro does not support audio-by-URL. Send raw bytes or use ?model=pulse for the URL flow.
Async (webhook) mode is Pulse Pro only. Pulse runs sync only on this endpoint.
Max payload 250 MB. Larger requests return 413. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

1	import requests
2
3	url = "https://api.smallest.ai/waves/v1/stt/"
4
5	querystring = {"model":"pulse-pro","language":"en"}
6
7	headers = {
8	"Authorization": "Bearer <BearerAuth>",
9	"Content-Type": "application/octet-stream"
10	}
11
12	response = requests.post(url, headers=headers, params=querystring)
13
14	print(response.json())

1	{
2	"status": "success",
3	"transcription": "Hi, how are you doing? Could you help me reschedule my appointment?",
4	"words": [
5	{
6	"word": "Hi",
7	"start": 0.32,
8	"end": 0.4,
9	"confidence": 0.96
10	},
11	{
12	"word": "how",
13	"start": 0.48,
14	"end": 0.56,
15	"confidence": 0.93
16	}
17	],
18	"language": "en",
19	"metadata": {
20	"duration": 5.6,
21	"processing_time_ms": 240.51,
22	"rtfx": 23.3,
23	"num_chunks": 1
24	},
25	"request_id": "87dd36c1-4267-472d-96ee-4113e0a770a6"
26	}

$	curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
>	-H "Authorization: Bearer $SMALLEST_API_KEY" \
>	-H "Content-Type: application/octet-stream" \
>	--data-binary "@./call.wav"

$	curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
>	-H "Authorization: Bearer $SMALLEST_API_KEY" \
>	-H "Content-Type: application/json" \
>	-d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'

1	import requests
2
3	with open("./call.wav", "rb") as f:
4	audio = f.read()
5
6	r = requests.post(
7	"https://api.smallest.ai/waves/v1/stt/",
8	params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
9	headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
10	data=audio,
11	)
12	r.raise_for_status()
13	print(r.json()["transcription"])

1	import { readFileSync } from "node:fs";
2
3	const audio = readFileSync("./call.wav");
4	const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
5
6	const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
7	method: "POST",
8	headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
9	body: audio,
10	});
11	console.log((await res.json()).transcription);