Transcribe (Pre-recorded) | Smallest AI Docs

Transcribe an audio file. The model is chosen via `?model=`: - `?model=pulse-pro`: English-only, leaderboard-ranked accuracy. Raw bytes only; pass `webhook_url` to receive transcription asynchronously on long files. - `?model=pulse`: multilingual transcription (38 languages), supports both raw bytes and audio-by-URL. ## When to use this Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WS /waves/v1/stt/live`) instead. Pulse Pro has no streaming worker today; calls to `WS /waves/v1/stt/live?model=pulse-pro` return `400` before the WebSocket upgrades. ## Input methods - **Raw bytes**: `Content-Type: application/octet-stream` with the audio in the body. All knobs are query parameters. - **URL (`?model=pulse` only)**: `Content-Type: application/json` with `{"url": "..."}` in the body. ## Examples **cURL**: Pulse Pro, sync ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` **cURL**: Pulse Pro, async via webhook ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` Returns `200 { "status": "processing", "request_id": "..." }` immediately. The webhook receives the full transcription when ready. **cURL**: Pulse, audio-by-URL ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}' ``` **Python** ```python import requests with open("./call.wav", "rb") as f: audio = f.read() r = requests.post( "https://api.smallest.ai/waves/v1/stt/", params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"}, headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"}, data=audio, ) r.raise_for_status() print(r.json()["transcription"]) ``` **JavaScript / TypeScript** ```typescript import { readFileSync } from "node:fs"; const audio = readFileSync("./call.wav"); const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" }); const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" }, body: audio, }); console.log((await res.json()).transcription); ``` ## Common gotchas - **`model` is required.** Missing or invalid values return `400` with an enum-validation error. - **Pulse Pro is English only.** Pass `language=en`. Other language codes are accepted at the wire level but produce unpredictable output. - **Pulse Pro does not support audio-by-URL.** Send raw bytes or use `?model=pulse` for the URL flow. - **Async (webhook) mode is Pulse Pro only.** Pulse runs sync only on this endpoint. - **Max payload 250 MB.** Larger requests return `413`. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Transcribe an audio file. The model is chosen via ?model=:

?model=pulse-pro: English-only, leaderboard-ranked accuracy. Raw bytes only; pass webhook_url to receive transcription asynchronously on long files.
?model=pulse: multilingual transcription (38 languages), supports both raw bytes and audio-by-URL.

When to use this

Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (WS /waves/v1/stt/live) instead.

Pulse Pro has no streaming worker today; calls to WS /waves/v1/stt/live?model=pulse-pro return 400 before the WebSocket upgrades.

Input methods

Raw bytes: Content-Type: application/octet-stream with the audio in the body. All knobs are query parameters.
URL (?model=pulse only): Content-Type: application/json with {"url": "..."} in the body.

Examples

cURL: Pulse Pro, sync

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

cURL: Pulse Pro, async via webhook

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/octet-stream" \
>   --data-binary "@./call.wav"

Returns 200 { "status": "processing", "request_id": "..." } immediately. The webhook receives the full transcription when ready.

cURL: Pulse, audio-by-URL

$ curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY" \
>   -H "Content-Type: application/json" \
>   -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'

Python

1 import requests
2 
3 with open("./call.wav", "rb") as f:
4     audio = f.read()
5 
6 r = requests.post(
7     "https://api.smallest.ai/waves/v1/stt/",
8     params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
9     headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
10     data=audio,
11 )
12 r.raise_for_status()
13 print(r.json()["transcription"])

JavaScript / TypeScript

1 import { readFileSync } from "node:fs";
2 
3 const audio = readFileSync("./call.wav");
4 const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
5 
6 const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
7   method: "POST",
8   headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
9   body: audio,
10 });
11 console.log((await res.json()).transcription);

Common gotchas

model is required. Missing or invalid values return 400 with an enum-validation error.
Pulse Pro is English only. Pass language=en. Other language codes are accepted at the wire level but produce unpredictable output.
Pulse Pro does not support audio-by-URL. Send raw bytes or use ?model=pulse for the URL flow.
Async (webhook) mode is Pulse Pro only. Pulse runs sync only on this endpoint.
Max payload 250 MB. Larger requests return 413. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Query parameters

modelenumRequired

Selects which ASR model handles the request. Required; missing or invalid values return `400`. - `pulse-pro`: English only, leaderboard-ranked accuracy, raw bytes only; supports async via `webhook_url`. - `pulse`: multilingual (38 languages), raw bytes OR URL.

Allowed values:

languagestringRequired

Language of the audio file. For Pulse Pro pass en. For Pulse, see the Pulse model card for the full 38-language list, plus the multi, multi-eu, multi-indic, and multi-asian aggregators.

word_timestampsbooleanOptionalDefaults to false

Include per-word timestamps in the response. On Pulse Pro this costs roughly one-third of throughput.

diarizebooleanOptionalDefaults to false

Multi-speaker identification; adds per-word and per-utterance speaker labels.

webhook_urlstringOptionalformat: "uri"

Pulse Pro only. If set, the response is 200 with {"status": "processing", "request_id": "..."} immediately, and the full transcription is delivered to this URL when ready. Use for long files where you do not want to hold an HTTP connection open.

webhook_methodenumOptionalDefaults to POST

HTTP method to use when calling the webhook. Pulse Pro only.

Allowed values:

webhook_extrastringOptional

Arbitrary metadata returned to the webhook in addition to the transcription payload. Pulse Pro only.

Request

This endpoint expects binary data of type application/octet-stream.

Response

Transcription succeeded. The response body has two shapes: - **Sync**: full `TranscriptionResponse` with `transcription`, `words`, `metadata`, etc. Returned when `webhook_url` is not set (all `?model=pulse` requests, and `?model=pulse-pro` requests without a webhook). - **Async**: `{ "status": "processing", "request_id": "..." }`. Returned when `?model=pulse-pro` is paired with `webhook_url`. The full `TranscriptionResponse` then arrives on the webhook when ready.

TranscriptionResponseobject

AsyncAcceptedobject

Returned by Pulse Pro when webhook_url is set. The transcription arrives on the webhook when ready.

Errors

400

Bad Request Error

401

Unauthorized Error

403

Forbidden Error

413

Content Too Large Error

429

Too Many Requests Error

503

Service Unavailable Error