Transcribe (Pre-recorded)

View as Markdown
Transcribe an audio file. The model is chosen via `?model=`: - `?model=pulse-pro`: English-only, leaderboard-ranked accuracy. Raw bytes only; pass `webhook_url` to receive transcription asynchronously on long files. - `?model=pulse`: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL. ## When to use this Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WS /waves/v1/stt/live`) instead. Pulse Pro has no streaming worker today; calls to `WS /waves/v1/stt/live?model=pulse-pro` return `400` before the WebSocket upgrades. ## Input methods - **Raw bytes**: `Content-Type: application/octet-stream` with the audio in the body. All knobs are query parameters. - **URL (`?model=pulse` only)**: `Content-Type: application/json` with `{"url": "..."}` in the body. ## Examples **cURL**: Pulse Pro, sync ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` **cURL**: Pulse Pro, async via webhook ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` Returns `200 { "status": "processing", "request_id": "..." }` immediately. The webhook receives the full transcription when ready. **cURL**: Pulse, audio-by-URL ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}' ``` **Python** ```python import requests with open("./call.wav", "rb") as f: audio = f.read() r = requests.post( "https://api.smallest.ai/waves/v1/stt/", params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"}, headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"}, data=audio, ) r.raise_for_status() print(r.json()["transcription"]) ``` **JavaScript / TypeScript** ```typescript import { readFileSync } from "node:fs"; const audio = readFileSync("./call.wav"); const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" }); const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" }, body: audio, }); console.log((await res.json()).transcription); ``` ## Common gotchas - **`model` is required.** Missing or invalid values return `400` with an enum-validation error. - **Pulse Pro is English only.** Pass `language=en`. Other language codes are accepted at the wire level but produce unpredictable output. - **Pulse Pro does not support audio-by-URL.** Send raw bytes or use `?model=pulse` for the URL flow. - **Async (webhook) mode is Pulse Pro only.** Pulse runs sync only on this endpoint. - **Max payload 250 MB.** Larger requests return `413`. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Query parameters

modelenumRequired
Selects which ASR model handles the request. Required; missing or invalid values return `400`. - `pulse-pro`: English only, leaderboard-ranked accuracy, raw bytes only; supports async via `webhook_url`. - `pulse`: multilingual (39 languages), raw bytes OR URL.
Allowed values:
languageenumRequired
Language of the audio file. This endpoint is **Pre-Recorded (HTTP)** — for streaming, switch to `WSS /waves/v1/stt/live` (different supported language set). **26 single-language codes:** `en`, `hi`, `de`, `es`, `ru`, `it`, `fr`, `nl`, `pt`, `uk`, `pl`, `cs`, `sk`, `lv`, `et`, `ro`, `fi`, `sv`, `bg`, `hu`, `da`, `lt`, `mt`, `zh`, `ja`, `ko`. **Regional auto-detect aggregators** for unknown audio: - `multi-eu` — auto-detects across all 21 European codes plus `en`. - `multi-asian` — auto-detects across `zh`, `ko`, `ja`, `en`. - **Pulse Pro**: pass `en`. - **Pulse**: pass any of the single-language codes above, or use the `multi-eu` / `multi-asian` aggregator for unknown audio. See the [Pulse model card](/waves/model-cards/speech-to-text/pulse) for the full table with language names.
word_timestampsbooleanOptionalDefaults to false

Include per-word timestamps in the response. On Pulse Pro this costs roughly one-third of throughput.

diarizebooleanOptionalDefaults to false

Multi-speaker identification; adds per-word and per-utterance speaker labels.

webhook_urlstringOptionalformat: "uri"

Pulse Pro only. If set, the response is 200 with {"status": "processing", "request_id": "..."} immediately, and the full transcription is delivered to this URL when ready. Use for long files where you do not want to hold an HTTP connection open.

webhook_methodenumOptionalDefaults to POST
HTTP method to use when calling the webhook. Pulse Pro only.
Allowed values:
webhook_extrastringOptional
Arbitrary metadata returned to the webhook in addition to the transcription payload. Pulse Pro only.
redact_piienumOptionalDefaults to false
Redact personally identifiable information from the transcript. Names → `[FIRSTNAME_*]` / `[LASTNAME_*]`, phone numbers → `[PHONENUMBER_*]`, addresses → `[ADDRESS_*]`, etc. The redaction tokens use sequential indices so multiple occurrences of the same entity get distinct labels (`[FIRSTNAME_1]`, `[FIRSTNAME_2]`).
Allowed values:
redact_pcienumOptionalDefaults to false

Redact payment card information (credit-card numbers, CVV, account numbers, etc.). Replaces matches with [ACCOUNTNUMBER_*] tokens. Use alongside redact_pii=true for full PCI-compliant transcript handling.

Allowed values:
emotion_detectionenumOptionalDefaults to false

When true, the response adds an emotions object mapping detected emotion labels to confidence scores. Useful for voice-of-customer analytics on call recordings.

Allowed values:
gender_detectionenumOptionalDefaults to false

When true, the response adds a gender field with the detected speaker gender label. Pulse pre-recorded only.

Allowed values:

Request

This endpoint expects binary data of type application/octet-stream.

Response

Transcription succeeded. The response body has two shapes: - **Sync**: full `TranscriptionResponse` with `transcription`, `words`, `metadata`, etc. Returned when `webhook_url` is not set (all `?model=pulse` requests, and `?model=pulse-pro` requests without a webhook). - **Async**: `{ "status": "processing", "request_id": "..." }`. Returned when `?model=pulse-pro` is paired with `webhook_url`. The full `TranscriptionResponse` then arrives on the webhook when ready.
TranscriptionResponseobject
OR
AsyncAcceptedobject

Returned by Pulse Pro when webhook_url is set. The transcription arrives on the webhook when ready.

Errors

400
Bad Request Error
401
Unauthorized Error
403
Forbidden Error
413
Content Too Large Error
429
Too Many Requests Error
503
Service Unavailable Error