For AI agents: a documentation index is available at the root level at /llms.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Transcribe an audio file. The model is chosen via `?model=`:
- `?model=pulse-pro`: English-only, leaderboard-ranked accuracy. Raw bytes only; pass `webhook_url` to receive transcription asynchronously on long files.
- `?model=pulse`: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL.
## When to use this
Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WS /waves/v1/stt/live`) instead.
Pulse Pro has no streaming worker today; calls to `WS /waves/v1/stt/live?model=pulse-pro` return `400` before the WebSocket upgrades.
## Input methods
- **Raw bytes**: `Content-Type: application/octet-stream` with the audio in the body. All knobs are query parameters.
- **URL (`?model=pulse` only)**: `Content-Type: application/json` with `{"url": "..."}` in the body.
## Examples
**cURL**: Pulse Pro, sync
```bash
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/octet-stream" \
--data-binary "@./call.wav"
```
**cURL**: Pulse Pro, async via webhook
```bash
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/octet-stream" \
--data-binary "@./call.wav"
```
Returns `200 { "status": "processing", "request_id": "..." }` immediately. The webhook receives the full transcription when ready.
**cURL**: Pulse, audio-by-URL
```bash
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'
```
**Python**
```python
import requests
with open("./call.wav", "rb") as f:
audio = f.read()
r = requests.post(
"https://api.smallest.ai/waves/v1/stt/",
params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
data=audio,
)
r.raise_for_status()
print(r.json()["transcription"])
```
**JavaScript / TypeScript**
```typescript
import { readFileSync } from "node:fs";
const audio = readFileSync("./call.wav");
const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
method: "POST",
headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
body: audio,
});
console.log((await res.json()).transcription);
```
## Common gotchas
- **`model` is required.** Missing or invalid values return `400` with an enum-validation error.
- **Pulse Pro is English only.** Pass `language=en`. Other language codes are accepted at the wire level but produce unpredictable output.
- **Pulse Pro does not support audio-by-URL.** Send raw bytes or use `?model=pulse` for the URL flow.
- **Async (webhook) mode is Pulse Pro only.** Pulse runs sync only on this endpoint.
- **Max payload 250 MB.** Larger requests return `413`. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.
Authentication
AuthorizationBearer
Header authentication of the form Bearer <token>
Query parameters
modelenumRequired
Selects which ASR model handles the request. Required; missing or invalid values return `400`.
- `pulse-pro`: English only, leaderboard-ranked accuracy, raw bytes only; supports async via `webhook_url`.
- `pulse`: multilingual (39 languages), raw bytes OR URL.
Allowed values:
languageenumRequired
Language of the audio file. This endpoint is **Pre-Recorded (HTTP)** — for streaming, switch to `WSS /waves/v1/stt/live` (different supported language set).
**26 single-language codes:** `en`, `hi`, `de`, `es`, `ru`, `it`, `fr`, `nl`, `pt`, `uk`, `pl`, `cs`, `sk`, `lv`, `et`, `ro`, `fi`, `sv`, `bg`, `hu`, `da`, `lt`, `mt`, `zh`, `ja`, `ko`.
**Regional auto-detect aggregators** for unknown audio:
- `multi-eu` — auto-detects across all 21 European codes plus `en`.
- `multi-asian` — auto-detects across `zh`, `ko`, `ja`, `en`.
- **Pulse Pro**: pass `en`.
- **Pulse**: pass any of the single-language codes above, or use the `multi-eu` / `multi-asian` aggregator for unknown audio. See the [Pulse model card](/waves/model-cards/speech-to-text/pulse) for the full table with language names.
word_timestampsbooleanOptionalDefaults to false
Include per-word timestamps in the response. On Pulse Pro this costs roughly one-third of throughput.
diarizebooleanOptionalDefaults to false
Multi-speaker identification; adds per-word and per-utterance speaker labels.
webhook_urlstringOptionalformat: "uri"
Pulse Pro only. If set, the response is 200 with {"status": "processing", "request_id": "..."} immediately, and the full transcription is delivered to this URL when ready. Use for long files where you do not want to hold an HTTP connection open.
webhook_methodenumOptionalDefaults to POST
HTTP method to use when calling the webhook. Pulse Pro only.
Allowed values:
webhook_extrastringOptional
Arbitrary metadata returned to the webhook in addition to the transcription payload. Pulse Pro only.
redact_piienumOptionalDefaults to false
Redact personally identifiable information from the transcript.
Names → `[FIRSTNAME_*]` / `[LASTNAME_*]`, phone numbers →
`[PHONENUMBER_*]`, addresses → `[ADDRESS_*]`, etc. The redaction
tokens use sequential indices so multiple occurrences of the same
entity get distinct labels (`[FIRSTNAME_1]`, `[FIRSTNAME_2]`).
Allowed values:
redact_pcienumOptionalDefaults to false
Redact payment card information (credit-card numbers, CVV, account
numbers, etc.). Replaces matches with [ACCOUNTNUMBER_*] tokens.
Use alongside redact_pii=true for full PCI-compliant transcript
handling.
Allowed values:
emotion_detectionenumOptionalDefaults to false
When true, the response adds an emotions object mapping detected
emotion labels to confidence scores. Useful for voice-of-customer
analytics on call recordings.
Allowed values:
gender_detectionenumOptionalDefaults to false
When true, the response adds a gender field with the detected
speaker gender label. Pulse pre-recorded only.
Allowed values:
Request
This endpoint expects binary data of type application/octet-stream.
Response
Transcription succeeded. The response body has two shapes:
- **Sync**: full `TranscriptionResponse` with `transcription`, `words`, `metadata`, etc. Returned when `webhook_url` is not set (all `?model=pulse` requests, and `?model=pulse-pro` requests without a webhook).
- **Async**: `{ "status": "processing", "request_id": "..." }`. Returned when `?model=pulse-pro` is paired with `webhook_url`. The full `TranscriptionResponse` then arrives on the webhook when ready.
TranscriptionResponseobject
OR
AsyncAcceptedobject
Returned by Pulse Pro when webhook_url is set. The transcription arrives on the webhook when ready.
Errors
400
Bad Request Error
401
Unauthorized Error
403
Forbidden Error
413
Content Too Large Error
429
Too Many Requests Error
503
Service Unavailable Error
Transcribe an audio file. The model is chosen via ?model=:
?model=pulse-pro: English-only, leaderboard-ranked accuracy. Raw bytes only; pass webhook_url to receive transcription asynchronously on long files.
?model=pulse: multilingual transcription (17 streaming + 26 pre-recorded languages), supports both raw bytes and audio-by-URL.
When to use this
Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (WS /waves/v1/stt/live) instead.
Pulse Pro has no streaming worker today; calls to WS /waves/v1/stt/live?model=pulse-pro return 400 before the WebSocket upgrades.
Input methods
Raw bytes: Content-Type: application/octet-stream with the audio in the body. All knobs are query parameters.
URL (?model=pulse only): Content-Type: application/json with {"url": "..."} in the body.
Examples
cURL: Pulse Pro, sync
$
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
>
-H "Authorization: Bearer $SMALLEST_API_KEY" \
>
-H "Content-Type: application/octet-stream" \
>
--data-binary "@./call.wav"
cURL: Pulse Pro, async via webhook
$
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
>
-H "Authorization: Bearer $SMALLEST_API_KEY" \
>
-H "Content-Type: application/octet-stream" \
>
--data-binary "@./call.wav"
Returns 200 { "status": "processing", "request_id": "..." } immediately. The webhook receives the full transcription when ready.
cURL: Pulse, audio-by-URL
$
curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
model is required. Missing or invalid values return 400 with an enum-validation error.
Pulse Pro is English only. Pass language=en. Other language codes are accepted at the wire level but produce unpredictable output.
Pulse Pro does not support audio-by-URL. Send raw bytes or use ?model=pulse for the URL flow.
Async (webhook) mode is Pulse Pro only. Pulse runs sync only on this endpoint.
Max payload 250 MB. Larger requests return 413. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.
Selects which ASR model handles the request. Required; missing or invalid values return 400.
pulse-pro: English only, leaderboard-ranked accuracy, raw bytes only; supports async via webhook_url.
pulse: multilingual (39 languages), raw bytes OR URL.
Language of the audio file. This endpoint is Pre-Recorded (HTTP) — for streaming, switch to WSS /waves/v1/stt/live (different supported language set).
26 single-language codes:en, hi, de, es, ru, it, fr, nl, pt, uk, pl, cs, sk, lv, et, ro, fi, sv, bg, hu, da, lt, mt, zh, ja, ko.
Regional auto-detect aggregators for unknown audio:
multi-eu — auto-detects across all 21 European codes plus en.
multi-asian — auto-detects across zh, ko, ja, en.
Pulse Pro: pass en.
Pulse: pass any of the single-language codes above, or use the multi-eu / multi-asian aggregator for unknown audio. See the Pulse model card for the full table with language names.
Redact personally identifiable information from the transcript.
Names → [FIRSTNAME_*] / [LASTNAME_*], phone numbers →
[PHONENUMBER_*], addresses → [ADDRESS_*], etc. The redaction
tokens use sequential indices so multiple occurrences of the same
entity get distinct labels ([FIRSTNAME_1], [FIRSTNAME_2]).
Transcription succeeded. The response body has two shapes:
Sync: full TranscriptionResponse with transcription, words, metadata, etc. Returned when webhook_url is not set (all ?model=pulse requests, and ?model=pulse-pro requests without a webhook).
Async: { "status": "processing", "request_id": "..." }. Returned when ?model=pulse-pro is paired with webhook_url. The full TranscriptionResponse then arrives on the webhook when ready.