For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • API References
    • Authentication
    • Concurrency and Limits
    • WebSocket
  • Text to Speech
    • POSTSynthesize Speech
    • STREAMStream Speech (SSE)
    • WSSStream Speech (WebSocket)
    • POSTLightning v3.1 (endpoint will be deprecated)
    • POSTLightning v3.1 SSE (endpoint will be deprecated)
    • WSSLightning v3.1 WebSocket (endpoint will be deprecated)
    • POSTLightning v2 (Deprecated)
    • POSTLightning v2 SSE (Deprecated)
    • WSSLightning v2 WebSocket (Deprecated)
    • GETGet Voices
    • POSTCreate a Voice Clone
    • GETList Voice Clones
    • DELDelete a Voice Clone
    • POSTAdd Voice (Deprecated)
    • GETGet Cloned Voices (Deprecated)
    • GETGet Pronunciation Dictionaries
    • POSTCreate Pronunciation Dictionary
    • PUTUpdate Pronunciation Dictionary
    • DELDelete Pronunciation Dictionary
  • Speech to Text
    • POSTTranscribe (Pre-recorded)
    • WSSTranscribe (Realtime / WebSocket)
  • LLM (Chat Completions)
    • POSTElectron — Chat Completions
  • Speech to Speech
    • WSSHydra (Realtime / WebSocket)
LogoLogo
Voice AgentsModels
Voice AgentsModels
Speech to Text

Transcribe (Pre-recorded)

||View as Markdown|
POST
https://api.smallest.ai/waves/v1/stt/
POST
/waves/v1/stt/
1import requests
2
3url = "https://api.smallest.ai/waves/v1/stt/"
4
5querystring = {"model":"pulse-pro","language":"language"}
6
7headers = {
8 "Authorization": "Bearer <BearerAuth>",
9 "Content-Type": "application/octet-stream"
10}
11
12response = requests.post(url, headers=headers, params=querystring)
13
14print(response.json())
1{
2 "status": "success",
3 "transcription": "Hi, how are you doing? Could you help me reschedule my appointment?",
4 "words": [
5 {
6 "word": "Hi",
7 "start": 0.32,
8 "end": 0.4,
9 "confidence": 0.96
10 },
11 {
12 "word": "how",
13 "start": 0.48,
14 "end": 0.56,
15 "confidence": 0.93
16 }
17 ],
18 "language": "en",
19 "metadata": {
20 "duration": 5.6,
21 "processing_time_ms": 240.51,
22 "rtfx": 23.3,
23 "num_chunks": 1
24 },
25 "request_id": "87dd36c1-4267-472d-96ee-4113e0a770a6"
26}
Transcribe an audio file. The model is chosen via `?model=`: - `?model=pulse-pro`: English-only, leaderboard-ranked accuracy. Raw bytes only; pass `webhook_url` to receive transcription asynchronously on long files. - `?model=pulse`: multilingual transcription (38 languages), supports both raw bytes and audio-by-URL. ## When to use this Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WS /waves/v1/stt/live`) instead. Pulse Pro has no streaming worker today; calls to `WS /waves/v1/stt/live?model=pulse-pro` return `400` before the WebSocket upgrades. ## Input methods - **Raw bytes**: `Content-Type: application/octet-stream` with the audio in the body. All knobs are query parameters. - **URL (`?model=pulse` only)**: `Content-Type: application/json` with `{"url": "..."}` in the body. ## Examples **cURL**: Pulse Pro, sync ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` **cURL**: Pulse Pro, async via webhook ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` Returns `200 { "status": "processing", "request_id": "..." }` immediately. The webhook receives the full transcription when ready. **cURL**: Pulse, audio-by-URL ```bash curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}' ``` **Python** ```python import requests with open("./call.wav", "rb") as f: audio = f.read() r = requests.post( "https://api.smallest.ai/waves/v1/stt/", params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"}, headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"}, data=audio, ) r.raise_for_status() print(r.json()["transcription"]) ``` **JavaScript / TypeScript** ```typescript import { readFileSync } from "node:fs"; const audio = readFileSync("./call.wav"); const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" }); const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" }, body: audio, }); console.log((await res.json()).transcription); ``` ## Common gotchas - **`model` is required.** Missing or invalid values return `400` with an enum-validation error. - **Pulse Pro is English only.** Pass `language=en`. Other language codes are accepted at the wire level but produce unpredictable output. - **Pulse Pro does not support audio-by-URL.** Send raw bytes or use `?model=pulse` for the URL flow. - **Async (webhook) mode is Pulse Pro only.** Pulse runs sync only on this endpoint. - **Max payload 250 MB.** Larger requests return `413`. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.
Was this page helpful?
Previous

Delete Pronunciation Dictionary

Next

Transcribe (Realtime / WebSocket)

Built with

Transcribe an audio file. The model is chosen via ?model=:

  • ?model=pulse-pro: English-only, leaderboard-ranked accuracy. Raw bytes only; pass webhook_url to receive transcription asynchronously on long files.
  • ?model=pulse: multilingual transcription (38 languages), supports both raw bytes and audio-by-URL.

When to use this

Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (WS /waves/v1/stt/live) instead.

Pulse Pro has no streaming worker today; calls to WS /waves/v1/stt/live?model=pulse-pro return 400 before the WebSocket upgrades.

Input methods

  • Raw bytes: Content-Type: application/octet-stream with the audio in the body. All knobs are query parameters.
  • URL (?model=pulse only): Content-Type: application/json with {"url": "..."} in the body.

Examples

cURL: Pulse Pro, sync

$curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&word_timestamps=true" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/octet-stream" \
> --data-binary "@./call.wav"

cURL: Pulse Pro, async via webhook

$curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse-pro&language=en&webhook_url=https://your.app/cb" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/octet-stream" \
> --data-binary "@./call.wav"

Returns 200 { "status": "processing", "request_id": "..." } immediately. The webhook receives the full transcription when ready.

cURL: Pulse, audio-by-URL

$curl -X POST "https://api.smallest.ai/waves/v1/stt/?model=pulse&language=en" \
> -H "Authorization: Bearer $SMALLEST_API_KEY" \
> -H "Content-Type: application/json" \
> -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'

Python

1import requests
2
3with open("./call.wav", "rb") as f:
4 audio = f.read()
5
6r = requests.post(
7 "https://api.smallest.ai/waves/v1/stt/",
8 params={"model": "pulse-pro", "language": "en", "word_timestamps": "true"},
9 headers={"Authorization": f"Bearer {API_KEY}", "Content-Type": "application/octet-stream"},
10 data=audio,
11)
12r.raise_for_status()
13print(r.json()["transcription"])

JavaScript / TypeScript

1import { readFileSync } from "node:fs";
2
3const audio = readFileSync("./call.wav");
4const params = new URLSearchParams({ model: "pulse-pro", language: "en", word_timestamps: "true" });
5
6const res = await fetch(`https://api.smallest.ai/waves/v1/stt/?${params}`, {
7 method: "POST",
8 headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream" },
9 body: audio,
10});
11console.log((await res.json()).transcription);

Common gotchas

  • model is required. Missing or invalid values return 400 with an enum-validation error.
  • Pulse Pro is English only. Pass language=en. Other language codes are accepted at the wire level but produce unpredictable output.
  • Pulse Pro does not support audio-by-URL. Send raw bytes or use ?model=pulse for the URL flow.
  • Async (webhook) mode is Pulse Pro only. Pulse runs sync only on this endpoint.
  • Max payload 250 MB. Larger requests return 413. Compress to mono 16 kHz PCM if you are close to the limit; quality is unaffected.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Query parameters

modelenumRequired
Selects which ASR model handles the request. Required; missing or invalid values return `400`. - `pulse-pro`: English only, leaderboard-ranked accuracy, raw bytes only; supports async via `webhook_url`. - `pulse`: multilingual (38 languages), raw bytes OR URL.
Allowed values:
languagestringRequired

Language of the audio file. For Pulse Pro pass en. For Pulse, see the Pulse model card for the full 38-language list, plus the multi, multi-eu, multi-indic, and multi-asian aggregators.

word_timestampsbooleanOptionalDefaults to false

Include per-word timestamps in the response. On Pulse Pro this costs roughly one-third of throughput.

diarizebooleanOptionalDefaults to false

Multi-speaker identification; adds per-word and per-utterance speaker labels.

webhook_urlstringOptionalformat: "uri"

Pulse Pro only. If set, the response is 200 with {"status": "processing", "request_id": "..."} immediately, and the full transcription is delivered to this URL when ready. Use for long files where you do not want to hold an HTTP connection open.

webhook_methodenumOptionalDefaults to POST
HTTP method to use when calling the webhook. Pulse Pro only.
Allowed values:
webhook_extrastringOptional
Arbitrary metadata returned to the webhook in addition to the transcription payload. Pulse Pro only.

Request

This endpoint expects binary data of type application/octet-stream.

Response

Transcription succeeded. The response body has two shapes: - **Sync**: full `TranscriptionResponse` with `transcription`, `words`, `metadata`, etc. Returned when `webhook_url` is not set (all `?model=pulse` requests, and `?model=pulse-pro` requests without a webhook). - **Async**: `{ "status": "processing", "request_id": "..." }`. Returned when `?model=pulse-pro` is paired with `webhook_url`. The full `TranscriptionResponse` then arrives on the webhook when ready.
TranscriptionResponseobject
OR
AsyncAcceptedobject

Returned by Pulse Pro when webhook_url is set. The transcription arrives on the webhook when ready.

Errors

400
Bad Request Error
401
Unauthorized Error
403
Forbidden Error
413
Content Too Large Error
429
Too Many Requests Error
503
Service Unavailable Error

Selects which ASR model handles the request. Required; missing or invalid values return 400.

  • pulse-pro: English only, leaderboard-ranked accuracy, raw bytes only; supports async via webhook_url.
  • pulse: multilingual (38 languages), raw bytes OR URL.

Transcription succeeded. The response body has two shapes:

  • Sync: full TranscriptionResponse with transcription, words, metadata, etc. Returned when webhook_url is not set (all ?model=pulse requests, and ?model=pulse-pro requests without a webhook).
  • Async: { "status": "processing", "request_id": "..." }. Returned when ?model=pulse-pro is paired with webhook_url. The full TranscriptionResponse then arrives on the webhook when ready.