For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Transcribe an audio file to text using the Pulse model. The fastest way to get a transcript when you already have a recording — pass either the raw bytes or a URL.
## When to use this
Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WSS /waves/v1/pulse/get_text`) instead.
## Input methods
Send the audio in one of two ways:
1. **Raw bytes** — `Content-Type: application/octet-stream` with the audio in the body. All knobs (`language`, `word_timestamps`, etc.) are query parameters.
2. **URL** — `Content-Type: application/json` with `{"url": "..."}` in the body. Useful when the audio already lives in object storage. Same query parameters apply.
Pulse autodetects the language across 30+ supported locales. Pass `language` explicitly when you already know it — detection is fast but skipping it is faster.
## Examples
**cURL** (raw bytes)
```bash
curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en&word_timestamps=true" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/octet-stream" \
--data-binary "@./call.wav"
```
**cURL** (URL)
```bash
curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en" \
-H "Authorization: Bearer $SMALLEST_API_KEY" \
-H "Content-Type: application/json" \
-d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}'
```
**Python** (`pip install smallestai>=4.4.0`)
```python
from smallestai import SmallestAI
client = SmallestAI(token="YOUR_API_KEY")
with open("./call.wav", "rb") as f:
result = client.waves.transcribe_pulse(
request=f.read(),
language="en",
word_timestamps=True,
diarize=True,
)
print(result.status) # "success"
print(result.transcription) # the transcript string
```
**JavaScript / TypeScript** (using `fetch`)
```typescript
import { readFileSync } from "node:fs";
const audio = readFileSync("./call.wav");
const params = new URLSearchParams({ language: "en", word_timestamps: "true", diarize: "true" });
const res = await fetch(`https://api.smallest.ai/waves/v1/pulse/get_text?${params}`, {
method: "POST",
headers: {
Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`,
"Content-Type": "application/octet-stream",
},
body: audio,
});
const result = await res.json();
console.log(result.transcription);
```
## Common gotchas
- **Max file size is 25 MB.** Larger files return HTTP `413`. Compress to mono 16 kHz PCM if you're close to the limit; quality is unaffected.
- **Formatting flags (`format`, `punctuate`, `capitalize`)** are accepted at the wire level and exposed in the Python SDK as of `smallestai>=4.4.0`. Today they currently return the same transcript regardless of value — pass them in your integration so it works as the behavior changes.
- **Webhook-driven flow**: pass `webhook_url` to receive the transcript asynchronously. The endpoint returns immediately; the transcript hits your webhook when ready. Useful for long files where you don't want to hold an HTTP connection open.
- **Speaker diarization** (`diarize=true`) adds latency. Skip it if you only need the words.
- **JavaScript / TypeScript**: the official `smallestai` npm package predates the Pulse model, so call this endpoint with `fetch` or `axios` as shown above.
Authentication
AuthorizationBearer
Header authentication of the form Bearer <token>
Query parameters
languageenumOptionalDefaults to multi-eu
Language of the audio file. Set explicitly to the known language for best accuracy.
Auto-detection scopes:
- `multi-eu` (default) — European set: de, en, fr, it, nl, pt, ru, es.
- `multi-indic` — Indic set: en, hi, mr, pa, gu, or, ka, ta, te, ml, bn.
- `multi-asian` — East Asian set: en, ja, ko, zh, yue.
- `multi` — full multilingual auto-detection across all supported languages.
Omitting `language` routes to `multi-eu`, which can mis-detect on non-European audio. Always pass `language` explicitly when the source language is known, or pick the regional `multi-*` scope that matches your audio.
encodingenumOptional
Audio encoding of the bytes you upload. Mirrors the `encoding`
parameter on the realtime WS endpoint.
- `linear16`, `linear32` — raw PCM (16-bit and 32-bit)
- `alaw`, `mulaw` — 8 kHz telephony codecs
- `opus`, `ogg_opus` — Opus compressed audio (raw and Ogg container)
When omitted, the server detects the format from the file's
container header (works for `.wav`, `.mp3`, `.flac`, `.ogg`,
`.m4a`, `.webm`).
webhook_urlstringOptionalformat: "uri"
webhook_extrastringOptional
word_timestampsbooleanOptionalDefaults to false
Whether to include word and utterance level timestamps in the response
diarizebooleanOptionalDefaults to false
Whether to perform speaker diarization
gender_detectionenumOptionalDefaults to false
Whether to predict the gender of the speaker
Allowed values:
emotion_detectionenumOptionalDefaults to false
Whether to predict speaker emotions
Allowed values:
formatenumOptionalDefaults to true
Master formatting switch for the transcript. When `false`, forces
`punctuate=false`, `capitalize=false`, and also disables Inverse
Text Normalization (ITN) so it cannot silently reintroduce
punctuation or casing.
When `true`, the `punctuate` and `capitalize` params take effect
independently. Leave `format=true` and use those two to fine-tune.
Allowed values:
punctuateenumOptionalDefaults to true
When false, strips end-of-sentence punctuation (., ,, ?, !)
from the transcript, words[].word, and
utterances[].transcript. Does not affect casing — use
capitalize for that. Overridden to false when format=false.
Allowed values:
capitalizeenumOptionalDefaults to true
When false, lowercases the entire transcript output (transcript,
words[].word, and utterances[].transcript). Does not affect
punctuation — use punctuate for that. Overridden to false
when format=false.
Allowed values:
Request
This endpoint expects binary data of type application/octet-stream.
Response
Speech transcribed successfully
statusstring
Status of the transcription request
transcriptionstring
The transcribed text from the audio file
audio_lengthdouble
Duration of the audio file in seconds
wordslist of objects
Word-level timestamps in seconds.
utteranceslist of objects
List of utterances with start and end times
genderenum
Predicted gender of the speaker if requested
Allowed values:
emotionsobject
Predicted emotions of the speaker if requested
metadataobject
Metadata about the transcription
Errors
400
Bad Request Error
401
Unauthorized Error
413
Content Too Large Error
429
Too Many Requests Error
500
Internal Server Error
Transcribe an audio file to text using the Pulse model. The fastest way to get a transcript when you already have a recording — pass either the raw bytes or a URL.
When to use this
Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (WSS /waves/v1/pulse/get_text) instead.
Input methods
Send the audio in one of two ways:
Raw bytes — Content-Type: application/octet-stream with the audio in the body. All knobs (language, word_timestamps, etc.) are query parameters.
URL — Content-Type: application/json with {"url": "..."} in the body. Useful when the audio already lives in object storage. Same query parameters apply.
Pulse autodetects the language across 30+ supported locales. Pass language explicitly when you already know it — detection is fast but skipping it is faster.
Examples
cURL (raw bytes)
$
curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en&word_timestamps=true" \
>
-H "Authorization: Bearer $SMALLEST_API_KEY" \
>
-H "Content-Type: application/octet-stream" \
>
--data-binary "@./call.wav"
cURL (URL)
$
curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en" \
Max file size is 25 MB. Larger files return HTTP 413. Compress to mono 16 kHz PCM if you’re close to the limit; quality is unaffected.
Formatting flags (format, punctuate, capitalize) are accepted at the wire level and exposed in the Python SDK as of smallestai>=4.4.0. Today they currently return the same transcript regardless of value — pass them in your integration so it works as the behavior changes.
Webhook-driven flow: pass webhook_url to receive the transcript asynchronously. The endpoint returns immediately; the transcript hits your webhook when ready. Useful for long files where you don’t want to hold an HTTP connection open.
Speaker diarization (diarize=true) adds latency. Skip it if you only need the words.
JavaScript / TypeScript: the official smallestai npm package predates the Pulse model, so call this endpoint with fetch or axios as shown above.
Language of the audio file. Set explicitly to the known language for best accuracy.
Auto-detection scopes:
multi-eu (default) — European set: de, en, fr, it, nl, pt, ru, es.
multi-indic — Indic set: en, hi, mr, pa, gu, or, ka, ta, te, ml, bn.
multi-asian — East Asian set: en, ja, ko, zh, yue.
multi — full multilingual auto-detection across all supported languages.
Omitting language routes to multi-eu, which can mis-detect on non-European audio. Always pass language explicitly when the source language is known, or pick the regional multi-* scope that matches your audio.
Audio encoding of the bytes you upload. Mirrors the encoding
parameter on the realtime WS endpoint.
linear16, linear32 — raw PCM (16-bit and 32-bit)
alaw, mulaw — 8 kHz telephony codecs
opus, ogg_opus — Opus compressed audio (raw and Ogg container)
When omitted, the server detects the format from the file’s
container header (works for .wav, .mp3, .flac, .ogg,
.m4a, .webm).
Master formatting switch for the transcript. When false, forces
punctuate=false, capitalize=false, and also disables Inverse
Text Normalization (ITN) so it cannot silently reintroduce
punctuation or casing.
When true, the punctuate and capitalize params take effect
independently. Leave format=true and use those two to fine-tune.