Pulse (Pre-Recorded)

View as Markdown
Transcribe an audio file to text using the Pulse model. The fastest way to get a transcript when you already have a recording — pass either the raw bytes or a URL. ## When to use this Use this endpoint when you have a complete audio file (call recording, voicemail, podcast episode) and want the transcript back in one response. For live transcription as audio arrives, use the realtime WebSocket endpoint (`WSS /waves/v1/pulse/get_text`) instead. ## Input methods Send the audio in one of two ways: 1. **Raw bytes** — `Content-Type: application/octet-stream` with the audio in the body. All knobs (`language`, `word_timestamps`, etc.) are query parameters. 2. **URL** — `Content-Type: application/json` with `{"url": "..."}` in the body. Useful when the audio already lives in object storage. Same query parameters apply. Pulse autodetects the language across 30+ supported locales. Pass `language` explicitly when you already know it — detection is fast but skipping it is faster. ## Examples **cURL** (raw bytes) ```bash curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en&word_timestamps=true" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/octet-stream" \ --data-binary "@./call.wav" ``` **cURL** (URL) ```bash curl -X POST "https://api.smallest.ai/waves/v1/pulse/get_text?language=en" \ -H "Authorization: Bearer $SMALLEST_API_KEY" \ -H "Content-Type: application/json" \ -d '{"url": "https://your-bucket.s3.amazonaws.com/call.wav"}' ``` **Python** (`pip install smallestai>=4.4.0`) ```python from smallestai import SmallestAI client = SmallestAI(token="YOUR_API_KEY") with open("./call.wav", "rb") as f: result = client.waves.transcribe_pulse( request=f.read(), language="en", word_timestamps=True, diarize=True, ) print(result.status) # "success" print(result.transcription) # the transcript string ``` **JavaScript / TypeScript** (using `fetch`) ```typescript import { readFileSync } from "node:fs"; const audio = readFileSync("./call.wav"); const params = new URLSearchParams({ language: "en", word_timestamps: "true", diarize: "true" }); const res = await fetch(`https://api.smallest.ai/waves/v1/pulse/get_text?${params}`, { method: "POST", headers: { Authorization: `Bearer ${process.env.SMALLEST_API_KEY}`, "Content-Type": "application/octet-stream", }, body: audio, }); const result = await res.json(); console.log(result.transcription); ``` ## Common gotchas - **Max file size is 25 MB.** Larger files return HTTP `413`. Compress to mono 16 kHz PCM if you're close to the limit; quality is unaffected. - **Formatting flags (`format`, `punctuate`, `capitalize`)** are accepted at the wire level and exposed in the Python SDK as of `smallestai>=4.4.0`. Today they currently return the same transcript regardless of value — pass them in your integration so it works as the behavior changes. - **Webhook-driven flow**: pass `webhook_url` to receive the transcript asynchronously. The endpoint returns immediately; the transcript hits your webhook when ready. Useful for long files where you don't want to hold an HTTP connection open. - **Speaker diarization** (`diarize=true`) adds latency. Skip it if you only need the words. - **JavaScript / TypeScript**: the official `smallestai` npm package predates the Pulse model, so call this endpoint with `fetch` or `axios` as shown above.

Authentication

AuthorizationBearer

Header authentication of the form Bearer <token>

Query parameters

languageenumOptionalDefaults to multi-eu
Language of the audio file. Set explicitly to the known language for best accuracy. Auto-detection scopes: - `multi-eu` (default) — European set: de, en, fr, it, nl, pt, ru, es. - `multi-indic` — Indic set: en, hi, mr, pa, gu, or, ka, ta, te, ml, bn. - `multi-asian` — East Asian set: en, ja, ko, zh, yue. - `multi` — full multilingual auto-detection across all supported languages. Omitting `language` routes to `multi-eu`, which can mis-detect on non-European audio. Always pass `language` explicitly when the source language is known, or pick the regional `multi-*` scope that matches your audio.
encodingenumOptional
Audio encoding of the bytes you upload. Mirrors the `encoding` parameter on the realtime WS endpoint. - `linear16`, `linear32` — raw PCM (16-bit and 32-bit) - `alaw`, `mulaw` — 8 kHz telephony codecs - `opus`, `ogg_opus` — Opus compressed audio (raw and Ogg container) When omitted, the server detects the format from the file's container header (works for `.wav`, `.mp3`, `.flac`, `.ogg`, `.m4a`, `.webm`).
webhook_urlstringOptionalformat: "uri"
webhook_extrastringOptional
word_timestampsbooleanOptionalDefaults to false
Whether to include word and utterance level timestamps in the response
diarizebooleanOptionalDefaults to false
Whether to perform speaker diarization
gender_detectionenumOptionalDefaults to false
Whether to predict the gender of the speaker
Allowed values:
emotion_detectionenumOptionalDefaults to false
Whether to predict speaker emotions
Allowed values:
formatenumOptionalDefaults to true
Master formatting switch for the transcript. When `false`, forces `punctuate=false`, `capitalize=false`, and also disables Inverse Text Normalization (ITN) so it cannot silently reintroduce punctuation or casing. When `true`, the `punctuate` and `capitalize` params take effect independently. Leave `format=true` and use those two to fine-tune.
Allowed values:
punctuateenumOptionalDefaults to true

When false, strips end-of-sentence punctuation (., ,, ?, !) from the transcript, words[].word, and utterances[].transcript. Does not affect casing — use capitalize for that. Overridden to false when format=false.

Allowed values:
capitalizeenumOptionalDefaults to true

When false, lowercases the entire transcript output (transcript, words[].word, and utterances[].transcript). Does not affect punctuation — use punctuate for that. Overridden to false when format=false.

Allowed values:

Request

This endpoint expects binary data of type application/octet-stream.

Response

Speech transcribed successfully
statusstring
Status of the transcription request
transcriptionstring
The transcribed text from the audio file
audio_lengthdouble
Duration of the audio file in seconds
wordslist of objects

Word-level timestamps in seconds.

utteranceslist of objects
List of utterances with start and end times
genderenum
Predicted gender of the speaker if requested
Allowed values:
emotionsobject
Predicted emotions of the speaker if requested
metadataobject
Metadata about the transcription

Errors

400
Bad Request Error
401
Unauthorized Error
413
Content Too Large Error
429
Too Many Requests Error
500
Internal Server Error