For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
DocumentationAPI ReferenceSelf HostModel CardsClient LibrariesIntegrationsDeveloper ToolsChangelog
  • Getting Started
    • Introduction
    • Models
    • Authentication
  • Text to Speech (Lightning)
    • Quickstart
    • Overview
    • Sync & Async
    • Streaming
    • Pronunciation Dictionaries
    • Voices & Languages
    • HTTP vs Streaming vs WebSockets
  • Speech to Text (Pulse)
    • Quickstart
    • Overview
      • Word Timestamps
      • Language Detection
      • Utterances
      • Diarization
      • Redaction
      • Gender Detection
      • Emotion Detection
      • Keyword Boosting
      • Punctuation Formatting
      • End-of-Utterance Timeout
      • Inverse Text Normalization
      • Finalize Control
  • LLM (Electron)
    • Quickstart
    • Overview
    • Chat Completions
    • Streaming
    • Tool / Function Calling
    • Prefix Caching
    • Supported Parameters
    • Migrate from OpenAI
    • Best Practices
  • Cookbooks
    • Speech to Text
    • Text to Speech
    • Voice Agent (Electron + Pulse + Lightning)
  • Voice Cloning
    • Instant Clone (UI)
    • Instant Clone (API)
    • Instant Clone (Python SDK)
    • Delete Cloned Voice
  • Best Practices
    • Voice Cloning Best Practices
    • TTS Best Practices
  • Troubleshooting
    • Error reference
LogoLogo
Voice AgentsModels
Voice AgentsModels
On this page
  • Enabling language detection
  • Pre-Recorded API
  • Real-Time WebSocket API
  • Output format & field of interest
  • Sample response
  • Pre-Recorded API Response
  • Real-Time WebSocket API Response
Speech to Text (Pulse)Features

Language detection

||View as Markdown|
Was this page helpful?
Previous

Word timestamps

Next

Sentence-level timestamps

Built with
Pre-Recorded Real-Time

Enabling language detection

Set the language query parameter to multi when calling the API. It will auto-detect the spoken language across 30+ ISO 639-1 language codes.

View the full list of supported languages.

Pre-Recorded API

$# Download sample audio (or use your own file)
$curl -sL -o audio.wav "https://github.com/smallest-inc/cookbook/raw/main/speech-to-text/getting-started/samples/audio.wav"
$
$curl --request POST \
> --url "https://api.smallest.ai/waves/v1/pulse/get_text?language=multi&word_timestamps=true" \
> --header "Authorization: Bearer $SMALLEST_API_KEY" \
> --header "Content-Type: audio/wav" \
> --data-binary "@audio.wav"

Real-Time WebSocket API

1const url = new URL("wss://api.smallest.ai/waves/v1/pulse/get_text");
2url.searchParams.append("language", "multi");
3url.searchParams.append("encoding", "linear16");
4url.searchParams.append("sample_rate", "16000");
5
6const ws = new WebSocket(url.toString(), {
7 headers: {
8 Authorization: `Bearer ${API_KEY}`,
9 },
10});

Output format & field of interest

When language detection is enabled, the transcription (or transcript for realtime), words, and utterances arrays are emitted in the detected language. The response includes a language field with the detected primary language code, and a languages array (in realtime responses where is_final=true) listing all detected languages. Persist the detected locale in your app by storing the language parameter you supplied (for auditing) and by inspecting downstream metadata such as subtitles or captions that inherit the localized transcript.

Sample response

Pre-Recorded API Response

1{
2 "status": "success",
3 "transcription": "Hola mundo.",
4 "words": [
5 { "start": 0.0, "end": 0.4, "word": "Hola" },
6 { "start": 0.5, "end": 0.9, "word": "mundo." }
7 ],
8 "utterances": [
9 { "text": "Hola mundo.", "start": 0.0, "end": 0.9 }
10 ]
11}

Real-Time WebSocket API Response

1{
2 "session_id": "sess_12345abcde",
3 "transcript": "Hola mundo.",
4 "is_final": true,
5 "is_last": false,
6 "language": "es",
7 "languages": ["es"]
8}

The language field is only returned when is_final=true in real-time API responses. The languages array lists all languages detected in the audio and is also only included when is_final=true.