> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Overview

> Smallest Speech-to-Text API. Pulse and Pulse Pro models behind one unified endpoint with multilingual streaming, leaderboard-ranked English accuracy, diarization, word timestamps, and emotion detection.

The Speech-to-Text API transcribes audio via the unified endpoint `https://api.smallest.ai/waves/v1/stt/`. The model is selected via the `?model=` query parameter:

* **`?model=pulse`**: multilingual (38 languages), supports streaming and pre-recorded transcription.
* **`?model=pulse-pro`**: leaderboard-ranked English accuracy (5.42% ESB avg WER, tied #2 on the public Open ASR Leaderboard). Pre-recorded HTTP only; no streaming worker yet.

Live streaming runs on `WS /waves/v1/stt/live?model=pulse`. Pulse Pro on the live path returns `400` with a directive to use HTTP.

Multilingual streaming + non-streaming. 38 languages.

English-only, leaderboard-ranked accuracy. Pre-recorded HTTP only.

Get started in minutes. Learn how to get your API key and transcribe your first audio file.

## Transcription Modes

We offer two transcription modes to cover a wide range of use cases. Choose the one that best fits your needs:

Transcribe audio files using synchronous HTTPS POST requests. Perfect for batch processing, archived media, and offline transcription workflows.

Stream audio and receive transcription results as the audio is processed. Ideal for live conversations, voice assistants, and low-latency applications.

## Feature highlights

Our models specialize in processing audio to preserve information that is often lost during conventional speech to text conversion.

Support for 38 languages with automatic language detection or ISO 639-1 codes (`en`, `hi`, etc.). Use `language=multi` to enable automatic language detection across all supported languages.

Get precise timing information for each word in the transcription. Enables caption generation, subtitle tracks, and time-based search within audio content.

Receive sentence-level transcription segments with timing information. Perfect for displaying readable captions, synchronizing larger chunks of audio, or storing structured call summaries.

Identify and separate generated text into speaker turns. Automatically label different speakers in multi-speaker audio, enabling speaker-attributed transcription.

Detect the gender of each speaker alongside transcription. Provides demographic insights for analytics and content analysis.

Detect emotional tone in transcribed speech with strength indicators for 5 core emotion types. Analyze sentiment and emotional context in conversations.

Automatically redact personally identifiable information (names, addresses, phone numbers) and payment card information (credit cards, CVV, account numbers) to protect privacy and ensure compliance.

Streaming pipeline tuned for \~64 ms time to first transcript latency. Optimized for real-time transcription with minimal delay.

## Supported languages

<table>
  <thead>
    <tr>
      <th>
        Language
      </th>

      <th>
        Code
      </th>

      <th>
        Pre-Recorded
      </th>

      <th>
        Real-Time
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        Italian
      </td>

      <td>
        <code>it</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Spanish
      </td>

      <td>
        <code>es</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        English
      </td>

      <td>
        <code>en</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Portuguese
      </td>

      <td>
        <code>pt</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Hindi
      </td>

      <td>
        <code>hi</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        German
      </td>

      <td>
        <code>de</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        French
      </td>

      <td>
        <code>fr</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Ukrainian
      </td>

      <td>
        <code>uk</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Russian
      </td>

      <td>
        <code>ru</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Kannada
      </td>

      <td>
        <code>kn</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Malayalam
      </td>

      <td>
        <code>ml</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Polish
      </td>

      <td>
        <code>pl</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Marathi
      </td>

      <td>
        <code>mr</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Gujarati
      </td>

      <td>
        <code>gu</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Czech
      </td>

      <td>
        <code>cs</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Slovak
      </td>

      <td>
        <code>sk</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Telugu
      </td>

      <td>
        <code>te</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Oriya (Odia)
      </td>

      <td>
        <code>or</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Dutch
      </td>

      <td>
        <code>nl</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Bengali
      </td>

      <td>
        <code>bn</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Latvian
      </td>

      <td>
        <code>lv</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Estonian
      </td>

      <td>
        <code>et</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Romanian
      </td>

      <td>
        <code>ro</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Punjabi
      </td>

      <td>
        <code>pa</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Finnish
      </td>

      <td>
        <code>fi</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Swedish
      </td>

      <td>
        <code>sv</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Bulgarian
      </td>

      <td>
        <code>bg</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Tamil
      </td>

      <td>
        <code>ta</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Hungarian
      </td>

      <td>
        <code>hu</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Danish
      </td>

      <td>
        <code>da</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Lithuanian
      </td>

      <td>
        <code>lt</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>

    <tr>
      <td>
        Maltese
      </td>

      <td>
        <code>mt</code>
      </td>

      <td>
        Yes
      </td>

      <td>
        Yes
      </td>
    </tr>
  </tbody>
</table>

Use `language=multi` to auto-detect across the full list or specify one of the codes above to pin the model to a single language.

## Next steps

* Send your first POST request in the [Pulse STT Pre-Recorded quickstart](/waves/documentation/speech-to-text-pulse/pre-recorded/quickstart).
* Start your first WebSocket connection in the [Pulse STT WebSocket quickstart](/waves/documentation/speech-to-text-pulse/realtime-web-socket/quickstart).
* See the [Pulse model card](/waves/model-cards/speech-to-text/pulse) for benchmarks, capabilities, and pricing.
* Review [best practices](/waves/documentation/speech-to-text-pulse/pre-recorded/best-practices) for audio preprocessing and request hygiene.
* Use the [troubleshooting guide](/waves/documentation/speech-to-text-pulse/pre-recorded/troubleshooting) when you need quick fixes.