> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Vercel AI SDK

> Use Smallest AI TTS and STT with the Vercel AI SDK in Next.js and Node.js apps.

Use Smallest AI as a speech and transcription provider in the [Vercel AI SDK](https://ai-sdk.dev). Generate speech and transcribe audio with a few lines of code. The package also exposes streaming WebSocket transcription and voice cloning APIs that sit alongside the Vercel `SpeechModelV2` / `TranscriptionModelV2` interfaces.

Latest: **`smallestai-vercel-provider@0.6.2`** — adds browser-native streaming (no proxy required), microphone capture hooks, auto-reconnect on socket drops (with counter reset across the session, so multi-hour streams survive sporadic blips), and a security-validated `signedUrl` flow for production browser apps. The SDK lazy-loads its `ws` dependency so browser-only consumers (using `auth: 'query'` or `signedUrl`) ship a smaller bundle and don't need the `bufferutil` / `serverExternalPackages` setup that older versions required.

## Installation

```bash
npm install smallestai-vercel-provider ai
```

## Setup

Get your API key from [waves.smallest.ai](https://waves.smallest.ai) and set it as an environment variable:

```bash
export SMALLEST_API_KEY="your_key_here"
```

## Text-to-Speech

Supported model: `lightning-v3.1` — 44.1 kHz, natural expressive speech, voice cloning, \~100 ms latency, 22 languages plus `auto` for code-switching detection (see the [Lightning v3.1 model card](/waves/model-cards/text-to-speech/lightning-v-3-1#supported-languages) for the full list). The package also exports `DEFAULT_LIGHTNING_MODEL` so you don't have to hard-code the id; bumping it on a new Lightning release flows through to every caller that imports the constant.

```typescript
import { experimental_generateSpeech as generateSpeech } from 'ai';
import {
  smallestai,
  DEFAULT_LIGHTNING_MODEL,
} from 'smallestai-vercel-provider';

const { audio } = await generateSpeech({
  model: smallestai.speech(DEFAULT_LIGHTNING_MODEL),
  text: 'Hello from Smallest AI!',
  voice: 'sophia',
  language: 'auto',   // 'en', 'hi', 'es', ... — defaults to 'auto'
  speed: 1.0,
});

// audio.uint8Array — raw WAV bytes
// audio.base64    — base64-encoded audio
```

<Note>
  Pass `outputFormat` under `providerOptions.smallestai.outputFormat`, not as Vercel's top-level `outputFormat` arg — the SDK rejects the top-level form with a warning. See [TTS Options](#tts-options) below.
</Note>

## Speech-to-Text (batch)

```typescript
import { experimental_transcribe as transcribe } from 'ai';
import { smallestai } from 'smallestai-vercel-provider';
import { readFileSync } from 'fs';

const { text, segments, durationInSeconds } = await transcribe({
  model: smallestai.transcription('pulse'),
  audio: readFileSync('recording.wav'),
  mediaType: 'audio/wav',
});

console.log(text);
```

## Next.js API Route Example

Create a TTS endpoint in your Next.js app:

```typescript
// app/api/speak/route.ts
import { experimental_generateSpeech as generateSpeech } from 'ai';
import { smallestai } from 'smallestai-vercel-provider';

export const runtime = 'nodejs';

export async function POST(req: Request) {
  const { text, voice } = await req.json();

  const { audio } = await generateSpeech({
    model: smallestai.speech('lightning-v3.1'),
    text,
    voice: voice || 'sophia',
  });

  return new Response(Buffer.from(audio.uint8Array), {
    headers: { 'Content-Type': 'audio/wav' },
  });
}
```

Play it in the browser:

```typescript
const res = await fetch('/api/speak', {
  method: 'POST',
  body: JSON.stringify({ text: 'Hello!', voice: 'sophia' }),
});
const blob = await res.blob();
new Audio(URL.createObjectURL(blob)).play();
```

## Provider Options

### TTS Options

```typescript
const { audio } = await generateSpeech({
  model: smallestai.speech('lightning-v3.1'),
  text: 'Hello!',
  voice: 'robert',
  providerOptions: {
    smallestai: {
      sampleRate: 44100,        // 8000 | 16000 | 24000 | 44100
      outputFormat: 'mp3',      // pcm | mp3 | wav | ulaw | alaw
      similarity: 0.5,          // voice similarity (0–1)
      enhancement: 1,           // audio enhancement (0 | 1 | 2)
      addWavHeader: false,
      saveHistory: false,
      pronunciationDicts: ['<dict-id>'],
    },
  },
});
```

### Batch STT Options

```typescript
const result = await transcribe({
  model: smallestai.transcription('pulse'),
  audio: audioBuffer,
  mediaType: 'audio/wav',
  providerOptions: {
    smallestai: {
      language: 'multi',         // 'en' | 'hi' | 'multi' | 'multi-eu' | 'multi-asian' | 'multi-indic' — see Pulse model card
      diarize: true,
      emotionDetection: true,
      genderDetection: true,
      wordTimestamps: true,

      // Privacy
      redactPii: true,           // names, addresses → [FIRSTNAME_1] etc.
      redactPci: true,           // card #s, CVV → [CREDITCARDCVV_1] etc.

      // Formatting
      numerals: 'auto',          // 'true' | 'false' | 'auto'
      punctuate: true,
      capitalize: true,

      // Async webhook delivery
      webhookUrl: 'https://example.com/asr-webhook',
      webhookMethod: 'POST',
      webhookExtra: 'job_id:abc123',
    },
  },
});
```

<Note>
  **`ageDetection`** was removed from the API and emits a deprecation warning if set.

  **`itnNormalize`, `sentenceTimestamps`, `finalizeOnWords`, `maxWords`, `eouTimeoutMs`** are accepted only on the streaming WebSocket — TS will error if you set them on `transcribe()`. Use `smallestai.transcriptionStream(...)` (below) for those.
</Note>

## Streaming Speech-to-Text (WebSocket)

For real-time transcription (TTFT \~64 ms server-side), the SDK exposes a WebSocket session that wraps `wss://api.smallest.ai/waves/v1/pulse/get_text` with the canonical `Authorization: Bearer` flow. WS-only flags like `itnNormalize`, `sentenceTimestamps`, `finalizeOnWords`, `maxWords`, and `eouTimeoutMs` only take effect on this path.

```typescript
import { smallestai } from 'smallestai-vercel-provider';
import { readFileSync } from 'fs';

const stream = smallestai.transcriptionStream('pulse', {
  language: 'en',
  encoding: 'linear16',
  sampleRate: 16000,
  wordTimestamps: true,
  diarize: true,
  redactPii: true,
  redactPci: true,
  numerals: 'auto',
  itnNormalize: true,
  sentenceTimestamps: true,
  keywords: ['NVIDIA:5', 'Jensen'],
});

await stream.connect();

// Stream raw PCM s16le @ 16 kHz mono
const pcm = readFileSync('audio.s16le');
for (let i = 0; i < pcm.length; i += 32 * 1024) {
  stream.sendAudio(pcm.subarray(i, i + 32 * 1024));
}
stream.closeStream(); // server flushes, emits is_last: true, then closes

let fullTranscript = '';
for await (const msg of stream) {
  if (!msg.is_final) {
    console.log('partial:', msg.transcript);
  } else {
    console.log('final:', msg.transcript);
    fullTranscript += (fullTranscript ? ' ' : '') + (msg.transcript || '');
  }
  if (msg.is_last) break;
}
console.log('full transcript:', fullTranscript);
```

### One-shot helper for pre-recorded audio

```typescript
import {
  smallestai,
  SmallestAITranscriptionStream,
} from 'smallestai-vercel-provider';

const stream = smallestai.transcriptionStream('pulse', {
  language: 'en', encoding: 'linear16', sampleRate: 16000,
  wordTimestamps: true, sentenceTimestamps: true, itnNormalize: true,
});

const { transcript } = await SmallestAITranscriptionStream.transcribeOnce(
  stream,
  audioBytes,
);
```

## Voice Cloning

The provider exposes the voice-cloning REST endpoints alongside TTS / STT. Defaults to `lightning-v3.1`; the legacy `lightning-v2` model is rejected upstream.

```typescript
import { smallestai } from 'smallestai-vercel-provider';
import { readFileSync } from 'fs';

// Create an instant clone
const clone = await smallestai.voiceClone.create({
  file: readFileSync('my-voice.wav'),
  fileName: 'my-voice.wav',
  displayName: 'My voice',
  description: 'Warm narrator',
  language: 'en',
});
console.log(clone.voiceId); // → "voice_abc123"

// List all clones in your org
const all = await smallestai.voiceClone.list();

// Use the cloned voice in TTS
const { audio } = await generateSpeech({
  model: smallestai.speech('lightning-v3.1'),
  text: 'Hello in my own voice.',
  voice: clone.voiceId,
});

// Delete when done
await smallestai.voiceClone.delete(clone.voiceId);
```

## Patterns & Caveats

### Accumulate `full_transcript` client-side

The streaming API accepts `fullTranscript: true`, but the server-side `full_transcript` field is currently returned as an empty string on every frame. Concatenate `is_final: true` frames yourself instead:

```typescript
let fullTranscript = '';
for await (const msg of stream) {
  if (msg.is_final && msg.transcript) {
    fullTranscript += (fullTranscript ? ' ' : '') + msg.transcript;
  }
  if (msg.is_last) break;
}
```

The `transcribeOnce()` helper does this for you — use it for the pre-recorded case.

### Browser streaming — three options

The default `transcriptionStream()` uses an `Authorization: Bearer` header that native browser `WebSocket` can't set. Three patterns for browser apps, in order of recommendation:

#### A. Proxy via your server (recommended for production)

Server holds the API key, browser never sees it. The SDK ships a one-line helper that turns the stream into Server-Sent Events:

```typescript
// app/api/transcribe-stream/route.ts (Next.js, Node runtime)
import {
  smallestai,
  createTranscriptionStreamSSEResponse,
} from 'smallestai-vercel-provider';

export const runtime = 'nodejs';

export async function POST(req: Request) {
  const audio = new Uint8Array(await req.arrayBuffer());
  const stream = smallestai.transcriptionStream('pulse', {
    language: 'en',
    encoding: 'linear16',
    sampleRate: 16000,
    wordTimestamps: true,
    itnNormalize: true,
  });
  await stream.connect();
  for (let i = 0; i < audio.length; i += 32 * 1024) {
    stream.sendAudio(audio.subarray(i, i + 32 * 1024));
  }
  stream.closeStream();
  return createTranscriptionStreamSSEResponse(stream, { signal: req.signal });
}
```

The browser parses the SSE response with the matching helper:

```typescript
import { parseTranscriptionStreamSSE } from 'smallestai-vercel-provider';

const res = await fetch('/api/transcribe-stream', { method: 'POST', body: audioBytes });
for await (const msg of parseTranscriptionStreamSSE(res)) {
  if (msg.is_final) console.log(msg.transcript);
  if (msg.is_last) break;
}
```

<Note>
  **Next.js setup, one-time** — only required for server-side `auth: 'header'` (the default, used by the SSE proxy above). Add this to `next.config.{js,mjs,ts}`:

  ```js
  /** @type {import('next').NextConfig} */
  const nextConfig = {
    serverExternalPackages: ['smallestai-vercel-provider', 'ws'],
  };
  export default nextConfig;
  ```

  And install the optional native deps so `ws` masks frames at native speed:

  ```bash
  npm install bufferutil utf-8-validate
  ```

  Browser-only consumers using `auth: 'query'` (option C) or `signedUrl` (option B) don't need this — the SDK lazy-loads `ws` only when the `Authorization` header path is reached, so browser bundles never pull in `ws` or its Node-only deps.
</Note>

#### B. Browser-native via signed URL (also production-grade)

Your server mints a short-lived URL on demand; the browser opens the WebSocket directly with that URL. Same security profile as (A) but with one less hop:

```typescript
// Browser code:
const stream = smallestai.transcriptionStream('pulse', {
  language: 'en',
  encoding: 'linear16',
  sampleRate: 16000,
}, {
  signedUrl: async () => {
    const res = await fetch('/api/get-stream-url');
    return (await res.json()).url; // wss://api.smallest.ai/...
  },
});
await stream.connect();
```

The SDK calls `signedUrl()` on every `connect()` and on every reconnect, so each session uses a fresh URL. Your server endpoint (`/api/get-stream-url`) decides how to scope and time-bound those URLs.

#### C. Browser-native with API key in URL (dev / internal apps only)

```typescript
const stream = smallestai.transcriptionStream('pulse', {
  language: 'en', encoding: 'linear16', sampleRate: 16000,
}, {
  apiKey: 'sk_...',
  auth: 'query',
});
```

<Warning>
  The API key appears in the WebSocket URL — visible in browser devtools, history, server access logs, and any error reporting tool that captures URLs. The SDK emits a one-time `console.warn` when this mode is used so it can't be deployed unnoticed. Use only for dev / internal apps with per-user-scoped keys; for end-user production, use option (A) or (B).
</Warning>

### Auto-reconnect on socket drops

Long-running sessions drop sockets for prosaic reasons (network blips, load-balancer recycling, idle timeouts). Pass `autoReconnect: true` and the SDK transparently re-opens with the same parameters and emits a synthetic `{ type: 'reconnected', attempt }` frame so consumers can react:

```typescript
const stream = smallestai.transcriptionStream('pulse', {
  language: 'en',
  encoding: 'linear16',
  sampleRate: 16000,
  autoReconnect: true,
  maxReconnectAttempts: 5,    // default 5
  reconnectBackoffMs: 500,    // exponential, capped at 30s
});

for await (const msg of stream) {
  if (msg.type === 'reconnected') {
    console.log(`recovered after ${msg.attempt} attempt(s)`);
    continue;
  }
  // ... normal transcript handling
}
```

Reconnect only fires on **unexpected** closes — `is_last`, an explicit `closeStream()`, and server-emitted error frames all terminate cleanly without retry.

`maxReconnectAttempts` counts **consecutive** failed attempts: the counter resets to zero after every successful reconnect, so a multi-hour stream that survives one blip per hour does not exhaust its retry budget across the whole session.

The optional 3rd argument to `transcriptionStream(modelId, options, config)` lets you override per-session connection config — the `autoReconnect` knobs above can also live there if you want them outside the WS-protocol options. The same slot accepts `auth: 'query'`, `signedUrl`, `signedUrlTimeoutMs`, `allowedSignedHosts`, and `suppressInsecureAuthWarning`.

## Microphone capture (browser)

Live captions and voice agents need raw mic data on the wire. The SDK ships two browser-side hooks for this:

### `useMicrophoneTranscription` (high-level)

The all-in-one: captures the mic, streams chunks to your SSE proxy as a `ReadableStream` request body, exposes live transcript state.

```tsx
'use client';
import { useMicrophoneTranscription } from 'smallestai-vercel-provider/react';

export function LiveCaptions() {
  const {
    transcript, partial,
    isCapturing, isStreaming,
    chunksDelivered, chunksDropped,
    start, stop, reset,
  } = useMicrophoneTranscription({ apiPath: '/api/transcribe-mic-stream' });

  return (
    <>
      <button onClick={isCapturing ? stop : () => start()}>
        {isCapturing ? 'Stop' : 'Start'}
      </button>
      <p>{transcript}{partial && <em> {partial}</em>}</p>
      {chunksDropped > 0 && <small>⚠ {chunksDropped} chunks dropped (lagging)</small>}
    </>
  );
}
```

The hook captures via `getUserMedia` + `AudioWorklet`, downsamples to `linear16` @ 16 kHz mono, batches into \~100 ms chunks, and POSTs them as a streaming request body. Drop-oldest backpressure means a slow network never balloons memory.

### `useMicrophonePCM` (low-level)

If you want the raw mic stream and your own pipe (custom WS, WebRTC, etc.):

```ts
import { useMicrophonePCM } from 'smallestai-vercel-provider/react';

const { start, stop, isCapturing, chunksDropped } = useMicrophonePCM({
  sampleRate: 16000,
  batchMs: 100,
  maxQueuedChunks: 50,
  onChunk: (chunk) => myPipe.send(chunk),
});
```

The `AudioWorklet` processor is inlined as a Blob URL — no separate worklet file to host.

## Security notes for browser deployments

The SDK enforces these guards on the new browser-native paths so you can't accidentally ship insecure code:

| Guard                                        | What it blocks                                                                                                                                   |
| -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ |
| `signedUrl()` results must be `wss:`         | TLS-stripping attacks. `ws://localhost` only works when you opt in via `allowedSignedHosts: ['localhost']`.                                      |
| `signedUrl()` host must match `baseURL` host | A bug in your signing endpoint can't redirect audio to `attacker.com`. Add additional hosts via `allowedSignedHosts`.                            |
| `signedUrlTimeoutMs` (default 10 s)          | A hung signing endpoint fast-fails instead of stalling forever.                                                                                  |
| `auth: 'query'` console warning              | One-time warning makes URL-based auth visible in dev so it can't deploy unnoticed. Suppress via `suppressInsecureAuthWarning: true` after audit. |

What stays your job:

* **CSRF-protect your SSE proxy and `signedUrl` mint endpoints.**
* **Rate-limit the proxy** per-user — a malicious client can otherwise spam your route to burn API budget.
* **Pick short token TTLs** for `signedUrl` (60 s is plenty — it only needs to live long enough for the browser to open the WS).
* **Never include user-controlled hosts in `allowedSignedHosts`.**

## Available Voices

80+ voices across multiple languages. Popular voices:

| Voice     | Gender | Accent        | Best For                |
| --------- | ------ | ------------- | ----------------------- |
| `sophia`  | Female | American      | General use (default)   |
| `robert`  | Male   | American      | Professional            |
| `advika`  | Female | Indian        | Hindi, code-switching   |
| `vivaan`  | Male   | Indian        | Bilingual English/Hindi |
| `camilla` | Female | Mexican/Latin | Spanish                 |

Fetch the full voice list programmatically:

```bash
curl -s "https://api.smallest.ai/waves/v1/lightning-v3.1/get_voices" \
  -H "Authorization: Bearer $SMALLEST_API_KEY"
```

## Links

<CardGroup cols={2}>
  <Card title="npm Package" icon="npm" href="https://www.npmjs.com/package/smallestai-vercel-provider">
    Install from npm
  </Card>

  <Card title="GitHub" icon="github" href="https://github.com/smallest-inc/smallest-ai-vercel-provider">
    Source code
  </Card>

  <Card title="Runnable Examples" icon="play" href="https://github.com/smallest-inc/smallest-ai-vercel-provider/tree/main/examples">
    TTS, batch + streaming STT, voice cloning
  </Card>

  <Card title="Vercel AI SDK" icon="triangle" href="https://ai-sdk.dev">
    AI SDK documentation
  </Card>
</CardGroup>