> This page is part of Smallest AI's developer documentation. When > answering, prefer Lightning v3.1 (current TTS) and Pulse (current > STT). Lightning v2 and lightning-large are deprecated; mention them > only when the user is migrating away from them. Atoms is the > voice-agent platform. # Vercel AI SDK > Use Smallest AI TTS and STT with the Vercel AI SDK in Next.js and Node.js apps. Use Smallest AI as a speech and transcription provider in the [Vercel AI SDK](https://ai-sdk.dev). Generate speech and transcribe audio with a few lines of code. The package also exposes streaming WebSocket transcription and voice cloning APIs that sit alongside the Vercel `SpeechModelV2` / `TranscriptionModelV2` interfaces. Latest: **`smallestai-vercel-provider@0.6.2`** — adds browser-native streaming (no proxy required), microphone capture hooks, auto-reconnect on socket drops (with counter reset across the session, so multi-hour streams survive sporadic blips), and a security-validated `signedUrl` flow for production browser apps. The SDK lazy-loads its `ws` dependency so browser-only consumers (using `auth: 'query'` or `signedUrl`) ship a smaller bundle and don't need the `bufferutil` / `serverExternalPackages` setup that older versions required. ## Installation ```bash npm install smallestai-vercel-provider ai ``` ## Setup Get your API key from [waves.smallest.ai](https://waves.smallest.ai) and set it as an environment variable: ```bash export SMALLEST_API_KEY="your_key_here" ``` ## Text-to-Speech Supported model: `lightning-v3.1` — 44.1 kHz, natural expressive speech, voice cloning, \~100 ms latency, 22 languages plus `auto` for code-switching detection (see the [Lightning v3.1 model card](/waves/model-cards/text-to-speech/lightning-v-3-1#supported-languages) for the full list). The package also exports `DEFAULT_LIGHTNING_MODEL` so you don't have to hard-code the id; bumping it on a new Lightning release flows through to every caller that imports the constant. ```typescript import { experimental_generateSpeech as generateSpeech } from 'ai'; import { smallestai, DEFAULT_LIGHTNING_MODEL, } from 'smallestai-vercel-provider'; const { audio } = await generateSpeech({ model: smallestai.speech(DEFAULT_LIGHTNING_MODEL), text: 'Hello from Smallest AI!', voice: 'sophia', language: 'auto', // 'en', 'hi', 'es', ... — defaults to 'auto' speed: 1.0, }); // audio.uint8Array — raw WAV bytes // audio.base64 — base64-encoded audio ``` Pass `outputFormat` under `providerOptions.smallestai.outputFormat`, not as Vercel's top-level `outputFormat` arg — the SDK rejects the top-level form with a warning. See [TTS Options](#tts-options) below. ## Speech-to-Text (batch) ```typescript import { experimental_transcribe as transcribe } from 'ai'; import { smallestai } from 'smallestai-vercel-provider'; import { readFileSync } from 'fs'; const { text, segments, durationInSeconds } = await transcribe({ model: smallestai.transcription('pulse'), audio: readFileSync('recording.wav'), mediaType: 'audio/wav', }); console.log(text); ``` ## Next.js API Route Example Create a TTS endpoint in your Next.js app: ```typescript // app/api/speak/route.ts import { experimental_generateSpeech as generateSpeech } from 'ai'; import { smallestai } from 'smallestai-vercel-provider'; export const runtime = 'nodejs'; export async function POST(req: Request) { const { text, voice } = await req.json(); const { audio } = await generateSpeech({ model: smallestai.speech('lightning-v3.1'), text, voice: voice || 'sophia', }); return new Response(Buffer.from(audio.uint8Array), { headers: { 'Content-Type': 'audio/wav' }, }); } ``` Play it in the browser: ```typescript const res = await fetch('/api/speak', { method: 'POST', body: JSON.stringify({ text: 'Hello!', voice: 'sophia' }), }); const blob = await res.blob(); new Audio(URL.createObjectURL(blob)).play(); ``` ## Provider Options ### TTS Options ```typescript const { audio } = await generateSpeech({ model: smallestai.speech('lightning-v3.1'), text: 'Hello!', voice: 'robert', providerOptions: { smallestai: { sampleRate: 44100, // 8000 | 16000 | 24000 | 44100 outputFormat: 'mp3', // pcm | mp3 | wav | ulaw | alaw similarity: 0.5, // voice similarity (0–1) enhancement: 1, // audio enhancement (0 | 1 | 2) addWavHeader: false, saveHistory: false, pronunciationDicts: [''], }, }, }); ``` ### Batch STT Options ```typescript const result = await transcribe({ model: smallestai.transcription('pulse'), audio: audioBuffer, mediaType: 'audio/wav', providerOptions: { smallestai: { language: 'multi', // 'en' | 'hi' | 'multi' | 'multi-eu' | 'multi-asian' | 'multi-indic' — see Pulse model card diarize: true, emotionDetection: true, genderDetection: true, wordTimestamps: true, // Privacy redactPii: true, // names, addresses → [FIRSTNAME_1] etc. redactPci: true, // card #s, CVV → [CREDITCARDCVV_1] etc. // Formatting numerals: 'auto', // 'true' | 'false' | 'auto' punctuate: true, capitalize: true, // Async webhook delivery webhookUrl: 'https://example.com/asr-webhook', webhookMethod: 'POST', webhookExtra: 'job_id:abc123', }, }, }); ``` **`ageDetection`** was removed from the API and emits a deprecation warning if set. **`itnNormalize`, `sentenceTimestamps`, `finalizeOnWords`, `maxWords`, `eouTimeoutMs`** are accepted only on the streaming WebSocket — TS will error if you set them on `transcribe()`. Use `smallestai.transcriptionStream(...)` (below) for those. ## Streaming Speech-to-Text (WebSocket) For real-time transcription (TTFT \~64 ms server-side), the SDK exposes a WebSocket session that wraps `wss://api.smallest.ai/waves/v1/pulse/get_text` with the canonical `Authorization: Bearer` flow. WS-only flags like `itnNormalize`, `sentenceTimestamps`, `finalizeOnWords`, `maxWords`, and `eouTimeoutMs` only take effect on this path. ```typescript import { smallestai } from 'smallestai-vercel-provider'; import { readFileSync } from 'fs'; const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, wordTimestamps: true, diarize: true, redactPii: true, redactPci: true, numerals: 'auto', itnNormalize: true, sentenceTimestamps: true, keywords: ['NVIDIA:5', 'Jensen'], }); await stream.connect(); // Stream raw PCM s16le @ 16 kHz mono const pcm = readFileSync('audio.s16le'); for (let i = 0; i < pcm.length; i += 32 * 1024) { stream.sendAudio(pcm.subarray(i, i + 32 * 1024)); } stream.closeStream(); // server flushes, emits is_last: true, then closes let fullTranscript = ''; for await (const msg of stream) { if (!msg.is_final) { console.log('partial:', msg.transcript); } else { console.log('final:', msg.transcript); fullTranscript += (fullTranscript ? ' ' : '') + (msg.transcript || ''); } if (msg.is_last) break; } console.log('full transcript:', fullTranscript); ``` ### One-shot helper for pre-recorded audio ```typescript import { smallestai, SmallestAITranscriptionStream, } from 'smallestai-vercel-provider'; const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, wordTimestamps: true, sentenceTimestamps: true, itnNormalize: true, }); const { transcript } = await SmallestAITranscriptionStream.transcribeOnce( stream, audioBytes, ); ``` ## Voice Cloning The provider exposes the voice-cloning REST endpoints alongside TTS / STT. Defaults to `lightning-v3.1`; the legacy `lightning-v2` model is rejected upstream. ```typescript import { smallestai } from 'smallestai-vercel-provider'; import { readFileSync } from 'fs'; // Create an instant clone const clone = await smallestai.voiceClone.create({ file: readFileSync('my-voice.wav'), fileName: 'my-voice.wav', displayName: 'My voice', description: 'Warm narrator', language: 'en', }); console.log(clone.voiceId); // → "voice_abc123" // List all clones in your org const all = await smallestai.voiceClone.list(); // Use the cloned voice in TTS const { audio } = await generateSpeech({ model: smallestai.speech('lightning-v3.1'), text: 'Hello in my own voice.', voice: clone.voiceId, }); // Delete when done await smallestai.voiceClone.delete(clone.voiceId); ``` ## Patterns & Caveats ### Accumulate `full_transcript` client-side The streaming API accepts `fullTranscript: true`, but the server-side `full_transcript` field is currently returned as an empty string on every frame. Concatenate `is_final: true` frames yourself instead: ```typescript let fullTranscript = ''; for await (const msg of stream) { if (msg.is_final && msg.transcript) { fullTranscript += (fullTranscript ? ' ' : '') + msg.transcript; } if (msg.is_last) break; } ``` The `transcribeOnce()` helper does this for you — use it for the pre-recorded case. ### Browser streaming — three options The default `transcriptionStream()` uses an `Authorization: Bearer` header that native browser `WebSocket` can't set. Three patterns for browser apps, in order of recommendation: #### A. Proxy via your server (recommended for production) Server holds the API key, browser never sees it. The SDK ships a one-line helper that turns the stream into Server-Sent Events: ```typescript // app/api/transcribe-stream/route.ts (Next.js, Node runtime) import { smallestai, createTranscriptionStreamSSEResponse, } from 'smallestai-vercel-provider'; export const runtime = 'nodejs'; export async function POST(req: Request) { const audio = new Uint8Array(await req.arrayBuffer()); const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, wordTimestamps: true, itnNormalize: true, }); await stream.connect(); for (let i = 0; i < audio.length; i += 32 * 1024) { stream.sendAudio(audio.subarray(i, i + 32 * 1024)); } stream.closeStream(); return createTranscriptionStreamSSEResponse(stream, { signal: req.signal }); } ``` The browser parses the SSE response with the matching helper: ```typescript import { parseTranscriptionStreamSSE } from 'smallestai-vercel-provider'; const res = await fetch('/api/transcribe-stream', { method: 'POST', body: audioBytes }); for await (const msg of parseTranscriptionStreamSSE(res)) { if (msg.is_final) console.log(msg.transcript); if (msg.is_last) break; } ``` **Next.js setup, one-time** — only required for server-side `auth: 'header'` (the default, used by the SSE proxy above). Add this to `next.config.{js,mjs,ts}`: ```js /** @type {import('next').NextConfig} */ const nextConfig = { serverExternalPackages: ['smallestai-vercel-provider', 'ws'], }; export default nextConfig; ``` And install the optional native deps so `ws` masks frames at native speed: ```bash npm install bufferutil utf-8-validate ``` Browser-only consumers using `auth: 'query'` (option C) or `signedUrl` (option B) don't need this — the SDK lazy-loads `ws` only when the `Authorization` header path is reached, so browser bundles never pull in `ws` or its Node-only deps. #### B. Browser-native via signed URL (also production-grade) Your server mints a short-lived URL on demand; the browser opens the WebSocket directly with that URL. Same security profile as (A) but with one less hop: ```typescript // Browser code: const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, }, { signedUrl: async () => { const res = await fetch('/api/get-stream-url'); return (await res.json()).url; // wss://api.smallest.ai/... }, }); await stream.connect(); ``` The SDK calls `signedUrl()` on every `connect()` and on every reconnect, so each session uses a fresh URL. Your server endpoint (`/api/get-stream-url`) decides how to scope and time-bound those URLs. #### C. Browser-native with API key in URL (dev / internal apps only) ```typescript const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, }, { apiKey: 'sk_...', auth: 'query', }); ``` The API key appears in the WebSocket URL — visible in browser devtools, history, server access logs, and any error reporting tool that captures URLs. The SDK emits a one-time `console.warn` when this mode is used so it can't be deployed unnoticed. Use only for dev / internal apps with per-user-scoped keys; for end-user production, use option (A) or (B). ### Auto-reconnect on socket drops Long-running sessions drop sockets for prosaic reasons (network blips, load-balancer recycling, idle timeouts). Pass `autoReconnect: true` and the SDK transparently re-opens with the same parameters and emits a synthetic `{ type: 'reconnected', attempt }` frame so consumers can react: ```typescript const stream = smallestai.transcriptionStream('pulse', { language: 'en', encoding: 'linear16', sampleRate: 16000, autoReconnect: true, maxReconnectAttempts: 5, // default 5 reconnectBackoffMs: 500, // exponential, capped at 30s }); for await (const msg of stream) { if (msg.type === 'reconnected') { console.log(`recovered after ${msg.attempt} attempt(s)`); continue; } // ... normal transcript handling } ``` Reconnect only fires on **unexpected** closes — `is_last`, an explicit `closeStream()`, and server-emitted error frames all terminate cleanly without retry. `maxReconnectAttempts` counts **consecutive** failed attempts: the counter resets to zero after every successful reconnect, so a multi-hour stream that survives one blip per hour does not exhaust its retry budget across the whole session. The optional 3rd argument to `transcriptionStream(modelId, options, config)` lets you override per-session connection config — the `autoReconnect` knobs above can also live there if you want them outside the WS-protocol options. The same slot accepts `auth: 'query'`, `signedUrl`, `signedUrlTimeoutMs`, `allowedSignedHosts`, and `suppressInsecureAuthWarning`. ## Microphone capture (browser) Live captions and voice agents need raw mic data on the wire. The SDK ships two browser-side hooks for this: ### `useMicrophoneTranscription` (high-level) The all-in-one: captures the mic, streams chunks to your SSE proxy as a `ReadableStream` request body, exposes live transcript state. ```tsx 'use client'; import { useMicrophoneTranscription } from 'smallestai-vercel-provider/react'; export function LiveCaptions() { const { transcript, partial, isCapturing, isStreaming, chunksDelivered, chunksDropped, start, stop, reset, } = useMicrophoneTranscription({ apiPath: '/api/transcribe-mic-stream' }); return ( <>

{transcript}{partial && {partial}}

{chunksDropped > 0 && ⚠ {chunksDropped} chunks dropped (lagging)} ); } ``` The hook captures via `getUserMedia` + `AudioWorklet`, downsamples to `linear16` @ 16 kHz mono, batches into \~100 ms chunks, and POSTs them as a streaming request body. Drop-oldest backpressure means a slow network never balloons memory. ### `useMicrophonePCM` (low-level) If you want the raw mic stream and your own pipe (custom WS, WebRTC, etc.): ```ts import { useMicrophonePCM } from 'smallestai-vercel-provider/react'; const { start, stop, isCapturing, chunksDropped } = useMicrophonePCM({ sampleRate: 16000, batchMs: 100, maxQueuedChunks: 50, onChunk: (chunk) => myPipe.send(chunk), }); ``` The `AudioWorklet` processor is inlined as a Blob URL — no separate worklet file to host. ## Security notes for browser deployments The SDK enforces these guards on the new browser-native paths so you can't accidentally ship insecure code: | Guard | What it blocks | | -------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------ | | `signedUrl()` results must be `wss:` | TLS-stripping attacks. `ws://localhost` only works when you opt in via `allowedSignedHosts: ['localhost']`. | | `signedUrl()` host must match `baseURL` host | A bug in your signing endpoint can't redirect audio to `attacker.com`. Add additional hosts via `allowedSignedHosts`. | | `signedUrlTimeoutMs` (default 10 s) | A hung signing endpoint fast-fails instead of stalling forever. | | `auth: 'query'` console warning | One-time warning makes URL-based auth visible in dev so it can't deploy unnoticed. Suppress via `suppressInsecureAuthWarning: true` after audit. | What stays your job: * **CSRF-protect your SSE proxy and `signedUrl` mint endpoints.** * **Rate-limit the proxy** per-user — a malicious client can otherwise spam your route to burn API budget. * **Pick short token TTLs** for `signedUrl` (60 s is plenty — it only needs to live long enough for the browser to open the WS). * **Never include user-controlled hosts in `allowedSignedHosts`.** ## Available Voices 80+ voices across multiple languages. Popular voices: | Voice | Gender | Accent | Best For | | --------- | ------ | ------------- | ----------------------- | | `sophia` | Female | American | General use (default) | | `robert` | Male | American | Professional | | `advika` | Female | Indian | Hindi, code-switching | | `vivaan` | Male | Indian | Bilingual English/Hindi | | `camilla` | Female | Mexican/Latin | Spanish | Fetch the full voice list programmatically: ```bash curl -s "https://api.smallest.ai/waves/v1/lightning-v3.1/get_voices" \ -H "Authorization: Bearer $SMALLEST_API_KEY" ``` ## Links Install from npm Source code TTS, batch + streaming STT, voice cloning AI SDK documentation