Vercel AI SDK | Smallest AI Docs

Use Smallest AI as a speech and transcription provider in the Vercel AI SDK. Generate speech and transcribe audio with a few lines of code. The package also exposes streaming WebSocket transcription and voice cloning APIs that sit alongside the Vercel SpeechModelV2 / TranscriptionModelV2 interfaces.

Latest: smallestai-vercel-provider@0.6.2 — adds browser-native streaming (no proxy required), microphone capture hooks, auto-reconnect on socket drops (with counter reset across the session, so multi-hour streams survive sporadic blips), and a security-validated signedUrl flow for production browser apps. The SDK lazy-loads its ws dependency so browser-only consumers (using auth: 'query' or signedUrl) ship a smaller bundle and don’t need the bufferutil / serverExternalPackages setup that older versions required.

Installation

$ npm install smallestai-vercel-provider ai

Setup

Get your API key from waves.smallest.ai and set it as an environment variable:

$ export SMALLEST_API_KEY="your_key_here"

Text-to-Speech

Supported model: lightning-v3.1 — 44.1 kHz, natural expressive speech, voice cloning, ~100 ms latency, 22 languages plus auto for code-switching detection (see the Lightning v3.1 model card for the full list). The package also exports DEFAULT_LIGHTNING_MODEL so you don’t have to hard-code the id; bumping it on a new Lightning release flows through to every caller that imports the constant.

1 import { experimental_generateSpeech as generateSpeech } from 'ai';
2 import {
3   smallestai,
4   DEFAULT_LIGHTNING_MODEL,
5 } from 'smallestai-vercel-provider';
6 
7 const { audio } = await generateSpeech({
8   model: smallestai.speech(DEFAULT_LIGHTNING_MODEL),
9   text: 'Hello from Smallest AI!',
10   voice: 'sophia',
11   language: 'auto',   // 'en', 'hi', 'es', ... — defaults to 'auto'
12   speed: 1.0,
13 });
14 
15 // audio.uint8Array — raw WAV bytes
16 // audio.base64    — base64-encoded audio

Pass outputFormat under providerOptions.smallestai.outputFormat, not as Vercel’s top-level outputFormat arg — the SDK rejects the top-level form with a warning. See TTS Options below.

Speech-to-Text (batch)

1 import { experimental_transcribe as transcribe } from 'ai';
2 import { smallestai } from 'smallestai-vercel-provider';
3 import { readFileSync } from 'fs';
4 
5 const { text, segments, durationInSeconds } = await transcribe({
6   model: smallestai.transcription('pulse'),
7   audio: readFileSync('recording.wav'),
8   mediaType: 'audio/wav',
9 });
10 
11 console.log(text);

Next.js API Route Example

Create a TTS endpoint in your Next.js app:

1 // app/api/speak/route.ts
2 import { experimental_generateSpeech as generateSpeech } from 'ai';
3 import { smallestai } from 'smallestai-vercel-provider';
4 
5 export const runtime = 'nodejs';
6 
7 export async function POST(req: Request) {
8   const { text, voice } = await req.json();
9 
10   const { audio } = await generateSpeech({
11     model: smallestai.speech('lightning-v3.1'),
12     text,
13     voice: voice || 'sophia',
14   });
15 
16   return new Response(Buffer.from(audio.uint8Array), {
17     headers: { 'Content-Type': 'audio/wav' },
18   });
19 }

Play it in the browser:

1 const res = await fetch('/api/speak', {
2   method: 'POST',
3   body: JSON.stringify({ text: 'Hello!', voice: 'sophia' }),
4 });
5 const blob = await res.blob();
6 new Audio(URL.createObjectURL(blob)).play();

Provider Options

TTS Options

1 const { audio } = await generateSpeech({
2   model: smallestai.speech('lightning-v3.1'),
3   text: 'Hello!',
4   voice: 'robert',
5   providerOptions: {
6     smallestai: {
7       sampleRate: 44100,        // 8000 | 16000 | 24000 | 44100
8       outputFormat: 'mp3',      // pcm | mp3 | wav | ulaw | alaw
9       similarity: 0.5,          // voice similarity (0–1)
10       enhancement: 1,           // audio enhancement (0 | 1 | 2)
11       addWavHeader: false,
12       saveHistory: false,
13       pronunciationDicts: ['<dict-id>'],
14     },
15   },
16 });

Batch STT Options

1 const result = await transcribe({
2   model: smallestai.transcription('pulse'),
3   audio: audioBuffer,
4   mediaType: 'audio/wav',
5   providerOptions: {
6     smallestai: {
7       language: 'multi',         // 'en' | 'hi' | 'multi' | 'multi-eu' | 'multi-asian' | 'multi-indic' — see Pulse model card
8       diarize: true,
9       emotionDetection: true,
10       genderDetection: true,
11       wordTimestamps: true,
12 
13       // Privacy
14       redactPii: true,           // names, addresses → [FIRSTNAME_1] etc.
15       redactPci: true,           // card #s, CVV → [CREDITCARDCVV_1] etc.
16 
17       // Formatting
18       numerals: 'auto',          // 'true' | 'false' | 'auto'
19       punctuate: true,
20       capitalize: true,
21 
22       // Async webhook delivery
23       webhookUrl: 'https://example.com/asr-webhook',
24       webhookMethod: 'POST',
25       webhookExtra: 'job_id:abc123',
26     },
27   },
28 });

ageDetection was removed from the API and emits a deprecation warning if set.

itnNormalize, sentenceTimestamps, finalizeOnWords, maxWords, eouTimeoutMs are accepted only on the streaming WebSocket — TS will error if you set them on transcribe(). Use smallestai.transcriptionStream(...) (below) for those.

Streaming Speech-to-Text (WebSocket)

For real-time transcription (TTFT ~64 ms server-side), the SDK exposes a WebSocket session that wraps wss://api.smallest.ai/waves/v1/pulse/get_text with the canonical Authorization: Bearer flow. WS-only flags like itnNormalize, sentenceTimestamps, finalizeOnWords, maxWords, and eouTimeoutMs only take effect on this path.

1 import { smallestai } from 'smallestai-vercel-provider';
2 import { readFileSync } from 'fs';
3 
4 const stream = smallestai.transcriptionStream('pulse', {
5   language: 'en',
6   encoding: 'linear16',
7   sampleRate: 16000,
8   wordTimestamps: true,
9   diarize: true,
10   redactPii: true,
11   redactPci: true,
12   numerals: 'auto',
13   itnNormalize: true,
14   sentenceTimestamps: true,
15   keywords: ['NVIDIA:5', 'Jensen'],
16 });
17 
18 await stream.connect();
19 
20 // Stream raw PCM s16le @ 16 kHz mono
21 const pcm = readFileSync('audio.s16le');
22 for (let i = 0; i < pcm.length; i += 32 * 1024) {
23   stream.sendAudio(pcm.subarray(i, i + 32 * 1024));
24 }
25 stream.closeStream(); // server flushes, emits is_last: true, then closes
26 
27 let fullTranscript = '';
28 for await (const msg of stream) {
29   if (!msg.is_final) {
30     console.log('partial:', msg.transcript);
31   } else {
32     console.log('final:', msg.transcript);
33     fullTranscript += (fullTranscript ? ' ' : '') + (msg.transcript || '');
34   }
35   if (msg.is_last) break;
36 }
37 console.log('full transcript:', fullTranscript);

One-shot helper for pre-recorded audio

1 import {
2   smallestai,
3   SmallestAITranscriptionStream,
4 } from 'smallestai-vercel-provider';
5 
6 const stream = smallestai.transcriptionStream('pulse', {
7   language: 'en', encoding: 'linear16', sampleRate: 16000,
8   wordTimestamps: true, sentenceTimestamps: true, itnNormalize: true,
9 });
10 
11 const { transcript } = await SmallestAITranscriptionStream.transcribeOnce(
12   stream,
13   audioBytes,
14 );

Voice Cloning

The provider exposes the voice-cloning REST endpoints alongside TTS / STT. Defaults to lightning-v3.1; the legacy lightning-v2 model is rejected upstream.

1 import { smallestai } from 'smallestai-vercel-provider';
2 import { readFileSync } from 'fs';
3 
4 // Create an instant clone
5 const clone = await smallestai.voiceClone.create({
6   file: readFileSync('my-voice.wav'),
7   fileName: 'my-voice.wav',
8   displayName: 'My voice',
9   description: 'Warm narrator',
10   language: 'en',
11 });
12 console.log(clone.voiceId); // → "voice_abc123"
13 
14 // List all clones in your org
15 const all = await smallestai.voiceClone.list();
16 
17 // Use the cloned voice in TTS
18 const { audio } = await generateSpeech({
19   model: smallestai.speech('lightning-v3.1'),
20   text: 'Hello in my own voice.',
21   voice: clone.voiceId,
22 });
23 
24 // Delete when done
25 await smallestai.voiceClone.delete(clone.voiceId);

Patterns & Caveats

Accumulate `full_transcript` client-side

The streaming API accepts fullTranscript: true, but the server-side full_transcript field is currently returned as an empty string on every frame. Concatenate is_final: true frames yourself instead:

1 let fullTranscript = '';
2 for await (const msg of stream) {
3   if (msg.is_final && msg.transcript) {
4     fullTranscript += (fullTranscript ? ' ' : '') + msg.transcript;
5   }
6   if (msg.is_last) break;
7 }

The transcribeOnce() helper does this for you — use it for the pre-recorded case.

Browser streaming — three options

The default transcriptionStream() uses an Authorization: Bearer header that native browser WebSocket can’t set. Three patterns for browser apps, in order of recommendation:

A. Proxy via your server (recommended for production)

Server holds the API key, browser never sees it. The SDK ships a one-line helper that turns the stream into Server-Sent Events:

1 // app/api/transcribe-stream/route.ts (Next.js, Node runtime)
2 import {
3   smallestai,
4   createTranscriptionStreamSSEResponse,
5 } from 'smallestai-vercel-provider';
6 
7 export const runtime = 'nodejs';
8 
9 export async function POST(req: Request) {
10   const audio = new Uint8Array(await req.arrayBuffer());
11   const stream = smallestai.transcriptionStream('pulse', {
12     language: 'en',
13     encoding: 'linear16',
14     sampleRate: 16000,
15     wordTimestamps: true,
16     itnNormalize: true,
17   });
18   await stream.connect();
19   for (let i = 0; i < audio.length; i += 32 * 1024) {
20     stream.sendAudio(audio.subarray(i, i + 32 * 1024));
21   }
22   stream.closeStream();
23   return createTranscriptionStreamSSEResponse(stream, { signal: req.signal });
24 }

The browser parses the SSE response with the matching helper:

1 import { parseTranscriptionStreamSSE } from 'smallestai-vercel-provider';
2 
3 const res = await fetch('/api/transcribe-stream', { method: 'POST', body: audioBytes });
4 for await (const msg of parseTranscriptionStreamSSE(res)) {
5   if (msg.is_final) console.log(msg.transcript);
6   if (msg.is_last) break;
7 }

Next.js setup, one-time — only required for server-side auth: 'header' (the default, used by the SSE proxy above). Add this to next.config.{js,mjs,ts}:

1 /** @type {import('next').NextConfig} */
2 const nextConfig = {
3   serverExternalPackages: ['smallestai-vercel-provider', 'ws'],
4 };
5 export default nextConfig;

And install the optional native deps so ws masks frames at native speed:

$ npm install bufferutil utf-8-validate

Browser-only consumers using auth: 'query' (option C) or signedUrl (option B) don’t need this — the SDK lazy-loads ws only when the Authorization header path is reached, so browser bundles never pull in ws or its Node-only deps.

B. Browser-native via signed URL (also production-grade)

Your server mints a short-lived URL on demand; the browser opens the WebSocket directly with that URL. Same security profile as (A) but with one less hop:

1 // Browser code:
2 const stream = smallestai.transcriptionStream('pulse', {
3   language: 'en',
4   encoding: 'linear16',
5   sampleRate: 16000,
6 }, {
7   signedUrl: async () => {
8     const res = await fetch('/api/get-stream-url');
9     return (await res.json()).url; // wss://api.smallest.ai/...
10   },
11 });
12 await stream.connect();

The SDK calls signedUrl() on every connect() and on every reconnect, so each session uses a fresh URL. Your server endpoint (/api/get-stream-url) decides how to scope and time-bound those URLs.

C. Browser-native with API key in URL (dev / internal apps only)

1 const stream = smallestai.transcriptionStream('pulse', {
2   language: 'en', encoding: 'linear16', sampleRate: 16000,
3 }, {
4   apiKey: 'sk_...',
5   auth: 'query',
6 });

The API key appears in the WebSocket URL — visible in browser devtools, history, server access logs, and any error reporting tool that captures URLs. The SDK emits a one-time console.warn when this mode is used so it can’t be deployed unnoticed. Use only for dev / internal apps with per-user-scoped keys; for end-user production, use option (A) or (B).

Auto-reconnect on socket drops

Long-running sessions drop sockets for prosaic reasons (network blips, load-balancer recycling, idle timeouts). Pass autoReconnect: true and the SDK transparently re-opens with the same parameters and emits a synthetic { type: 'reconnected', attempt } frame so consumers can react:

1 const stream = smallestai.transcriptionStream('pulse', {
2   language: 'en',
3   encoding: 'linear16',
4   sampleRate: 16000,
5   autoReconnect: true,
6   maxReconnectAttempts: 5,    // default 5
7   reconnectBackoffMs: 500,    // exponential, capped at 30s
8 });
9 
10 for await (const msg of stream) {
11   if (msg.type === 'reconnected') {
12     console.log(`recovered after ${msg.attempt} attempt(s)`);
13     continue;
14   }
15   // ... normal transcript handling
16 }

Reconnect only fires on unexpected closes — is_last, an explicit closeStream(), and server-emitted error frames all terminate cleanly without retry.

maxReconnectAttempts counts consecutive failed attempts: the counter resets to zero after every successful reconnect, so a multi-hour stream that survives one blip per hour does not exhaust its retry budget across the whole session.

The optional 3rd argument to transcriptionStream(modelId, options, config) lets you override per-session connection config — the autoReconnect knobs above can also live there if you want them outside the WS-protocol options. The same slot accepts auth: 'query', signedUrl, signedUrlTimeoutMs, allowedSignedHosts, and suppressInsecureAuthWarning.

Microphone capture (browser)

Live captions and voice agents need raw mic data on the wire. The SDK ships two browser-side hooks for this:

`useMicrophoneTranscription` (high-level)

The all-in-one: captures the mic, streams chunks to your SSE proxy as a ReadableStream request body, exposes live transcript state.

1 'use client';
2 import { useMicrophoneTranscription } from 'smallestai-vercel-provider/react';
3 
4 export function LiveCaptions() {
5   const {
6     transcript, partial,
7     isCapturing, isStreaming,
8     chunksDelivered, chunksDropped,
9     start, stop, reset,
10   } = useMicrophoneTranscription({ apiPath: '/api/transcribe-mic-stream' });
11 
12   return (
13     <>
14       <button onClick={isCapturing ? stop : () => start()}>
15         {isCapturing ? 'Stop' : 'Start'}
16       </button>
17       <p>{transcript}{partial && <em> {partial}</em>}</p>
18       {chunksDropped > 0 && <small>⚠ {chunksDropped} chunks dropped (lagging)</small>}
19     </>
20   );
21 }

The hook captures via getUserMedia + AudioWorklet, downsamples to linear16 @ 16 kHz mono, batches into ~100 ms chunks, and POSTs them as a streaming request body. Drop-oldest backpressure means a slow network never balloons memory.

`useMicrophonePCM` (low-level)

If you want the raw mic stream and your own pipe (custom WS, WebRTC, etc.):

1 import { useMicrophonePCM } from 'smallestai-vercel-provider/react';
2 
3 const { start, stop, isCapturing, chunksDropped } = useMicrophonePCM({
4   sampleRate: 16000,
5   batchMs: 100,
6   maxQueuedChunks: 50,
7   onChunk: (chunk) => myPipe.send(chunk),
8 });

The AudioWorklet processor is inlined as a Blob URL — no separate worklet file to host.

Security notes for browser deployments

The SDK enforces these guards on the new browser-native paths so you can’t accidentally ship insecure code:

Guard	What it blocks
`signedUrl()` results must be `wss:`	TLS-stripping attacks. `ws://localhost` only works when you opt in via `allowedSignedHosts: ['localhost']`.
`signedUrl()` host must match `baseURL` host	A bug in your signing endpoint can’t redirect audio to `attacker.com`. Add additional hosts via `allowedSignedHosts`.
`signedUrlTimeoutMs` (default 10 s)	A hung signing endpoint fast-fails instead of stalling forever.
`auth: 'query'` console warning	One-time warning makes URL-based auth visible in dev so it can’t deploy unnoticed. Suppress via `suppressInsecureAuthWarning: true` after audit.

What stays your job:

CSRF-protect your SSE proxy and signedUrl mint endpoints.
Rate-limit the proxy per-user — a malicious client can otherwise spam your route to burn API budget.
Pick short token TTLs for signedUrl (60 s is plenty — it only needs to live long enough for the browser to open the WS).
Never include user-controlled hosts in allowedSignedHosts.

Available Voices

80+ voices across multiple languages. Popular voices:

Voice	Gender	Accent	Best For
`sophia`	Female	American	General use (default)
`robert`	Male	American	Professional
`advika`	Female	Indian	Hindi, code-switching
`vivaan`	Male	Indian	Bilingual English/Hindi
`camilla`	Female	Mexican/Latin	Spanish

Fetch the full voice list programmatically:

$ curl -s "https://api.smallest.ai/waves/v1/lightning-v3.1/get_voices" \
>   -H "Authorization: Bearer $SMALLEST_API_KEY"

Links

npm Package

Install from npm

GitHub

Source code

Runnable Examples

TTS, batch + streaming STT, voice cloning

Vercel AI SDK

AI SDK documentation