OpenClaw

View as Markdown

Add voice capabilities to your OpenClaw agent. Generate speech with sub-100ms latency and transcribe audio with the Smallest AI skill.

Installation

$# Via ClawHub (recommended)
$clawhub install smallest-ai
$
$# Or manually
$git clone https://github.com/smallest-inc/smallest-ai-openclaw.git
$cp -r smallest-ai-openclaw ~/.openclaw/skills/smallest-ai

Setup

Set your API key:

$export SMALLEST_API_KEY="your_key_here"

Get a free key at waves.smallest.ai.

Restart the gateway:

$openclaw gateway stop && openclaw gateway start

Usage

The skill triggers automatically when you ask your agent to generate speech or transcribe audio. Just talk naturally:

Text-to-Speech:

  • “Say good morning in a male voice”
  • “Read this aloud: The meeting is at 3pm”
  • “Generate a voice note saying hello in Hindi”

Speech-to-Text:

  • “Transcribe this audio file”
  • “What did they say in this recording?”

Multilingual:

  • “Say ‘namaste, kaise hain aap’ in advika’s voice”
  • “Say ‘hola buenos dias’ using camilla”

Voices

The skill auto-selects voices based on your request:

VoiceGenderAccentBest For
sophiaFemaleAmericanGeneral use (default)
robertMaleAmericanProfessional (default male)
advikaFemaleIndianHindi, code-switching
vivaanMaleIndianBilingual English/Hindi
camillaFemaleMexican/LatinSpanish
ellaFemaleAmericanConversational
miaFemaleAmericanStorytelling
arjunMaleIndianEnglish/Hindi bilingual
vanessaFemaleAmericanExpressive, warm

80+ more voices available. The agent picks the right voice based on language and gender preference.

Features

  • Sub-100ms text-to-speech via Lightning v3.1
  • 64ms speech-to-text via Pulse
  • Supports WAV, MP3, OGG, FLAC, M4A, and WebM audio formats (STT)
  • 30+ languages with automatic language detection
  • Speaker diarization and emotion detection (STT)
  • Hindi-English code-switching
  • Voice cloning — clone any voice with just 5 seconds of audio (Basic plan+)