OpenClaw
Add voice capabilities to your OpenClaw agent. Generate speech with sub-100ms latency and transcribe audio with the Smallest AI skill.
Installation
Setup
Set your API key:
Get a free key at waves.smallest.ai.
Restart the gateway:
Usage
The skill triggers automatically when you ask your agent to generate speech or transcribe audio. Just talk naturally:
Text-to-Speech:
- “Say good morning in a male voice”
- “Read this aloud: The meeting is at 3pm”
- “Generate a voice note saying hello in Hindi”
Speech-to-Text:
- “Transcribe this audio file”
- “What did they say in this recording?”
Multilingual:
- “Say ‘namaste, kaise hain aap’ in advika’s voice”
- “Say ‘hola buenos dias’ using camilla”
Voices
The skill auto-selects voices based on your request:
80+ more voices available. The agent picks the right voice based on language and gender preference.
Features
- Sub-100ms text-to-speech via Lightning v3.1
- 64ms speech-to-text via Pulse
- Supports WAV, MP3, OGG, FLAC, M4A, and WebM audio formats (STT)
- 30+ languages with automatic language detection
- Speaker diarization and emotion detection (STT)
- Hindi-English code-switching
- Voice cloning — clone any voice with just 5 seconds of audio (Basic plan+)

