Unified Speech-to-Text endpoint, Pulse Pro model

The Speech-to-Text API now lives at the unified path /waves/v1/stt/, mirroring the unified TTS shape. The model is selected via the ?model= query parameter. Two models are live today:

  • ?model=pulse: multilingual (17 streaming + 26 pre-recorded languages), HTTP + WebSocket streaming.
  • ?model=pulse-pro: leaderboard-ranked English STT (5.42% ESB avg WER, tied #2 on the public Open ASR Leaderboard). HTTP only.

Pulse Pro on the streaming endpoint (WS /waves/v1/stt/live?model=pulse-pro) is rejected with 400 before WebSocket upgrade because the streaming worker is not yet deployed. Use the HTTP endpoint and pass webhook_url for long files.

Customer pricing (Standard plan):

  • Pulse, streaming (WebSocket): $0.006 / minute
  • Pulse, non-streaming (HTTP): $0.0035 / minute
  • Pulse Pro, non-streaming (HTTP): $0.004 / minute

Standard plan rate limits default to 25 RPM per model and 100 concurrent WebSocket sessions. Enterprise is unlimited and configurable per-customer.

The existing endpoints (POST /waves/v1/pulse/get_text and WS /waves/v1/pulse/get_text) continue to work alongside the new unified path. New integrations are encouraged to use /waves/v1/stt/ since it carries both models behind one path.