Cloud Deployment
This page lists the recommended cloud instance types for self-hosting Pulse and Pulse Pro. L4 is the recommended GPU class for STT across all three clouds; larger GPUs (L40S, A100, H100) deliver higher throughput when needed. The broader hardware requirements page lists every supported GPU class.
Pricing varies by region, commitment term (on-demand, 1-year reserved, 3-year reserved, spot), and quota availability. Confirm rate and quota in your account before sizing a deployment.
AWS
Region availability: L4 (G6), L40S (G6E), and T4 (G4dn) families are available in major regions (us-east-1, us-west-2, eu-west-1, ap-south-1). Check the EC2 instance availability matrix for your target region before deployment.
GCP
Region availability: G2 (L4) family is available in us-central1, us-east4, us-west4, europe-west4, asia-southeast1 among others. Confirm in GCP regions & zones.
Azure
Region availability: NC A100 v4 is available in eastus2, southcentralus, westeurope, southeastasia among others. Confirm in the Azure GPU regions list.
Picking between on-demand, reserved, and spot
- On-demand for proof-of-concept and bursty workloads.
- 1-year or 3-year reserved for steady-state production. Materially reduces hourly rate (commonly 30–60% vs on-demand).
- Spot / preemptible for batch / overnight transcription. Cheapest option, but the worker can be preempted; build your queue to tolerate restarts.
Container image
Once you have the GPU host, follow the Quick Start to pull the Pulse / Pulse Pro container image and run the worker. The image runs identically across AWS, GCP, and Azure; the only difference is the GPU driver version on the host.
L4 is the recommended GPU class for self-hosting Pulse and Pulse Pro across all three clouds. The Azure recommendation falls back to T4 because L4 is not yet GA on Azure. If your workload is latency-sensitive or you have committed capacity for a different class, contact support@smallest.ai for sizing guidance.
Next steps
- Hardware requirements: minimum and recommended GPU specs.
- Parallelism and latency: RTFx, RPS, and latency by mode.
- Quick Start: deploy the STT worker.

