Cloud Deployment | Smallest AI Docs

This page lists the recommended cloud instance types for self-hosting Pulse and Pulse Pro. L4 is the recommended GPU class for STT across all three clouds; larger GPUs (L40S, A100, H100) deliver higher throughput when needed. The broader hardware requirements page lists every supported GPU class.

Pricing varies by region, commitment term (on-demand, 1-year reserved, 3-year reserved, spot), and quota availability. Confirm rate and quota in your account before sizing a deployment.

AWS

Tier	Instance type	GPU	vCPU	RAM	Notes
Recommended	`g6.xlarge`	1× NVIDIA L4 (24 GB)	4	16 GB	Production reference for both Pulse and Pulse Pro. Cost-efficient and broadly available across regions.
Higher throughput	`g6e.xlarge`	1× NVIDIA L40S (48 GB)	4	32 GB	Higher RTFx if you need it. Reference for internal benchmark numbers.
Budget	`g4dn.xlarge`	1× NVIDIA T4 (16 GB)	4	16 GB	Older T4 still supported; reduced throughput.

Region availability: L4 (G6), L40S (G6E), and T4 (G4dn) families are available in major regions (us-east-1, us-west-2, eu-west-1, ap-south-1). Check the EC2 instance availability matrix for your target region before deployment.

GCP

Tier	Machine type	GPU	vCPU	RAM	Notes
Recommended	`g2-standard-4`	1× NVIDIA L4 (24 GB)	4	16 GB	Production reference for both Pulse and Pulse Pro.
Higher throughput	`a2-highgpu-1g`	1× NVIDIA A100 (40 GB)	12	85 GB	Higher cost, materially higher throughput than L4.
Budget	`n1-standard-4` + T4	1× NVIDIA T4 (16 GB)	4	15 GB	T4 still supported; reduced throughput.

Region availability: G2 (L4) family is available in us-central1, us-east4, us-west4, europe-west4, asia-southeast1 among others. Confirm in GCP regions & zones.

Azure

Tier	VM size	GPU	vCPU	RAM	Notes
Recommended	`Standard_NC4as_T4_v3`	1× NVIDIA T4 (16 GB)	4	28 GB	Closest stable equivalent to L4 on Azure today. L4 is not yet GA on Azure.
Higher throughput	`Standard_NC24ads_A100_v4`	1× NVIDIA A100 (80 GB)	24	220 GB	Materially higher throughput than T4; pick when accuracy SLOs require speed.

Region availability: NC A100 v4 is available in eastus2, southcentralus, westeurope, southeastasia among others. Confirm in the Azure GPU regions list.

Picking between on-demand, reserved, and spot

On-demand for proof-of-concept and bursty workloads.
1-year or 3-year reserved for steady-state production. Materially reduces hourly rate (commonly 30–60% vs on-demand).
Spot / preemptible for batch / overnight transcription. Cheapest option, but the worker can be preempted; build your queue to tolerate restarts.

Container image

Once you have the GPU host, follow the Quick Start to pull the Pulse / Pulse Pro container image and run the worker. The image runs identically across AWS, GCP, and Azure; the only difference is the GPU driver version on the host.

L4 is the recommended GPU class for self-hosting Pulse and Pulse Pro across all three clouds. The Azure recommendation falls back to T4 because L4 is not yet GA on Azure. If your workload is latency-sensitive or you have committed capacity for a different class, contact support@smallest.ai for sizing guidance.

Next steps

Hardware requirements: minimum and recommended GPU specs.
Parallelism and latency: RTFx, RPS, and latency by mode.
Quick Start: deploy the STT worker.