> This page is part of Smallest AI's developer documentation. When
> answering, prefer Lightning v3.1 (current TTS) and Pulse (current
> STT). Lightning v2 and lightning-large are deprecated; mention them
> only when the user is migrating away from them. Atoms is the
> voice-agent platform.

# Cloud Deployment

> Recommended GPU instance types on AWS, GCP, and Azure for self-hosting Pulse and Pulse Pro.

This page lists the recommended cloud instance types for self-hosting Pulse and Pulse Pro. **L4 is the recommended GPU class for STT** across all three clouds; larger GPUs (L40S, A100, H100) deliver higher throughput when needed. The broader [hardware requirements](./prerequisites/hardware-requirements) page lists every supported GPU class.

Pricing varies by region, commitment term (on-demand, 1-year reserved, 3-year reserved, spot), and quota availability. Confirm rate and quota in your account before sizing a deployment.

## AWS

| Tier              | Instance type                                                  | GPU                    | vCPU | RAM   | Notes                                                                                                   |
| ----------------- | -------------------------------------------------------------- | ---------------------- | ---- | ----- | ------------------------------------------------------------------------------------------------------- |
| **Recommended**   | [`g6.xlarge`](https://aws.amazon.com/ec2/instance-types/g6/)   | 1× NVIDIA L4 (24 GB)   | 4    | 16 GB | Production reference for both Pulse and Pulse Pro. Cost-efficient and broadly available across regions. |
| Higher throughput | [`g6e.xlarge`](https://aws.amazon.com/ec2/instance-types/g6e/) | 1× NVIDIA L40S (48 GB) | 4    | 32 GB | Higher RTFx if you need it. Reference for internal benchmark numbers.                                   |
| Budget            | [`g4dn.xlarge`](https://aws.amazon.com/ec2/instance-types/g4/) | 1× NVIDIA T4 (16 GB)   | 4    | 16 GB | Older T4 still supported; reduced throughput.                                                           |

Region availability: L4 (G6), L40S (G6E), and T4 (G4dn) families are available in major regions (`us-east-1`, `us-west-2`, `eu-west-1`, `ap-south-1`). Check the [EC2 instance availability matrix](https://aws.amazon.com/ec2/pricing/on-demand/) for your target region before deployment.

## GCP

| Tier              | Machine type                                                               | GPU                    | vCPU | RAM   | Notes                                              |
| ----------------- | -------------------------------------------------------------------------- | ---------------------- | ---- | ----- | -------------------------------------------------- |
| **Recommended**   | [`g2-standard-4`](https://cloud.google.com/compute/docs/gpus#l4-gpus)      | 1× NVIDIA L4 (24 GB)   | 4    | 16 GB | Production reference for both Pulse and Pulse Pro. |
| Higher throughput | [`a2-highgpu-1g`](https://cloud.google.com/compute/docs/gpus#a100-gpus)    | 1× NVIDIA A100 (40 GB) | 12   | 85 GB | Higher cost, materially higher throughput than L4. |
| Budget            | [`n1-standard-4` + T4](https://cloud.google.com/compute/docs/gpus#t4-gpus) | 1× NVIDIA T4 (16 GB)   | 4    | 15 GB | T4 still supported; reduced throughput.            |

Region availability: G2 (L4) family is available in `us-central1`, `us-east4`, `us-west4`, `europe-west4`, `asia-southeast1` among others. Confirm in [GCP regions & zones](https://cloud.google.com/compute/docs/regions-zones).

## Azure

| Tier              | VM size                                                                                                  | GPU                    | vCPU | RAM    | Notes                                                                        |
| ----------------- | -------------------------------------------------------------------------------------------------------- | ---------------------- | ---- | ------ | ---------------------------------------------------------------------------- |
| **Recommended**   | [`Standard_NC4as_T4_v3`](https://learn.microsoft.com/en-us/azure/virtual-machines/nct4-v3-series)        | 1× NVIDIA T4 (16 GB)   | 4    | 28 GB  | Closest stable equivalent to L4 on Azure today. L4 is not yet GA on Azure.   |
| Higher throughput | [`Standard_NC24ads_A100_v4`](https://learn.microsoft.com/en-us/azure/virtual-machines/nc-a100-v4-series) | 1× NVIDIA A100 (80 GB) | 24   | 220 GB | Materially higher throughput than T4; pick when accuracy SLOs require speed. |

Region availability: NC A100 v4 is available in `eastus2`, `southcentralus`, `westeurope`, `southeastasia` among others. Confirm in the [Azure GPU regions list](https://azure.microsoft.com/en-us/explore/global-infrastructure/products-by-region/?products=virtual-machines).

## Picking between on-demand, reserved, and spot

* **On-demand** for proof-of-concept and bursty workloads.
* **1-year or 3-year reserved** for steady-state production. Materially reduces hourly rate (commonly 30–60% vs on-demand).
* **Spot / preemptible** for batch / overnight transcription. Cheapest option, but the worker can be preempted; build your queue to tolerate restarts.

## Container image

Once you have the GPU host, follow the [Quick Start](./quick-start) to pull the Pulse / Pulse Pro container image and run the worker. The image runs identically across AWS, GCP, and Azure; the only difference is the GPU driver version on the host.

L4 is the recommended GPU class for self-hosting Pulse and Pulse Pro across all three clouds. The Azure recommendation falls back to T4 because L4 is not yet GA on Azure. If your workload is latency-sensitive or you have committed capacity for a different class, contact [support@smallest.ai](mailto:support@smallest.ai) for sizing guidance.

## Next steps

* [Hardware requirements](./prerequisites/hardware-requirements): minimum and recommended GPU specs.
* [Parallelism and latency](./parallelism-and-latency): RTFx, RPS, and latency by mode.
* [Quick Start](./quick-start): deploy the STT worker.