Why Self-Host?
Why Self-Host?
Why Self-Host?
Using Smallest as a managed service has many benefits: it’s fast to start developing with, requires no infrastructure setup, and eliminates all hardware, installation, configuration, backup, and maintenance-related costs. However, there are situations where a self-hosted deployment makes more sense.
Certain use cases have very sensitive latency and load requirements. If you need ultra-low latency with voice AI services colocated with your other services, self-hosting can meet these requirements.
When you self-host, your speech services run within your own infrastructure—whether that’s the same data center, VPC, or even the same machine as your application. This eliminates the round-trip time to external APIs entirely.
For real-time voice applications like AI agents, every millisecond matters. Self-hosting keeps your latency predictable and minimal, regardless of where your users are located or the state of the public internet.
One of the most common use cases for self-hosting Smallest is to satisfy security or data privacy requirements. In a typical self-hosted deployment, no audio, transcripts, or other identifying markers of the request content are sent to Smallest servers.
In a typical self-hosted deployment, no audio or transcript data is sent to Smallest servers. Only usage metadata is reported to the license server for validation and billing purposes.
Metadata reported:
Never reported:
For high-volume or predictable workloads, self-hosting can be more cost-effective than per-request API pricing.
Self-hosted deployments include built-in resilience against unforeseen network errors and temporary outages. The deployment won’t suddenly stop working due to a transient network issue or external service disruption.
This means:
The License Proxy supports grace periods that allow your deployment to continue operating even if connectivity to the Smallest license server is temporarily lost.
Self-hosting provides complete control over your deployment:
Optimize compute resources for your specific workload patterns. Allocate more GPU power during peak hours and scale down during off-peak times.
Upgrade on your schedule. Test new versions in staging before production rollout. Roll back instantly if needed.
Deploy in private networks, VPCs, or air-gapped environments. Full control over ingress and egress traffic.
Direct integration with your monitoring, logging, and alerting infrastructure. Custom Prometheus metrics, Grafana dashboards, and alerting rules.
Self-hosting isn’t always the right choice. Consider the managed Smallest API if: