Why Self-Host? | Smallest AI Docs

Overview

Using Smallest as a managed service has many benefits: it’s fast to start developing with, requires no infrastructure setup, and eliminates all hardware, installation, configuration, backup, and maintenance-related costs. However, there are situations where a self-hosted deployment makes more sense.

Performance Requirements

Certain use cases have very sensitive latency and load requirements. If you need ultra-low latency with voice AI services colocated with your other services, self-hosting can meet these requirements.

Ideal Use Cases

Real-time AI voicebots requiring sub-100ms response times
Live transcription systems for broadcasts or conferences
High-volume processing with predictable costs
Edge deployments with limited internet connectivity

Key Benefits

Colocate speech services with your application infrastructure
Scale independently based on your specific workload patterns
Zero network latency to external APIs
Consistent performance regardless of internet conditions

Zero Network Latency

When you self-host, your speech services run within your own infrastructure—whether that’s the same data center, VPC, or even the same machine as your application. This eliminates the round-trip time to external APIs entirely.

Scenario	Network Latency
Self-hosted	1-5ms
Same region	20-50ms
Cross-region	100-200ms
Edge/on-premises	200-500ms+

For real-time voice applications like AI agents, every millisecond matters. Self-hosting keeps your latency predictable and minimal, regardless of where your users are located or the state of the public internet.

Security & Data Privacy

One of the most common use cases for self-hosting Smallest is to satisfy security or data privacy requirements. In a typical self-hosted deployment, no audio, transcripts, or other identifying markers of the request content are sent to Smallest servers.

Ideal For

Healthcare applications requiring HIPAA compliance
Financial services with strict data governance
Government and defense applications
Enterprise environments with air-gapped networks

Data Privacy Guarantees

Your audio data never leaves your infrastructure
Transcripts remain entirely within your control
No data stored beyond the duration of the API request
Self-hosted deployments do not persist request/response data

What Data is Reported?

In a typical self-hosted deployment, no audio or transcript data is sent to Smallest servers. Only usage metadata is reported to the license server for validation and billing purposes.

Metadata reported:

Audio duration and character count
Features requested (diarization, timestamps, etc.)
Success/error response codes

Never reported:

Audio content
Transcripts or synthesis output
Personally identifiable information

Cost Optimization

For high-volume or predictable workloads, self-hosting can be more cost-effective than per-request API pricing.

Benefit	Description
Predictable costs	Infrastructure-based pricing, not usage-based
Efficient utilization	Predictable autoscaling maximizes resource efficiency
Long-term savings	Significant cost reduction for sustained high volumes

Reliability & Grace Periods

Self-hosted deployments include built-in resilience against unforeseen network errors and temporary outages. The deployment won’t suddenly stop working due to a transient network issue or external service disruption.

This means:

Continuous operation during network interruptions or license server maintenance
Protection against unforeseen errors — your services keep running while issues are resolved
Time to recover — grace periods provide a buffer to restore connectivity without impacting your users

The License Proxy supports grace periods that allow your deployment to continue operating even if connectivity to the Smallest license server is temporarily lost.

Customization & Control

Self-hosting provides complete control over your deployment:

Resource Allocation

Optimize compute resources for your specific workload patterns. Allocate more GPU power during peak hours and scale down during off-peak times.

Version Control

Upgrade on your schedule. Test new versions in staging before production rollout. Roll back instantly if needed.

Network Isolation

Deploy in private networks, VPCs, or air-gapped environments. Full control over ingress and egress traffic.

Integration Flexibility

Direct integration with your monitoring, logging, and alerting infrastructure. Custom Prometheus metrics, Grafana dashboards, and alerting rules.

When to Use Managed Service Instead

Self-hosting isn’t always the right choice. Consider the managed Smallest API if:

You’re building a prototype or MVP
Your audio processing volume is low or unpredictable
You don’t have DevOps resources to manage infrastructure
You need to get started quickly without infrastructure setup

Ready to Self-Host?

Get Started

Return to the introduction for deployment options

Docker Quick Start

Deploy in 15 minutes with Docker