***

title: Why Self-Host?
description: Understand when self-hosting our models makes sense for your organization
---------------------

For clean Markdown of any page, append .md to the page URL. For a complete documentation index, see https://docs.smallest.ai/waves/v-4-0-0/self-host/getting-started/llms.txt. For full documentation content, see https://docs.smallest.ai/waves/v-4-0-0/self-host/getting-started/llms-full.txt.

## Overview

Using Smallest as a managed service has many benefits: it's fast to start developing with, requires no infrastructure setup, and eliminates all hardware, installation, configuration, backup, and maintenance-related costs. However, there are situations where a self-hosted deployment makes more sense.

## Performance Requirements

Certain use cases have very sensitive latency and load requirements. If you need ultra-low latency with voice AI services colocated with your other services, self-hosting can meet these requirements.

<CardGroup cols={2}>
  <Card title="Ideal Use Cases">
    * **Real-time AI voicebots** requiring sub-100ms response times
    * **Live transcription systems** for broadcasts or conferences
    * **High-volume processing** with predictable costs
    * **Edge deployments** with limited internet connectivity
  </Card>

  <Card title="Key Benefits">
    * Colocate speech services with your application infrastructure
    * Scale independently based on your specific workload patterns
    * Zero network latency to external APIs
    * Consistent performance regardless of internet conditions
  </Card>
</CardGroup>

### Zero Network Latency

When you self-host, your speech services run within your own infrastructure—whether that's the same data center, VPC, or even the same machine as your application. This eliminates the round-trip time to external APIs entirely.

<table>
  <thead>
    <tr>
      <th>
        Scenario
      </th>

      <th>
        Network Latency
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        <strong>Self-hosted</strong>
      </td>

      <td>
        1-5ms
      </td>
    </tr>

    <tr>
      <td>
        Same region
      </td>

      <td>
        20-50ms
      </td>
    </tr>

    <tr>
      <td>
        Cross-region
      </td>

      <td>
        100-200ms
      </td>
    </tr>

    <tr>
      <td>
        Edge/on-premises
      </td>

      <td>
        200-500ms+
      </td>
    </tr>
  </tbody>
</table>

For real-time voice applications like AI agents, every millisecond matters. Self-hosting keeps your latency predictable and minimal, regardless of where your users are located or the state of the public internet.

### Security & Data Privacy

One of the most common use cases for self-hosting Smallest is to satisfy security or data privacy requirements. In a typical self-hosted deployment, no audio, transcripts, or other identifying markers of the request content are sent to Smallest servers.

<CardGroup cols={2}>
  <Card title="Ideal For">
    * **Healthcare applications** requiring HIPAA compliance
    * **Financial services** with strict data governance
    * **Government and defense** applications
    * **Enterprise environments** with air-gapped networks
  </Card>

  <Card title="Data Privacy Guarantees">
    * Your audio data never leaves your infrastructure
    * Transcripts remain entirely within your control
    * No data stored beyond the duration of the API request
    * Self-hosted deployments do not persist request/response data
  </Card>
</CardGroup>

### What Data is Reported?

<Note>
  In a typical self-hosted deployment, no audio or transcript data is sent to Smallest servers. Only usage metadata is reported to the license server for validation and billing purposes.
</Note>

**Metadata reported:**

* Audio duration and character count
* Features requested (diarization, timestamps, etc.)
* Success/error response codes

**Never reported:**

* Audio content
* Transcripts or synthesis output
* Personally identifiable information

### Cost Optimization

For high-volume or predictable workloads, self-hosting can be more cost-effective than per-request API pricing.

<table>
  <thead>
    <tr>
      <th>
        Benefit
      </th>

      <th>
        Description
      </th>
    </tr>
  </thead>

  <tbody>
    <tr>
      <td>
        <strong>Predictable costs</strong>
      </td>

      <td>
        Infrastructure-based pricing, not usage-based
      </td>
    </tr>

    <tr>
      <td>
        <strong>Efficient utilization</strong>
      </td>

      <td>
        Predictable autoscaling maximizes resource efficiency
      </td>
    </tr>

    <tr>
      <td>
        <strong>Long-term savings</strong>
      </td>

      <td>
        Significant cost reduction for sustained high volumes
      </td>
    </tr>
  </tbody>
</table>

### Reliability & Grace Periods

Self-hosted deployments include built-in resilience against unforeseen network errors and temporary outages. The deployment won't suddenly stop working due to a transient network issue or external service disruption.

This means:

* **Continuous operation** during network interruptions or license server maintenance
* **Protection against unforeseen errors** — your services keep running while issues are resolved
* **Time to recover** — grace periods provide a buffer to restore connectivity without impacting your users

<Note>
  The License Proxy supports **grace periods** that allow your deployment to continue operating even if connectivity to the Smallest license server is temporarily lost.
</Note>

## Customization & Control

Self-hosting provides complete control over your deployment:

<AccordionGroup>
  <Accordion title="Resource Allocation">
    Optimize compute resources for your specific workload patterns. Allocate more GPU power during peak hours and scale down during off-peak times.
  </Accordion>

  <Accordion title="Version Control">
    Upgrade on your schedule. Test new versions in staging before production rollout. Roll back instantly if needed.
  </Accordion>

  <Accordion title="Network Isolation">
    Deploy in private networks, VPCs, or air-gapped environments. Full control over ingress and egress traffic.
  </Accordion>

  <Accordion title="Integration Flexibility">
    Direct integration with your monitoring, logging, and alerting infrastructure. Custom Prometheus metrics, Grafana dashboards, and alerting rules.
  </Accordion>
</AccordionGroup>

## When to Use Managed Service Instead

Self-hosting isn't always the right choice. Consider the managed Smallest API if:

* You're building a prototype or MVP
* Your audio processing volume is low or unpredictable
* You don't have DevOps resources to manage infrastructure
* You need to get started quickly without infrastructure setup

## Ready to Self-Host?

<CardGroup cols={2}>
  <Card title="Get Started" href="/waves/self-host/getting-started/introduction">
    Return to the introduction for deployment options
  </Card>

  <Card title="Docker Quick Start" href="/waves/self-host/docker-setup/stt-deployment/quick-start">
    Deploy in 15 minutes with Docker
  </Card>
</CardGroup>