***
title: Metrics Setup
description: 'Configure Prometheus, ServiceMonitor, and custom metrics for autoscaling'
---------------------------------------------------------------------------------------
## Overview
The metrics setup enables autoscaling by collecting Lightning ASR metrics with Prometheus and exposing them to Kubernetes HPA through the Prometheus Adapter.
## Architecture
```mermaid
graph LR
ASR[Lightning ASR] -->|Exports| Metrics
Metrics[/metrics Endpoint/] -->|Discovered By| SM[ServiceMonitor]
SM -->|Scraped By| Prom[Prometheus]
Prom -->|Queried By| Adapter[Prometheus Adapter]
Adapter -->|Supplies Metrics| HPA[HPA Controller]
HPA -->|Scales| ASR
style Prom fill:#E6522C
style ASR fill:#0D9373
```
/\* The original had a syntax error in Mermaid—edges must connect nodes, not labels.
"Metrics" is now a node, and edge directions/names are consistent.
\*/
## Components
### Prometheus
Collects and stores metrics from Lightning ASR pods.
**Included in chart**:
```yaml values.yaml
scaling:
auto:
enabled: true
kube-prometheus-stack:
prometheus:
prometheusSpec:
serviceMonitorSelectorNilUsesHelmValues: false
retention: 7d
resources:
requests:
memory: 2Gi
```
### ServiceMonitor
CRD that tells Prometheus which services to scrape.
**Enabled for Lightning ASR**:
```yaml values.yaml
scaling:
auto:
lightningAsr:
servicemonitor:
enabled: true
```
### Prometheus Adapter
Converts Prometheus metrics to Kubernetes custom metrics API.
**Configuration**:
```yaml values.yaml
prometheus-adapter:
prometheus:
url: http://smallest-prometheus-stack-prometheus.default.svc
port: 9090
rules:
custom:
- seriesQuery: "asr_active_requests"
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)$"
as: "${1}"
metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
```
## Available Metrics
Lightning ASR exposes the following metrics:
| Metric | Type | Description |
| ------------------------------ | --------- | ----------------------------------------------- |
| `asr_active_requests` | Gauge | Current number of active transcription requests |
| `asr_total_requests` | Counter | Total requests processed |
| `asr_failed_requests` | Counter | Total failed requests |
| `asr_request_duration_seconds` | Histogram | Request processing time |
| `asr_model_load_time_seconds` | Gauge | Time to load model on startup |
| `asr_gpu_utilization` | Gauge | GPU utilization percentage |
| `asr_gpu_memory_used_bytes` | Gauge | GPU memory used |
## Verify Metrics Setup
### Check Prometheus
Forward Prometheus port:
```bash
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
```
Open [http://localhost:9090](http://localhost:9090) and verify:
1. **Status → Targets**: Lightning ASR endpoints should be "UP"
2. **Graph**: Query `asr_active_requests` - should return data
3. **Status → Service Discovery**: Should show ServiceMonitor
### Check ServiceMonitor
```bash
kubectl get servicemonitor -n smallest
```
Expected output:
```
NAME AGE
lightning-asr 5m
```
Describe ServiceMonitor:
```bash
kubectl describe servicemonitor lightning-asr -n smallest
```
Should show:
```yaml
Spec:
Endpoints:
Port: metrics
Path: /metrics
Selector:
Match Labels:
app: lightning-asr
```
### Check Prometheus Adapter
Verify custom metrics are available:
```bash
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr
```
Expected output:
```
pods/asr_active_requests
pods/asr_total_requests
pods/asr_failed_requests
```
Query specific metric:
```bash
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .
```
## Custom Metric Configuration
### Add New Custom Metrics
To expose additional metrics to HPA:
```yaml values.yaml
prometheus-adapter:
rules:
custom:
- seriesQuery: "asr_active_requests"
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
matches: "^(.*)$"
as: "${1}"
metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
- seriesQuery: "asr_gpu_utilization"
resources:
overrides:
namespace: {resource: "namespace"}
pod: {resource: "pod"}
name:
as: "gpu_utilization"
metricsQuery: "avg_over_time(asr_gpu_utilization{<<.LabelMatchers>>}[2m])"
```
### External Metrics
For cluster-wide metrics:
```yaml values.yaml
prometheus-adapter:
rules:
external:
- seriesQuery: 'kube_deployment_status_replicas{deployment="lightning-asr"}'
metricsQuery: 'sum(kube_deployment_status_replicas{deployment="lightning-asr"})'
name:
as: "lightning_asr_replica_count"
resources:
overrides:
namespace: {resource: "namespace"}
```
Use in HPA:
```yaml
spec:
metrics:
- type: External
external:
metric:
name: lightning_asr_replica_count
target:
type: Value
value: "5"
```
## Prometheus Configuration
### Retention Policy
Configure how long metrics are stored:
```yaml values.yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
retention: 15d
retentionSize: "50GB"
```
### Storage
Persist Prometheus data:
```yaml values.yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
storageSpec:
volumeClaimTemplate:
spec:
storageClassName: gp3
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 100Gi
```
### Scrape Interval
Adjust how frequently metrics are collected:
```yaml values.yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
scrapeInterval: 30s
evaluationInterval: 30s
```
Lower intervals (e.g., 15s) provide faster HPA response but increase storage.
## Recording Rules
Pre-compute expensive queries:
```yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
additionalScrapeConfigs:
- job_name: 'lightning-asr-aggregated'
scrape_interval: 15s
static_configs:
- targets: ['lightning-asr:2269']
additionalPrometheusRulesMap:
asr-rules:
groups:
- name: asr_aggregations
interval: 30s
rules:
- record: asr:requests:rate5m
expr: rate(asr_total_requests[5m])
- record: asr:requests:active_avg
expr: avg(asr_active_requests) by (namespace)
- record: asr:gpu:utilization_avg
expr: avg(asr_gpu_utilization) by (namespace)
```
Use recording rules in HPA for better performance.
## Alerting Rules
Create alerts for anomalies:
```yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
additionalPrometheusRulesMap:
asr-alerts:
groups:
- name: asr_alerts
rules:
- alert: HighErrorRate
expr: rate(asr_failed_requests[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High ASR error rate"
description: "Error rate is {{ $value }} errors/sec"
- alert: HighQueueLength
expr: asr_active_requests > 50
for: 2m
labels:
severity: warning
annotations:
summary: "ASR queue backing up"
description: "{{ $value }} requests queued"
- alert: GPUMemoryHigh
expr: asr_gpu_memory_used_bytes / 24000000000 > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "GPU memory usage high"
description: "GPU memory at {{ $value | humanizePercentage }}"
```
## Debugging Metrics
### Check Metrics Endpoint
Directly query Lightning ASR metrics:
```bash
kubectl port-forward -n smallest svc/lightning-asr 2269:2269
curl http://localhost:2269/metrics
```
Expected output:
```
# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3
# HELP asr_total_requests Total requests processed
# TYPE asr_total_requests counter
asr_total_requests{pod="lightning-asr-xxx"} 1523
...
```
### Test Prometheus Query
Access Prometheus UI and test queries:
```promql
asr_active_requests
rate(asr_total_requests[5m])
histogram_quantile(0.95, asr_request_duration_seconds_bucket)
```
### Check Prometheus Targets
```bash
kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090
```
Navigate to: [http://localhost:9090/targets](http://localhost:9090/targets)
Verify Lightning ASR targets are "UP"
### View Prometheus Logs
```bash
kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100
```
Look for scrape errors.
## Troubleshooting
### Metrics Not Appearing
**Check ServiceMonitor is created**:
```bash
kubectl get servicemonitor -n smallest
```
**Check Prometheus is discovering**:
```bash
kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr
```
**Check service has metrics port**:
```bash
kubectl get svc lightning-asr -n smallest -o yaml
```
Should show:
```yaml
ports:
- name: metrics
port: 2269
```
### Custom Metrics Not Available
**Check Prometheus Adapter logs**:
```bash
kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter
```
**Verify adapter configuration**:
```bash
kubectl get configmap prometheus-adapter -n kube-system -o yaml
```
**Test API manually**:
```bash
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .
```
### High Cardinality Issues
If Prometheus is using too much memory:
1. Reduce label cardinality
2. Increase retention limits
3. Use recording rules for complex queries
```yaml
kube-prometheus-stack:
prometheus:
prometheusSpec:
resources:
requests:
memory: 4Gi
limits:
memory: 8Gi
```
## Best Practices
Pre-compute expensive queries:
```yaml
- record: asr:requests:rate5m
expr: rate(asr_total_requests[5m])
```
Then use in HPA instead of raw query
Balance responsiveness vs storage:
* Fast autoscaling: 15s
* Normal: 30s
* Cost-optimized: 60s
Always persist Prometheus data:
```yaml
storageSpec:
volumeClaimTemplate:
spec:
resources:
requests:
storage: 100Gi
```
Track Prometheus performance:
* Query duration
* Scrape duration
* Memory usage
* TSDB size
Don't rely on Prometheus UI
Use Grafana dashboards for ops
See [Grafana Dashboards](/waves/self-host/kubernetes-setup/autoscaling/grafana-dashboards)
## What's Next?
Use metrics for autoscaling
Visualize metrics