Metrics Setup

View as Markdown

Overview

This page focuses on collecting and validating Lightning ASR metrics with Prometheus and exposing them through the Prometheus Adapter.

Autoscaling documentation is currently under active development. Use this page as a metrics reference. If you need autoscaling now, configure your own HPA/KEDA rules using these metrics.

Architecture

/* The original had a syntax error in Mermaid—edges must connect nodes, not labels. “Metrics” is now a node, and edge directions/names are consistent. */

Components

Prometheus

Collects and stores metrics from Lightning ASR pods.

Included in chart:

values.yaml
1scaling:
2 auto:
3 enabled: true
4
5kube-prometheus-stack:
6 prometheus:
7 prometheusSpec:
8 serviceMonitorSelectorNilUsesHelmValues: false
9 retention: 7d
10 resources:
11 requests:
12 memory: 2Gi

ServiceMonitor

CRD that tells Prometheus which services to scrape.

Enabled for Lightning ASR:

values.yaml
1scaling:
2 auto:
3 lightningAsr:
4 servicemonitor:
5 enabled: true

Prometheus Adapter

Converts Prometheus metrics to Kubernetes custom metrics API.

Configuration:

values.yaml
1prometheus-adapter:
2 prometheus:
3 url: http://smallest-prometheus-stack-prometheus.default.svc
4 port: 9090
5 rules:
6 custom:
7 - seriesQuery: "asr_active_requests"
8 resources:
9 overrides:
10 namespace: {resource: "namespace"}
11 pod: {resource: "pod"}
12 name:
13 matches: "^(.*)$"
14 as: "${1}"
15 metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
16 - seriesQuery: "asr_batch_queue_depth"
17 resources:
18 overrides:
19 namespace: {resource: "namespace"}
20 pod: {resource: "pod"}
21 name:
22 matches: "^(.*)$"
23 as: "${1}"
24 metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"
25 - seriesQuery: "asr_active_streams"
26 resources:
27 overrides:
28 namespace: {resource: "namespace"}
29 pod: {resource: "pod"}
30 name:
31 matches: "^(.*)$"
32 as: "${1}"
33 metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"
34 - seriesQuery: "asr_stream_queue_depth"
35 resources:
36 overrides:
37 namespace: {resource: "namespace"}
38 pod: {resource: "pod"}
39 name:
40 matches: "^(.*)$"
41 as: "${1}"
42 metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Available Metrics

Lightning ASR exposes the following metrics:

MetricTypeDescription
asr_active_requestsGaugeActive batch requests currently being processed on GPU
asr_batch_queue_depthGaugeRequests waiting in the batch queue
asr_active_streamsGaugeActive streaming sessions
asr_stream_queue_depthGaugePending sessions in the streaming Redis queue

Verify Metrics Setup

Check Prometheus

Forward Prometheus port:

$kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090

Open http://localhost:9090 and verify:

  1. Status → Targets: Lightning ASR endpoints should be “UP”
  2. Graph: Query asr_active_requests or asr_batch_queue_depth - should return data
  3. Status → Service Discovery: Should show ServiceMonitor

Check ServiceMonitor

$kubectl get servicemonitor -n smallest

Expected output:

NAME AGE
lightning-asr 5m

Describe ServiceMonitor:

$kubectl describe servicemonitor lightning-asr -n smallest

Should show:

1Spec:
2 Endpoints:
3 Port: metrics
4 Path: /metrics
5 Selector:
6 Match Labels:
7 app: lightning-asr

Check Prometheus Adapter

Verify custom metrics are available:

$kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr

Expected output:

pods/asr_active_requests
pods/asr_batch_queue_depth
pods/asr_active_streams
pods/asr_stream_queue_depth

Query specific metric:

$kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .

Custom Metric Configuration

Add New Custom Metrics

To expose additional metrics for your own autoscaling setup:

values.yaml
1prometheus-adapter:
2 rules:
3 custom:
4 - seriesQuery: "asr_active_requests"
5 resources:
6 overrides:
7 namespace: {resource: "namespace"}
8 pod: {resource: "pod"}
9 name:
10 matches: "^(.*)$"
11 as: "${1}"
12 metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
13
14 - seriesQuery: "asr_batch_queue_depth"
15 resources:
16 overrides:
17 namespace: {resource: "namespace"}
18 pod: {resource: "pod"}
19 name:
20 matches: "^(.*)$"
21 as: "${1}"
22 metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"
23
24 - seriesQuery: "asr_active_streams"
25 resources:
26 overrides:
27 namespace: {resource: "namespace"}
28 pod: {resource: "pod"}
29 name:
30 matches: "^(.*)$"
31 as: "${1}"
32 metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"
33
34 - seriesQuery: "asr_stream_queue_depth"
35 resources:
36 overrides:
37 namespace: {resource: "namespace"}
38 pod: {resource: "pod"}
39 name:
40 matches: "^(.*)$"
41 as: "${1}"
42 metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Prometheus Configuration

Retention Policy

Configure how long metrics are stored:

values.yaml
1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 retention: 15d
5 retentionSize: "50GB"

Storage

Persist Prometheus data:

values.yaml
1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 storageSpec:
5 volumeClaimTemplate:
6 spec:
7 storageClassName: gp3
8 accessModes: ["ReadWriteOnce"]
9 resources:
10 requests:
11 storage: 100Gi

Scrape Interval

Adjust how frequently metrics are collected:

values.yaml
1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 scrapeInterval: 30s
5 evaluationInterval: 30s

Lower intervals (e.g., 15s) provide faster metrics response but increase storage.

Recording Rules

Pre-compute expensive queries:

1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 additionalScrapeConfigs:
5 - job_name: 'lightning-asr-aggregated'
6 scrape_interval: 15s
7 static_configs:
8 - targets: ['lightning-asr:2269']
9
10 additionalPrometheusRulesMap:
11 asr-rules:
12 groups:
13 - name: asr_aggregations
14 interval: 30s
15 rules:
16 - record: asr:requests:active_avg
17 expr: avg(asr_active_requests) by (namespace)
18
19 - record: asr:batch_queue:depth_avg
20 expr: avg(asr_batch_queue_depth) by (namespace)
21
22 - record: asr:streams:active_avg
23 expr: avg(asr_active_streams) by (namespace)
24
25 - record: asr:stream_queue:depth_avg
26 expr: avg(asr_stream_queue_depth) by (namespace)

Use recording rules in your autoscaling queries for better performance.

Alerting Rules

Create alerts for anomalies:

1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 additionalPrometheusRulesMap:
5 asr-alerts:
6 groups:
7 - name: asr_alerts
8 rules:
9 - alert: HighBatchQueueDepth
10 expr: asr_batch_queue_depth > 20
11 for: 5m
12 labels:
13 severity: warning
14 annotations:
15 summary: "ASR batch queue depth is high"
16 description: "{{ $value }} requests are waiting in the batch queue"
17
18 - alert: HighStreamQueueDepth
19 expr: asr_stream_queue_depth > 30
20 for: 2m
21 labels:
22 severity: warning
23 annotations:
24 summary: "ASR stream queue depth is high"
25 description: "{{ $value }} streaming sessions are waiting in Redis"
26
27 - alert: HighActiveStreams
28 expr: asr_active_streams > 100
29 for: 5m
30 labels:
31 severity: warning
32 annotations:
33 summary: "ASR active streams are high"
34 description: "{{ $value }} active streaming sessions"

Debugging Metrics

Check Metrics Endpoint

Directly query Lightning ASR metrics:

$kubectl port-forward -n smallest svc/lightning-asr 2269:2269
$curl http://localhost:2269/metrics

Expected output:

# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3
# HELP asr_batch_queue_depth Requests waiting in the batch queue
# TYPE asr_batch_queue_depth gauge
asr_batch_queue_depth{pod="lightning-asr-xxx"} 2
# HELP asr_active_streams Active streaming sessions
# TYPE asr_active_streams gauge
asr_active_streams{pod="lightning-asr-xxx"} 14
# HELP asr_stream_queue_depth Pending sessions in the streaming Redis queue
# TYPE asr_stream_queue_depth gauge
asr_stream_queue_depth{pod="lightning-asr-xxx"} 1
...

Test Prometheus Query

Access Prometheus UI and test queries:

1asr_active_requests
2asr_batch_queue_depth
3asr_active_streams
4asr_stream_queue_depth

Check Prometheus Targets

$kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090

Navigate to: http://localhost:9090/targets

Verify Lightning ASR targets are “UP”

View Prometheus Logs

$kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100

Look for scrape errors.

Troubleshooting

Metrics Not Appearing

Check ServiceMonitor is created:

$kubectl get servicemonitor -n smallest

Check Prometheus is discovering:

$kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr

Check service has metrics port:

$kubectl get svc lightning-asr -n smallest -o yaml

Should show:

1ports:
2 - name: metrics
3 port: 2269

Custom Metrics Not Available

Check Prometheus Adapter logs:

$kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter

Verify adapter configuration:

$kubectl get configmap prometheus-adapter -n kube-system -o yaml

Test API manually:

$kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

High Cardinality Issues

If Prometheus is using too much memory:

  1. Reduce label cardinality
  2. Increase retention limits
  3. Use recording rules for complex queries
1kube-prometheus-stack:
2 prometheus:
3 prometheusSpec:
4 resources:
5 requests:
6 memory: 4Gi
7 limits:
8 memory: 8Gi

Best Practices

Pre-compute expensive queries:

1- record: asr:batch_queue:depth_avg
2 expr: avg(asr_batch_queue_depth) by (namespace)

Then use this in your autoscaling logic instead of a raw query

Balance responsiveness vs storage:

  • Fast autoscaling: 15s
  • Normal: 30s
  • Cost-optimized: 60s

Always persist Prometheus data:

1storageSpec:
2 volumeClaimTemplate:
3 spec:
4 resources:
5 requests:
6 storage: 100Gi

Track Prometheus performance:

  • Query duration
  • Scrape duration
  • Memory usage
  • TSDB size

Don’t rely on Prometheus UI

Use Grafana dashboards for ops

See Grafana Dashboards

What’s Next?