Metrics Setup | Smallest AI Docs

Overview

This page focuses on collecting and validating Lightning ASR metrics with Prometheus and exposing them through the Prometheus Adapter.

Autoscaling documentation is currently under active development. Use this page as a metrics reference. If you need autoscaling now, configure your own HPA/KEDA rules using these metrics.

Architecture

/* The original had a syntax error in Mermaid—edges must connect nodes, not labels. “Metrics” is now a node, and edge directions/names are consistent. */

Components

Prometheus

Collects and stores metrics from Lightning ASR pods.

Included in chart:

values.yaml

1 scaling:
2   auto:
3     enabled: true
4 
5 kube-prometheus-stack:
6   prometheus:
7     prometheusSpec:
8       serviceMonitorSelectorNilUsesHelmValues: false
9       retention: 7d
10       resources:
11         requests:
12           memory: 2Gi

ServiceMonitor

CRD that tells Prometheus which services to scrape.

Enabled for Lightning ASR:

values.yaml

1 scaling:
2   auto:
3     lightningAsr:
4       servicemonitor:
5         enabled: true

Prometheus Adapter

Converts Prometheus metrics to Kubernetes custom metrics API.

Configuration:

values.yaml

1 prometheus-adapter:
2   prometheus:
3     url: http://smallest-prometheus-stack-prometheus.default.svc
4     port: 9090
5   rules:
6     custom:
7       - seriesQuery: "asr_active_requests"
8         resources:
9           overrides:
10             namespace: {resource: "namespace"}
11             pod: {resource: "pod"}
12         name:
13           matches: "^(.*)$"
14           as: "${1}"
15         metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
16       - seriesQuery: "asr_batch_queue_depth"
17         resources:
18           overrides:
19             namespace: {resource: "namespace"}
20             pod: {resource: "pod"}
21         name:
22           matches: "^(.*)$"
23           as: "${1}"
24         metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"
25       - seriesQuery: "asr_active_streams"
26         resources:
27           overrides:
28             namespace: {resource: "namespace"}
29             pod: {resource: "pod"}
30         name:
31           matches: "^(.*)$"
32           as: "${1}"
33         metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"
34       - seriesQuery: "asr_stream_queue_depth"
35         resources:
36           overrides:
37             namespace: {resource: "namespace"}
38             pod: {resource: "pod"}
39         name:
40           matches: "^(.*)$"
41           as: "${1}"
42         metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Available Metrics

Lightning ASR exposes the following metrics:

Metric	Type	Description
`asr_active_requests`	Gauge	Active batch requests currently being processed on GPU
`asr_batch_queue_depth`	Gauge	Requests waiting in the batch queue
`asr_active_streams`	Gauge	Active streaming sessions
`asr_stream_queue_depth`	Gauge	Pending sessions in the streaming Redis queue

Verify Metrics Setup

Check Prometheus

Forward Prometheus port:

$ kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090

Open http://localhost:9090 and verify:

Status → Targets: Lightning ASR endpoints should be “UP”
Graph: Query asr_active_requests or asr_batch_queue_depth - should return data
Status → Service Discovery: Should show ServiceMonitor

Check ServiceMonitor

$ kubectl get servicemonitor -n smallest

Expected output:

NAME            AGE
lightning-asr   5m

Describe ServiceMonitor:

$ kubectl describe servicemonitor lightning-asr -n smallest

Should show:

1 Spec:
2   Endpoints:
3     Port: metrics
4     Path: /metrics
5   Selector:
6     Match Labels:
7       app: lightning-asr

Check Prometheus Adapter

Verify custom metrics are available:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq -r '.resources[].name' | grep asr

Expected output:

pods/asr_active_requests
pods/asr_batch_queue_depth
pods/asr_active_streams
pods/asr_stream_queue_depth

Query specific metric:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/smallest/pods/*/asr_active_requests" | jq .

Custom Metric Configuration

Add New Custom Metrics

To expose additional metrics for your own autoscaling setup:

values.yaml

1 prometheus-adapter:
2   rules:
3     custom:
4       - seriesQuery: "asr_active_requests"
5         resources:
6           overrides:
7             namespace: {resource: "namespace"}
8             pod: {resource: "pod"}
9         name:
10           matches: "^(.*)$"
11           as: "${1}"
12         metricsQuery: "asr_active_requests{<<.LabelMatchers>>}"
13       
14       - seriesQuery: "asr_batch_queue_depth"
15         resources:
16           overrides:
17             namespace: {resource: "namespace"}
18             pod: {resource: "pod"}
19         name:
20           matches: "^(.*)$"
21           as: "${1}"
22         metricsQuery: "asr_batch_queue_depth{<<.LabelMatchers>>}"
23 
24       - seriesQuery: "asr_active_streams"
25         resources:
26           overrides:
27             namespace: {resource: "namespace"}
28             pod: {resource: "pod"}
29         name:
30           matches: "^(.*)$"
31           as: "${1}"
32         metricsQuery: "asr_active_streams{<<.LabelMatchers>>}"
33 
34       - seriesQuery: "asr_stream_queue_depth"
35         resources:
36           overrides:
37             namespace: {resource: "namespace"}
38             pod: {resource: "pod"}
39         name:
40           matches: "^(.*)$"
41           as: "${1}"
42         metricsQuery: "asr_stream_queue_depth{<<.LabelMatchers>>}"

Prometheus Configuration

Retention Policy

Configure how long metrics are stored:

values.yaml

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       retention: 15d
5       retentionSize: "50GB"

Storage

Persist Prometheus data:

values.yaml

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       storageSpec:
5         volumeClaimTemplate:
6           spec:
7             storageClassName: gp3
8             accessModes: ["ReadWriteOnce"]
9             resources:
10               requests:
11                 storage: 100Gi

Scrape Interval

Adjust how frequently metrics are collected:

values.yaml

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       scrapeInterval: 30s
5       evaluationInterval: 30s

Lower intervals (e.g., 15s) provide faster metrics response but increase storage.

Recording Rules

Pre-compute expensive queries:

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       additionalScrapeConfigs:
5         - job_name: 'lightning-asr-aggregated'
6           scrape_interval: 15s
7           static_configs:
8             - targets: ['lightning-asr:2269']
9       
10       additionalPrometheusRulesMap:
11         asr-rules:
12           groups:
13             - name: asr_aggregations
14               interval: 30s
15               rules:
16                 - record: asr:requests:active_avg
17                   expr: avg(asr_active_requests) by (namespace)
18                 
19                 - record: asr:batch_queue:depth_avg
20                   expr: avg(asr_batch_queue_depth) by (namespace)
21 
22                 - record: asr:streams:active_avg
23                   expr: avg(asr_active_streams) by (namespace)
24 
25                 - record: asr:stream_queue:depth_avg
26                   expr: avg(asr_stream_queue_depth) by (namespace)

Use recording rules in your autoscaling queries for better performance.

Alerting Rules

Create alerts for anomalies:

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       additionalPrometheusRulesMap:
5         asr-alerts:
6           groups:
7             - name: asr_alerts
8               rules:
9                 - alert: HighBatchQueueDepth
10                   expr: asr_batch_queue_depth > 20
11                   for: 5m
12                   labels:
13                     severity: warning
14                   annotations:
15                     summary: "ASR batch queue depth is high"
16                     description: "{{ $value }} requests are waiting in the batch queue"
17                 
18                 - alert: HighStreamQueueDepth
19                   expr: asr_stream_queue_depth > 30
20                   for: 2m
21                   labels:
22                     severity: warning
23                   annotations:
24                     summary: "ASR stream queue depth is high"
25                     description: "{{ $value }} streaming sessions are waiting in Redis"
26                 
27                 - alert: HighActiveStreams
28                   expr: asr_active_streams > 100
29                   for: 5m
30                   labels:
31                     severity: warning
32                   annotations:
33                     summary: "ASR active streams are high"
34                     description: "{{ $value }} active streaming sessions"

Debugging Metrics

Check Metrics Endpoint

Directly query Lightning ASR metrics:

$ kubectl port-forward -n smallest svc/lightning-asr 2269:2269
$ curl http://localhost:2269/metrics

Expected output:

# HELP asr_active_requests Current active requests
# TYPE asr_active_requests gauge
asr_active_requests{pod="lightning-asr-xxx"} 3
# HELP asr_batch_queue_depth Requests waiting in the batch queue
# TYPE asr_batch_queue_depth gauge
asr_batch_queue_depth{pod="lightning-asr-xxx"} 2
# HELP asr_active_streams Active streaming sessions
# TYPE asr_active_streams gauge
asr_active_streams{pod="lightning-asr-xxx"} 14
# HELP asr_stream_queue_depth Pending sessions in the streaming Redis queue
# TYPE asr_stream_queue_depth gauge
asr_stream_queue_depth{pod="lightning-asr-xxx"} 1
...

Test Prometheus Query

Access Prometheus UI and test queries:

1 asr_active_requests
2 asr_batch_queue_depth
3 asr_active_streams
4 asr_stream_queue_depth

Check Prometheus Targets

$ kubectl port-forward -n default svc/smallest-prometheus-stack-prometheus 9090:9090

Navigate to: http://localhost:9090/targets

Verify Lightning ASR targets are “UP”

View Prometheus Logs

$ kubectl logs -n default -l app.kubernetes.io/name=prometheus --tail=100

Look for scrape errors.

Troubleshooting

Metrics Not Appearing

Check ServiceMonitor is created:

$ kubectl get servicemonitor -n smallest

Check Prometheus is discovering:

$ kubectl logs -n default -l app.kubernetes.io/name=prometheus | grep lightning-asr

Check service has metrics port:

$ kubectl get svc lightning-asr -n smallest -o yaml

Should show:

1 ports:
2   - name: metrics
3     port: 2269

Custom Metrics Not Available

Check Prometheus Adapter logs:

$ kubectl logs -n kube-system -l app.kubernetes.io/name=prometheus-adapter

Verify adapter configuration:

$ kubectl get configmap prometheus-adapter -n kube-system -o yaml

Test API manually:

$ kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1" | jq .

High Cardinality Issues

If Prometheus is using too much memory:

Reduce label cardinality
Increase retention limits
Use recording rules for complex queries

1 kube-prometheus-stack:
2   prometheus:
3     prometheusSpec:
4       resources:
5         requests:
6           memory: 4Gi
7         limits:
8           memory: 8Gi

Best Practices

Use Recording Rules

Pre-compute expensive queries:

1 - record: asr:batch_queue:depth_avg
2   expr: avg(asr_batch_queue_depth) by (namespace)

Then use this in your autoscaling logic instead of a raw query

Set Appropriate Scrape Intervals

Balance responsiveness vs storage:

Fast autoscaling: 15s
Normal: 30s
Cost-optimized: 60s

Enable Persistence

Always persist Prometheus data:

1 storageSpec:
2   volumeClaimTemplate:
3     spec:
4       resources:
5         requests:
6           storage: 100Gi

Monitor Prometheus Itself

Track Prometheus performance:

Query duration
Scrape duration
Memory usage
TSDB size

Use Grafana for Visualization

Don’t rely on Prometheus UI

Use Grafana dashboards for ops

See Grafana Dashboards

What’s Next?

Grafana Dashboards

Visualize metrics