Metrics Setup
Overview
The metrics setup enables autoscaling by collecting Lightning ASR metrics with Prometheus and exposing them to Kubernetes HPA through the Prometheus Adapter.
Architecture
/* The original had a syntax error in Mermaid—edges must connect nodes, not labels. “Metrics” is now a node, and edge directions/names are consistent. */
Components
Prometheus
Collects and stores metrics from Lightning ASR pods.
Included in chart:
ServiceMonitor
CRD that tells Prometheus which services to scrape.
Enabled for Lightning ASR:
Prometheus Adapter
Converts Prometheus metrics to Kubernetes custom metrics API.
Configuration:
Available Metrics
Lightning ASR exposes the following metrics:
Verify Metrics Setup
Check Prometheus
Forward Prometheus port:
Open http://localhost:9090 and verify:
- Status → Targets: Lightning ASR endpoints should be “UP”
- Graph: Query
asr_active_requests- should return data - Status → Service Discovery: Should show ServiceMonitor
Check ServiceMonitor
Expected output:
Describe ServiceMonitor:
Should show:
Check Prometheus Adapter
Verify custom metrics are available:
Expected output:
Query specific metric:
Custom Metric Configuration
Add New Custom Metrics
To expose additional metrics to HPA:
External Metrics
For cluster-wide metrics:
Use in HPA:
Prometheus Configuration
Retention Policy
Configure how long metrics are stored:
Storage
Persist Prometheus data:
Scrape Interval
Adjust how frequently metrics are collected:
Lower intervals (e.g., 15s) provide faster HPA response but increase storage.
Recording Rules
Pre-compute expensive queries:
Use recording rules in HPA for better performance.
Alerting Rules
Create alerts for anomalies:
Debugging Metrics
Check Metrics Endpoint
Directly query Lightning ASR metrics:
Expected output:
Test Prometheus Query
Access Prometheus UI and test queries:
Check Prometheus Targets
Navigate to: http://localhost:9090/targets
Verify Lightning ASR targets are “UP”
View Prometheus Logs
Look for scrape errors.
Troubleshooting
Metrics Not Appearing
Check ServiceMonitor is created:
Check Prometheus is discovering:
Check service has metrics port:
Should show:
Custom Metrics Not Available
Check Prometheus Adapter logs:
Verify adapter configuration:
Test API manually:
High Cardinality Issues
If Prometheus is using too much memory:
- Reduce label cardinality
- Increase retention limits
- Use recording rules for complex queries
Best Practices
Use Recording Rules
Pre-compute expensive queries:
Then use in HPA instead of raw query
Set Appropriate Scrape Intervals
Balance responsiveness vs storage:
- Fast autoscaling: 15s
- Normal: 30s
- Cost-optimized: 60s
Enable Persistence
Always persist Prometheus data:
Monitor Prometheus Itself
Track Prometheus performance:
- Query duration
- Scrape duration
- Memory usage
- TSDB size

