***
title: Grafana Dashboards
description: 'Visualize metrics, autoscaling behavior, and system performance'
------------------------------------------------------------------------------
## Overview
Grafana provides powerful visualization of Lightning ASR metrics, autoscaling behavior, and system performance. This guide covers accessing Grafana, importing dashboards, and creating custom visualizations.
## Access Grafana
### Enable Grafana
Ensure Grafana is enabled in your Helm values:
```yaml values.yaml
scaling:
auto:
enabled: true
kube-prometheus-stack:
grafana:
enabled: true
adminPassword: "admin-password"
```
### Port Forward
Access Grafana locally:
```bash
kubectl port-forward -n default svc/smallest-prometheus-stack-grafana 3000:80
```
Open [http://localhost:3000](http://localhost:3000) in your browser.
### Default Credentials
* **Username**: `admin`
* **Password**: `prom-operator` (or custom password from `adminPassword`)
Change the default password immediately in production:
```yaml
grafana:
adminPassword: "your-secure-password"
```
### Expose Externally
For permanent access, expose via LoadBalancer or Ingress:
```yaml values.yaml
kube-prometheus-stack:
grafana:
service:
type: LoadBalancer
```
```yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: default
spec:
rules:
- host: grafana.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: smallest-prometheus-stack-grafana
port:
number: 80
```
## Import ASR Dashboard
The Smallest Self-Host repository includes a pre-built ASR dashboard.
### Import from File
The dashboard is available at `grafana/dashboards/asr-dashboard.json` in the repository.
Navigate to Grafana → Dashboards → Import
* Click "Upload JSON file"
* Select `asr-dashboard.json`
* Click "Load"
* Select Prometheus data source: `Prometheus`
* Click "Import"
### Import via ConfigMap
Automatically load dashboard on Grafana startup:
```yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: asr-dashboard
namespace: default
labels:
grafana_dashboard: "1"
data:
asr-dashboard.json: |
{
"dashboard": ...,
"overwrite": true
}
```
Or enable via Helm:
```yaml values.yaml
kube-prometheus-stack:
grafana:
dashboardProviders:
dashboardproviders.yaml:
apiVersion: 1
providers:
- name: 'default'
folder: 'Smallest'
type: file
options:
path: /var/lib/grafana/dashboards/default
dashboards:
default:
asr-dashboard:
file: dashboards/asr-dashboard.json
```
## ASR Dashboard Overview
The pre-built dashboard includes the following panels:
### Active Requests
Shows current requests being processed:
* **Metric**: `asr_active_requests`
* **Visualization**: Stat panel with thresholds
* **Colors**:
* Green: 0-5 requests
* Yellow: 5-10 requests
* Orange: 10-20 requests
* Red: 20+ requests
### Request Rate
Requests per second over time:
* **Metric**: `rate(asr_total_requests[5m])`
* **Visualization**: Time series graph
* **Use**: Track traffic patterns
### Error Rate
Failed requests percentage:
* **Metric**: `rate(asr_failed_requests[5m]) / rate(asr_total_requests[5m]) * 100`
* **Visualization**: Stat panel + time series
* **Alert**: Warning if > 5%
### Response Time
Request duration percentiles:
* **Metrics**:
* P50: `histogram_quantile(0.50, asr_request_duration_seconds_bucket)`
* P95: `histogram_quantile(0.95, asr_request_duration_seconds_bucket)`
* P99: `histogram_quantile(0.99, asr_request_duration_seconds_bucket)`
* **Visualization**: Time series graph
### Pod Count
Number of Lightning ASR replicas:
* **Metric**: `count(asr_active_requests)`
* **Visualization**: Stat panel
* **Use**: Monitor autoscaling
### GPU Utilization
GPU usage per pod:
* **Metric**: `asr_gpu_utilization`
* **Visualization**: Time series graph
* **Use**: Ensure GPUs are utilized
### GPU Memory
GPU memory usage:
* **Metric**: `asr_gpu_memory_used_bytes / 1024 / 1024 / 1024`
* **Visualization**: Gauge + time series
* **Use**: Monitor memory leaks
## Create Custom Dashboards
### Add New Dashboard
Grafana → Dashboards → New Dashboard
Click "Add panel"
* Data source: Prometheus
* Metric: `asr_active_requests`
* Legend: `{{pod}}`
* Choose visualization type (Time series, Stat, Gauge, etc.)
* Configure thresholds
* Set units and decimals
Click "Save dashboard"
Enter name: "Custom ASR Dashboard"
### Useful Queries
#### Average Active Requests
```promql
avg(asr_active_requests)
```
#### Total Throughput (requests/hour)
```promql
sum(rate(asr_total_requests[1h])) * 3600
```
#### Pod Resource Usage
```promql
sum(container_memory_usage_bytes{pod=~"lightning-asr.*"}) by (pod) / 1024 / 1024 / 1024
```
#### Autoscaling Events
```promql
kube_deployment_status_replicas{deployment="lightning-asr"}
```
#### GPU Temperature
```promql
asr_gpu_temperature_celsius
```
## Dashboard Variables
Add variables for dynamic filtering:
### Namespace Variable
Click gear icon → Variables → Add variable
* **Name**: `namespace`
* **Type**: Query
* **Data source**: Prometheus
* **Query**: `label_values(asr_active_requests, namespace)`
* **Multi-value**: Enabled
Update panels to use variable:
```promql
asr_active_requests{namespace="$namespace"}
```
### Pod Variable
```
label_values(asr_active_requests{namespace="$namespace"}, pod)
```
### Time Range Variable
```
$__interval
```
Use in queries for dynamic aggregation.
## Alerting
### Configure Alert Rules
Open panel → Alert tab
* **Name**: High Active Requests
* **Evaluate every**: 1m
* **For**: 5m
```
WHEN avg() OF query(A, 5m, now) IS ABOVE 20
```
* Choose notification channel
* Add message template
### Alert Notification Channels
Configure notifications:
Grafana → Alerting → Notification channels → Add channel
* **Type**: Email
* **Addresses**: [ops@example.com](mailto:ops@example.com)
* **Type**: Slack
* **Webhook URL**: [https://hooks.slack.com/](https://hooks.slack.com/)...
* **Channel**: #alerts
* **Type**: PagerDuty
* **Integration Key**: Your key
## Pre-Built Dashboard Examples
### System Overview Dashboard
```json
{
"title": "Smallest Self-Host Overview",
"panels": [
{
"title": "Active Requests",
"targets": [{"expr": "sum(asr_active_requests)"}]
},
{
"title": "Request Rate",
"targets": [{"expr": "sum(rate(asr_total_requests[5m]))"}]
},
{
"title": "Pod Count",
"targets": [{"expr": "count(asr_active_requests)"}]
},
{
"title": "Error Rate %",
"targets": [{"expr": "sum(rate(asr_failed_requests[5m])) / sum(rate(asr_total_requests[5m])) * 100"}]
}
]
}
```
### Autoscaling Dashboard
Track HPA behavior:
```promql
kube_deployment_status_replicas{deployment="lightning-asr"}
kube_deployment_status_replicas_available{deployment="lightning-asr"}
kube_horizontalpodautoscaler_status_desired_replicas{horizontalpodautoscaler="lightning-asr"}
kube_horizontalpodautoscaler_status_current_replicas{horizontalpodautoscaler="lightning-asr"}
```
### Cost Dashboard
Monitor resource costs:
```promql
sum(kube_pod_container_resource_requests{pod=~"lightning-asr.*"}) by (resource)
count(kube_node_info{node=~".*gpu.*"}) * 1.00
```
## Best Practices
Organize dashboards by category:
* **Smallest Overview**: High-level metrics
* **Lightning ASR**: Detailed ASR metrics
* **Infrastructure**: Node and cluster metrics
* **Autoscaling**: HPA and scaling behavior
Default time ranges for different views:
* **Real-time monitoring**: Last 15 minutes
* **Troubleshooting**: Last 1 hour
* **Analysis**: Last 24 hours
* **Trends**: Last 7 days
Mark important events:
* Deployments
* Scaling events
* Incidents
* Configuration changes
Create template dashboards for:
* Different environments (dev, staging, prod)
* Different namespaces
* Different models
Save dashboard JSON to git:
```bash
kubectl get configmap asr-dashboard -o jsonpath='{.data.asr-dashboard\.json}' > asr-dashboard.json
git add asr-dashboard.json
git commit -m "Update ASR dashboard"
```
## Troubleshooting
### Grafana Not Showing Data
**Check Prometheus data source**:
Grafana → Configuration → Data Sources → Prometheus
* **URL**: `http://smallest-prometheus-stack-prometheus:9090`
* **Access**: Server (default)
Test connection with "Save & Test" button.
**Check Prometheus is running**:
```bash
kubectl get pods -l app.kubernetes.io/name=prometheus
```
### Queries Returning No Data
**Verify metric exists in Prometheus**:
```bash
kubectl port-forward svc/smallest-prometheus-stack-prometheus 9090:9090
```
Open [http://localhost:9090](http://localhost:9090) and query the metric.
**Check time range**: Ensure time range includes data.
### Dashboard Not Loading
**Check Grafana logs**:
```bash
kubectl logs -l app.kubernetes.io/name=grafana
```
**Increase memory if needed**:
```yaml
kube-prometheus-stack:
grafana:
resources:
limits:
memory: 512Mi
```
## What's Next?
Use metrics for autoscaling
Configure Prometheus metrics
Configure license validation