Logs Analysis | Smallest AI Docs

Overview

Understanding log messages is crucial for diagnosing issues. This guide helps you interpret logs from each component and identify common error patterns.

Log Levels

All components use standard log levels:

Level	Description	Example
`DEBUG`	Detailed diagnostic info	Variable values, function calls
`INFO`	Normal operation events	Request received, model loaded
`WARNING`	Potential issues	Slow response, retry attempt
`ERROR`	Error that needs attention	Failed request, connection error
`CRITICAL`	Severe error	Service crash, unrecoverable error

Lightning ASR Logs

Successful Startup

1 INFO: Starting Lightning ASR v1.0.0
2 INFO: GPU detected: NVIDIA A10 (24GB)
3 INFO: Downloading model from URL...
4 INFO: Model downloaded: 23.5GB
5 INFO: Loading model into GPU memory...
6 INFO: Model loaded successfully (5.2GB GPU memory)
7 INFO: Warmup inference completed in 3.2s
8 INFO: Server ready on port 2269

Request Processing

1 INFO: Request received: req_abc123
2 DEBUG: Audio duration: 60.5s, sample_rate: 44100
3 DEBUG: Preprocessing audio...
4 DEBUG: Running inference...
5 INFO: Transcription completed in 3.1s (RTF: 0.05x)
6 INFO: Confidence: 0.95

Common Errors

GPU Not Found

1 ERROR: No CUDA-capable device detected
2 ERROR: nvidia-smi command not found
3 CRITICAL: Cannot initialize GPU, exiting

Cause: GPU not available or drivers not installed

Solution:

Check nvidia-smi works
Verify GPU device plugin (Kubernetes)
Check NVIDIA Container Toolkit (Docker)

Out of GPU Memory

1 ERROR: CUDA out of memory
2 ERROR: Tried to allocate 2.5GB but only 1.2GB available
3 WARNING: Reducing batch size

Cause: Not enough GPU memory

Solution:

Reduce concurrent requests
Use larger GPU (A10 vs T4)
Scale horizontally (more pods)

Model Download Failed

1 INFO: Downloading model from https://example.com/model.bin
2 WARNING: Download attempt 1 failed: Connection timeout
3 WARNING: Retrying download...
4 ERROR: Download failed after 3 attempts

Cause: Network issues, invalid URL, disk full

Solution:

Verify MODEL_URL
Check disk space: df -h
Test URL: curl -I $MODEL_URL
Use shared storage (EFS)

Audio Processing Error

1 ERROR: Failed to process audio: req_xyz789
2 ERROR: Unsupported audio format: audio/webm
3 ERROR: Audio file corrupted or invalid

Cause: Invalid audio file

Solution:

Verify audio format (WAV, MP3, FLAC supported)
Check file is not corrupted
Ensure proper sample rate (16kHz+)

API Server Logs

Successful Startup

1 INFO: Starting API Server v1.0.0
2 INFO: Connecting to Lightning ASR at http://lightning-asr:2269
3 INFO: Connected to Lightning ASR (2 replicas)
4 INFO: Connecting to License Proxy at http://license-proxy:3369
5 INFO: License validated
6 INFO: API server listening on port 7100

Request Handling

1 INFO: POST /v1/listen from 10.0.1.5
2 DEBUG: Request ID: req_abc123
3 DEBUG: Audio URL: https://example.com/audio.wav
4 DEBUG: Routing to Lightning ASR pod: lightning-asr-0
5 INFO: Response time: 3.2s
6 INFO: Status: 200 OK

Common Errors

Authentication Failed

1 WARNING: Invalid license key from 10.0.1.5
2 WARNING: Missing Authorization header
3 ERROR: License validation failed: expired

Cause: Invalid, missing, or expired license key

Solution:

Verify Authorization: Token <key> header
Check license key is correct
Renew expired license

No ASR Workers

1 ERROR: No Lightning ASR workers available
2 WARNING: Request queued: req_abc123
3 WARNING: Queue size: 15

Cause: All Lightning ASR pods busy or down

Solution:

Check Lightning ASR pods: kubectl get pods
Scale up replicas
Check HPA configuration

Request Timeout

1 ERROR: Request timeout after 300s
2 ERROR: Lightning ASR pod not responding: lightning-asr-0
3 WARNING: Retrying with different pod

Cause: Lightning ASR overloaded or crashed

Solution:

Check Lightning ASR logs
Increase timeout
Scale up pods

License Proxy Logs

Successful Validation

1 INFO: Starting License Proxy v1.0.0
2 INFO: License key loaded
3 INFO: Connecting to api.smallest.ai
4 INFO: License validated successfully
5 INFO: License valid until: 2025-12-31T23:59:59Z
6 INFO: Grace period: 24 hours
7 INFO: Server listening on port 3369

Usage Reporting

1 DEBUG: Reporting usage batch: 150 requests
2 DEBUG: Total duration: 3600s
3 DEBUG: Features: [streaming, punctuation]
4 INFO: Usage reported successfully

Common Errors

License Validation Failed

1 ERROR: License validation failed: Invalid license key
2 ERROR: License server returned 401 Unauthorized
3 CRITICAL: Cannot start without valid license

Cause: Invalid or expired license

Solution:

Verify LICENSE_KEY is correct
Check license hasn’t expired
Contact support@smallest.ai

Connection Failed

1 WARNING: Connection to api.smallest.ai failed
2 WARNING: Connection timeout after 10s
3 INFO: Using cached validation
4 INFO: Grace period active (23h remaining)

Cause: Network connectivity issue

Solution:

Test: curl https://api.smallest.ai
Check firewall allows HTTPS
Restore connectivity before grace period expires

Grace Period Expiring

1 WARNING: Grace period expires in 1 hour
2 WARNING: Cannot connect to license server
3 ERROR: Grace period expired
4 CRITICAL: Service will stop accepting requests

Cause: Extended network outage

Solution:

Restore network connectivity immediately
Check firewall rules
Contact support if persistent

Redis Logs

Normal Operation

1 Ready to accept connections
2 Client connected from 10.0.1.5:45678
3 DB 0: 1523 keys (expires: 0)

Common Errors

Memory Limit Reached

1 WARNING: Memory usage: 95%
2 ERROR: OOM command not allowed when used memory > 'maxmemory'

Solution:

Increase memory limit
Enable eviction policy
Clear old keys

Persistence Issues

1 ERROR: Failed writing the RDB file
2 ERROR: Disk is full

Solution:

Increase disk space
Disable persistence if not needed
Clean up old snapshots

Log Pattern Analysis

Error Rate Analysis

Count errors in last 1000 lines:

$ kubectl logs <pod> --tail=1000 | grep -c "ERROR"

Group errors by type:

$ kubectl logs <pod> | grep "ERROR" | sort | uniq -c | sort -rn

Performance Analysis

Extract response times:

$ kubectl logs <pod> | grep "Response time" | awk '{print $NF}' | sort -n

Calculate average:

$ kubectl logs <pod> | grep "Response time" | awk '{sum+=$NF; count++} END {print sum/count}'

Request Tracking

Follow a specific request ID:

$ kubectl logs <pod> | grep "req_abc123"

Across all pods:

$ kubectl logs -l app=lightning-asr | grep "req_abc123"

Log Aggregation

Using stern

Install stern:

$ brew install stern

Follow logs from all Lightning ASR pods:

$ stern lightning-asr -n smallest

Filter by pattern:

$ stern lightning-asr -n smallest --grep "ERROR"

Using Loki (if installed)

Query logs via LogQL:

{app="lightning-asr"} |= "ERROR"
{app="api-server"} |= "req_abc123"
rate({app="lightning-asr"}[5m])

Structured Logging

Parse JSON Logs

If logs are in JSON format:

$ kubectl logs <pod> | jq 'select(.level=="ERROR")'
$ kubectl logs <pod> | jq 'select(.duration > 1000)'
$ kubectl logs <pod> | jq '.message' -r

Filter by Field

$ kubectl logs <pod> | jq 'select(.request_id=="req_abc123")'
$ kubectl logs <pod> | jq 'select(.component=="license_proxy")'

Log Retention

Configure Log Rotation

Docker:

docker-compose.yml

1 services:
2   lightning-asr:
3     logging:
4       driver: "json-file"
5       options:
6         max-size: "10m"
7         max-file: "3"

Kubernetes:

1 apiVersion: v1
2 kind: Pod
3 metadata:
4   name: lightning-asr
5 spec:
6   containers:
7   - name: lightning-asr
8     imagePullPolicy: Always

Kubernetes automatically rotates logs via kubelet.

Export Logs

Save logs for analysis:

$ kubectl logs <pod> > logs.txt
$ kubectl logs <pod> --since=1h > logs-last-hour.txt
$ kubectl logs <pod> --since-time=2024-01-15T10:00:00Z > logs-since.txt

Debugging Log Issues

No Logs Appearing

Check pod is running:

$ kubectl get pods -n smallest
$ kubectl describe pod <pod-name>

Check stdout/stderr:

$ kubectl exec -it <pod> -- sh -c "ls -la /proc/1/fd/"

Logs Truncated

Increase log size limits:

1 apiVersion: v1
2 kind: Pod
3 metadata:
4   annotations:
5     kubernetes.io/psp: privileged
6 spec:
7   containers:
8   - name: app
9     env:
10     - name: LOG_MAX_SIZE
11       value: "100M"

Best Practices

Use Structured Logging

Prefer JSON format for easier parsing:

1 {
2   "timestamp": "2024-01-15T10:30:00Z",
3   "level": "ERROR",
4   "message": "Request failed",
5   "request_id": "req_abc123",
6   "duration_ms": 3200
7 }

Include Context

Always include relevant context in logs:

Request ID
Component name
Timestamp
User/session info (if applicable)

Set Appropriate Levels

Use correct log levels:

DEBUG: Development only
INFO: Normal operation
WARNING: Potential issues
ERROR: Actual problems
CRITICAL: Service-breaking issues

Aggregate Logs

Use centralized logging:

ELK Stack (Elasticsearch, Logstash, Kibana)
Loki + Grafana
CloudWatch Logs (AWS)
Cloud Logging (GCP)

What’s Next?

Common Issues

Quick solutions to frequent problems

Debugging Guide

Advanced debugging techniques