82 lines
1.8 KiB
Markdown
82 lines
1.8 KiB
Markdown
|
|
# Monitoring & Alerts
|
||
|
|
|
||
|
|
## Health Endpoints
|
||
|
|
|
||
|
|
| Endpoint | Description |
|
||
|
|
|----------|-------------|
|
||
|
|
| `/api/health` | API server health |
|
||
|
|
| `/api/health/db` | Database connectivity |
|
||
|
|
| `/api/health/blockchain` | Besu node status |
|
||
|
|
|
||
|
|
### Health Response
|
||
|
|
|
||
|
|
```json
|
||
|
|
{
|
||
|
|
"status": "healthy",
|
||
|
|
"timestamp": "2026-02-09T10:00:00Z",
|
||
|
|
"components": {
|
||
|
|
"database": "healthy",
|
||
|
|
"blockchain": "healthy",
|
||
|
|
"cache": "healthy"
|
||
|
|
}
|
||
|
|
}
|
||
|
|
```
|
||
|
|
|
||
|
|
## Key Metrics
|
||
|
|
|
||
|
|
### Application Metrics
|
||
|
|
|
||
|
|
| Metric | Description | Alert Threshold |
|
||
|
|
|--------|-------------|-----------------|
|
||
|
|
| `http_request_duration_seconds` | API response time | > 2s |
|
||
|
|
| `http_requests_total` | Request count | - |
|
||
|
|
| `active_sessions` | Logged-in users | - |
|
||
|
|
| `queue_depth` | Pending jobs | > 1000 |
|
||
|
|
|
||
|
|
### Infrastructure Metrics
|
||
|
|
|
||
|
|
| Metric | Description | Alert Threshold |
|
||
|
|
|--------|-------------|-----------------|
|
||
|
|
| `cpu_usage_percent` | CPU utilization | > 80% |
|
||
|
|
| `memory_usage_percent` | Memory utilization | > 85% |
|
||
|
|
| `disk_usage_percent` | Disk utilization | > 90% |
|
||
|
|
| `db_connection_pool` | Active connections | > 80% of max |
|
||
|
|
|
||
|
|
### Business Metrics
|
||
|
|
|
||
|
|
| Metric | Description |
|
||
|
|
|--------|-------------|
|
||
|
|
| `applications_submitted` | New applications |
|
||
|
|
| `applications_processed` | Completed processing |
|
||
|
|
| `sla_breaches` | SLA violations |
|
||
|
|
| `certificates_issued` | Licenses issued |
|
||
|
|
|
||
|
|
## Alert Configuration
|
||
|
|
|
||
|
|
### Critical Alerts
|
||
|
|
|
||
|
|
- API health check failing
|
||
|
|
- Database unreachable
|
||
|
|
- Blockchain node disconnected
|
||
|
|
- Disk space < 10%
|
||
|
|
|
||
|
|
### Warning Alerts
|
||
|
|
|
||
|
|
- Response time > 2 seconds
|
||
|
|
- Error rate > 1%
|
||
|
|
- SLA breach count increasing
|
||
|
|
- Certificate minting failures
|
||
|
|
|
||
|
|
## Dashboard
|
||
|
|
|
||
|
|
Access Grafana dashboards at:
|
||
|
|
```
|
||
|
|
https://monitoring.tlas.gov.in/grafana
|
||
|
|
```
|
||
|
|
|
||
|
|
Dashboards available:
|
||
|
|
- System Overview
|
||
|
|
- Application Processing
|
||
|
|
- Blockchain Status
|
||
|
|
- SLA Compliance
|