11 KiB
Monitoring & Status Guide
LogWisp provides comprehensive monitoring capabilities through status endpoints, operational logs, and metrics.
Status Endpoints
Stream Status
Each stream exposes its own status endpoint:
# Standalone mode
curl http://localhost:8080/status
# Router mode
curl http://localhost:8080/streamname/status
Example response:
{
"service": "LogWisp",
"version": "1.0.0",
"server": {
"type": "http",
"port": 8080,
"active_clients": 5,
"buffer_size": 1000,
"uptime_seconds": 3600,
"mode": {
"standalone": true,
"router": false
}
},
"monitor": {
"active_watchers": 3,
"total_entries": 152341,
"dropped_entries": 12,
"start_time": "2024-01-20T10:00:00Z",
"last_entry_time": "2024-01-20T11:00:00Z"
},
"filters": {
"filter_count": 2,
"total_processed": 152341,
"total_passed": 48234,
"filters": [
{
"type": "include",
"logic": "or",
"pattern_count": 3,
"total_processed": 152341,
"total_matched": 48234,
"total_dropped": 0
}
]
},
"features": {
"heartbeat": {
"enabled": true,
"interval": 30,
"format": "comment"
},
"rate_limit": {
"enabled": true,
"total_requests": 8234,
"blocked_requests": 89,
"active_ips": 12,
"total_connections": 5
}
}
}
Global Status (Router Mode)
In router mode, a global status endpoint provides aggregated information:
curl http://localhost:8080/status
Key Metrics
Monitor Metrics
Track file watching performance:
| Metric | Description | Healthy Range |
|---|---|---|
active_watchers |
Number of files being watched | 1-1000 |
total_entries |
Total log entries processed | Increasing |
dropped_entries |
Entries dropped due to buffer full | < 1% of total |
entries_per_second |
Current processing rate | Varies |
Connection Metrics
Monitor client connections:
| Metric | Description | Warning Signs |
|---|---|---|
active_clients |
Current SSE connections | Near limit |
tcp_connections |
Current TCP connections | Near limit |
total_connections |
All active connections | > 80% of max |
Filter Metrics
Understand filtering effectiveness:
| Metric | Description | Optimization |
|---|---|---|
total_processed |
Entries checked | - |
total_passed |
Entries that passed | Very low = too restrictive |
total_dropped |
Entries filtered out | Very high = review patterns |
Rate Limit Metrics
Track rate limiting impact:
| Metric | Description | Action Needed |
|---|---|---|
blocked_requests |
Rejected requests | High = increase limits |
active_ips |
Unique clients | High = scale out |
blocked_percentage |
Rejection rate | > 10% = review |
Operational Logging
Log Levels
Configure LogWisp's operational logging:
[logging]
output = "both" # file and stderr
level = "info" # info for production
Log levels and their use:
- DEBUG: Detailed internal operations
- INFO: Normal operations, connections
- WARN: Recoverable issues
- ERROR: Errors requiring attention
Important Log Messages
Startup Messages
LogWisp starting version=1.0.0 config_file=/etc/logwisp.toml
Stream registered with router stream=app
TCP endpoint configured transport=system port=9090
HTTP endpoints configured transport=app stream_url=http://localhost:8080/stream
Connection Events
HTTP client connected remote_addr=192.168.1.100:54231 active_clients=6
HTTP client disconnected remote_addr=192.168.1.100:54231 active_clients=5
TCP connection opened remote_addr=192.168.1.100:54232 active_connections=3
Error Conditions
Failed to open file for checking path=/var/log/app.log error=permission denied
Scanner error while reading file path=/var/log/huge.log error=token too long
Request rate limited ip=192.168.1.100
Connection limit exceeded ip=192.168.1.100 connections=5 limit=5
Performance Warnings
Dropped log entry - subscriber buffer full
Dropped entry for slow client remote_addr=192.168.1.100
Check interval too small: 5ms (min: 10ms)
Health Checks
Basic Health Check
Simple up/down check:
#!/bin/bash
# health_check.sh
STATUS=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:8080/status)
if [ "$STATUS" -eq 200 ]; then
echo "LogWisp is healthy"
exit 0
else
echo "LogWisp is unhealthy (status: $STATUS)"
exit 1
fi
Advanced Health Check
Check specific conditions:
#!/bin/bash
# advanced_health_check.sh
RESPONSE=$(curl -s http://localhost:8080/status)
# Check if processing logs
ENTRIES=$(echo "$RESPONSE" | jq -r '.monitor.total_entries')
if [ "$ENTRIES" -eq 0 ]; then
echo "WARNING: No log entries processed"
exit 1
fi
# Check dropped entries
DROPPED=$(echo "$RESPONSE" | jq -r '.monitor.dropped_entries')
TOTAL=$(echo "$RESPONSE" | jq -r '.monitor.total_entries')
DROP_PERCENT=$(( DROPPED * 100 / TOTAL ))
if [ "$DROP_PERCENT" -gt 5 ]; then
echo "WARNING: High drop rate: ${DROP_PERCENT}%"
exit 1
fi
# Check connections
CONNECTIONS=$(echo "$RESPONSE" | jq -r '.server.active_clients')
echo "OK: Processing logs, $CONNECTIONS active clients"
exit 0
Container Health Check
Docker/Kubernetes configuration:
# Dockerfile
HEALTHCHECK --interval=30s --timeout=3s --retries=3 \
CMD curl -f http://localhost:8080/status || exit 1
# Kubernetes
livenessProbe:
httpGet:
path: /status
port: 8080
initialDelaySeconds: 10
periodSeconds: 30
readinessProbe:
httpGet:
path: /status
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
Monitoring Integration
Prometheus Metrics
Export metrics in Prometheus format:
#!/bin/bash
# prometheus_exporter.sh
while true; do
STATUS=$(curl -s http://localhost:8080/status)
# Extract metrics
CLIENTS=$(echo "$STATUS" | jq -r '.server.active_clients')
ENTRIES=$(echo "$STATUS" | jq -r '.monitor.total_entries')
DROPPED=$(echo "$STATUS" | jq -r '.monitor.dropped_entries')
# Output Prometheus format
cat << EOF
# HELP logwisp_active_clients Number of active streaming clients
# TYPE logwisp_active_clients gauge
logwisp_active_clients $CLIENTS
# HELP logwisp_total_entries Total log entries processed
# TYPE logwisp_total_entries counter
logwisp_total_entries $ENTRIES
# HELP logwisp_dropped_entries Total log entries dropped
# TYPE logwisp_dropped_entries counter
logwisp_dropped_entries $DROPPED
EOF
sleep 60
done
Grafana Dashboard
Key panels for Grafana:
-
Active Connections
- Query:
logwisp_active_clients - Visualization: Graph
- Alert: > 80% of max
- Query:
-
Log Processing Rate
- Query:
rate(logwisp_total_entries[5m]) - Visualization: Graph
- Alert: < 1 entry/min
- Query:
-
Drop Rate
- Query:
rate(logwisp_dropped_entries[5m]) / rate(logwisp_total_entries[5m]) - Visualization: Gauge
- Alert: > 5%
- Query:
-
Rate Limit Rejections
- Query:
rate(logwisp_blocked_requests[5m]) - Visualization: Graph
- Alert: > 10/min
- Query:
Datadog Integration
Send custom metrics:
#!/bin/bash
# datadog_metrics.sh
while true; do
STATUS=$(curl -s http://localhost:8080/status)
# Send metrics to Datadog
echo "$STATUS" | jq -r '
"logwisp.connections:\(.server.active_clients)|g",
"logwisp.entries:\(.monitor.total_entries)|c",
"logwisp.dropped:\(.monitor.dropped_entries)|c"
' | while read metric; do
echo "$metric" | nc -u -w1 localhost 8125
done
sleep 60
done
Performance Monitoring
CPU Usage
Monitor CPU usage by component:
# Check process CPU
top -p $(pgrep logwisp) -b -n 1
# Profile CPU usage
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
Common CPU consumers:
- File watching (reduce check_interval_ms)
- Regex filtering (simplify patterns)
- JSON encoding (reduce clients)
Memory Usage
Track memory consumption:
# Check process memory
ps aux | grep logwisp
# Detailed memory stats
cat /proc/$(pgrep logwisp)/status | grep -E "Vm(RSS|Size)"
Memory optimization:
- Reduce buffer sizes
- Limit connections
- Simplify filters
Network Bandwidth
Monitor streaming bandwidth:
# Network statistics
netstat -i
iftop -i eth0 -f "port 8080"
# Connection count
ss -tan | grep :8080 | wc -l
Alerting
Basic Alerts
Essential alerts to configure:
| Alert | Condition | Severity |
|---|---|---|
| Service Down | Status endpoint fails | Critical |
| High Drop Rate | > 10% entries dropped | Warning |
| No Log Activity | 0 entries/min for 5 min | Warning |
| Connection Limit | > 90% of max connections | Warning |
| Rate Limit High | > 20% requests blocked | Warning |
Alert Script
Example monitoring script:
#!/bin/bash
# monitor_alerts.sh
check_alert() {
local name=$1
local condition=$2
local message=$3
if eval "$condition"; then
echo "ALERT: $name - $message"
# Send to alerting system
# curl -X POST https://alerts.example.com/...
fi
}
while true; do
STATUS=$(curl -s http://localhost:8080/status)
if [ -z "$STATUS" ]; then
check_alert "SERVICE_DOWN" "true" "LogWisp not responding"
sleep 60
continue
fi
# Extract metrics
DROPPED=$(echo "$STATUS" | jq -r '.monitor.dropped_entries')
TOTAL=$(echo "$STATUS" | jq -r '.monitor.total_entries')
CLIENTS=$(echo "$STATUS" | jq -r '.server.active_clients')
# Check conditions
check_alert "HIGH_DROP_RATE" \
"[ $((DROPPED * 100 / TOTAL)) -gt 10 ]" \
"Drop rate above 10%"
check_alert "HIGH_CONNECTIONS" \
"[ $CLIENTS -gt 90 ]" \
"Near connection limit: $CLIENTS/100"
sleep 60
done
Troubleshooting with Monitoring
No Logs Appearing
Check monitor stats:
curl -s http://localhost:8080/status | jq '.monitor'
Look for:
active_watchers= 0 (no files found)total_entriesnot increasing (files not updating)
High CPU Usage
Enable debug logging:
logwisp --log-level debug --log-output stderr
Watch for:
- Frequent "checkFile" messages (reduce check_interval)
- Many filter operations (optimize patterns)
Memory Growth
Monitor over time:
while true; do
ps aux | grep logwisp | grep -v grep
curl -s http://localhost:8080/status | jq '.server.active_clients'
sleep 10
done
Connection Issues
Check connection stats:
# Current connections
curl -s http://localhost:8080/status | jq '.server'
# Rate limit stats
curl -s http://localhost:8080/status | jq '.features.rate_limit'
Best Practices
- Regular Monitoring: Check status endpoints every 30-60 seconds
- Set Alerts: Configure alerts for critical conditions
- Log Rotation: Rotate LogWisp's own logs to prevent disk fill
- Baseline Metrics: Establish normal ranges for your environment
- Capacity Planning: Monitor trends for scaling decisions
- Test Monitoring: Verify alerts work before issues occur
See Also
- Performance Tuning - Optimization guide
- Troubleshooting - Common issues
- Configuration Guide - Monitoring configuration
- Integration Examples - Monitoring system integration