# Heartbeat Monitoring [← Disk Management](disk-management.md) | [← Back to README](../README.md) | [Performance →](performance.md) Guide to using heartbeat messages for operational monitoring and system health tracking. ## Table of Contents - [Overview](#overview) - [Heartbeat Levels](#heartbeat-levels) - [Configuration](#configuration) - [Heartbeat Messages](#heartbeat-messages) - [Monitoring Integration](#monitoring-integration) - [Use Cases](#use-cases) ## Overview Heartbeats are periodic log messages that provide operational statistics about the logger and system. They bypass normal log level filtering, ensuring visibility even when running at higher log levels. ### Key Features - **Always Visible**: Heartbeats use special log levels that bypass filtering - **Multi-Level Detail**: Choose from process, disk, or system statistics - **Production Monitoring**: Track logger health without debug logs - **Metrics Source**: Parse heartbeats for monitoring dashboards ## Heartbeat Levels ### Level 0: Disabled (Default) No heartbeat messages are generated. ```go logger.InitWithDefaults( "heartbeat_level=0", // No heartbeats ) ``` ### Level 1: Process Statistics (PROC) Basic logger operation metrics: ```go logger.InitWithDefaults( "heartbeat_level=1", "heartbeat_interval_s=300", // Every 5 minutes ) ``` **Output:** ``` 2024-01-15T10:30:00Z PROC type="proc" sequence=1 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0 ``` **Fields:** - `sequence`: Incrementing counter - `uptime_hours`: Logger uptime - `processed_logs`: Successfully written logs - `dropped_logs`: Logs lost due to buffer overflow ### Level 2: Process + Disk Statistics (DISK) Includes file and disk usage information: ```go logger.InitWithDefaults( "heartbeat_level=2", "heartbeat_interval_s=300", ) ``` **Additional Output:** ``` 2024-01-15T10:30:00Z DISK type="disk" sequence=1 rotated_files=12 deleted_files=5 total_log_size_mb="487.32" log_file_count=8 current_file_size_mb="23.45" disk_status_ok=true disk_free_mb="5234.67" ``` **Additional Fields:** - `rotated_files`: Total file rotations - `deleted_files`: Files removed by cleanup - `total_log_size_mb`: Size of all log files - `log_file_count`: Number of log files - `current_file_size_mb`: Active file size - `disk_status_ok`: Disk health status - `disk_free_mb`: Available disk space ### Level 3: Process + Disk + System Statistics (SYS) Includes runtime and memory metrics: ```go logger.InitWithDefaults( "heartbeat_level=3", "heartbeat_interval_s=60", // Every minute for detailed monitoring ) ``` **Additional Output:** ``` 2024-01-15T10:30:00Z SYS type="sys" sequence=1 alloc_mb="45.23" sys_mb="128.45" num_gc=1523 num_goroutine=42 ``` **Additional Fields:** - `alloc_mb`: Allocated memory - `sys_mb`: System memory reserved - `num_gc`: Garbage collection runs - `num_goroutine`: Active goroutines ## Configuration ### Basic Configuration ```go logger.InitWithDefaults( "heartbeat_level=2", // Process + Disk stats "heartbeat_interval_s=300", // Every 5 minutes ) ``` ### Interval Recommendations | Environment | Level | Interval | Rationale | |-------------|-------|----------|-----------| | Development | 3 | 30s | Detailed debugging info | | Staging | 2 | 300s | Balance detail vs noise | | Production | 1-2 | 300-600s | Minimize overhead | | High-Load | 1 | 600s | Reduce I/O impact | ### Dynamic Adjustment ```go // Start with basic monitoring logger.InitWithDefaults( "heartbeat_level=1", "heartbeat_interval_s=600", ) // During incident, increase detail logger.InitWithDefaults( "heartbeat_level=3", "heartbeat_interval_s=60", ) // After resolution, reduce back logger.InitWithDefaults( "heartbeat_level=1", "heartbeat_interval_s=600", ) ``` ## Heartbeat Messages ### JSON Format Example With `format=json`, heartbeats are structured for easy parsing: ```json { "time": "2024-01-15T10:30:00.123456789Z", "level": "PROC", "fields": [ "type", "proc", "sequence", 42, "uptime_hours", "24.50", "processed_logs", 1847293, "dropped_logs", 0 ] } ``` ### Text Format Example With `format=txt`, heartbeats are human-readable: ``` 2024-01-15T10:30:00.123456789Z PROC type="proc" sequence=42 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0 ``` ## Monitoring Integration ### Prometheus Exporter ```go type LoggerMetrics struct { logger *log.Logger uptime prometheus.Gauge processedTotal prometheus.Counter droppedTotal prometheus.Counter diskUsageMB prometheus.Gauge diskFreeSpace prometheus.Gauge fileCount prometheus.Gauge } func (m *LoggerMetrics) ParseHeartbeat(line string) { if strings.Contains(line, "type=\"proc\"") { // Extract and update process metrics if match := regexp.MustCompile(`processed_logs=(\d+)`).FindStringSubmatch(line); match != nil { if val, err := strconv.ParseFloat(match[1], 64); err == nil { m.processedTotal.Set(val) } } } if strings.Contains(line, "type=\"disk\"") { // Extract and update disk metrics if match := regexp.MustCompile(`total_log_size_mb="([0-9.]+)"`).FindStringSubmatch(line); match != nil { if val, err := strconv.ParseFloat(match[1], 64); err == nil { m.diskUsageMB.Set(val) } } } } ``` ### Grafana Dashboard Create alerts based on heartbeat metrics: ```yaml # Dropped logs alert - alert: HighLogDropRate expr: rate(logger_dropped_total[5m]) > 10 annotations: summary: "High log drop rate detected" description: "Logger dropping {{ $value }} logs/sec" # Disk space alert - alert: LogDiskSpaceLow expr: logger_disk_free_mb < 1000 annotations: summary: "Low log disk space" description: "Only {{ $value }}MB free on log disk" # Logger health alert - alert: LoggerUnhealthy expr: logger_disk_status_ok == 0 annotations: summary: "Logger disk status unhealthy" ``` ### ELK Stack Integration Logstash filter for parsing heartbeats: ```ruby filter { if [message] =~ /type="(proc|disk|sys)"/ { grok { match => { "message" => [ '%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} type="%{WORD:heartbeat_type}" sequence=%{NUMBER:sequence:int} uptime_hours="%{NUMBER:uptime_hours:float}" processed_logs=%{NUMBER:processed_logs:int} dropped_logs=%{NUMBER:dropped_logs:int}', '%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} type="%{WORD:heartbeat_type}" sequence=%{NUMBER:sequence:int} rotated_files=%{NUMBER:rotated_files:int} deleted_files=%{NUMBER:deleted_files:int} total_log_size_mb="%{NUMBER:total_log_size_mb:float}"' ] } } mutate { add_tag => [ "heartbeat", "metrics" ] } } } ``` ## Use Cases ### 1. Production Health Monitoring ```go // Production configuration logger.InitWithDefaults( "level=4", // Warn and Error only "heartbeat_level=2", // But still get disk stats "heartbeat_interval_s=300", // Every 5 minutes ) // Monitor for: // - Dropped logs (buffer overflow) // - Disk space issues // - File rotation frequency // - Logger uptime (crash detection) ``` ### 2. Performance Tuning ```go // Detailed monitoring during load test logger.InitWithDefaults( "heartbeat_level=3", // All stats "heartbeat_interval_s=10", // Frequent updates ) // Track: // - Memory usage trends // - Goroutine leaks // - GC frequency // - Log throughput ``` ### 3. Capacity Planning ```go // Long-term trending logger.InitWithDefaults( "heartbeat_level=2", "heartbeat_interval_s=3600", // Hourly ) // Analyze: // - Log growth rate // - Rotation frequency // - Disk usage trends // - Seasonal patterns ``` ### 4. Debugging Logger Issues ```go // When investigating logger problems logger.InitWithDefaults( "level=-4", // Debug everything "heartbeat_level=3", // All heartbeats "heartbeat_interval_s=5", // Very frequent "enable_stdout=true", // Console output ) ``` ### 5. Alerting Script ```bash #!/bin/bash # Monitor heartbeats for issues tail -f /var/log/myapp/*.log | while read line; do if [[ $line =~ type=\"proc\" ]]; then if [[ $line =~ dropped_logs=([0-9]+) ]] && [[ ${BASH_REMATCH[1]} -gt 0 ]]; then alert "Logs being dropped: ${BASH_REMATCH[1]}" fi fi if [[ $line =~ type=\"disk\" ]]; then if [[ $line =~ disk_status_ok=false ]]; then alert "Logger disk unhealthy!" fi if [[ $line =~ disk_free_mb=\"([0-9.]+)\" ]]; then free_mb=${BASH_REMATCH[1]} if (( $(echo "$free_mb < 500" | bc -l) )); then alert "Low disk space: ${free_mb}MB" fi fi fi done ``` --- [← Disk Management](disk-management.md) | [← Back to README](../README.md) | [Performance →](performance.md)