lixen/log

SHA256

Fork 0

Files

Lixen Wraith 52d6a3c86d e1.6.0 Documentation update.

2025-07-10 13:51:27 -04:00

8.8 KiB

Raw Blame History

Heartbeat Monitoring

← Disk Management | ← Back to README | Performance →

Guide to using heartbeat messages for operational monitoring and system health tracking.

Overview
Heartbeat Levels
Configuration
Heartbeat Messages
Monitoring Integration
Use Cases

Overview

Heartbeats are periodic log messages that provide operational statistics about the logger and system. They bypass normal log level filtering, ensuring visibility even when running at higher log levels.

Key Features

Always Visible: Heartbeats use special log levels that bypass filtering
Multi-Level Detail: Choose from process, disk, or system statistics
Production Monitoring: Track logger health without debug logs
Metrics Source: Parse heartbeats for monitoring dashboards

Heartbeat Levels

Level 0: Disabled (Default)

No heartbeat messages are generated.

logger.InitWithDefaults(
    "heartbeat_level=0",  // No heartbeats
)

Level 1: Process Statistics (PROC)

Basic logger operation metrics:

logger.InitWithDefaults(
    "heartbeat_level=1",
    "heartbeat_interval_s=300",  // Every 5 minutes
)

Output:

2024-01-15T10:30:00Z PROC type="proc" sequence=1 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0

Fields:

sequence: Incrementing counter
uptime_hours: Logger uptime
processed_logs: Successfully written logs
dropped_logs: Logs lost due to buffer overflow

Level 2: Process + Disk Statistics (DISK)

Includes file and disk usage information:

logger.InitWithDefaults(
    "heartbeat_level=2",
    "heartbeat_interval_s=300",
)

Additional Output:

2024-01-15T10:30:00Z DISK type="disk" sequence=1 rotated_files=12 deleted_files=5 total_log_size_mb="487.32" log_file_count=8 current_file_size_mb="23.45" disk_status_ok=true disk_free_mb="5234.67"

Additional Fields:

rotated_files: Total file rotations
deleted_files: Files removed by cleanup
total_log_size_mb: Size of all log files
log_file_count: Number of log files
current_file_size_mb: Active file size
disk_status_ok: Disk health status
disk_free_mb: Available disk space

Level 3: Process + Disk + System Statistics (SYS)

Includes runtime and memory metrics:

logger.InitWithDefaults(
    "heartbeat_level=3",
    "heartbeat_interval_s=60",  // Every minute for detailed monitoring
)

Additional Output:

2024-01-15T10:30:00Z SYS type="sys" sequence=1 alloc_mb="45.23" sys_mb="128.45" num_gc=1523 num_goroutine=42

Additional Fields:

alloc_mb: Allocated memory
sys_mb: System memory reserved
num_gc: Garbage collection runs
num_goroutine: Active goroutines

Configuration

Basic Configuration

logger.InitWithDefaults(
    "heartbeat_level=2",         // Process + Disk stats
    "heartbeat_interval_s=300",  // Every 5 minutes
)

Interval Recommendations

Environment	Level	Interval	Rationale
Development	3	30s	Detailed debugging info
Staging	2	300s	Balance detail vs noise
Production	1-2	300-600s	Minimize overhead
High-Load	1	600s	Reduce I/O impact

Dynamic Adjustment

// Start with basic monitoring
logger.InitWithDefaults(
    "heartbeat_level=1",
    "heartbeat_interval_s=600",
)

// During incident, increase detail
logger.InitWithDefaults(
    "heartbeat_level=3",
    "heartbeat_interval_s=60",
)

// After resolution, reduce back
logger.InitWithDefaults(
    "heartbeat_level=1",
    "heartbeat_interval_s=600",
)

Heartbeat Messages

JSON Format Example

With format=json, heartbeats are structured for easy parsing:

{
  "time": "2024-01-15T10:30:00.123456789Z",
  "level": "PROC",
  "fields": [
    "type", "proc",
    "sequence", 42,
    "uptime_hours", "24.50",
    "processed_logs", 1847293,
    "dropped_logs", 0
  ]
}

Text Format Example

With format=txt, heartbeats are human-readable:

2024-01-15T10:30:00.123456789Z PROC type="proc" sequence=42 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0

Monitoring Integration

Prometheus Exporter

type LoggerMetrics struct {
    logger *log.Logger
    
    uptime          prometheus.Gauge
    processedTotal  prometheus.Counter
    droppedTotal    prometheus.Counter
    diskUsageMB     prometheus.Gauge
    diskFreeSpace   prometheus.Gauge
    fileCount       prometheus.Gauge
}

func (m *LoggerMetrics) ParseHeartbeat(line string) {
    if strings.Contains(line, "type=\"proc\"") {
        // Extract and update process metrics
        if match := regexp.MustCompile(`processed_logs=(\d+)`).FindStringSubmatch(line); match != nil {
            if val, err := strconv.ParseFloat(match[1], 64); err == nil {
                m.processedTotal.Set(val)
            }
        }
    }
    
    if strings.Contains(line, "type=\"disk\"") {
        // Extract and update disk metrics
        if match := regexp.MustCompile(`total_log_size_mb="([0-9.]+)"`).FindStringSubmatch(line); match != nil {
            if val, err := strconv.ParseFloat(match[1], 64); err == nil {
                m.diskUsageMB.Set(val)
            }
        }
    }
}

Grafana Dashboard

Create alerts based on heartbeat metrics:

# Dropped logs alert
- alert: HighLogDropRate
  expr: rate(logger_dropped_total[5m]) > 10
  annotations:
    summary: "High log drop rate detected"
    description: "Logger dropping {{ $value }} logs/sec"

# Disk space alert
- alert: LogDiskSpaceLow
  expr: logger_disk_free_mb < 1000
  annotations:
    summary: "Low log disk space"
    description: "Only {{ $value }}MB free on log disk"

# Logger health alert
- alert: LoggerUnhealthy
  expr: logger_disk_status_ok == 0
  annotations:
    summary: "Logger disk status unhealthy"

ELK Stack Integration

Logstash filter for parsing heartbeats:

filter {
  if [message] =~ /type="(proc|disk|sys)"/ {
    grok {
      match => {
        "message" => [
          '%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} type="%{WORD:heartbeat_type}" sequence=%{NUMBER:sequence:int} uptime_hours="%{NUMBER:uptime_hours:float}" processed_logs=%{NUMBER:processed_logs:int} dropped_logs=%{NUMBER:dropped_logs:int}',
          '%{TIMESTAMP_ISO8601:timestamp} %{WORD:level} type="%{WORD:heartbeat_type}" sequence=%{NUMBER:sequence:int} rotated_files=%{NUMBER:rotated_files:int} deleted_files=%{NUMBER:deleted_files:int} total_log_size_mb="%{NUMBER:total_log_size_mb:float}"'
        ]
      }
    }
    
    mutate {
      add_tag => [ "heartbeat", "metrics" ]
    }
  }
}

Use Cases

1. Production Health Monitoring

// Production configuration
logger.InitWithDefaults(
    "level=4",                   // Warn and Error only
    "heartbeat_level=2",         // But still get disk stats
    "heartbeat_interval_s=300",  // Every 5 minutes
)

// Monitor for:
// - Dropped logs (buffer overflow)
// - Disk space issues
// - File rotation frequency
// - Logger uptime (crash detection)

2. Performance Tuning

// Detailed monitoring during load test
logger.InitWithDefaults(
    "heartbeat_level=3",        // All stats
    "heartbeat_interval_s=10",  // Frequent updates
)

// Track:
// - Memory usage trends
// - Goroutine leaks
// - GC frequency
// - Log throughput

3. Capacity Planning

// Long-term trending
logger.InitWithDefaults(
    "heartbeat_level=2",
    "heartbeat_interval_s=3600",  // Hourly
)

// Analyze:
// - Log growth rate
// - Rotation frequency
// - Disk usage trends
// - Seasonal patterns

4. Debugging Logger Issues

// When investigating logger problems
logger.InitWithDefaults(
    "level=-4",                 // Debug everything
    "heartbeat_level=3",        // All heartbeats
    "heartbeat_interval_s=5",   // Very frequent
    "enable_stdout=true",       // Console output
)

5. Alerting Script

#!/bin/bash
# Monitor heartbeats for issues

tail -f /var/log/myapp/*.log | while read line; do
    if [[ $line =~ type=\"proc\" ]]; then
        if [[ $line =~ dropped_logs=([0-9]+) ]] && [[ ${BASH_REMATCH[1]} -gt 0 ]]; then
            alert "Logs being dropped: ${BASH_REMATCH[1]}"
        fi
    fi
    
    if [[ $line =~ type=\"disk\" ]]; then
        if [[ $line =~ disk_status_ok=false ]]; then
            alert "Logger disk unhealthy!"
        fi
        
        if [[ $line =~ disk_free_mb=\"([0-9.]+)\" ]]; then
            free_mb=${BASH_REMATCH[1]}
            if (( $(echo "$free_mb < 500" | bc -l) )); then
                alert "Low disk space: ${free_mb}MB"
            fi
        fi
    fi
done