Files
log/doc/heartbeat-monitoring.md

4.0 KiB

Heartbeat Monitoring

Guide to using heartbeat messages for operational monitoring and system health tracking.

Overview

Heartbeats are periodic log messages that provide operational statistics about the logger and system. They bypass normal log level filtering, ensuring visibility even when running at higher log levels.

Key Features

  • Always Visible: Heartbeats use special log levels that bypass filtering
  • Multi-Level Detail: Choose from process, disk, or system statistics
  • Production Monitoring: Track logger health without debug logs
  • Metrics Source: Parse heartbeats for monitoring dashboards

Heartbeat Levels

Level 0: Disabled (Default)

No heartbeat messages are generated.

logger.ApplyOverride(
    "heartbeat_level=0",  // No heartbeats
)

Level 1: Process Statistics (PROC)

Basic logger operation metrics:

logger.ApplyOverride(
    "heartbeat_level=1",
    "heartbeat_interval_s=300",  // Every 5 minutes
)

Output:

2024-01-15T10:30:00Z PROC type="proc" sequence=1 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0

Fields:

  • sequence: Incrementing counter
  • uptime_hours: Logger uptime
  • processed_logs: Successfully written logs
  • dropped_logs: Logs lost due to buffer overflow

Level 2: Process + Disk Statistics (DISK)

Includes file and disk usage information:

logger.ApplyOverride(
    "heartbeat_level=2",
    "heartbeat_interval_s=300",
)

Additional Output:

2024-01-15T10:30:00Z DISK type="disk" sequence=1 rotated_files=12 deleted_files=5 total_log_size_mb="487.32" log_file_count=8 current_file_size_mb="23.45" disk_status_ok=true disk_free_mb="5234.67"

Additional Fields:

  • rotated_files: Total file rotations
  • deleted_files: Files removed by cleanup
  • total_log_size_mb: Size of all log files
  • log_file_count: Number of log files
  • current_file_size_mb: Active file size
  • disk_status_ok: Disk health status
  • disk_free_mb: Available disk space

Level 3: Process + Disk + System Statistics (SYS)

Includes runtime and memory metrics:

logger.ApplyOverride(
    "heartbeat_level=3",
    "heartbeat_interval_s=60",  // Every minute for detailed monitoring
)

Additional Output:

2024-01-15T10:30:00Z SYS type="sys" sequence=1 alloc_mb="45.23" sys_mb="128.45" num_gc=1523 num_goroutine=42

Additional Fields:

  • alloc_mb: Allocated memory
  • sys_mb: System memory reserved
  • num_gc: Garbage collection runs
  • num_goroutine: Active goroutines

Configuration

Basic Configuration

logger.ApplyOverride(
    "heartbeat_level=2",         // Process + Disk stats
    "heartbeat_interval_s=300",  // Every 5 minutes
)

Interval Recommendations

Environment Level Interval Rationale
Development 3 30s Detailed debugging info
Staging 2 300s Balance detail vs noise
Production 1-2 300-600s Minimize overhead
High-Load 1 600s Reduce I/O impact

Dynamic Adjustment

// Start with basic monitoring
logger.ApplyOverride(
    "heartbeat_level=1",
    "heartbeat_interval_s=600",
)

// During incident, increase detail
logger.ApplyOverride(
    "heartbeat_level=3",
    "heartbeat_interval_s=60",
)

// After resolution, reduce back
logger.ApplyOverride(
    "heartbeat_level=1",
    "heartbeat_interval_s=600",
)

Heartbeat Messages

JSON Format Example

With format=json, heartbeats are structured for easy parsing:

{
  "time": "2024-01-15T10:30:00.123456789Z",
  "level": "PROC",
  "fields": [
    "type", "proc",
    "sequence", 42,
    "uptime_hours", "24.50",
    "processed_logs", 1847293,
    "dropped_logs", 0
  ]
}

Text Format Example

With format=txt, heartbeats are human-readable:

2024-01-15T10:30:00.123456789Z PROC type="proc" sequence=42 uptime_hours="24.50" processed_logs=1847293 dropped_logs=0

← Disk Management | ← Back to README | Compatibility Adapters →