e1.6.0 Documentation update.

2025-07-10 13:51:27 -04:00
parent 43c98b08f9
commit 52d6a3c86d
12 changed files with 3747 additions and 313 deletions
--- a/doc/disk-management.md
+++ b/doc/disk-management.md
@ -0,0 +1,348 @@
+# Disk Management
+
+[← Logging Guide](logging-guide.md) | [← Back to README](../README.md) | [Heartbeat Monitoring →](heartbeat-monitoring.md)
+
+Comprehensive guide to log file rotation, retention policies, and disk space management.
+
+## Table of Contents
+
+- [File Rotation](#file-rotation)
+- [Disk Space Management](#disk-space-management)
+- [Retention Policies](#retention-policies)
+- [Adaptive Monitoring](#adaptive-monitoring)
+- [Recovery Behavior](#recovery-behavior)
+- [Best Practices](#best-practices)
+
+## File Rotation
+
+### Automatic Rotation
+
+Log files are automatically rotated when they reach the configured size limit:
+
+```go
+logger.InitWithDefaults(
+    "max_size_mb=100",  // Rotate at 100MB
+)
+```
+
+### Rotation Behavior
+
+1. **Size Check**: Before each write, the logger checks if the file would exceed `max_size_mb`
+2. **New File Creation**: Creates a new file with timestamp: `appname_240115_103045_123456789.log`
+3. **Seamless Transition**: No logs are lost during rotation
+4. **Old File Closure**: Previous file is properly closed and synced
+
+### File Naming Convention
+
+```
+{name}_{YYMMDD}_{HHMMSS}_{nanoseconds}.{extension}
+
+Example: myapp_240115_143022_987654321.log
+```
+
+Components:
+- `name`: Configured log name
+- `YYMMDD`: Date (year, month, day)
+- `HHMMSS`: Time (hour, minute, second)
+- `nanoseconds`: For uniqueness
+- `extension`: Configured extension
+
+## Disk Space Management
+
+### Space Limits
+
+The logger enforces two types of space limits:
+
+```go
+logger.InitWithDefaults(
+    "max_total_size_mb=1000",   // Total log directory size
+    "min_disk_free_mb=5000",    // Minimum free disk space
+)
+```
+
+### Automatic Cleanup
+
+When limits are exceeded, the logger:
+1. Identifies oldest log files
+2. Deletes them until space requirements are met
+3. Preserves the current active log file
+4. Logs cleanup actions for audit
+
+### Example Configuration
+
+```go
+// Conservative: Strict limits
+logger.InitWithDefaults(
+    "max_size_mb=50",          // 50MB files
+    "max_total_size_mb=500",   // 500MB total
+    "min_disk_free_mb=1000",   // 1GB free required
+)
+
+// Generous: Large files, external archival
+logger.InitWithDefaults(
+    "max_size_mb=1000",        // 1GB files
+    "max_total_size_mb=0",     // No total limit
+    "min_disk_free_mb=100",    // 100MB free required
+)
+
+// Balanced: Production defaults
+logger.InitWithDefaults(
+    "max_size_mb=100",         // 100MB files
+    "max_total_size_mb=5000",  // 5GB total
+    "min_disk_free_mb=500",    // 500MB free required
+)
+```
+
+## Retention Policies
+
+### Time-Based Retention
+
+Automatically delete logs older than a specified duration:
+
+```go
+logger.InitWithDefaults(
+    "retention_period_hrs=168",    // Keep 7 days
+    "retention_check_mins=60",     // Check hourly
+)
+```
+
+### Retention Examples
+
+```go
+// Daily logs, keep 30 days
+logger.InitWithDefaults(
+    "retention_period_hrs=720",    // 30 days
+    "retention_check_mins=60",     // Check hourly
+    "max_size_mb=1000",           // 1GB daily files
+)
+
+// High-frequency logs, keep 24 hours
+logger.InitWithDefaults(
+    "retention_period_hrs=24",     // 1 day
+    "retention_check_mins=15",     // Check every 15 min
+    "max_size_mb=100",            // 100MB files
+)
+
+// Compliance: Keep 90 days
+logger.InitWithDefaults(
+    "retention_period_hrs=2160",   // 90 days
+    "retention_check_mins=360",    // Check every 6 hours
+    "max_total_size_mb=100000",   // 100GB total
+)
+```
+
+### Retention Priority
+
+When multiple policies conflict, cleanup priority is:
+1. **Disk free space** (highest priority)
+2. **Total size limit**
+3. **Retention period** (lowest priority)
+
+## Adaptive Monitoring
+
+### Adaptive Disk Checks
+
+The logger adjusts disk check frequency based on logging volume:
+
+```go
+logger.InitWithDefaults(
+    "enable_adaptive_interval=true",
+    "disk_check_interval_ms=5000",    // Base: 5 seconds
+    "min_check_interval_ms=100",      // Minimum: 100ms
+    "max_check_interval_ms=60000",    // Maximum: 1 minute
+)
+```
+
+### How It Works
+
+1. **Low Activity**: Interval increases (up to max)
+2. **High Activity**: Interval decreases (down to min)
+3. **Reactive Checks**: Immediate check after 10MB written
+
+### Monitoring Disk Usage
+
+Check disk-related heartbeat messages:
+
+```go
+logger.InitWithDefaults(
+    "heartbeat_level=2",           // Enable disk stats
+    "heartbeat_interval_s=300",    // Every 5 minutes
+)
+```
+
+Output:
+```
+2024-01-15T10:30:00Z DISK type="disk" sequence=1 rotated_files=5 deleted_files=2 total_log_size_mb="487.32" log_file_count=8 current_file_size_mb="23.45" disk_status_ok=true disk_free_mb="5234.67"
+```
+
+## Recovery Behavior
+
+### Disk Full Handling
+
+When disk space is exhausted:
+
+1. **Detection**: Write failure or space check triggers recovery
+2. **Cleanup Attempt**: Delete oldest logs to free space
+3. **Status Update**: Set `disk_status_ok=false` if cleanup fails
+4. **Log Dropping**: New logs dropped until space available
+5. **Recovery**: Automatic retry on next disk check
+
+### Monitoring Recovery
+
+```go
+// Check for disk issues in logs
+grep "disk full" /var/log/myapp/*.log
+grep "cleanup failed" /var/log/myapp/*.log
+
+// Monitor disk status in heartbeats
+grep "disk_status_ok=false" /var/log/myapp/*.log
+```
+
+### Manual Intervention
+
+If automatic cleanup fails:
+
+```bash
+# Check disk usage
+df -h /var/log
+
+# Find large log files
+find /var/log/myapp -name "*.log" -size +100M
+
+# Manual cleanup (oldest first)
+ls -t /var/log/myapp/*.log | tail -n 20 | xargs rm
+
+# Verify space
+df -h /var/log
+```
+
+## Best Practices
+
+### 1. Plan for Growth
+
+Estimate log volume and set appropriate limits:
+
+```go
+// Calculate required space:
+// - Average log entry: 200 bytes
+// - Entries per second: 100
+// - Daily volume: 200 * 100 * 86400 = 1.7GB
+
+logger.InitWithDefaults(
+    "max_size_mb=2000",          // 2GB files (~ 1 day)
+    "max_total_size_mb=15000",   // 15GB (~ 1 week)
+    "retention_period_hrs=168",   // 7 days
+)
+```
+
+### 2. External Archival
+
+For long-term storage, implement external archival:
+
+```go
+// Configure for archival
+logger.InitWithDefaults(
+    "max_size_mb=1000",          // 1GB files for easy transfer
+    "max_total_size_mb=10000",   // 10GB local buffer
+    "retention_period_hrs=48",    // 2 days local
+)
+
+// Archive completed files
+func archiveCompletedLogs(archivePath string) error {
+    files, _ := filepath.Glob("/var/log/myapp/*.log")
+    for _, file := range files {
+        if !isCurrentLogFile(file) {
+            // Move to archive storage (S3, NFS, etc.)
+            if err := archiveFile(file, archivePath); err != nil {
+                return err
+            }
+            os.Remove(file)
+        }
+    }
+    return nil
+}
+```
+
+### 3. Monitor Disk Health
+
+Set up alerts for disk issues:
+
+```go
+// Parse heartbeat logs for monitoring
+type DiskStats struct {
+    TotalSizeMB    float64
+    FileCount      int
+    DiskFreeMB     float64
+    DiskStatusOK   bool
+}
+
+func monitorDiskHealth(logLine string) {
+    if strings.Contains(logLine, "type=\"disk\"") {
+        stats := parseDiskHeartbeat(logLine)
+        
+        if !stats.DiskStatusOK {
+            alert("Log disk unhealthy")
+        }
+        
+        if stats.DiskFreeMB < 1000 {
+            alert("Low disk space: %.0fMB free", stats.DiskFreeMB)
+        }
+        
+        if stats.FileCount > 100 {
+            alert("Too many log files: %d", stats.FileCount)
+        }
+    }
+}
+```
+
+### 4. Separate Log Volumes
+
+Use dedicated volumes for logs:
+
+```bash
+# Create dedicated log volume
+mkdir -p /mnt/logs
+mount /dev/sdb1 /mnt/logs
+
+# Configure logger
+logger.InitWithDefaults(
+    "directory=/mnt/logs/myapp",
+    "max_total_size_mb=50000",   # Use most of volume
+    "min_disk_free_mb=1000",     # Leave 1GB free
+)
+```
+
+### 5. Test Cleanup Behavior
+
+Verify cleanup works before production:
+
+```go
+// Test configuration
+func TestDiskCleanup(t *testing.T) {
+    logger := log.NewLogger()
+    logger.InitWithDefaults(
+        "directory=./test_logs",
+        "max_size_mb=1",             // Small files
+        "max_total_size_mb=5",       // Low limit
+        "retention_period_hrs=0.01", // 36 seconds
+        "retention_check_mins=0.5",  // 30 seconds
+    )
+    
+    // Generate logs to trigger cleanup
+    for i := 0; i < 1000; i++ {
+        logger.Info(strings.Repeat("x", 1000))
+    }
+    
+    time.Sleep(45 * time.Second)
+    
+    // Verify cleanup occurred
+    files, _ := filepath.Glob("./test_logs/*.log")
+    if len(files) > 5 {
+        t.Errorf("Cleanup failed: %d files remain", len(files))
+    }
+}
+```
+
+---
+
+[← Logging Guide](logging-guide.md) | [← Back to README](../README.md) | [Heartbeat Monitoring →](heartbeat-monitoring.md)