e1.6.0 Documentation update.

This commit is contained in:
2025-07-10 13:51:27 -04:00
parent 43c98b08f9
commit 52d6a3c86d
12 changed files with 3747 additions and 313 deletions

348
doc/disk-management.md Normal file
View File

@ -0,0 +1,348 @@
# Disk Management
[← Logging Guide](logging-guide.md) | [← Back to README](../README.md) | [Heartbeat Monitoring →](heartbeat-monitoring.md)
Comprehensive guide to log file rotation, retention policies, and disk space management.
## Table of Contents
- [File Rotation](#file-rotation)
- [Disk Space Management](#disk-space-management)
- [Retention Policies](#retention-policies)
- [Adaptive Monitoring](#adaptive-monitoring)
- [Recovery Behavior](#recovery-behavior)
- [Best Practices](#best-practices)
## File Rotation
### Automatic Rotation
Log files are automatically rotated when they reach the configured size limit:
```go
logger.InitWithDefaults(
"max_size_mb=100", // Rotate at 100MB
)
```
### Rotation Behavior
1. **Size Check**: Before each write, the logger checks if the file would exceed `max_size_mb`
2. **New File Creation**: Creates a new file with timestamp: `appname_240115_103045_123456789.log`
3. **Seamless Transition**: No logs are lost during rotation
4. **Old File Closure**: Previous file is properly closed and synced
### File Naming Convention
```
{name}_{YYMMDD}_{HHMMSS}_{nanoseconds}.{extension}
Example: myapp_240115_143022_987654321.log
```
Components:
- `name`: Configured log name
- `YYMMDD`: Date (year, month, day)
- `HHMMSS`: Time (hour, minute, second)
- `nanoseconds`: For uniqueness
- `extension`: Configured extension
## Disk Space Management
### Space Limits
The logger enforces two types of space limits:
```go
logger.InitWithDefaults(
"max_total_size_mb=1000", // Total log directory size
"min_disk_free_mb=5000", // Minimum free disk space
)
```
### Automatic Cleanup
When limits are exceeded, the logger:
1. Identifies oldest log files
2. Deletes them until space requirements are met
3. Preserves the current active log file
4. Logs cleanup actions for audit
### Example Configuration
```go
// Conservative: Strict limits
logger.InitWithDefaults(
"max_size_mb=50", // 50MB files
"max_total_size_mb=500", // 500MB total
"min_disk_free_mb=1000", // 1GB free required
)
// Generous: Large files, external archival
logger.InitWithDefaults(
"max_size_mb=1000", // 1GB files
"max_total_size_mb=0", // No total limit
"min_disk_free_mb=100", // 100MB free required
)
// Balanced: Production defaults
logger.InitWithDefaults(
"max_size_mb=100", // 100MB files
"max_total_size_mb=5000", // 5GB total
"min_disk_free_mb=500", // 500MB free required
)
```
## Retention Policies
### Time-Based Retention
Automatically delete logs older than a specified duration:
```go
logger.InitWithDefaults(
"retention_period_hrs=168", // Keep 7 days
"retention_check_mins=60", // Check hourly
)
```
### Retention Examples
```go
// Daily logs, keep 30 days
logger.InitWithDefaults(
"retention_period_hrs=720", // 30 days
"retention_check_mins=60", // Check hourly
"max_size_mb=1000", // 1GB daily files
)
// High-frequency logs, keep 24 hours
logger.InitWithDefaults(
"retention_period_hrs=24", // 1 day
"retention_check_mins=15", // Check every 15 min
"max_size_mb=100", // 100MB files
)
// Compliance: Keep 90 days
logger.InitWithDefaults(
"retention_period_hrs=2160", // 90 days
"retention_check_mins=360", // Check every 6 hours
"max_total_size_mb=100000", // 100GB total
)
```
### Retention Priority
When multiple policies conflict, cleanup priority is:
1. **Disk free space** (highest priority)
2. **Total size limit**
3. **Retention period** (lowest priority)
## Adaptive Monitoring
### Adaptive Disk Checks
The logger adjusts disk check frequency based on logging volume:
```go
logger.InitWithDefaults(
"enable_adaptive_interval=true",
"disk_check_interval_ms=5000", // Base: 5 seconds
"min_check_interval_ms=100", // Minimum: 100ms
"max_check_interval_ms=60000", // Maximum: 1 minute
)
```
### How It Works
1. **Low Activity**: Interval increases (up to max)
2. **High Activity**: Interval decreases (down to min)
3. **Reactive Checks**: Immediate check after 10MB written
### Monitoring Disk Usage
Check disk-related heartbeat messages:
```go
logger.InitWithDefaults(
"heartbeat_level=2", // Enable disk stats
"heartbeat_interval_s=300", // Every 5 minutes
)
```
Output:
```
2024-01-15T10:30:00Z DISK type="disk" sequence=1 rotated_files=5 deleted_files=2 total_log_size_mb="487.32" log_file_count=8 current_file_size_mb="23.45" disk_status_ok=true disk_free_mb="5234.67"
```
## Recovery Behavior
### Disk Full Handling
When disk space is exhausted:
1. **Detection**: Write failure or space check triggers recovery
2. **Cleanup Attempt**: Delete oldest logs to free space
3. **Status Update**: Set `disk_status_ok=false` if cleanup fails
4. **Log Dropping**: New logs dropped until space available
5. **Recovery**: Automatic retry on next disk check
### Monitoring Recovery
```go
// Check for disk issues in logs
grep "disk full" /var/log/myapp/*.log
grep "cleanup failed" /var/log/myapp/*.log
// Monitor disk status in heartbeats
grep "disk_status_ok=false" /var/log/myapp/*.log
```
### Manual Intervention
If automatic cleanup fails:
```bash
# Check disk usage
df -h /var/log
# Find large log files
find /var/log/myapp -name "*.log" -size +100M
# Manual cleanup (oldest first)
ls -t /var/log/myapp/*.log | tail -n 20 | xargs rm
# Verify space
df -h /var/log
```
## Best Practices
### 1. Plan for Growth
Estimate log volume and set appropriate limits:
```go
// Calculate required space:
// - Average log entry: 200 bytes
// - Entries per second: 100
// - Daily volume: 200 * 100 * 86400 = 1.7GB
logger.InitWithDefaults(
"max_size_mb=2000", // 2GB files (~ 1 day)
"max_total_size_mb=15000", // 15GB (~ 1 week)
"retention_period_hrs=168", // 7 days
)
```
### 2. External Archival
For long-term storage, implement external archival:
```go
// Configure for archival
logger.InitWithDefaults(
"max_size_mb=1000", // 1GB files for easy transfer
"max_total_size_mb=10000", // 10GB local buffer
"retention_period_hrs=48", // 2 days local
)
// Archive completed files
func archiveCompletedLogs(archivePath string) error {
files, _ := filepath.Glob("/var/log/myapp/*.log")
for _, file := range files {
if !isCurrentLogFile(file) {
// Move to archive storage (S3, NFS, etc.)
if err := archiveFile(file, archivePath); err != nil {
return err
}
os.Remove(file)
}
}
return nil
}
```
### 3. Monitor Disk Health
Set up alerts for disk issues:
```go
// Parse heartbeat logs for monitoring
type DiskStats struct {
TotalSizeMB float64
FileCount int
DiskFreeMB float64
DiskStatusOK bool
}
func monitorDiskHealth(logLine string) {
if strings.Contains(logLine, "type=\"disk\"") {
stats := parseDiskHeartbeat(logLine)
if !stats.DiskStatusOK {
alert("Log disk unhealthy")
}
if stats.DiskFreeMB < 1000 {
alert("Low disk space: %.0fMB free", stats.DiskFreeMB)
}
if stats.FileCount > 100 {
alert("Too many log files: %d", stats.FileCount)
}
}
}
```
### 4. Separate Log Volumes
Use dedicated volumes for logs:
```bash
# Create dedicated log volume
mkdir -p /mnt/logs
mount /dev/sdb1 /mnt/logs
# Configure logger
logger.InitWithDefaults(
"directory=/mnt/logs/myapp",
"max_total_size_mb=50000", # Use most of volume
"min_disk_free_mb=1000", # Leave 1GB free
)
```
### 5. Test Cleanup Behavior
Verify cleanup works before production:
```go
// Test configuration
func TestDiskCleanup(t *testing.T) {
logger := log.NewLogger()
logger.InitWithDefaults(
"directory=./test_logs",
"max_size_mb=1", // Small files
"max_total_size_mb=5", // Low limit
"retention_period_hrs=0.01", // 36 seconds
"retention_check_mins=0.5", // 30 seconds
)
// Generate logs to trigger cleanup
for i := 0; i < 1000; i++ {
logger.Info(strings.Repeat("x", 1000))
}
time.Sleep(45 * time.Second)
// Verify cleanup occurred
files, _ := filepath.Glob("./test_logs/*.log")
if len(files) > 5 {
t.Errorf("Cleanup failed: %d files remain", len(files))
}
}
```
---
[← Logging Guide](logging-guide.md) | [← Back to README](../README.md) | [Heartbeat Monitoring →](heartbeat-monitoring.md)