Files
logwisp/doc/troubleshooting.md

9.6 KiB

Troubleshooting Guide

This guide helps diagnose and resolve common issues with LogWisp.

Diagnostic Tools

Enable Debug Logging

The first step in troubleshooting is enabling debug logs:

# Via command line
logwisp --log-level debug --log-output stderr

# Via environment
export LOGWISP_LOGGING_LEVEL=debug
logwisp

# Via config
[logging]
level = "debug"
output = "stderr"

Check Status Endpoint

Verify LogWisp is running and processing:

# Basic check
curl http://localhost:8080/status

# Pretty print
curl -s http://localhost:8080/status | jq .

# Check specific metrics
curl -s http://localhost:8080/status | jq '.monitor'

Test Log Streaming

Verify streams are working:

# Test SSE stream (should show heartbeats if enabled)
curl -N http://localhost:8080/stream

# Test with timeout
timeout 5 curl -N http://localhost:8080/stream

# Test TCP stream
nc localhost 9090

Common Issues

No Logs Appearing

Symptoms:

  • Stream connects but no log entries appear
  • Status shows total_entries: 0

Diagnosis:

  1. Check monitor configuration:

    curl -s http://localhost:8080/status | jq '.monitor'
    
  2. Verify file paths exist:

    # Check your configured paths
    ls -la /var/log/myapp/
    
  3. Check file permissions:

    # LogWisp user must have read access
    sudo -u logwisp ls /var/log/myapp/
    
  4. Verify files match pattern:

    # If pattern is "*.log"
    ls /var/log/myapp/*.log
    
  5. Check if files are being updated:

    # Should show recent timestamps
    ls -la /var/log/myapp/*.log
    tail -f /var/log/myapp/app.log
    

Solutions:

  • Fix file permissions:

    sudo chmod 644 /var/log/myapp/*.log
    sudo usermod -a -G adm logwisp  # Add to log group
    
  • Correct path configuration:

    targets = [
        { path = "/correct/path/to/logs", pattern = "*.log" }
    ]
    
  • Use absolute paths:

    # Bad: Relative path
    targets = [{ path = "./logs", pattern = "*.log" }]
    
    # Good: Absolute path
    targets = [{ path = "/var/log/app", pattern = "*.log" }]
    

High CPU Usage

Symptoms:

  • LogWisp process using excessive CPU
  • System slowdown

Diagnosis:

  1. Check process CPU:

    top -p $(pgrep logwisp)
    
  2. Review check intervals:

    grep check_interval /etc/logwisp/logwisp.toml
    
  3. Count active watchers:

    curl -s http://localhost:8080/status | jq '.monitor.active_watchers'
    
  4. Check filter complexity:

    curl -s http://localhost:8080/status | jq '.filters'
    

Solutions:

  • Increase check interval:

    [streams.monitor]
    check_interval_ms = 1000  # Was 50ms
    
  • Reduce watched files:

    # Instead of watching entire directory
    targets = [
        { path = "/var/log/specific-app.log", is_file = true }
    ]
    
  • Simplify filter patterns:

    # Complex regex (slow)
    patterns = ["^\\[\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\]\\s+\\[(ERROR|WARN)\\]"]
    
    # Simple patterns (fast)
    patterns = ["ERROR", "WARN"]
    

Memory Growth

Symptoms:

  • Increasing memory usage over time
  • Eventually runs out of memory

Diagnosis:

  1. Monitor memory usage:

    watch -n 10 'ps aux | grep logwisp'
    
  2. Check connection count:

    curl -s http://localhost:8080/status | jq '.server.active_clients'
    
  3. Check for dropped entries:

    curl -s http://localhost:8080/status | jq '.monitor.dropped_entries'
    

Solutions:

  • Limit connections:

    [streams.httpserver.rate_limit]
    enabled = true
    max_connections_per_ip = 5
    max_total_connections = 100
    
  • Reduce buffer sizes:

    [streams.httpserver]
    buffer_size = 500  # Was 5000
    
  • Enable rate limiting:

    [streams.httpserver.rate_limit]
    enabled = true
    requests_per_second = 10.0
    

Connection Refused

Symptoms:

  • Cannot connect to LogWisp
  • curl: (7) Failed to connect

Diagnosis:

  1. Check if LogWisp is running:

    ps aux | grep logwisp
    systemctl status logwisp
    
  2. Verify listening ports:

    sudo netstat -tlnp | grep logwisp
    # or
    sudo ss -tlnp | grep logwisp
    
  3. Check firewall:

    sudo iptables -L -n | grep 8080
    sudo ufw status
    

Solutions:

  • Start the service:

    sudo systemctl start logwisp
    
  • Fix port configuration:

    [streams.httpserver]
    enabled = true  # Must be true
    port = 8080     # Correct port
    
  • Open firewall:

    sudo ufw allow 8080/tcp
    

Rate Limit Errors

Symptoms:

  • HTTP 429 responses
  • "Rate limit exceeded" errors

Diagnosis:

  1. Check rate limit stats:

    curl -s http://localhost:8080/status | jq '.features.rate_limit'
    
  2. Test rate limits:

    # Rapid requests
    for i in {1..20}; do curl -s -o /dev/null -w "%{http_code}\n" http://localhost:8080/status; done
    

Solutions:

  • Increase rate limits:

    [streams.httpserver.rate_limit]
    requests_per_second = 50.0  # Was 10.0
    burst_size = 100            # Was 20
    
  • Use per-IP limiting:

    limit_by = "ip"  # Instead of "global"
    
  • Disable for internal use:

    enabled = false
    

Filter Not Working

Symptoms:

  • Unwanted logs still appearing
  • Wanted logs being filtered out

Diagnosis:

  1. Check filter configuration:

    curl -s http://localhost:8080/status | jq '.filters'
    
  2. Test patterns:

    # Test regex pattern
    echo "ERROR: test message" | grep -E "your-pattern"
    
  3. Enable debug logging to see filter decisions:

    logwisp --log-level debug 2>&1 | grep filter
    

Solutions:

  • Fix pattern syntax:

    # Word boundaries
    patterns = ["\\bERROR\\b"]  # Not "ERROR" which matches "TERROR"
    
    # Case insensitive
    patterns = ["(?i)error"]
    
  • Check filter order:

    # Include filters run first
    [[streams.filters]]
    type = "include"
    patterns = ["ERROR", "WARN"]
    
    # Then exclude filters
    [[streams.filters]]
    type = "exclude"
    patterns = ["IGNORE_THIS"]
    
  • Use correct logic:

    logic = "or"   # Match ANY pattern
    # not
    logic = "and"  # Match ALL patterns
    

Logs Dropping

Symptoms:

  • dropped_entries counter increasing
  • Missing log entries in stream

Diagnosis:

  1. Check drop statistics:

    curl -s http://localhost:8080/status | jq '{
      dropped: .monitor.dropped_entries,
      total: .monitor.total_entries,
      percent: (.monitor.dropped_entries / .monitor.total_entries * 100)
    }'
    
  2. Monitor drop rate:

    watch -n 5 'curl -s http://localhost:8080/status | jq .monitor.dropped_entries'
    

Solutions:

  • Increase buffer sizes:

    [streams.httpserver]
    buffer_size = 5000  # Was 1000
    
  • Add flow control:

    [streams.monitor]
    check_interval_ms = 500  # Slow down reading
    
  • Reduce clients:

    [streams.httpserver.rate_limit]
    max_total_connections = 50
    

Performance Issues

Slow Response Times

Diagnosis:

# Measure response time
time curl -s http://localhost:8080/status > /dev/null

# Check system load
uptime
top

Solutions:

  • Reduce concurrent operations
  • Increase system resources
  • Use TCP instead of HTTP for high volume

Network Bandwidth

Diagnosis:

# Monitor network usage
iftop -i eth0 -f "port 8080"

# Check connection count
ss -tan | grep :8080 | wc -l

Solutions:

  • Enable compression (future feature)
  • Filter more aggressively
  • Use TCP for local connections

Debug Commands

System Information

# LogWisp version
logwisp --version

# System resources
free -h
df -h
ulimit -a

# Network state
ss -tlnp
netstat -anp | grep logwisp

Process Inspection

# Process details
ps aux | grep logwisp

# Open files
lsof -p $(pgrep logwisp)

# System calls (Linux)
strace -p $(pgrep logwisp) -e trace=open,read,write

# File system activity
inotifywait -m /var/log/myapp/

Configuration Validation

# Test configuration
logwisp --config test.toml --log-level debug --log-output stderr

# Check file syntax
cat /etc/logwisp/logwisp.toml | grep -E "^\s*\["

# Validate TOML
python3 -m pip install toml
python3 -c "import toml; toml.load('/etc/logwisp/logwisp.toml'); print('Valid')"

Getting Help

Collect Diagnostic Information

Create a diagnostic bundle:

#!/bin/bash
# diagnostic.sh

DIAG_DIR="logwisp-diag-$(date +%Y%m%d-%H%M%S)"
mkdir -p "$DIAG_DIR"

# Version
logwisp --version > "$DIAG_DIR/version.txt" 2>&1

# Configuration (sanitized)
grep -v "password\|secret\|token" /etc/logwisp/logwisp.toml > "$DIAG_DIR/config.toml"

# Status
curl -s http://localhost:8080/status > "$DIAG_DIR/status.json"

# System info
uname -a > "$DIAG_DIR/system.txt"
free -h >> "$DIAG_DIR/system.txt"
df -h >> "$DIAG_DIR/system.txt"

# Process info
ps aux | grep logwisp > "$DIAG_DIR/process.txt"
lsof -p $(pgrep logwisp) > "$DIAG_DIR/files.txt" 2>&1

# Recent logs
journalctl -u logwisp -n 1000 > "$DIAG_DIR/logs.txt" 2>&1

# Create archive
tar -czf "$DIAG_DIR.tar.gz" "$DIAG_DIR"
rm -rf "$DIAG_DIR"

echo "Diagnostic bundle created: $DIAG_DIR.tar.gz"

Report Issues

When reporting issues, include:

  1. LogWisp version
  2. Configuration (sanitized)
  3. Error messages
  4. Steps to reproduce
  5. Diagnostic bundle

See Also