v0.2.0 restructured to pipeline architecture, dirty

This commit is contained in:
2025-07-11 04:52:41 -04:00
parent 5936f82970
commit b503816de3
51 changed files with 4132 additions and 5936 deletions

View File

@ -1,21 +1,17 @@
# Filter Guide
LogWisp's filtering system allows you to control which log entries are streamed to clients, reducing noise and focusing on what matters.
LogWisp filters control which log entries pass through pipelines using regular expressions.
## How Filters Work
Filters use regular expressions to match log entries. Each filter can either:
- **Include**: Only matching logs pass through (whitelist)
- **Include**: Only matching logs pass (whitelist)
- **Exclude**: Matching logs are dropped (blacklist)
- Multiple filters apply sequentially - all must pass
Multiple filters are applied sequentially - a log entry must pass ALL filters to be streamed.
## Filter Configuration
### Basic Structure
## Configuration
```toml
[[streams.filters]]
[[pipelines.filters]]
type = "include" # or "exclude"
logic = "or" # or "and"
patterns = [
@ -26,115 +22,79 @@ patterns = [
### Filter Types
#### Include Filter (Whitelist)
Only logs matching the patterns are streamed:
#### Include Filter
```toml
[[streams.filters]]
[[pipelines.filters]]
type = "include"
logic = "or"
patterns = [
"ERROR",
"WARN",
"CRITICAL"
]
# Result: Only ERROR, WARN, or CRITICAL logs are streamed
patterns = ["ERROR", "WARN", "CRITICAL"]
# Only ERROR, WARN, or CRITICAL logs pass
```
#### Exclude Filter (Blacklist)
Logs matching the patterns are dropped:
#### Exclude Filter
```toml
[[streams.filters]]
[[pipelines.filters]]
type = "exclude"
patterns = [
"DEBUG",
"TRACE",
"/health"
]
# Result: DEBUG, TRACE, and health check logs are filtered out
patterns = ["DEBUG", "TRACE", "/health"]
# DEBUG, TRACE, and health checks are dropped
```
### Logic Operators
#### OR Logic (Default)
Log matches if ANY pattern matches:
- **OR**: Match ANY pattern (default)
- **AND**: Match ALL patterns
```toml
[[streams.filters]]
type = "include"
# OR Logic
logic = "or"
patterns = ["ERROR", "FAIL", "EXCEPTION"]
# Matches: "ERROR: disk full" OR "FAIL: connection timeout" OR "NullPointerException"
```
patterns = ["ERROR", "FAIL"]
# Matches: "ERROR: disk full" OR "FAIL: timeout"
#### AND Logic
Log matches only if ALL patterns match:
```toml
[[streams.filters]]
type = "include"
# AND Logic
logic = "and"
patterns = ["database", "timeout", "ERROR"]
# Matches: "ERROR: database connection timeout"
# Doesn't match: "ERROR: file not found" (missing "database" and "timeout")
# Not: "ERROR: file not found"
```
## Pattern Syntax
LogWisp uses Go's regular expression syntax (RE2):
### Basic Patterns
Go regular expressions (RE2):
```toml
"ERROR" # Substring match
"(?i)error" # Case-insensitive
"\\berror\\b" # Word boundaries
"^ERROR" # Start of line
"ERROR$" # End of line
"error|fail|warn" # Alternatives
```
## Common Patterns
### Log Levels
```toml
patterns = [
"ERROR", # Exact substring match
"(?i)error", # Case-insensitive
"\\berror\\b", # Word boundaries
"^ERROR", # Start of line
"ERROR$", # End of line
"ERR(OR)?", # Optional group
"error|fail|exception" # Alternatives
"\\[(ERROR|WARN|INFO)\\]", # [ERROR] format
"(?i)\\b(error|warning)\\b", # Word boundaries
"level=(error|warn)", # key=value format
]
```
### Common Pattern Examples
#### Log Levels
### Application Errors
```toml
# Standard log levels
patterns = [
"\\[(ERROR|WARN|INFO|DEBUG)\\]", # [ERROR] format
"(?i)\\b(error|warning|info|debug)\\b", # Word boundaries
"level=(error|warn|info|debug)", # key=value format
"<(Error|Warning|Info|Debug)>" # XML-style
]
# Severity patterns
patterns = [
"(?i)(fatal|critical|severe)",
"(?i)(error|fail|exception)",
"(?i)(warn|warning|caution)",
"panic:", # Go panics
"Traceback", # Python errors
]
```
#### Application Errors
```toml
# Java/JVM
# Java
patterns = [
"Exception",
"\\.java:[0-9]+", # Stack trace lines
"at com\\.mycompany\\.", # Company packages
"NullPointerException|ClassNotFoundException"
"at .+\\.java:[0-9]+",
"NullPointerException"
]
# Python
patterns = [
"Traceback \\(most recent call last\\)",
"Traceback",
"File \".+\\.py\", line [0-9]+",
"(ValueError|TypeError|KeyError)"
"ValueError|TypeError"
]
# Go
@ -143,297 +103,73 @@ patterns = [
"goroutine [0-9]+",
"runtime error:"
]
```
# Node.js
### Performance Issues
```toml
patterns = [
"Error:",
"at .+ \\(.+\\.js:[0-9]+:[0-9]+\\)",
"UnhandledPromiseRejection"
"took [0-9]{4,}ms", # >999ms operations
"timeout|timed out",
"slow query",
"high cpu|cpu usage: [8-9][0-9]%"
]
```
#### Performance Issues
### HTTP Patterns
```toml
patterns = [
"took [0-9]{4,}ms", # Operations over 999ms
"duration>[0-9]{3,}s", # Long durations
"timeout|timed out", # Timeouts
"slow query", # Database
"memory pressure", # Memory issues
"high cpu|cpu usage: [8-9][0-9]%" # CPU issues
]
```
#### Security Patterns
```toml
patterns = [
"(?i)(unauthorized|forbidden|denied)",
"(?i)(auth|authentication) fail",
"invalid (token|session|credentials)",
"SQL injection|XSS|CSRF",
"brute force|rate limit",
"suspicious activity"
]
```
#### HTTP Patterns
```toml
# Error status codes
patterns = [
"status[=:][4-5][0-9]{2}", # status=404, status:500
"HTTP/[0-9.]+ [4-5][0-9]{2}", # HTTP/1.1 404
"\"status\":\\s*[4-5][0-9]{2}" # JSON "status": 500
]
# Specific endpoints
patterns = [
"\"(GET|POST|PUT|DELETE) /api/",
"/api/v[0-9]+/users",
"path=\"/admin"
"status[=:][4-5][0-9]{2}", # 4xx/5xx codes
"HTTP/[0-9.]+ [4-5][0-9]{2}",
"\"/api/v[0-9]+/", # API paths
]
```
## Filter Chains
Multiple filters create a processing chain. Each filter must pass for the log to be streamed.
### Example: Error Monitoring
### Error Monitoring
```toml
# Step 1: Include only errors and warnings
[[streams.filters]]
# Include errors
[[pipelines.filters]]
type = "include"
logic = "or"
patterns = [
"(?i)\\b(error|fail|exception)\\b",
"(?i)\\b(warn|warning)\\b",
"(?i)\\b(critical|fatal|severe)\\b"
]
patterns = ["(?i)\\b(error|fail|critical)\\b"]
# Step 2: Exclude known non-issues
[[streams.filters]]
# Exclude known non-issues
[[pipelines.filters]]
type = "exclude"
patterns = [
"Error: Expected behavior",
"Warning: Deprecated API",
"INFO.*error in message" # INFO logs talking about errors
]
# Step 3: Exclude noisy sources
[[streams.filters]]
type = "exclude"
patterns = [
"/health",
"/metrics",
"ELB-HealthChecker",
"Googlebot"
]
patterns = ["Error: Expected", "/health"]
```
### Example: API Monitoring
### API Monitoring
```toml
# Include only API calls
[[streams.filters]]
# Include API calls
[[pipelines.filters]]
type = "include"
patterns = [
"/api/",
"/v[0-9]+/"
]
patterns = ["/api/", "/v[0-9]+/"]
# Exclude successful requests
[[streams.filters]]
# Exclude successful
[[pipelines.filters]]
type = "exclude"
patterns = [
"\" 200 ", # HTTP 200 OK
"\" 201 ", # HTTP 201 Created
"\" 204 ", # HTTP 204 No Content
"\" 304 " # HTTP 304 Not Modified
]
# Exclude OPTIONS requests (CORS)
[[streams.filters]]
type = "exclude"
patterns = [
"OPTIONS "
]
patterns = ["\" 2[0-9]{2} "]
```
### Example: Security Audit
```toml
# Include security-relevant events
[[streams.filters]]
type = "include"
logic = "or"
patterns = [
"(?i)auth",
"(?i)login|logout",
"(?i)sudo|root",
"(?i)ssh|sftp|ftp",
"(?i)firewall|iptables",
"COMMAND=", # sudo commands
"USER=", # user actions
"SELINUX"
]
## Performance Tips
# Must also contain failure/success indicators
[[streams.filters]]
type = "include"
logic = "or"
patterns = [
"(?i)(fail|denied|error)",
"(?i)(success|accepted|granted)",
"(?i)(invalid|unauthorized)"
]
```
## Performance Considerations
### Pattern Complexity
Simple patterns are fast (~1μs per check):
```toml
patterns = ["ERROR", "WARN", "FATAL"]
```
Complex patterns are slower (~10-100μs per check):
```toml
patterns = [
"^\\[\\d{4}-\\d{2}-\\d{2}T\\d{2}:\\d{2}:\\d{2}\\]\\s+\\[(ERROR|WARN)\\]\\s+\\[([^\\]]+)\\]\\s+(.+)$"
]
```
### Optimization Tips
1. **Use anchors when possible**:
```toml
"^ERROR" # Faster than "ERROR"
```
2. **Avoid nested quantifiers**:
```toml
# BAD: Can cause exponential backtracking
"((a+)+)+"
# GOOD: Linear time
"a+"
```
3. **Use non-capturing groups**:
```toml
"(?:error|warn)" # Instead of "(error|warn)"
```
4. **Order patterns by frequency**:
```toml
# Most common first
patterns = ["ERROR", "WARN", "INFO", "DEBUG"]
```
5. **Prefer character classes**:
```toml
"[0-9]" # Instead of "\\d"
"[a-zA-Z]" # Instead of "\\w"
```
1. **Use anchors**: `^ERROR` faster than `ERROR`
2. **Avoid nested quantifiers**: `((a+)+)+`
3. **Non-capturing groups**: `(?:error|warn)`
4. **Order by frequency**: Most common first
5. **Simple patterns**: Faster than complex regex
## Testing Filters
### Test Configuration
Create a test configuration with sample logs:
```toml
[[streams]]
name = "test"
[streams.monitor]
targets = [{ path = "./test-logs", pattern = "*.log" }]
[[streams.filters]]
type = "include"
patterns = ["YOUR_PATTERN_HERE"]
[streams.httpserver]
enabled = true
port = 8888
```
### Generate Test Logs
```bash
# Create test log entries
echo "[ERROR] Database connection failed" >> test-logs/app.log
echo "[INFO] User logged in" >> test-logs/app.log
echo "[WARN] High memory usage: 85%" >> test-logs/app.log
# Test configuration
echo "[ERROR] Test" >> test.log
echo "[INFO] Test" >> test.log
# Run LogWisp with debug logging
logwisp --config test.toml --log-level debug
# Run with debug
logwisp --log-level debug
# Check what passes through
curl -N http://localhost:8888/stream
```
### Debug Filter Behavior
Enable debug logging to see filter decisions:
```bash
logwisp --log-level debug --log-output stderr
```
Look for messages like:
```
Entry filtered out component=filter_chain filter_index=0 filter_type=include
Entry passed all filters component=filter_chain
```
## Common Pitfalls
### Case Sensitivity
By default, patterns are case-sensitive:
```toml
# Won't match "error" or "Error"
patterns = ["ERROR"]
# Use case-insensitive flag
patterns = ["(?i)error"]
```
### Partial Matches
Patterns match substrings by default:
```toml
# Matches "ERROR", "ERRORS", "TERROR"
patterns = ["ERROR"]
# Use word boundaries for exact words
patterns = ["\\bERROR\\b"]
```
### Special Characters
Remember to escape regex special characters:
```toml
# Won't work as expected
patterns = ["[ERROR]"]
# Correct: escape brackets
patterns = ["\\[ERROR\\]"]
```
### Performance Impact
Too many complex patterns can impact performance:
```toml
# Consider splitting into multiple streams instead
[[streams.filters]]
patterns = [
# 50+ complex patterns...
]
```
## Best Practices
1. **Start Simple**: Begin with basic patterns and refine as needed
2. **Test Thoroughly**: Use test logs to verify filter behavior
3. **Monitor Performance**: Check filter statistics in `/status`
4. **Document Patterns**: Comment complex patterns for maintenance
5. **Use Multiple Streams**: Instead of complex filters, consider separate streams
6. **Regular Review**: Periodically review and optimize filter rules
## See Also
- [Configuration Guide](configuration.md) - Complete configuration reference
- [Performance Tuning](performance.md) - Optimization guidelines
- [Examples](examples/) - Real-world filter configurations
# Check output
curl -N http://localhost:8080/stream
```