# LogWisp Architecture and Project Structure ## Directory Structure ``` logwisp/ ├── Makefile # Build automation with version injection ├── go.mod # Go module definition ├── go.sum # Go module checksums ├── README.md # Project documentation ├── config/ │ ├── logwisp.toml.defaults # Default configuration and guide │ ├── logwisp.toml.example # Example configuration │ └── logwisp.toml.minimal # Minimal configuration template ├── doc/ │ └── architecture.md # This file - architecture documentation └── src/ ├── cmd/ │ └── logwisp/ │ └── main.go # Application entry point, CLI handling └── internal/ ├── config/ │ ├── auth.go # Authentication configuration structures │ ├── config.go # Main configuration structures │ ├── loader.go # Configuration loading with lixenwraith/config │ ├── server.go # TCP/HTTP server configurations with rate limiting │ ├── ssl.go # SSL/TLS configuration structures │ ├── stream.go # Stream-specific configurations with filters │ └── validation.go # Configuration validation including filters and rate limits ├── filter/ │ ├── filter.go # Regex-based log filtering implementation │ └── chain.go # Sequential filter chain management ├── logstream/ │ ├── httprouter.go # HTTP router for path-based routing │ ├── logstream.go # Stream lifecycle management │ ├── routerserver.go # Router server implementation │ └── service.go # Multi-stream service orchestration ├── monitor/ │ ├── file_watcher.go # File watching and rotation detection │ └── monitor.go # Log monitoring interface and implementation ├── ratelimit/ │ ├── ratelimit.go # Token bucket algorithm implementation │ └── limiter.go # Per-stream rate limiter with IP tracking ├── stream/ │ ├── httpstreamer.go # HTTP/SSE streaming with rate limiting │ ├── noop_logger.go # Silent logger for gnet │ ├── tcpserver.go # TCP server with rate limiting (gnet) │ └── tcpstreamer.go # TCP streaming implementation └── version/ └── version.go # Version information management ``` ## Configuration System ### Configuration Hierarchy (Highest to Lowest Priority) 1. **CLI Arguments**: Direct command-line flags 2. **Environment Variables**: `LOGWISP_` prefixed variables 3. **Configuration File**: TOML format configuration 4. **Built-in Defaults**: Hardcoded default values ### Configuration Locations ```bash # Default configuration file location ~/.config/logwisp.toml # Override via environment variable export LOGWISP_CONFIG_FILE=/etc/logwisp/production.toml # Override config directory export LOGWISP_CONFIG_DIR=/etc/logwisp export LOGWISP_CONFIG_FILE=production.toml # Relative to CONFIG_DIR # Direct CLI override ./logwisp --config /path/to/config.toml ``` ### Environment Variable Mapping Environment variables follow a structured naming pattern: - Prefix: `LOGWISP_` - Path separator: `_` (underscore) - Array index: Numeric suffix (0-based) Examples: ```bash # Stream-specific settings LOGWISP_STREAMS_0_NAME=app LOGWISP_STREAMS_0_MONITOR_CHECK_INTERVAL_MS=50 LOGWISP_STREAMS_0_HTTPSERVER_PORT=8080 LOGWISP_STREAMS_0_HTTPSERVER_BUFFER_SIZE=2000 LOGWISP_STREAMS_0_HTTPSERVER_HEARTBEAT_ENABLED=true LOGWISP_STREAMS_0_HTTPSERVER_HEARTBEAT_FORMAT=json # Filter configuration LOGWISP_STREAMS_0_FILTERS_0_TYPE=include LOGWISP_STREAMS_0_FILTERS_0_LOGIC=or LOGWISP_STREAMS_0_FILTERS_0_PATTERNS='["ERROR","WARN"]' # Rate limiting configuration LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_ENABLED=true LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_REQUESTS_PER_SECOND=10.0 LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_BURST_SIZE=20 LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_LIMIT_BY=ip # Multiple streams LOGWISP_STREAMS_1_NAME=system LOGWISP_STREAMS_1_MONITOR_CHECK_INTERVAL_MS=1000 LOGWISP_STREAMS_1_TCPSERVER_PORT=9090 ``` ## Component Architecture ### Core Components 1. **Service (`logstream.Service`)** - Manages multiple log streams - Handles lifecycle (creation, shutdown) - Provides global statistics - Thread-safe stream registry 2. **LogStream (`logstream.LogStream`)** - Represents a single log monitoring pipeline - Contains: Monitor + Filter Chain + Rate Limiter + Servers (TCP/HTTP) - Independent configuration - Per-stream statistics with filter and rate limit metrics 3. **Monitor (`monitor.Monitor`)** - Watches files and directories - Detects log rotation - Publishes log entries to subscribers - Configurable check intervals 4. **Filter (`filter.Filter`)** - Regex-based log filtering - Include (whitelist) or Exclude (blacklist) modes - OR/AND logic for multiple patterns - Per-filter statistics (processed, matched, dropped) 5. **Filter Chain (`filter.Chain`)** - Sequential application of multiple filters - All filters must pass for entry to be streamed - Aggregate statistics across filter chain 6. **Rate Limiter (`ratelimit.Limiter`)** - Token bucket algorithm for smooth rate limiting - Per-IP or global limiting strategies - Connection tracking and limits - Automatic cleanup of stale entries - Non-blocking rejection of excess requests 7. **Streamers** - **HTTPStreamer**: SSE-based streaming over HTTP - Rate limit enforcement before request handling - Connection tracking for per-IP limits - Configurable 429 responses - **TCPStreamer**: Raw JSON streaming over TCP - Silent connection drops when rate limited - Per-IP connection tracking - Both support configurable heartbeats - Non-blocking client management 8. **HTTPRouter (`logstream.HTTPRouter`)** - Optional component for path-based routing - Consolidates multiple HTTP streams on shared ports - Provides global status endpoint - Longest-prefix path matching - Dynamic stream registration/deregistration ### Data Flow ``` File System → Monitor → LogEntry Channel → Filter Chain → [Rate Limiter] → Streamer → Network Client ↑ ↓ ↓ ↓ └── Rotation Detection Pattern Match Rate Limit Check ↓ ↓ Pass/Drop Accept/Reject ``` ### Filter Architecture ``` Log Entry → Filter Chain → Filter 1 → Filter 2 → ... → Output ↓ ↓ Include? Exclude? ↓ ↓ OR/AND OR/AND Logic Logic ``` ### Rate Limiting Architecture ``` Client Request → Rate Limiter → Token Bucket Check → Allow/Deny ↓ ↓ IP Tracking Refill Rate ↓ Cleanup Timer ``` ### Configuration Structure ```toml [[streams]] name = "stream-name" [streams.monitor] check_interval_ms = 100 # Per-stream check interval targets = [ { path = "/path/to/logs", pattern = "*.log", is_file = false }, { path = "/path/to/file.log", is_file = true } ] # Filter configuration (optional) [[streams.filters]] type = "include" # "include" or "exclude" logic = "or" # "or" or "and" patterns = [ "(?i)error", # Case-insensitive error matching "(?i)warn" # Case-insensitive warning matching ] [[streams.filters]] type = "exclude" patterns = ["DEBUG", "TRACE"] [streams.httpserver] enabled = true port = 8080 buffer_size = 1000 stream_path = "/stream" status_path = "/status" [streams.httpserver.heartbeat] enabled = true interval_seconds = 30 format = "comment" # or "json" include_timestamp = true include_stats = false [streams.httpserver.rate_limit] enabled = false # Disabled by default requests_per_second = 10.0 # Token refill rate burst_size = 20 # Token bucket capacity limit_by = "ip" # "ip" or "global" response_code = 429 # HTTP response code response_message = "Rate limit exceeded" max_connections_per_ip = 5 # Concurrent connection limit max_total_connections = 100 # Global connection limit [streams.tcpserver] enabled = true port = 9090 buffer_size = 5000 [streams.tcpserver.heartbeat] enabled = true interval_seconds = 60 include_timestamp = true include_stats = true [streams.tcpserver.rate_limit] enabled = false requests_per_second = 5.0 burst_size = 10 limit_by = "ip" ``` ## Filter Implementation ### Filter Types 1. **Include Filter**: Only logs matching patterns are streamed (whitelist) 2. **Exclude Filter**: Logs matching patterns are dropped (blacklist) ### Pattern Logic - **OR Logic**: Log matches if ANY pattern matches - **AND Logic**: Log matches only if ALL patterns match ### Filter Chain - Multiple filters are applied sequentially - All filters must pass for a log to be streamed - Efficient short-circuit evaluation ### Performance Considerations - Regex patterns compiled once at startup - Cached for efficient matching - Statistics tracked without locks in hot path ## Rate Limiting Implementation ### Token Bucket Algorithm - Each IP (or global limiter) gets a bucket with configurable capacity - Tokens refill at `requests_per_second` rate - Each request/connection consumes one token - Smooth rate limiting without hard cutoffs ### Limiting Strategies 1. **Per-IP**: Each client IP gets its own token bucket 2. **Global**: All clients share a single token bucket ### Connection Limits - Per-IP connection limits prevent single client resource exhaustion - Global connection limits protect overall system resources - Checked before rate limits to prevent connection hanging ### Cleanup - IP entries older than 5 minutes are automatically removed - Prevents unbounded memory growth - Runs every minute in background ## Build System ### Makefile Targets ```bash make build # Build with version information make install # Install to /usr/local/bin make clean # Remove built binary make test # Run test suite make release TAG=v1.0.0 # Create and push git tag ``` ### Version Management Version information is injected at compile time: ```bash # Automatic version detection from git VERSION := $(shell git describe --tags --always --dirty) GIT_COMMIT := $(shell git rev-parse --short HEAD) BUILD_TIME := $(shell date -u '+%Y-%m-%d_%H:%M:%S') # Manual build with version go build -ldflags "-X 'logwisp/src/internal/version.Version=v1.0.0'" \ -o logwisp ./src/cmd/logwisp ``` ## Operating Modes ### 1. Standalone Mode (Default) - Each stream runs its own HTTP/TCP servers - Direct port access per stream - Simple configuration - Best for single-stream or distinct-port setups ### 2. Router Mode (`--router`) - HTTP streams share ports via path-based routing - Consolidated access through URL paths - Global status endpoint with aggregated statistics - Best for multi-stream setups with limited ports - Streams accessible at `/{stream_name}/{path}` ## Testing ### Test Suites 1. **Router Testing** (`test_router.sh`) - Path routing verification - Client isolation between streams - Statistics aggregation - Graceful shutdown - Port conflict handling 2. **Rate Limiting Testing** (`test_ratelimit.sh`) - Per-IP rate limiting - Global rate limiting - Connection limits - Rate limit recovery - Statistics accuracy - Stress testing 3. **Filter Testing** (recommended) - Pattern matching accuracy - Include/exclude logic - OR/AND combination logic - Performance with complex patterns - Filter chain behavior ### Running Tests ```bash # Test router functionality ./test_router.sh # Test rate limiting ./test_ratelimit.sh # Run all tests make test ``` ## Performance Considerations ### Filter Overhead - Regex compilation: One-time cost at startup - Pattern matching: O(n*m) where n=patterns, m=text length - Use simple patterns when possible - Consider pattern order (most likely matches first) ### Rate Limiting Overhead - Token bucket checks: O(1) time complexity - Memory: ~100 bytes per tracked IP - Cleanup: Runs asynchronously every minute - Minimal impact when disabled ### Optimization Guidelines - Use specific patterns to reduce regex complexity - Place most selective filters first in chain - Use per-IP limiting for fairness - Use global limiting for resource protection - Set burst size to 2-3x requests_per_second - Monitor rate limit statistics for tuning - Higher check_interval_ms for low-activity logs ## Security Architecture ### Current Security Features - Read-only file access - Rate limiting for DDoS protection - Connection limits for resource protection - Non-blocking request rejection - Regex pattern validation at startup