Files
logwisp/doc/architecture.md

13 KiB

LogWisp Architecture and Project Structure

Directory Structure

logwisp/
├── Makefile                      # Build automation with version injection
├── go.mod                        # Go module definition
├── go.sum                        # Go module checksums
├── README.md                     # Project documentation
├── config/
│   ├── logwisp.toml.defaults     # Default configuration and guide
│   ├── logwisp.toml.example      # Example configuration
│   └── logwisp.toml.minimal      # Minimal configuration template
├── doc/
│   └── architecture.md           # This file - architecture documentation
└── src/
    ├── cmd/
    │   └── logwisp/
    │       └── main.go           # Application entry point, CLI handling
    └── internal/
        ├── config/
        │   ├── auth.go           # Authentication configuration structures
        │   ├── config.go         # Main configuration structures
        │   ├── loader.go         # Configuration loading with lixenwraith/config
        │   ├── server.go         # TCP/HTTP server configurations with rate limiting
        │   ├── ssl.go            # SSL/TLS configuration structures
        │   ├── stream.go         # Stream-specific configurations with filters
        │   └── validation.go     # Configuration validation including filters and rate limits
        ├── filter/
        │   ├── filter.go         # Regex-based log filtering implementation
        │   └── chain.go          # Sequential filter chain management
        ├── logstream/
        │   ├── httprouter.go     # HTTP router for path-based routing
        │   ├── logstream.go      # Stream lifecycle management
        │   ├── routerserver.go   # Router server implementation
        │   └── service.go        # Multi-stream service orchestration
        ├── monitor/
        │   ├── file_watcher.go   # File watching and rotation detection
        │   └── monitor.go        # Log monitoring interface and implementation
        ├── ratelimit/
        │   ├── ratelimit.go      # Token bucket algorithm implementation
        │   └── limiter.go        # Per-stream rate limiter with IP tracking
        ├── stream/
        │   ├── httpstreamer.go   # HTTP/SSE streaming with rate limiting
        │   ├── noop_logger.go    # Silent logger for gnet
        │   ├── tcpserver.go      # TCP server with rate limiting (gnet)
        │   └── tcpstreamer.go    # TCP streaming implementation
        └── version/
            └── version.go        # Version information management

Configuration System

Configuration Hierarchy (Highest to Lowest Priority)

  1. CLI Arguments: Direct command-line flags
  2. Environment Variables: LOGWISP_ prefixed variables
  3. Configuration File: TOML format configuration
  4. Built-in Defaults: Hardcoded default values

Configuration Locations

# Default configuration file location
~/.config/logwisp.toml

# Override via environment variable
export LOGWISP_CONFIG_FILE=/etc/logwisp/production.toml

# Override config directory
export LOGWISP_CONFIG_DIR=/etc/logwisp
export LOGWISP_CONFIG_FILE=production.toml  # Relative to CONFIG_DIR

# Direct CLI override
./logwisp --config /path/to/config.toml

Environment Variable Mapping

Environment variables follow a structured naming pattern:

  • Prefix: LOGWISP_
  • Path separator: _ (underscore)
  • Array index: Numeric suffix (0-based)

Examples:

# Stream-specific settings
LOGWISP_STREAMS_0_NAME=app
LOGWISP_STREAMS_0_MONITOR_CHECK_INTERVAL_MS=50
LOGWISP_STREAMS_0_HTTPSERVER_PORT=8080
LOGWISP_STREAMS_0_HTTPSERVER_BUFFER_SIZE=2000
LOGWISP_STREAMS_0_HTTPSERVER_HEARTBEAT_ENABLED=true
LOGWISP_STREAMS_0_HTTPSERVER_HEARTBEAT_FORMAT=json

# Filter configuration
LOGWISP_STREAMS_0_FILTERS_0_TYPE=include
LOGWISP_STREAMS_0_FILTERS_0_LOGIC=or
LOGWISP_STREAMS_0_FILTERS_0_PATTERNS='["ERROR","WARN"]'

# Rate limiting configuration
LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_ENABLED=true
LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_REQUESTS_PER_SECOND=10.0
LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_BURST_SIZE=20
LOGWISP_STREAMS_0_HTTPSERVER_RATE_LIMIT_LIMIT_BY=ip

# Multiple streams
LOGWISP_STREAMS_1_NAME=system
LOGWISP_STREAMS_1_MONITOR_CHECK_INTERVAL_MS=1000
LOGWISP_STREAMS_1_TCPSERVER_PORT=9090

Component Architecture

Core Components

  1. Service (logstream.Service)

    • Manages multiple log streams
    • Handles lifecycle (creation, shutdown)
    • Provides global statistics
    • Thread-safe stream registry
  2. LogStream (logstream.LogStream)

    • Represents a single log monitoring pipeline
    • Contains: Monitor + Filter Chain + Rate Limiter + Servers (TCP/HTTP)
    • Independent configuration
    • Per-stream statistics with filter and rate limit metrics
  3. Monitor (monitor.Monitor)

    • Watches files and directories
    • Detects log rotation
    • Publishes log entries to subscribers
    • Configurable check intervals
  4. Filter (filter.Filter)

    • Regex-based log filtering
    • Include (whitelist) or Exclude (blacklist) modes
    • OR/AND logic for multiple patterns
    • Per-filter statistics (processed, matched, dropped)
  5. Filter Chain (filter.Chain)

    • Sequential application of multiple filters
    • All filters must pass for entry to be streamed
    • Aggregate statistics across filter chain
  6. Rate Limiter (ratelimit.Limiter)

    • Token bucket algorithm for smooth rate limiting
    • Per-IP or global limiting strategies
    • Connection tracking and limits
    • Automatic cleanup of stale entries
    • Non-blocking rejection of excess requests
  7. Streamers

    • HTTPStreamer: SSE-based streaming over HTTP
      • Rate limit enforcement before request handling
      • Connection tracking for per-IP limits
      • Configurable 429 responses
    • TCPStreamer: Raw JSON streaming over TCP
      • Silent connection drops when rate limited
      • Per-IP connection tracking
    • Both support configurable heartbeats
    • Non-blocking client management
  8. HTTPRouter (logstream.HTTPRouter)

    • Optional component for path-based routing
    • Consolidates multiple HTTP streams on shared ports
    • Provides global status endpoint
    • Longest-prefix path matching
    • Dynamic stream registration/deregistration

Data Flow

File System → Monitor → LogEntry Channel → Filter Chain → [Rate Limiter] → Streamer → Network Client
     ↑            ↓                              ↓                ↓
     └── Rotation Detection              Pattern Match    Rate Limit Check
                                               ↓                ↓
                                         Pass/Drop        Accept/Reject

Filter Architecture

Log Entry → Filter Chain → Filter 1 → Filter 2 → ... → Output
                              ↓          ↓
                          Include?    Exclude?
                              ↓          ↓
                          OR/AND     OR/AND
                           Logic      Logic

Rate Limiting Architecture

Client Request → Rate Limiter → Token Bucket Check → Allow/Deny
                      ↓                    ↓
                 IP Tracking         Refill Rate
                      ↓
                Cleanup Timer

Configuration Structure

[[streams]]
name = "stream-name"

[streams.monitor]
check_interval_ms = 100  # Per-stream check interval
targets = [
    { path = "/path/to/logs", pattern = "*.log", is_file = false },
    { path = "/path/to/file.log", is_file = true }
]

# Filter configuration (optional)
[[streams.filters]]
type = "include"         # "include" or "exclude"
logic = "or"            # "or" or "and"
patterns = [
    "(?i)error",        # Case-insensitive error matching
    "(?i)warn"          # Case-insensitive warning matching
]

[[streams.filters]]
type = "exclude"
patterns = ["DEBUG", "TRACE"]

[streams.httpserver]
enabled = true
port = 8080
buffer_size = 1000
stream_path = "/stream"
status_path = "/status"

[streams.httpserver.heartbeat]
enabled = true
interval_seconds = 30
format = "comment"  # or "json"
include_timestamp = true
include_stats = false

[streams.httpserver.rate_limit]
enabled = false                  # Disabled by default
requests_per_second = 10.0       # Token refill rate
burst_size = 20                  # Token bucket capacity
limit_by = "ip"                  # "ip" or "global"
response_code = 429              # HTTP response code
response_message = "Rate limit exceeded"
max_connections_per_ip = 5       # Concurrent connection limit
max_total_connections = 100      # Global connection limit

[streams.tcpserver]
enabled = true
port = 9090
buffer_size = 5000

[streams.tcpserver.heartbeat]
enabled = true
interval_seconds = 60
include_timestamp = true
include_stats = true

[streams.tcpserver.rate_limit]
enabled = false
requests_per_second = 5.0
burst_size = 10
limit_by = "ip"

Filter Implementation

Filter Types

  1. Include Filter: Only logs matching patterns are streamed (whitelist)
  2. Exclude Filter: Logs matching patterns are dropped (blacklist)

Pattern Logic

  • OR Logic: Log matches if ANY pattern matches
  • AND Logic: Log matches only if ALL patterns match

Filter Chain

  • Multiple filters are applied sequentially
  • All filters must pass for a log to be streamed
  • Efficient short-circuit evaluation

Performance Considerations

  • Regex patterns compiled once at startup
  • Cached for efficient matching
  • Statistics tracked without locks in hot path

Rate Limiting Implementation

Token Bucket Algorithm

  • Each IP (or global limiter) gets a bucket with configurable capacity
  • Tokens refill at requests_per_second rate
  • Each request/connection consumes one token
  • Smooth rate limiting without hard cutoffs

Limiting Strategies

  1. Per-IP: Each client IP gets its own token bucket
  2. Global: All clients share a single token bucket

Connection Limits

  • Per-IP connection limits prevent single client resource exhaustion
  • Global connection limits protect overall system resources
  • Checked before rate limits to prevent connection hanging

Cleanup

  • IP entries older than 5 minutes are automatically removed
  • Prevents unbounded memory growth
  • Runs every minute in background

Build System

Makefile Targets

make build          # Build with version information
make install        # Install to /usr/local/bin
make clean          # Remove built binary
make test           # Run test suite
make release TAG=v1.0.0  # Create and push git tag

Version Management

Version information is injected at compile time:

# Automatic version detection from git
VERSION := $(shell git describe --tags --always --dirty)
GIT_COMMIT := $(shell git rev-parse --short HEAD)
BUILD_TIME := $(shell date -u '+%Y-%m-%d_%H:%M:%S')

# Manual build with version
go build -ldflags "-X 'logwisp/src/internal/version.Version=v1.0.0'" \
    -o logwisp ./src/cmd/logwisp

Operating Modes

1. Standalone Mode (Default)

  • Each stream runs its own HTTP/TCP servers
  • Direct port access per stream
  • Simple configuration
  • Best for single-stream or distinct-port setups

2. Router Mode (--router)

  • HTTP streams share ports via path-based routing
  • Consolidated access through URL paths
  • Global status endpoint with aggregated statistics
  • Best for multi-stream setups with limited ports
  • Streams accessible at /{stream_name}/{path}

Testing

Test Suites

  1. Router Testing (test_router.sh)

    • Path routing verification
    • Client isolation between streams
    • Statistics aggregation
    • Graceful shutdown
    • Port conflict handling
  2. Rate Limiting Testing (test_ratelimit.sh)

    • Per-IP rate limiting
    • Global rate limiting
    • Connection limits
    • Rate limit recovery
    • Statistics accuracy
    • Stress testing
  3. Filter Testing (recommended)

    • Pattern matching accuracy
    • Include/exclude logic
    • OR/AND combination logic
    • Performance with complex patterns
    • Filter chain behavior

Running Tests

# Test router functionality
./test_router.sh

# Test rate limiting
./test_ratelimit.sh

# Run all tests
make test

Performance Considerations

Filter Overhead

  • Regex compilation: One-time cost at startup
  • Pattern matching: O(n*m) where n=patterns, m=text length
  • Use simple patterns when possible
  • Consider pattern order (most likely matches first)

Rate Limiting Overhead

  • Token bucket checks: O(1) time complexity
  • Memory: ~100 bytes per tracked IP
  • Cleanup: Runs asynchronously every minute
  • Minimal impact when disabled

Optimization Guidelines

  • Use specific patterns to reduce regex complexity
  • Place most selective filters first in chain
  • Use per-IP limiting for fairness
  • Use global limiting for resource protection
  • Set burst size to 2-3x requests_per_second
  • Monitor rate limit statistics for tuning
  • Higher check_interval_ms for low-activity logs

Security Architecture

Current Security Features

  • Read-only file access
  • Rate limiting for DDoS protection
  • Connection limits for resource protection
  • Non-blocking request rejection
  • Regex pattern validation at startup