Coroot — How It Works¶

How Coroot uses eBPF for zero-instrumentation data collection, automated service discovery, and AI-powered root cause analysis.

Data Collection Pipeline¶

eBPF-Based Auto-Instrumentation¶

Coroot's core differentiator is kernel-level telemetry collection via eBPF (extended Berkeley Packet Filter). The coroot-node-agent runs as a DaemonSet on every Kubernetes node and attaches eBPF programs to kernel tracepoints and kprobes:

flowchart LR
    subgraph Kernel["Linux Kernel (4.16+)"]
        TP["Tracepoints"]
        KP["kprobes/kretprobes"]
        TC["Traffic Control (tc)"]
    end

    subgraph Agent["coroot-node-agent"]
        eBPF["eBPF Programs"]
        Perf["Perf Buffer"]
        Agg["Userspace Aggregation"]
    end

    TP --> eBPF
    KP --> eBPF
    TC --> eBPF
    eBPF --> Perf --> Agg
    Agg -->|OTLP / Prom RW| Server["Coroot Server"]

What eBPF Captures (Without Code Changes)¶

Signal	Kernel Attachment Point	Data Collected
Network metrics	`tcp_sendmsg`, `tcp_recvmsg`, `tcp_connect`	Latency, throughput, error rates per connection
HTTP/gRPC traces	Socket read/write	Request method, path, status code, duration
DNS	UDP socket	Resolution time, failures
Disk I/O	`blk_mq_start_request`	IOPS, latency, bandwidth per container
CPU profiling	`perf_event_open`	On-CPU flame graphs per process
Memory profiling	Allocation tracepoints	Heap allocation patterns
Container lifecycle	cgroup events	Start/stop times, resource limits
Log collection	Container stdout/stderr	Application log lines

Cluster Agent Discovery¶

The coroot-cluster-agent complements eBPF data by connecting directly to databases:

Database	Discovery Method	Metrics Collected
PostgreSQL	SQL queries via `pg_stat_*`	Active connections, query latency, replication lag
MySQL	`SHOW STATUS` / `information_schema`	Thread count, slow queries, buffer pool hit rate
Redis	`INFO` command	Memory usage, connected clients, hit rate
MongoDB	`serverStatus` command	Operations/sec, document counts, lock percentages

Service Map Generation¶

Coroot automatically builds a real-time service dependency graph by correlating eBPF network traces:

Connection tracking: eBPF programs track every TCP connection (source IP:port ↔ dest IP:port)
Container resolution: IP addresses are mapped to Kubernetes pods via the container runtime
Service grouping: Pods are grouped by Deployment/StatefulSet/DaemonSet
Protocol detection: L7 protocol (HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka, etc.) is identified from payload patterns
Dependency graph: Directed edges between services are weighted by request rate, latency, and error rate

AI-Powered Root Cause Analysis¶

When an SLO violation or anomaly is detected, Coroot's AI RCA engine automatically:

Identifies the impacted service from SLO breach alerts
Walks the dependency graph upstream and downstream
Correlates signals across metrics, traces, logs, and profiles for each service in the path
Ranks root causes using statistical anomaly detection (e.g., sudden CPU spike, disk saturation, memory leak, new deployment)
Generates remediation suggestions (e.g., "Service X shows 95th percentile latency spike correlated with disk I/O saturation on node Y — consider increasing PVC size or migrating to SSD-backed storage class")

AI RCA Integration (Enterprise)¶

The Enterprise edition integrates with LLM APIs to provide natural-language explanations of incidents, parse log patterns for error classification, and suggest specific remediations based on historical incident patterns.

SLO Monitoring¶

Coroot provides built-in SLO tracking based on RED metrics (Rate, Error, Duration):

Automatically calculates availability and latency SLOs per service
Tracks error budgets in real-time
Fires alerts when burn rate exceeds thresholds
No manual SLO configuration required — automatically derived from eBPF data

Data Flow Summary¶

sequenceDiagram
    participant App as Application
    participant Kernel as Linux Kernel
    participant NA as coroot-node-agent
    participant CA as coroot-cluster-agent
    participant Server as Coroot Server
    participant Prom as Prometheus / VM
    participant CH as ClickHouse

    Kernel->>NA: eBPF events (TCP, DNS, disk)
    App->>Kernel: syscalls (normal operation)
    NA->>Server: Metrics (Prometheus format)
    NA->>Server: Traces (OTLP)
    NA->>Server: Logs (container stdout)
    NA->>Server: Profiles (pprof)
    CA->>Server: DB metrics (SQL/INFO)
    Server->>Prom: Store metrics
    Server->>CH: Store logs, traces, profiles
    Server->>Server: Build service map
    Server->>Server: Run inspections & AI RCA