Skip to content

Coroot — How It Works

How Coroot uses eBPF for zero-instrumentation data collection, automated service discovery, and AI-powered root cause analysis.

Data Collection Pipeline

eBPF-Based Auto-Instrumentation

Coroot's core differentiator is kernel-level telemetry collection via eBPF (extended Berkeley Packet Filter). The coroot-node-agent runs as a DaemonSet on every Kubernetes node and attaches eBPF programs to kernel tracepoints and kprobes:

flowchart LR
    subgraph Kernel["Linux Kernel (4.16+)"]
        TP["Tracepoints"]
        KP["kprobes/kretprobes"]
        TC["Traffic Control (tc)"]
    end

    subgraph Agent["coroot-node-agent"]
        eBPF["eBPF Programs"]
        Perf["Perf Buffer"]
        Agg["Userspace Aggregation"]
    end

    TP --> eBPF
    KP --> eBPF
    TC --> eBPF
    eBPF --> Perf --> Agg
    Agg -->|OTLP / Prom RW| Server["Coroot Server"]

What eBPF Captures (Without Code Changes)

Signal Kernel Attachment Point Data Collected
Network metrics tcp_sendmsg, tcp_recvmsg, tcp_connect Latency, throughput, error rates per connection
HTTP/gRPC traces Socket read/write Request method, path, status code, duration
DNS UDP socket Resolution time, failures
Disk I/O blk_mq_start_request IOPS, latency, bandwidth per container
CPU profiling perf_event_open On-CPU flame graphs per process
Memory profiling Allocation tracepoints Heap allocation patterns
Container lifecycle cgroup events Start/stop times, resource limits
Log collection Container stdout/stderr Application log lines

Cluster Agent Discovery

The coroot-cluster-agent complements eBPF data by connecting directly to databases:

Database Discovery Method Metrics Collected
PostgreSQL SQL queries via pg_stat_* Active connections, query latency, replication lag
MySQL SHOW STATUS / information_schema Thread count, slow queries, buffer pool hit rate
Redis INFO command Memory usage, connected clients, hit rate
MongoDB serverStatus command Operations/sec, document counts, lock percentages

Service Map Generation

Coroot automatically builds a real-time service dependency graph by correlating eBPF network traces:

  1. Connection tracking: eBPF programs track every TCP connection (source IP:port ↔ dest IP:port)
  2. Container resolution: IP addresses are mapped to Kubernetes pods via the container runtime
  3. Service grouping: Pods are grouped by Deployment/StatefulSet/DaemonSet
  4. Protocol detection: L7 protocol (HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka, etc.) is identified from payload patterns
  5. Dependency graph: Directed edges between services are weighted by request rate, latency, and error rate

AI-Powered Root Cause Analysis

When an SLO violation or anomaly is detected, Coroot's AI RCA engine automatically:

  1. Identifies the impacted service from SLO breach alerts
  2. Walks the dependency graph upstream and downstream
  3. Correlates signals across metrics, traces, logs, and profiles for each service in the path
  4. Ranks root causes using statistical anomaly detection (e.g., sudden CPU spike, disk saturation, memory leak, new deployment)
  5. Generates remediation suggestions (e.g., "Service X shows 95th percentile latency spike correlated with disk I/O saturation on node Y — consider increasing PVC size or migrating to SSD-backed storage class")

AI RCA Integration (Enterprise)

The Enterprise edition integrates with LLM APIs to provide natural-language explanations of incidents, parse log patterns for error classification, and suggest specific remediations based on historical incident patterns.

SLO Monitoring

Coroot provides built-in SLO tracking based on RED metrics (Rate, Error, Duration):

  • Automatically calculates availability and latency SLOs per service
  • Tracks error budgets in real-time
  • Fires alerts when burn rate exceeds thresholds
  • No manual SLO configuration required — automatically derived from eBPF data

Data Flow Summary

sequenceDiagram
    participant App as Application
    participant Kernel as Linux Kernel
    participant NA as coroot-node-agent
    participant CA as coroot-cluster-agent
    participant Server as Coroot Server
    participant Prom as Prometheus / VM
    participant CH as ClickHouse

    Kernel->>NA: eBPF events (TCP, DNS, disk)
    App->>Kernel: syscalls (normal operation)
    NA->>Server: Metrics (Prometheus format)
    NA->>Server: Traces (OTLP)
    NA->>Server: Logs (container stdout)
    NA->>Server: Profiles (pprof)
    CA->>Server: DB metrics (SQL/INFO)
    Server->>Prom: Store metrics
    Server->>CH: Store logs, traces, profiles
    Server->>Server: Build service map
    Server->>Server: Run inspections & AI RCA

Sources