Coroot — How It Works¶
How Coroot uses eBPF for zero-instrumentation data collection, automated service discovery, and AI-powered root cause analysis.
Data Collection Pipeline¶
eBPF-Based Auto-Instrumentation¶
Coroot's core differentiator is kernel-level telemetry collection via eBPF (extended Berkeley Packet Filter). The coroot-node-agent runs as a DaemonSet on every Kubernetes node and attaches eBPF programs to kernel tracepoints and kprobes:
flowchart LR
subgraph Kernel["Linux Kernel (4.16+)"]
TP["Tracepoints"]
KP["kprobes/kretprobes"]
TC["Traffic Control (tc)"]
end
subgraph Agent["coroot-node-agent"]
eBPF["eBPF Programs"]
Perf["Perf Buffer"]
Agg["Userspace Aggregation"]
end
TP --> eBPF
KP --> eBPF
TC --> eBPF
eBPF --> Perf --> Agg
Agg -->|OTLP / Prom RW| Server["Coroot Server"]
What eBPF Captures (Without Code Changes)¶
| Signal | Kernel Attachment Point | Data Collected |
|---|---|---|
| Network metrics | tcp_sendmsg, tcp_recvmsg, tcp_connect |
Latency, throughput, error rates per connection |
| HTTP/gRPC traces | Socket read/write | Request method, path, status code, duration |
| DNS | UDP socket | Resolution time, failures |
| Disk I/O | blk_mq_start_request |
IOPS, latency, bandwidth per container |
| CPU profiling | perf_event_open |
On-CPU flame graphs per process |
| Memory profiling | Allocation tracepoints | Heap allocation patterns |
| Container lifecycle | cgroup events | Start/stop times, resource limits |
| Log collection | Container stdout/stderr | Application log lines |
Cluster Agent Discovery¶
The coroot-cluster-agent complements eBPF data by connecting directly to databases:
| Database | Discovery Method | Metrics Collected |
|---|---|---|
| PostgreSQL | SQL queries via pg_stat_* |
Active connections, query latency, replication lag |
| MySQL | SHOW STATUS / information_schema |
Thread count, slow queries, buffer pool hit rate |
| Redis | INFO command |
Memory usage, connected clients, hit rate |
| MongoDB | serverStatus command |
Operations/sec, document counts, lock percentages |
Service Map Generation¶
Coroot automatically builds a real-time service dependency graph by correlating eBPF network traces:
- Connection tracking: eBPF programs track every TCP connection (source IP:port ↔ dest IP:port)
- Container resolution: IP addresses are mapped to Kubernetes pods via the container runtime
- Service grouping: Pods are grouped by Deployment/StatefulSet/DaemonSet
- Protocol detection: L7 protocol (HTTP, gRPC, MySQL, PostgreSQL, Redis, Kafka, etc.) is identified from payload patterns
- Dependency graph: Directed edges between services are weighted by request rate, latency, and error rate
AI-Powered Root Cause Analysis¶
When an SLO violation or anomaly is detected, Coroot's AI RCA engine automatically:
- Identifies the impacted service from SLO breach alerts
- Walks the dependency graph upstream and downstream
- Correlates signals across metrics, traces, logs, and profiles for each service in the path
- Ranks root causes using statistical anomaly detection (e.g., sudden CPU spike, disk saturation, memory leak, new deployment)
- Generates remediation suggestions (e.g., "Service X shows 95th percentile latency spike correlated with disk I/O saturation on node Y — consider increasing PVC size or migrating to SSD-backed storage class")
AI RCA Integration (Enterprise)¶
The Enterprise edition integrates with LLM APIs to provide natural-language explanations of incidents, parse log patterns for error classification, and suggest specific remediations based on historical incident patterns.
SLO Monitoring¶
Coroot provides built-in SLO tracking based on RED metrics (Rate, Error, Duration):
- Automatically calculates availability and latency SLOs per service
- Tracks error budgets in real-time
- Fires alerts when burn rate exceeds thresholds
- No manual SLO configuration required — automatically derived from eBPF data
Data Flow Summary¶
sequenceDiagram
participant App as Application
participant Kernel as Linux Kernel
participant NA as coroot-node-agent
participant CA as coroot-cluster-agent
participant Server as Coroot Server
participant Prom as Prometheus / VM
participant CH as ClickHouse
Kernel->>NA: eBPF events (TCP, DNS, disk)
App->>Kernel: syscalls (normal operation)
NA->>Server: Metrics (Prometheus format)
NA->>Server: Traces (OTLP)
NA->>Server: Logs (container stdout)
NA->>Server: Profiles (pprof)
CA->>Server: DB metrics (SQL/INFO)
Server->>Prom: Store metrics
Server->>CH: Store logs, traces, profiles
Server->>Server: Build service map
Server->>Server: Run inspections & AI RCA