Skip to content

Coroot — Architecture

Detailed component breakdown, deployment topologies, and data flow diagrams for Coroot.

Component Architecture

flowchart TB
    subgraph K8s["Kubernetes Cluster"]
        subgraph Agents["Data Plane (DaemonSet + Deployment)"]
            NA["coroot-node-agent\n(eBPF DaemonSet)\nPer node"]
            CA["coroot-cluster-agent\n(Deployment)\nDatabase discovery"]
        end

        subgraph Control["Control Plane"]
            OP["Coroot Operator\n(Lifecycle management)"]
            CR["Coroot CR\n(Custom Resource)"]
        end

        subgraph Server["Coroot Server (StatefulSet)"]
            direction TB
            InspEng["Inspection Engine\n18+ auto-inspections"]
            AIRCA["AI RCA Engine\n(pattern detection)"]
            SvcMap["Service Map Builder\n(eBPF topology)"]
            SLOEng["SLO Engine\n(error budget tracking)"]
            APIGW["API / Web UI\n(port 8080)"]
        end

        subgraph Storage["Storage (StatefulSet / External)"]
            Prom["Prometheus / VM / Mimir\n(metrics)"]
            CH["ClickHouse\n(logs, traces, profiles)"]
        end
    end

    subgraph External["Optional"]
        OTEL["OTel SDK\n(app-level traces)"]
        LLM["LLM API\n(Enterprise AI RCA)"]
    end

    NA -->|"metrics, traces,\nlogs, profiles"| Server
    CA -->|"DB metrics\n(pg_stat, INFO)"| Server
    OTEL -->|OTLP| Server
    Server -->|"remote_write /\nPromQL"| Prom
    Server -->|"clickhouse-native"| CH
    LLM -.->|"API"| AIRCA
    OP -->|"reconcile"| CR
    CR -->|"manages"| Agents
    CR -->|"manages"| Server
    CR -->|"manages"| Storage

    style Server fill:#1565c0,color:#fff
    style Agents fill:#2e7d32,color:#fff
    style Storage fill:#e65100,color:#fff

18 Built-In Inspections

Coroot runs 18 automated inspection categories continuously on every discovered service:

Category Inspections
SLOs Availability SLO, Latency SLO
Instances Pod restarts, unavailable replicas
CPU CPU throttling, CPU usage near limits
GPU GPU utilization, memory usage
Memory OOM kills, memory near limits
Storage Disk usage, I/O latency
Network Connection errors, DNS failures, TCP retransmits
Logs Error log rate spikes, warning patterns
Runtime JVM heap/GC, .NET GC, Python GIL contention
Databases Postgres, MySQL, MongoDB, Redis, Memcached health
Deployments Rollout tracking, canary detection

Deployment Topologies

Single-Cluster (Standard)

flowchart LR
    subgraph Cluster["K8s Cluster"]
        NA1["node-agent<br/>(node 1)"]
        NA2["node-agent<br/>(node 2)"]
        NAN["node-agent<br/>(node N)"]
        CA["cluster-agent"]
        CS["Coroot Server"]
        CH["ClickHouse<br/>(2 shards × 2 replicas)"]
        Prom["Prometheus / VM"]
    end

    NA1 --> CS
    NA2 --> CS
    NAN --> CS
    CA --> CS
    CS --> CH
    CS --> Prom

Multi-Cluster (Hub and Spoke)

flowchart TB
    subgraph Central["Central Cluster"]
        CS["Coroot Server\n(full install)"]
        CH["ClickHouse"]
        Prom["Prometheus / VM"]
    end

    subgraph Remote1["Remote Cluster 1"]
        NA_R1["node-agents"]
        CA_R1["cluster-agent"]
    end

    subgraph Remote2["Remote Cluster 2"]
        NA_R2["node-agents"]
        CA_R2["cluster-agent"]
    end

    NA_R1 -->|"agentsOnly=true"| CS
    CA_R1 --> CS
    NA_R2 -->|"agentsOnly=true"| CS
    CA_R2 --> CS
    CS --> CH
    CS --> Prom

    style Central fill:#1565c0,color:#fff

Sequence: Incident Detection → RCA

sequenceDiagram
    participant App as Application
    participant Kernel as Linux Kernel
    participant Agent as node-agent (eBPF)
    participant Server as Coroot Server
    participant Insp as Inspection Engine
    participant RCA as AI RCA
    participant Alert as Alert Channel

    Kernel->>Agent: eBPF events (TCP, DNS, disk)
    Agent->>Server: Metrics + traces + logs
    Server->>Insp: Run 18 inspection categories
    Insp->>Insp: SLO breach detected
    Insp->>RCA: Trigger root cause analysis
    RCA->>RCA: Walk dependency graph
    RCA->>RCA: Correlate metrics ↔ traces ↔ logs
    RCA->>RCA: Rank root causes
    RCA->>Alert: Send alert with RCA summary
    Note over Alert: Slack / PagerDuty / Webhook

Sources