Skip to content

Architecture

1. Default Topology / Flow

flowchart TB
    subgraph Probes["Probes & Agents"]
        JA["Java Agent\n(bytecode injection)\nauto-attach"]
        DotNet[".NET Agent"]
        GoAgent["Go Agent\n(compile-time)"]
        PyAgent["Python Agent"]
        NodeAgent["Node.js Agent"]
        PHPAgent["PHP Agent"]
        Rover["Rover\n(eBPF network profiling)"]
        Satellite["Satellite\n(Go edge proxy)"]
        OTelP["OTel Collector\n(OTLP receiver)"]
        EnvoyALS["Envoy ALS\n(service mesh)"]
    end

    subgraph OAP["OAP Server Cluster"]
        direction TB
        subgraph Receivers["Receiver Layer"]
            gRPC_R["gRPC Receiver\n(port 11800)"]
            HTTP_R["HTTP Receiver\n(port 12800)"]
            Kafka_R["Kafka Consumer\n(optional)"]
        end

        subgraph Analysis["Analysis Core (V2 Engine)"]
            OAL["OAL V2\n(ANTLR4 + Javassist)\nMetric aggregation"]
            MAL["MAL V2\n(Prometheus → metrics)"]
            LAL["LAL V2\n(Log analysis)"]
            MQE["MQE\n(Metric Query Engine)"]
        end

        subgraph Services["Service Layer"]
            Topo["Topology Builder"]
            Alert["Alerting Engine\n(rules + ML)"]
            Profile["Profiling Service"]
            Hierarchy["Service Hierarchy"]
        end
    end

    subgraph StorageLayer["Storage (Pluggable)"]
        BDB["BanyanDB\n(recommended)\nLiaison + Data nodes"]
        ES["Elasticsearch\n/ OpenSearch"]
        CH_SW["ClickHouse"]
        PG["PostgreSQL"]
    end

    subgraph UI["Presentation"]
        SWUI["SkyWalking UI\n(Vue.js)"]
        Grafana_SW["Grafana\n(PromQL plugin)"]
    end

    Probes -->|"native SW / OTLP\n/ Envoy ALS"| Receivers
    Receivers --> Analysis
    Analysis --> StorageLayer
    Services --> StorageLayer
    MQE --> SWUI
    OAP -.->|"PromQL"| Grafana_SW

    style OAP fill:#0d47a1,color:#fff
    style BDB fill:#1b5e20,color:#fff
    style Probes fill:#4a148c,color:#fff

Component breakdown, deployment topologies, and the V2 engine internals for Apache SkyWalking.

System Architecture

Sub-Project Ecosystem

flowchart LR
    subgraph Core["Apache SkyWalking"]
        OAP_C["OAP Server\n(Java)"]
        UI_C["UI\n(Vue.js)"]
    end

    subgraph Agents["Language Agents"]
        JA_C["Java Agent"]
        NET["SkyWalking .NET"]
        GO["SkyWalking Go"]
        PY["SkyWalking Python"]
        NODE["SkyWalking Node.js"]
        PHP_C["SkyWalking PHP"]
        RUST_C["SkyWalking Rust"]
        CPP["SkyWalking C++"]
    end

    subgraph Infra["Infrastructure"]
        BDB_C["BanyanDB\n(Go)"]
        SAT_C["Satellite\n(Go, edge proxy)"]
        ROV["Rover\n(Go, eBPF)"]
        GraalVM_C["GraalVM Distro\n(native image)"]
    end

    subgraph Tools["Tools"]
        SWCTL["swctl\n(CLI)"]
        Helm_C["Helm Charts\n(OCI)"]
        Eyes["SkyWalking Eyes\n(license checker)"]
        Infra_E2E["Infra E2E\n(test framework)"]
    end

    Core --- Agents
    Core --- Infra
    Core --- Tools

BanyanDB Cluster Architecture

flowchart TB
    subgraph OAPCluster["OAP Cluster"]
        OAP1["OAP 1"]
        OAP2["OAP 2"]
        OAP3["OAP 3"]
    end

    subgraph BanyanCluster["BanyanDB Cluster"]
        L1["Liaison Node 1\n(query routing)"]
        L2["Liaison Node 2"]
        D1["Data Node 1\n(shard owner)"]
        D2["Data Node 2\n(shard owner)"]
        D3["Data Node 3\n(shard owner)"]
    end

    OAPCluster -->|"gRPC"| L1
    OAPCluster -->|"gRPC"| L2
    L1 --> D1
    L1 --> D2
    L1 --> D3
    L2 --> D1
    L2 --> D2
    L2 --> D3

    style BanyanCluster fill:#1b5e20,color:#fff

V2 DSL Engine Pipeline

sequenceDiagram
    participant Build as Build Phase
    participant Startup as OAP Startup
    participant Runtime as Runtime

    Note over Build: V2 (ANTLR4 + Javassist)
    Build->>Build: Parse OAL/MAL/LAL rules
    Build->>Build: Generate immutable AST
    Build->>Build: Compile to bytecode (Javassist)
    Build->>Startup: Load precompiled classes

    Note over Startup: Deterministic loading
    Startup->>Startup: Load manifests
    Startup->>Startup: Register metering functions
    Startup->>Runtime: Ready (no Groovy runtime)

    Note over Runtime: Thread-safe execution
    Runtime->>Runtime: Process incoming telemetry
    Runtime->>Runtime: Execute compiled OAL rules
    Runtime->>Runtime: Write to storage

Sources

Data Model

1. Default Topology / Flow

erDiagram
    Skywalking_CORE ||--o{ CONFIG : requires
    Skywalking_CORE ||--o{ STATE : writes
    CONFIG {
        string runtime_params
        string limits
    }
    STATE {
        string metric_id
        json payload
    }

How It Works

How SkyWalking's OAP server processes telemetry, the new V2 engine architecture, and BanyanDB's purpose-built storage model.

Architecture Overview

flowchart TB
    subgraph Probes["Probes & Agents"]
        JA["Java Agent\n(bytecode injection)"]
        LA["Language Agents\n(.NET, Go, Python, etc.)"]
        ROVER["Rover\n(eBPF network profiling)"]
        OTEL["OTel Collector\n(OTLP receiver)"]
        ENVOY["Envoy ALS\n(access log service)"]
        SAT["Satellite\n(edge proxy)"]
    end

    subgraph OAP["OAP Server"]
        direction TB
        RECV["Receiver Layer\n(gRPC, REST, Kafka)"]
        ANAL["Analysis Core"]
        subgraph DSL["V2 DSL Engines"]
            OAL["OAL V2\n(metric aggregation)"]
            MAL["MAL V2\n(Prometheus → metrics)"]
            LAL["LAL V2\n(log analysis)"]
        end
        ALERT["Alerting Engine"]
        TOPO["Topology Builder"]
        QUERY["MQE Query Engine"]
    end

    subgraph Storage["Storage (Pluggable)"]
        BDB["BanyanDB\n(recommended)"]
        ES["Elasticsearch\n/ OpenSearch"]
        CHSW["ClickHouse"]
        PG["PostgreSQL"]
    end

    Probes -->|gRPC/REST| RECV
    RECV --> ANAL
    ANAL --> DSL
    DSL --> Storage
    ALERT --> Storage
    TOPO --> Storage
    QUERY --> Storage
    QUERY --> UI["SkyWalking UI"]

V2 Engine Architecture (v10.4.0)

The v10.4.0 release introduces a major engine overhaul, replacing the Groovy-based DSL runtime with ANTLR4 parser + Javassist bytecode generation:

OAL V2 (Observability Analysis Language)

Feature V1 (Groovy) V2 (ANTLR4 + Javassist)
AST model Mutable, Groovy closures Immutable, type-safe
Thread safety ThreadLocal-dependent No shared mutable state
Error reporting Runtime exceptions File, line, column at parse time
Testability Requires parsing Models constructible without parsing

MAL V2 (Metric Analysis Language)

Converts Prometheus metrics into SkyWalking's internal metric model:

  • Speedup: ~6.8x faster execution vs Groovy V1
  • Compile-time validation: Syntax errors caught at startup
  • Immutable AST: Thread-safe without ThreadLocal

LAL V2 (Log Analysis Language)

Processes log streams for extraction, filtering, and routing:

  • Compile: ~39x faster than Groovy V1
  • Execute: ~2.8x faster
  • Breaking Change: slowSql {} and sampledTrace {} sub-DSLs replaced with outputType mechanism

BanyanDB

BanyanDB is SkyWalking's purpose-built observability database — a combined columnar + time-series DB:

Architecture

flowchart LR
    subgraph BanyanDB["BanyanDB Cluster"]
        Liaison["Liaison Node\n(query routing)"]
        Data1["Data Node 1\n(shard owner)"]
        Data2["Data Node 2\n(shard owner)"]
        DataN["Data Node N"]
    end

    OAP["OAP Server"] -->|gRPC| Liaison
    Liaison --> Data1
    Liaison --> Data2
    Liaison --> DataN

Storage Model

Concept Description
Group Logical namespace (e.g., sw_metric, sw_trace)
Measure Metric storage — columnar format optimized for aggregation
Stream Log/trace storage — append-only time-ordered
IndexRule Secondary index definitions for query acceleration

BanyanDB vs Elasticsearch

Dimension BanyanDB Elasticsearch
RAM usage ~5x less Baseline
Disk usage ~30% less Baseline
Hot/Warm/Cold Built-in lifecycle stages Requires ILM policies
Guardrails Disk-usage thresholds, query memory protectors External monitoring
Purpose Designed for observability General-purpose search

CLI Tool: bydbctl

# List groups
bydbctl group list

# Query a measure
bydbctl measure query --group sw_metric --name service_cpm

# Create an index rule
bydbctl indexrule create -f index-rule.yaml

# Check cluster status
# BanyanDB exposes HTTP UI on port 17913
curl http://banyandb:17913/api/healthz

Virtual Thread Support (JDK 25+)

v10.4.0 adds virtual thread support for JDK 25+:

Pool JDK < 25 JDK 25+
gRPC server handlers Cached platform (unbounded) Virtual threads
HTTP blocking handlers Cached platform (max 200) Virtual threads
Total OAP threads 150+ ~72 (~50% reduction)

BatchQueue (Replaces DataCarrier)

v10.4.0 replaces the legacy DataCarrier with BatchQueue:

Queue Old Threads New Threads Old Buffer Slots New Buffer Slots
L1 Aggregation 26 10 (unified OAL+MAL) ~12.5M ~6.6M
L2 Persistence 3 4 (unified) ~1.34M ~660K
TopN Persistence 4 1 4K 4K
Total 36 15 ~13.9M ~7.3M

Sources


Benchmarks

Performance data for GraalVM Distro, BanyanDB vs Elasticsearch, and V2 engine improvements.

GraalVM Distro Benchmarks

Test Environment

Parameter Value
Hardware Apple M3 Max
OS macOS, Docker Desktop
Resources 10 CPUs, 62.7 GB RAM
Storage BanyanDB
Load Kind + Istio 1.25.2 + Bookinfo at ~20 RPS
Replicas 2 OAP replicas
Samples 30 at 10s intervals after 60s warmup

Boot Test (median of 3 runs)

Metric JVM OAP GraalVM Native Improvement
Startup time ~635 ms ~5 ms 127x faster
Idle memory (RSS) ~1.2 GiB ~41 MiB 97% reduction

Under Sustained Load (~20 RPS)

Metric JVM OAP GraalVM Native Improvement
Memory under load ~2.0 GiB ~629 MiB 70% reduction
Warm-up required Yes (JIT) No Instant peak throughput
Traffic processing Baseline Identical CPM No throughput penalty

Operational Benefits

Aspect Impact
Pod rescheduling 5ms startup → telemetry gap eliminated
Edge/sidecar deployment 41 MiB idle makes it practical
CVE surface No JDK runtime → fewer CVEs
Cost 70% less RAM per pod at production load

Source: SkyWalking GraalVM Distro Blog, March 2026.

BanyanDB vs Elasticsearch

Test Conditions

Parameter Value
BanyanDB version v0.6.1
Elasticsearch version 8.13.2
Workload SkyWalking APM telemetry

Results

Metric BanyanDB Elasticsearch Advantage
Memory usage Baseline ~5x more BanyanDB 5x less RAM
Disk usage Baseline ~30% more BanyanDB 30% less disk
Disk IOPS Baseline ~5x more BanyanDB 5x less I/O
Disk throughput Baseline ~4x more BanyanDB 4x less bandwidth
CPU usage Slightly higher Baseline Elasticsearch slightly better (compression overhead)

Source: BanyanDB benchmark blog — official SkyWalking community testing.

V2 Engine Performance (v10.4.0)

MAL V2 (Metric Analysis Language)

Operation V1 (Groovy) V2 (ANTLR4) Speedup
Execution Baseline ~6.8x faster Immutable AST, no ThreadLocal

LAL V2 (Log Analysis Language)

Operation V1 (Groovy) V2 (ANTLR4) Speedup
Compile Baseline ~39x faster ANTLR4 parse
Execute Baseline ~2.8x faster Javassist bytecode

Thread Reduction

Pool V1 Threads V2 Threads Reduction
Total OAP threads 150+ ~72 ~50%
L1 Aggregation 26 10 62%
L2 Persistence 3 4 (unified)
TopN Persistence 4 1 75%

Buffer Slot Reduction

Queue V1 Slots V2 Slots Reduction
L1 Aggregation ~12.5M ~6.6M 47%
L2 Persistence ~1.34M ~660K 51%

Caveats

  • GraalVM Distro is experimental — current scope: BanyanDB storage, standalone + K8s only
  • BanyanDB benchmarks used v0.6.1; production version is now v0.10.1 with additional optimizations
  • V2 engine speedups measured against Groovy-based V1; production workload impact will vary

Sources