Skip to content

Apache SkyWalking — How It Works

How SkyWalking's OAP server processes telemetry, the new V2 engine architecture, and BanyanDB's purpose-built storage model.

Architecture Overview

flowchart TB
    subgraph Probes["Probes & Agents"]
        JA["Java Agent\n(bytecode injection)"]
        LA["Language Agents\n(.NET, Go, Python, etc.)"]
        ROVER["Rover\n(eBPF network profiling)"]
        OTEL["OTel Collector\n(OTLP receiver)"]
        ENVOY["Envoy ALS\n(access log service)"]
        SAT["Satellite\n(edge proxy)"]
    end

    subgraph OAP["OAP Server"]
        direction TB
        RECV["Receiver Layer\n(gRPC, REST, Kafka)"]
        ANAL["Analysis Core"]
        subgraph DSL["V2 DSL Engines"]
            OAL["OAL V2\n(metric aggregation)"]
            MAL["MAL V2\n(Prometheus → metrics)"]
            LAL["LAL V2\n(log analysis)"]
        end
        ALERT["Alerting Engine"]
        TOPO["Topology Builder"]
        QUERY["MQE Query Engine"]
    end

    subgraph Storage["Storage (Pluggable)"]
        BDB["BanyanDB\n(recommended)"]
        ES["Elasticsearch\n/ OpenSearch"]
        CHSW["ClickHouse"]
        PG["PostgreSQL"]
    end

    Probes -->|gRPC/REST| RECV
    RECV --> ANAL
    ANAL --> DSL
    DSL --> Storage
    ALERT --> Storage
    TOPO --> Storage
    QUERY --> Storage
    QUERY --> UI["SkyWalking UI"]

V2 Engine Architecture (v10.4.0)

The v10.4.0 release introduces a major engine overhaul, replacing the Groovy-based DSL runtime with ANTLR4 parser + Javassist bytecode generation:

OAL V2 (Observability Analysis Language)

Feature V1 (Groovy) V2 (ANTLR4 + Javassist)
AST model Mutable, Groovy closures Immutable, type-safe
Thread safety ThreadLocal-dependent No shared mutable state
Error reporting Runtime exceptions File, line, column at parse time
Testability Requires parsing Models constructible without parsing

MAL V2 (Metric Analysis Language)

Converts Prometheus metrics into SkyWalking's internal metric model:

  • Speedup: ~6.8x faster execution vs Groovy V1
  • Compile-time validation: Syntax errors caught at startup
  • Immutable AST: Thread-safe without ThreadLocal

LAL V2 (Log Analysis Language)

Processes log streams for extraction, filtering, and routing:

  • Compile: ~39x faster than Groovy V1
  • Execute: ~2.8x faster
  • Breaking Change: slowSql {} and sampledTrace {} sub-DSLs replaced with outputType mechanism

BanyanDB

BanyanDB is SkyWalking's purpose-built observability database — a combined columnar + time-series DB:

Architecture

flowchart LR
    subgraph BanyanDB["BanyanDB Cluster"]
        Liaison["Liaison Node\n(query routing)"]
        Data1["Data Node 1\n(shard owner)"]
        Data2["Data Node 2\n(shard owner)"]
        DataN["Data Node N"]
    end

    OAP["OAP Server"] -->|gRPC| Liaison
    Liaison --> Data1
    Liaison --> Data2
    Liaison --> DataN

Storage Model

Concept Description
Group Logical namespace (e.g., sw_metric, sw_trace)
Measure Metric storage — columnar format optimized for aggregation
Stream Log/trace storage — append-only time-ordered
IndexRule Secondary index definitions for query acceleration

BanyanDB vs Elasticsearch

Dimension BanyanDB Elasticsearch
RAM usage ~5x less Baseline
Disk usage ~30% less Baseline
Hot/Warm/Cold Built-in lifecycle stages Requires ILM policies
Guardrails Disk-usage thresholds, query memory protectors External monitoring
Purpose Designed for observability General-purpose search

CLI Tool: bydbctl

# List groups
bydbctl group list

# Query a measure
bydbctl measure query --group sw_metric --name service_cpm

# Create an index rule
bydbctl indexrule create -f index-rule.yaml

# Check cluster status
# BanyanDB exposes HTTP UI on port 17913
curl http://banyandb:17913/api/healthz

Virtual Thread Support (JDK 25+)

v10.4.0 adds virtual thread support for JDK 25+:

Pool JDK < 25 JDK 25+
gRPC server handlers Cached platform (unbounded) Virtual threads
HTTP blocking handlers Cached platform (max 200) Virtual threads
Total OAP threads 150+ ~72 (~50% reduction)

BatchQueue (Replaces DataCarrier)

v10.4.0 replaces the legacy DataCarrier with BatchQueue:

Queue Old Threads New Threads Old Buffer Slots New Buffer Slots
L1 Aggregation 26 10 (unified OAL+MAL) ~12.5M ~6.6M
L2 Persistence 3 4 (unified) ~1.34M ~660K
TopN Persistence 4 1 4K 4K
Total 36 15 ~13.9M ~7.3M

Sources