Skip to content

Architecture of an LLM Wiki

The architecture of an LLM Wiki is fundamentally different from a standard Retrieval-Augmented Generation (RAG) pipeline. While RAG operates as a stateless lookup mechanism attached to a generic LLM, the LLM Wiki pattern establishes a stateful, hierarchical system where the LLM acts as an active maintainer of an intermediate knowledge graph.

This document breaks down the structural components, data flows, and performance considerations of the LLM Wiki architecture.

Component Breakdown

The architecture is built upon three primary layers:

1. The Raw Sources Layer

This is the foundational layer, consisting of immutable, curated documents. - Raw Files (raw/): A directory containing the actual source material. This includes PDFs, raw markdown clippings (e.g., from Obsidian Web Clipper), meeting transcripts, and raw data files. - Media Assets (raw/assets/): Downloaded images, charts, and media referenced by the raw files. Keeping these local ensures that external URLs do not break and allows vision-capable LLMs to process them securely. - Immutability Contract: The LLM is granted read-only access to this layer. It may never modify, summarize over, or delete a raw source. This ensures the ground truth is always preserved.

2. The Wiki Layer

This is the "compiled" persistence layer owned by the LLM. It acts as the synthesized interface between the raw data and the user. - Topic Hubs (index.md files): Directory-level landing pages that summarize a specific domain, link to child pages, and outline outstanding questions. - Entity & Concept Pages: Dedicated markdown files representing specific people, products, architectural patterns, or ideas (e.g., kubernetes.md or andrej-karpathy.md). - Global Index (index.md / wiki-index.md): The master catalog of the entire wiki. It lists every page, a one-line summary, and metadata (like source count). - Audit Log (log.md): A chronological, append-only record of every action the LLM has taken (ingests, queries, lint passes), allowing for system rollback and state tracking.

3. The Schema / Control Plane

This layer dictates how the LLM interacts with the Wiki layer. - Instruction Schema (CLAUDE.md, AGENTS.md): The configuration file containing the rules of engagement. It defines the folder shapes, required markdown conventions (e.g., MkDocs admonitions), linking rules, and anti-hallucination guardrails. - Agent Runtime (Claude Code / Codex): The execution engine that processes tasks according to the schema. - Local Search Engine (qmd): An optional but critical component for scaling. It provides BM25 and vector search capabilities to the Agent Runtime via the Model Context Protocol (MCP), bypassing context window limitations.

How It Works: The Ingestion Lifecycle

The defining feature of the LLM Wiki is the ingestion process—the act of "compiling" raw sources into the knowledge graph.

Ingestion Data Flow

When a new file is dropped into the Raw Sources layer, the LLM initiates the ingestion workflow:

sequenceDiagram
    participant User
    participant Agent as LLM Agent (Claude Code)
    participant Raw as Raw Sources Layer
    participant Wiki as Wiki Layer
    participant Search as Search Engine (qmd)

    User->>Raw: Adds `new-article.md`
    User->>Agent: "Ingest this source"
    Agent->>Raw: Reads `new-article.md`
    Agent->>Search: Queries existing concepts related to article
    Search-->>Agent: Returns relevant `entity.md` paths

    rect rgb(30, 40, 50)
        Note over Agent,Wiki: The Synthesis Phase
        Agent->>Wiki: Creates `ref-new-article.md` (Provenance)
        Agent->>Wiki: Updates `entity.md` (Integrates new facts)
        Agent->>Wiki: Flags contradictions in `entity.md` (if any)
    end

    Agent->>Wiki: Updates Global `index.md`
    Agent->>Wiki: Appends entry to `log.md`
    Agent-->>User: "Ingestion complete. Updated 3 pages."

1. Source Reading and Fact Extraction

The LLM parses the raw markdown. If images are present, it invokes vision capabilities to extract diagrams or charts. It identifies key entities, claims, and architectural patterns.

2. Context Retrieval

Before writing, the LLM reads the Global index.md (or uses qmd) to determine if pages already exist for the identified entities.

3. Incremental Synthesis

This is the core "compilation" step. The LLM does not just copy the source. It actively merges the new facts into existing pages. - If the new source states that a tool has a new feature, the LLM appends it to that tool's architecture.md. - If the new source contradicts an existing claim, the LLM explicitly documents the contradiction (e.g., using a > [!WARNING] admonition).

4. Bookkeeping

The LLM ensures the graph remains navigable. It adds bidirectional wikilinks, updates the YAML frontmatter (e.g., last_checked: 2026-07-01), and writes a timestamped summary of its actions to log.md.

System Architecture: RAG vs. LLM Wiki

To fully understand the LLM Wiki, it must be contrasted with standard stateless RAG.

flowchart TD
    subgraph "Stateless RAG Pattern"
        R1[Raw Document] --> R2[(Vector DB)]
        R3[User Query] --> R4[Similarity Search]
        R2 -.-> R4
        R4 --> R5[LLM Context Window]
        R5 --> R6[Ephemeral Answer]
    end

    subgraph "LLM Wiki Pattern (Stateful)"
        W1[Raw Document] --> W2[LLM Agent]
        W2 -- Ingests & Synthesizes --> W3[(Markdown Wiki)]
        W3 -- Cross-References --> W3
        W4[User Query] --> W5[LLM Agent]
        W5 -- Reads Compiled Wiki --> W3
        W3 -.-> W5
        W5 --> W6[Answer / New Wiki Page]
        W6 -- Files Back --> W3
    end

In the stateless pattern, the LLM must perform heavy reasoning and synthesis at query time, dealing with fragmented, conflicting chunks retrieved by a dumb vector search. In the LLM Wiki pattern, the reasoning and synthesis are performed at ingest time. The vector search (if used) retrieves highly structured, pre-synthesized markdown, drastically reducing query latency and hallucinations.

Scalability and Benchmarks

The primary constraint on an LLM Wiki is not the size of the raw data, but the LLM's ability to navigate the compiled markdown structure.

1. Small Scale (0 - 100 Sources, ~200 Pages)

At this scale, the architecture relies purely on the Global index.md and log.md. - Mechanism: The LLM reads the index (which contains a 1-line summary of every page), decides which 3-5 pages are relevant to the query, and reads them directly. - Performance: Highly efficient. Token usage is minimal because the index file remains small (approx. 5,000 - 10,000 tokens). Query latency is dominated by file read I/O, which is negligible on local SSDs.

2. Medium Scale (100 - 1,000 Sources, ~2,000 Pages)

The global index becomes too large for efficient context-window usage. - Mechanism: Introduction of hierarchical indices. The domain landing pages (e.g., knowledge/databases/index.md) act as routing nodes. The LLM reads the root index, jumps to the domain index, and then to the specific page. - Performance: Slower due to sequential tool calls (read root -> read domain -> read page). Token usage increases slightly.

3. Large Scale (1,000+ Sources, 5,000+ Pages)

Hierarchical traversal becomes brittle and slow. The architecture requires a dedicated retrieval engine. - Mechanism: Integration of tools like qmd (Query Markup Documents). The LLM drops manual traversal and issues MCP queries to qmd. - qmd Internals: qmd utilizes a hybrid BM25 (keyword) and Vector (semantic) pipeline, processed entirely on-device via GGUF models. It returns the top-K relevant markdown files directly to the agent. - Performance: Query time stabilizes. The LLM only processes the highly relevant files returned by qmd. The local reranker ensures that context windows are not flooded with tangential data.