Skip to content

Operating an LLM Wiki

Operating an LLM Wiki requires treating the knowledge base as a codebase and the LLM as your primary engineer. As a human, your role shifts from manual data entry and cross-referencing to curation, orchestration, and review.

This document outlines the standard operating procedures, daily workflows, and tool configurations required to keep an LLM Wiki healthy.

The Three Core Workflows

1. Ingestion

The Ingestion phase is triggered when you add new raw material. It is best to process sources individually or in small batches to maintain high synthesis quality.

Best Practices: - Place raw files in a dedicated 00 inbox/ or raw/ directory. - Invoke the agent and explicitly tell it the domain it should focus on. - Review the agent's summary and specific file modifications before moving the raw source to an archive or permanent sources/ folder.

[!TIP] Use the Obsidian Web Clipper to download articles as markdown. Configure Obsidian to save all image attachments to a central raw/assets/ folder to prevent broken links.

2. Querying

Queries are how you extract value from the wiki. Instead of asking the agent to search the web, you ask it to search the local wiki.

Best Practices: - Force the agent to cite its sources using [[wikilinks]]. - If the agent synthesizes a particularly valuable insight (e.g., a comparison matrix between two tools), instruct it to save that response as a permanent page in the wiki. This ensures that analytical work is captured and compounds over time.

3. Linting

Just like code, knowledge rots. The wiki requires periodic maintenance passes to remain coherent.

Best Practices: - Schedule a weekly "lint pass" where the agent scans the wiki for issues. - Instruct the agent to look for: - Orphan pages (no inbound links). - Stale claims that have been superseded by newer ingestions. - Structural drift (files placed outside of designated domain folders). - Missing metadata (missing YAML frontmatter).

Commands & Recipes

Because the wiki is stored as plain-text markdown, you can leverage standard developer tools to manage and query the knowledge base.

Managing the Log File

The log.md file tracks all agent actions chronologically. If your agent uses a strict formatting prefix, you can easily parse recent actions using standard Unix tools.

# View the last 5 actions taken by the LLM agent
grep "^## \[" log.md | tail -5

# Find all times the agent ingested sources related to "Docker"
grep -A 2 -i "ingest.*docker" log.md

Git Operations

An LLM Wiki is fundamentally a git repository of markdown files. This provides version control, branching (for experimental research), and collaboration out of the box.

# Check what the agent modified during its last ingestion run
git status

# Review the specific changes the agent made to existing concept pages
git diff knowledge/

# Commit a successful ingestion session
git add knowledge/
git commit -m "chore(knowledge): ingest Karpathy LLM Wiki concepts and update indices"

As your wiki scales beyond a few hundred pages, traversing the index.md becomes inefficient. You should install qmd, a local, on-device hybrid search engine.

# 1. Install qmd globally via npm
npm install -g @tobilu/qmd

# 2. Add your Obsidian knowledge folder to the qmd index
qmd collection add ~/Documents/obsidian-vault/knowledge --name knowledge

# 3. Perform a manual semantic search from the terminal
qmd query "how does the ingestion phase work in an LLM Wiki?"

# 4. Perform a fast, exact-keyword BM25 search
qmd search "RAG vs LLM Wiki"

Once installed, you can configure your agent (e.g., Claude Code) to use the qmd MCP server natively, allowing the agent to execute these searches autonomously.

Troubleshooting

Issue: The Agent is Hallucinating Concepts

Cause: The agent is bypassing the index.md or qmd search and relying on its parametric memory (training data) instead of the local wiki. Fix: Update your AGENTS.md or CLAUDE.md schema file with strict instructions: "Always execute a qmd query or read the global index.md before writing a response. Never answer based on general knowledge."

Issue: Conflicting Information on Pages

Cause: A new source contradicted an older source, and the agent blindly appended the new fact without reconciling the conflict. Fix: Run a linting pass. Instruct the agent: "Review the architecture.md file for Kubernetes. Look for contradictory claims regarding networking. Use > [!WARNING] to highlight conflicts and cite the conflicting sources."

Issue: Token Limits Exceeded During Query

Cause: The index.md file has grown too large, and the agent is trying to load the entire wiki directory structure into context. Fix: Transition from manual file traversal to qmd local search. Restrict the agent from reading raw directory listings (ls -R) and force it to use semantic queries.