Skip to content

Security & Data Sovereignty

The LLM Wiki pattern fundamentally alters the security and privacy model of AI-assisted knowledge management. By shifting from cloud-based RAG platforms to a local-first, agent-driven workflow, you regain data sovereignty while introducing new risks related to agent filesystem access.

This document outlines the threat model, access controls, and encryption strategies necessary to secure an LLM Wiki.

1. Identity & Authentication (Auth)

In a traditional cloud RAG system (e.g., ChatGPT or NotebookLM), authentication is handled by the vendor. In a local LLM Wiki, authentication is decoupled into two distinct domains: Human Auth and Agent Auth.

Human Authentication

Because the wiki is stored as a directory of markdown files (often inside Obsidian), human access is governed by the host operating system. - Local Access: Authentication relies on standard OS-level login mechanisms (biometrics, passwords). - Remote Access: If the wiki is synced across devices (e.g., via Obsidian Sync or Git), authentication is handled by the sync provider. Obsidian Sync uses E2EE with a custom password, while Git relies on SSH keys or Personal Access Tokens (PATs).

Agent Authentication

The AI agent (Claude Code, Codex, or local models) must be authenticated to interact with your data. - API Keys: For cloud-backed agents (like Claude Code), API keys must be secured in local environment variables (e.g., .zshrc or .env files) and never committed to the wiki repository. - Model Context Protocol (MCP): If the agent uses qmd or other MCP servers to read the wiki, the MCP server runs locally under the user's OS permissions. No separate authentication is required between the agent CLI and the local MCP server, provided both run on the same machine under the same user profile.

2. Access Control (Authz)

Authorization is the primary security challenge when using autonomous agents. Giving an LLM agent read/write access to your local filesystem carries inherent risks.

Agent Permissions Model

  • Scoped Read/Write: Agents should be restricted to the specific directory containing the knowledge base (e.g., ~/Documents/obsidian-vault/). They should never be run from the root directory (/) or ~ home directory, as this could allow them to read SSH keys or system configurations.
  • The Raw Sources Immutability Contract: As defined in the architecture, the raw/ directory must be treated as read-only by the agent. While local operating systems do not easily enforce granular read-only permissions for specific scripts running under the user profile, this constraint must be heavily enforced via the agent's system prompt (e.g., inside AGENTS.md).

Multi-User / Internal Team Wikis

If the LLM Wiki pattern is deployed for a business or team, authorization becomes more complex: - Role-Based Access Control (RBAC): In a team setting, the wiki is typically hosted in a central Git repository. The agent runs in a CI/CD pipeline, processing PRs or new documents. Human engineers review the agent's generated markdown before merging. - Segregation of Duty: The agent has write access to the feature branch, but only human maintainers have merge permissions to main.

3. Threat Model & Encryption

Data Leakage via Prompts

When using cloud-backed agents (e.g., Anthropic API, OpenAI API), the contents of your local markdown files are transmitted to the cloud provider during ingestion and query phases. - Risk: Sensitive personal data, trade secrets, or proprietary code within the wiki could be logged by the API provider. - Mitigation: Rely on enterprise API agreements that guarantee zero-data retention (data is not used for model training). For absolute security, replace the cloud agent with a local GGUF model running on llama.cpp or Ollama.

Malicious Prompt Injection via Raw Sources

  • Risk: A user ingests a malicious webpage or PDF containing prompt injection attacks (e.g., hidden text instructing the agent to delete the workspace).
  • Mitigation: Modern agents (like Claude Code) employ tool-use confirmation and sandboxing, requiring user approval for destructive commands (like rm). The agent must never have permission to execute code found within a raw source without explicit human consent.

Data at Rest Encryption

Because the wiki is stored as plain-text markdown, it is highly vulnerable to physical theft or unauthorized local access. - Full Disk Encryption (FDE): Ensure FileVault (macOS), BitLocker (Windows), or LUKS (Linux) is enabled on the host machine. - Vault-Level Encryption: For highly sensitive wikis, use tools like Cryptomator to create an encrypted virtual drive where the markdown files are stored. The LLM agent can only access the files when the drive is mounted and decrypted by the user. - Sync Encryption: If utilizing cloud sync (Git, iCloud, Dropbox), ensure End-to-End Encryption (E2EE) is applied before the data leaves the local machine.

Summary Checklist

  • Ensure API keys are stored securely in environment variables, not in the wiki.
  • Run the agent ONLY within the scoped knowledge directory.
  • Enforce the "Read-Only Raw Sources" rule in the AGENTS.md schema.
  • Enable Full Disk Encryption on the host machine.
  • Ensure the agent requires human approval for terminal commands.