Architecture¶

Hermes Agent is a Python-based AI agent built around a synchronous orchestration loop with pluggable memory, skills, terminal backends, and platform adapters. This page covers the internals of each major subsystem.

Agent Loop¶

The core of Hermes Agent is run_conversation() — a synchronous orchestration engine that handles provider selection, prompt construction, tool execution, retries, fallback mechanisms, context compression, and session persistence.

Turn Lifecycle¶

Each iteration follows a defined sequence:

Generate task ID and append the user message
Build system prompt — assembles stable prompt components plus Honcho context layers
Preflight compression check — if token usage is near the limit, compress context before the API call
Build API messages — convert internal message format to OpenAI-format messages with tool schemas
Inject ephemeral prompt layers — session overlays and prefill messages added at call time (not baked into the stable prefix, to preserve provider-side prompt caching)
Interruptible API call — send to the configured LLM provider
Parse response — branch on tool calls vs. text
If tool calls — dispatch each via handle_function_call(), append results, continue loop
If text response — persist session, flush memory, return

flowchart TD
    A[User Message] --> B[Build System Prompt]
    B --> C{Token Limit Near?}
    C -->|Yes| D[Compress Context]
    C -->|No| E[Build API Messages]
    D --> E
    E --> F[Inject Ephemeral Layers]
    F --> G[LLM API Call]
    G --> H{Response Type}
    H -->|Tool Calls| I[Dispatch via handle_function_call]
    I --> J[Append Results]
    J --> G
    H -->|Text| K[Persist Session]
    K --> L[Flush Memory]
    L --> M[Return Response]

Prompt Architecture¶

The prompt system separates stable and ephemeral components to maximize provider-side prompt caching:

Stable prefix — system instructions, tool schemas, skill context, Honcho base context. Remains identical across turns within a session.
Ephemeral layers — session overlays, prefill messages, dialectic supplement. Injected only at API call time to avoid invalidating cached tokens.

The system supports three API modes for different provider backends (OpenAI-compatible, Anthropic native, and custom endpoints).

Memory System¶

Hermes uses a three-layer memory architecture that provides both immediate recall and long-term learning.

flowchart TB
    subgraph L1["Layer 1: Session Context"]
        SC[In-Memory Messages]
        SC_NOTE["Scope: current conversation<br>Retrieval: immediate"]
    end
    subgraph L2["Layer 2: Session History"]
        SH[SQLite + FTS5]
        SH_NOTE["Scope: all past sessions<br>Retrieval: full-text search"]
    end
    subgraph L3["Layer 3: User Model"]
        UM[Honcho Dialectic]
        UM_NOTE["Scope: cross-session identity<br>Retrieval: dialectic modeling"]
    end
    subgraph L4["Layer 4: Skills"]
        SK[Markdown Files]
        SK_NOTE["Scope: persistent knowledge<br>Retrieval: pattern matching"]
    end

    L1 --> L2
    L2 --> L3
    L3 --> L4

Layer 1 -- Session Context¶

In-memory message list for the current conversation. Subject to automatic context compression when approaching the provider's token limit.

Layer 2 -- Session History (SQLite + FTS5)¶

All past sessions are persisted to SQLite with FTS5 full-text search. Sessions include lineage tracking across compressions, per-platform isolation, and atomic writes with contention handling. Users can search their own conversation history via hermes search and receive LLM-powered summarization of results.

Layer 3 -- Honcho User Modeling¶

Honcho provides AI-native cross-session user modeling with multi-pass dialectic reasoning. It operates in three modes, configurable via hermes honcho mode:

Mode	Behavior
`local`	SQLite-only memory, no Honcho calls
`honcho`	Full Honcho cloud integration
`hybrid`	Local memory + Honcho context injection (default)

Every turn (in hybrid or honcho mode), Honcho assembles two layers of context injected into the system prompt:

Base context — session summary, user representation, user peer card, AI self-representation, AI identity card
Dialectic supplement — LLM-synthesized reasoning about the user's current state and needs

Both layers are concatenated and truncated to the contextTokens budget if set.

Layer 4 -- Skills¶

Structured markdown files stored in ~/.hermes/skills/. See #Skill Engine below.

Skill Engine¶

The skill engine is Hermes Agent's core differentiator — it enables autonomous creation, storage, retrieval, and self-improvement of reusable task knowledge.

Skill Lifecycle¶

flowchart LR
    A[Complex Task<br>5+ tool calls] --> B[Agent Creates<br>Skill Document]
    B --> C[Stored as<br>SKILL.md]
    C --> D[Pattern-Matched<br>on Future Tasks]
    D --> E{Skill Correct?}
    E -->|Yes| F[Used As-Is]
    E -->|Outdated/Wrong| G[Self-Improve:<br>Patch In-Place]
    G --> C
    F --> H{Eligible for<br>Evolution?}
    H -->|Yes| I[DSPy + GEPA<br>Optimization]
    I --> C

Skill Document Format¶

Skills are stored as structured markdown with YAML frontmatter:

---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
platforms: [macos, linux]
metadata:
  hermes:
    tags: [python, automation]
    category: devops
    requires_toolsets: [terminal]
    config:
      - key: my.setting
        description: "What this controls"
        default: "value"
---

# Skill Title

## When to Use
Trigger conditions for this skill.

## Procedure
1. Step one
2. Step two

## Pitfalls
- Known failure modes and fixes

## Verification
How to confirm it worked.

Autonomous Creation¶

After a complex task finishes (defined as 5+ tool calls), the agent writes a skill document capturing:

The approach it took
Edge cases encountered
Domain knowledge reconstructed during the task

Self-Improvement¶

Skills are patched in real-time when the agent detects:

Outdated content — an API changed or a dependency was updated
Incomplete coverage — a missing edge case was encountered
Incorrect output — the skill produced wrong results

Skill Discovery¶

Skills are loaded from three locations:

User skills — ~/.hermes/skills/
Project skills — .hermes/skills/ in the current directory
Hub skills — installed via hermes skills install from registries (official, skills.sh, well-known)

Skills are compatible with the agentskills.io open standard.

Self-Evolution System (DSPy + GEPA)¶

The companion repository hermes-agent-self-evolution uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve skills, tool descriptions, system prompts, and agent code.

Evolution Pipeline¶

flowchart TD
    A[Current Skills/Prompts] --> B[Read Execution Traces]
    B --> C[Understand Why Things Fail]
    C --> D[LLM Generates Text Variants<br>via Mutation]
    D --> E[Evaluate Variants<br>Against Test Cases]
    E --> F{Multi-Objective<br>Pareto-Optimal?}
    F -->|Yes| G[Keep Variant]
    F -->|No| H[Discard]
    G --> I[Selection:<br>Quality + Cost + Speed]
    I --> A

GEPA Process¶

Mutate — LLM generates text variants of skills/prompts. The GEPA optimizer reads execution traces to understand why things fail, not just that they failed, then proposes targeted improvements.
Evaluate — Run variants against test cases using DSPy evaluation frameworks (COPRO for gradient-free search, MIPRO for instruction tuning with validation sets).
Select — Keep Pareto-optimal variants across multiple objectives: quality, cost, and speed.
Repeat — Evolutionary pressure produces measurably better versions over successive runs.

Operational Characteristics¶

No GPU training required — operates entirely via LLM API calls
Cost: $2--10 per optimization run
Underlying research: ICLR 2026 Oral Paper
MIT licensed

Multi-Platform Gateway¶

The messaging gateway is a single background process that manages connections to all configured platforms, handles user sessions, executes cron jobs, and delivers voice messages.

Platform Adapters¶

Each platform has a dedicated adapter in gateway/platforms/ extending BaseAdapter:

Adapter	Protocol
`telegram.py`	Telegram Bot API (long polling or webhook)
`discord.py`	Discord bot via discord.py
`slack.py`	Slack Socket Mode
`whatsapp.py`	WhatsApp Business Cloud API
`signal.py`	Signal via signal-cli REST API
`matrix.py`	Matrix via mautrix (optional E2EE)
`mattermost.py`	Mattermost WebSocket API
`email.py`	Email via IMAP/SMTP
`sms.py`	SMS via Twilio
`dingtalk.py`	DingTalk WebSocket
`feishu.py`	Feishu/Lark WebSocket or webhook
`wecom.py`	WeCom (WeChat Work) callback
`weixin.py`	Weixin (personal WeChat) via iLink Bot API
`bluebubbles.py`	Apple iMessage via BlueBubbles macOS server
`qqbot.py`	QQ Bot (Tencent QQ) via Official API v2
`webhook.py`	Inbound/outbound webhook adapter
`api_server.py`	REST API server adapter
`homeassistant.py`	Home Assistant conversation integration

All platforms get full tool access, not just chat — the same agent capabilities are available from Telegram as from the CLI.

Gateway Architecture¶

The gateway routes incoming messages from any platform adapter through a unified session manager to the agent loop. Sessions are isolated per-platform, and each adapter handles media attachments and platform-specific message formatting independently.

Terminal Backends¶

Six execution backends determine where the agent's shell commands run:

Backend	Isolation	Use Case	Lifecycle
Local	None (host machine)	Development, personal use	Persistent
Docker	Container (hardened)	Isolation, reproducibility	Long-lived container, `docker exec` per command, cleaned up on session end
SSH	Remote server	Remote execution	Persistent remote session
Daytona	Sandbox	Serverless persistence	Hibernates when idle
Singularity	Container (HPC)	HPC clusters	Per-command or persistent
Modal	Serverless sandbox	Cloud pay-per-use	Near-zero idle cost

Configuration is via config.yaml or the TERMINAL_ENV environment variable. Container-based backends (Docker, Singularity, Modal, Daytona) default to the nikolaik/python-nodejs:python3.11-nodejs20 image.

Dangerous command handling

In the local backend, Hermes checks every command against a curated list of dangerous patterns (recursive deletes, SQL drops, piping curl to shell, etc.) and prompts for approval. In container backends, dangerous command checks are skipped because the container itself is the security boundary.

Plugin System¶

The plugin system supports three discovery sources:

User plugins — ~/.hermes/plugins/
Project plugins — .hermes/plugins/
pip entry points — installed Python packages that register as Hermes plugins

Plugins can register:

Tools — custom tool schemas and handlers
Hooks — event callbacks (e.g., post_tool_call)
CLI commands — custom subcommands added to the hermes CLI

Two specialized plugin types exist with single-select semantics:

Memory providers — alternative memory backends (e.g., the Honcho plugin)
Context engines — custom context injection systems

Plugin loading occurs at startup via the register(ctx) function, which receives a context object for registering tools, hooks, and commands.