Architecture¶
Hermes Agent is a Python-based AI agent built around a synchronous orchestration loop with pluggable memory, skills, terminal backends, and platform adapters. This page covers the internals of each major subsystem.
Agent Loop¶
The core of Hermes Agent is run_conversation() — a synchronous orchestration engine that handles provider selection, prompt construction, tool execution, retries, fallback mechanisms, context compression, and session persistence.
Turn Lifecycle¶
Each iteration follows a defined sequence:
- Generate task ID and append the user message
- Build system prompt — assembles stable prompt components plus Honcho context layers
- Preflight compression check — if token usage is near the limit, compress context before the API call
- Build API messages — convert internal message format to OpenAI-format messages with tool schemas
- Inject ephemeral prompt layers — session overlays and prefill messages added at call time (not baked into the stable prefix, to preserve provider-side prompt caching)
- Interruptible API call — send to the configured LLM provider
- Parse response — branch on tool calls vs. text
- If tool calls — dispatch each via
handle_function_call(), append results, continue loop - If text response — persist session, flush memory, return
flowchart TD
A[User Message] --> B[Build System Prompt]
B --> C{Token Limit Near?}
C -->|Yes| D[Compress Context]
C -->|No| E[Build API Messages]
D --> E
E --> F[Inject Ephemeral Layers]
F --> G[LLM API Call]
G --> H{Response Type}
H -->|Tool Calls| I[Dispatch via handle_function_call]
I --> J[Append Results]
J --> G
H -->|Text| K[Persist Session]
K --> L[Flush Memory]
L --> M[Return Response]
Prompt Architecture¶
The prompt system separates stable and ephemeral components to maximize provider-side prompt caching:
- Stable prefix — system instructions, tool schemas, skill context, Honcho base context. Remains identical across turns within a session.
- Ephemeral layers — session overlays, prefill messages, dialectic supplement. Injected only at API call time to avoid invalidating cached tokens.
The system supports three API modes for different provider backends (OpenAI-compatible, Anthropic native, and custom endpoints).
Memory System¶
Hermes uses a three-layer memory architecture that provides both immediate recall and long-term learning.
flowchart TB
subgraph L1["Layer 1: Session Context"]
SC[In-Memory Messages]
SC_NOTE["Scope: current conversation<br>Retrieval: immediate"]
end
subgraph L2["Layer 2: Session History"]
SH[SQLite + FTS5]
SH_NOTE["Scope: all past sessions<br>Retrieval: full-text search"]
end
subgraph L3["Layer 3: User Model"]
UM[Honcho Dialectic]
UM_NOTE["Scope: cross-session identity<br>Retrieval: dialectic modeling"]
end
subgraph L4["Layer 4: Skills"]
SK[Markdown Files]
SK_NOTE["Scope: persistent knowledge<br>Retrieval: pattern matching"]
end
L1 --> L2
L2 --> L3
L3 --> L4
Layer 1 -- Session Context¶
In-memory message list for the current conversation. Subject to automatic context compression when approaching the provider's token limit.
Layer 2 -- Session History (SQLite + FTS5)¶
All past sessions are persisted to SQLite with FTS5 full-text search. Sessions include lineage tracking across compressions, per-platform isolation, and atomic writes with contention handling. Users can search their own conversation history via hermes search and receive LLM-powered summarization of results.
Layer 3 -- Honcho User Modeling¶
Honcho provides AI-native cross-session user modeling with multi-pass dialectic reasoning. It operates in three modes, configurable via hermes honcho mode:
| Mode | Behavior |
|---|---|
local |
SQLite-only memory, no Honcho calls |
honcho |
Full Honcho cloud integration |
hybrid |
Local memory + Honcho context injection (default) |
Every turn (in hybrid or honcho mode), Honcho assembles two layers of context injected into the system prompt:
- Base context — session summary, user representation, user peer card, AI self-representation, AI identity card
- Dialectic supplement — LLM-synthesized reasoning about the user's current state and needs
Both layers are concatenated and truncated to the contextTokens budget if set.
Layer 4 -- Skills¶
Structured markdown files stored in ~/.hermes/skills/. See #Skill Engine below.
Skill Engine¶
The skill engine is Hermes Agent's core differentiator — it enables autonomous creation, storage, retrieval, and self-improvement of reusable task knowledge.
Skill Lifecycle¶
flowchart LR
A[Complex Task<br>5+ tool calls] --> B[Agent Creates<br>Skill Document]
B --> C[Stored as<br>SKILL.md]
C --> D[Pattern-Matched<br>on Future Tasks]
D --> E{Skill Correct?}
E -->|Yes| F[Used As-Is]
E -->|Outdated/Wrong| G[Self-Improve:<br>Patch In-Place]
G --> C
F --> H{Eligible for<br>Evolution?}
H -->|Yes| I[DSPy + GEPA<br>Optimization]
I --> C
Skill Document Format¶
Skills are stored as structured markdown with YAML frontmatter:
---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
platforms: [macos, linux]
metadata:
hermes:
tags: [python, automation]
category: devops
requires_toolsets: [terminal]
config:
- key: my.setting
description: "What this controls"
default: "value"
---
# Skill Title
## When to Use
Trigger conditions for this skill.
## Procedure
1. Step one
2. Step two
## Pitfalls
- Known failure modes and fixes
## Verification
How to confirm it worked.
Autonomous Creation¶
After a complex task finishes (defined as 5+ tool calls), the agent writes a skill document capturing:
- The approach it took
- Edge cases encountered
- Domain knowledge reconstructed during the task
Self-Improvement¶
Skills are patched in real-time when the agent detects:
- Outdated content — an API changed or a dependency was updated
- Incomplete coverage — a missing edge case was encountered
- Incorrect output — the skill produced wrong results
Skill Discovery¶
Skills are loaded from three locations:
- User skills —
~/.hermes/skills/ - Project skills —
.hermes/skills/in the current directory - Hub skills — installed via
hermes skills installfrom registries (official, skills.sh, well-known)
Skills are compatible with the agentskills.io open standard.
Self-Evolution System (DSPy + GEPA)¶
The companion repository hermes-agent-self-evolution uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve skills, tool descriptions, system prompts, and agent code.
Evolution Pipeline¶
flowchart TD
A[Current Skills/Prompts] --> B[Read Execution Traces]
B --> C[Understand Why Things Fail]
C --> D[LLM Generates Text Variants<br>via Mutation]
D --> E[Evaluate Variants<br>Against Test Cases]
E --> F{Multi-Objective<br>Pareto-Optimal?}
F -->|Yes| G[Keep Variant]
F -->|No| H[Discard]
G --> I[Selection:<br>Quality + Cost + Speed]
I --> A
GEPA Process¶
- Mutate — LLM generates text variants of skills/prompts. The GEPA optimizer reads execution traces to understand why things fail, not just that they failed, then proposes targeted improvements.
- Evaluate — Run variants against test cases using DSPy evaluation frameworks (COPRO for gradient-free search, MIPRO for instruction tuning with validation sets).
- Select — Keep Pareto-optimal variants across multiple objectives: quality, cost, and speed.
- Repeat — Evolutionary pressure produces measurably better versions over successive runs.
Operational Characteristics¶
- No GPU training required — operates entirely via LLM API calls
- Cost: $2--10 per optimization run
- Underlying research: ICLR 2026 Oral Paper
- MIT licensed
Multi-Platform Gateway¶
The messaging gateway is a single background process that manages connections to all configured platforms, handles user sessions, executes cron jobs, and delivers voice messages.
Platform Adapters¶
Each platform has a dedicated adapter in gateway/platforms/ extending BaseAdapter:
| Adapter | Protocol |
|---|---|
telegram.py |
Telegram Bot API (long polling or webhook) |
discord.py |
Discord bot via discord.py |
slack.py |
Slack Socket Mode |
whatsapp.py |
WhatsApp Business Cloud API |
signal.py |
Signal via signal-cli REST API |
matrix.py |
Matrix via mautrix (optional E2EE) |
mattermost.py |
Mattermost WebSocket API |
email.py |
Email via IMAP/SMTP |
sms.py |
SMS via Twilio |
dingtalk.py |
DingTalk WebSocket |
feishu.py |
Feishu/Lark WebSocket or webhook |
wecom.py |
WeCom (WeChat Work) callback |
weixin.py |
Weixin (personal WeChat) via iLink Bot API |
bluebubbles.py |
Apple iMessage via BlueBubbles macOS server |
qqbot.py |
QQ Bot (Tencent QQ) via Official API v2 |
webhook.py |
Inbound/outbound webhook adapter |
api_server.py |
REST API server adapter |
homeassistant.py |
Home Assistant conversation integration |
All platforms get full tool access, not just chat — the same agent capabilities are available from Telegram as from the CLI.
Gateway Architecture¶
The gateway routes incoming messages from any platform adapter through a unified session manager to the agent loop. Sessions are isolated per-platform, and each adapter handles media attachments and platform-specific message formatting independently.
Terminal Backends¶
Six execution backends determine where the agent's shell commands run:
| Backend | Isolation | Use Case | Lifecycle |
|---|---|---|---|
| Local | None (host machine) | Development, personal use | Persistent |
| Docker | Container (hardened) | Isolation, reproducibility | Long-lived container, docker exec per command, cleaned up on session end |
| SSH | Remote server | Remote execution | Persistent remote session |
| Daytona | Sandbox | Serverless persistence | Hibernates when idle |
| Singularity | Container (HPC) | HPC clusters | Per-command or persistent |
| Modal | Serverless sandbox | Cloud pay-per-use | Near-zero idle cost |
Configuration is via config.yaml or the TERMINAL_ENV environment variable. Container-based backends (Docker, Singularity, Modal, Daytona) default to the nikolaik/python-nodejs:python3.11-nodejs20 image.
Dangerous command handling
In the local backend, Hermes checks every command against a curated list of dangerous patterns (recursive deletes, SQL drops, piping curl to shell, etc.) and prompts for approval. In container backends, dangerous command checks are skipped because the container itself is the security boundary.
Plugin System¶
The plugin system supports three discovery sources:
- User plugins —
~/.hermes/plugins/ - Project plugins —
.hermes/plugins/ - pip entry points — installed Python packages that register as Hermes plugins
Plugins can register:
- Tools — custom tool schemas and handlers
- Hooks — event callbacks (e.g.,
post_tool_call) - CLI commands — custom subcommands added to the
hermesCLI
Two specialized plugin types exist with single-select semantics:
- Memory providers — alternative memory backends (e.g., the Honcho plugin)
- Context engines — custom context injection systems
Plugin loading occurs at startup via the register(ctx) function, which receives a context object for registering tools, hooks, and commands.