Skip to content

Architecture

Hermes Agent is a Python-based AI agent built around a synchronous orchestration loop with pluggable memory, skills, terminal backends, and platform adapters. This page covers the internals of each major subsystem.

Agent Loop

The core of Hermes Agent is run_conversation() — a synchronous orchestration engine that handles provider selection, prompt construction, tool execution, retries, fallback mechanisms, context compression, and session persistence.

Turn Lifecycle

Each iteration follows a defined sequence:

  1. Generate task ID and append the user message
  2. Build system prompt — assembles stable prompt components plus Honcho context layers
  3. Preflight compression check — if token usage is near the limit, compress context before the API call
  4. Build API messages — convert internal message format to OpenAI-format messages with tool schemas
  5. Inject ephemeral prompt layers — session overlays and prefill messages added at call time (not baked into the stable prefix, to preserve provider-side prompt caching)
  6. Interruptible API call — send to the configured LLM provider
  7. Parse response — branch on tool calls vs. text
  8. If tool calls — dispatch each via handle_function_call(), append results, continue loop
  9. If text response — persist session, flush memory, return
flowchart TD
    A[User Message] --> B[Build System Prompt]
    B --> C{Token Limit Near?}
    C -->|Yes| D[Compress Context]
    C -->|No| E[Build API Messages]
    D --> E
    E --> F[Inject Ephemeral Layers]
    F --> G[LLM API Call]
    G --> H{Response Type}
    H -->|Tool Calls| I[Dispatch via handle_function_call]
    I --> J[Append Results]
    J --> G
    H -->|Text| K[Persist Session]
    K --> L[Flush Memory]
    L --> M[Return Response]

Prompt Architecture

The prompt system separates stable and ephemeral components to maximize provider-side prompt caching:

  • Stable prefix — system instructions, tool schemas, skill context, Honcho base context. Remains identical across turns within a session.
  • Ephemeral layers — session overlays, prefill messages, dialectic supplement. Injected only at API call time to avoid invalidating cached tokens.

The system supports three API modes for different provider backends (OpenAI-compatible, Anthropic native, and custom endpoints).

Memory System

Hermes uses a three-layer memory architecture that provides both immediate recall and long-term learning.

flowchart TB
    subgraph L1["Layer 1: Session Context"]
        SC[In-Memory Messages]
        SC_NOTE["Scope: current conversation<br>Retrieval: immediate"]
    end
    subgraph L2["Layer 2: Session History"]
        SH[SQLite + FTS5]
        SH_NOTE["Scope: all past sessions<br>Retrieval: full-text search"]
    end
    subgraph L3["Layer 3: User Model"]
        UM[Honcho Dialectic]
        UM_NOTE["Scope: cross-session identity<br>Retrieval: dialectic modeling"]
    end
    subgraph L4["Layer 4: Skills"]
        SK[Markdown Files]
        SK_NOTE["Scope: persistent knowledge<br>Retrieval: pattern matching"]
    end

    L1 --> L2
    L2 --> L3
    L3 --> L4

Layer 1 -- Session Context

In-memory message list for the current conversation. Subject to automatic context compression when approaching the provider's token limit.

Layer 2 -- Session History (SQLite + FTS5)

All past sessions are persisted to SQLite with FTS5 full-text search. Sessions include lineage tracking across compressions, per-platform isolation, and atomic writes with contention handling. Users can search their own conversation history via hermes search and receive LLM-powered summarization of results.

Layer 3 -- Honcho User Modeling

Honcho provides AI-native cross-session user modeling with multi-pass dialectic reasoning. It operates in three modes, configurable via hermes honcho mode:

Mode Behavior
local SQLite-only memory, no Honcho calls
honcho Full Honcho cloud integration
hybrid Local memory + Honcho context injection (default)

Every turn (in hybrid or honcho mode), Honcho assembles two layers of context injected into the system prompt:

  • Base context — session summary, user representation, user peer card, AI self-representation, AI identity card
  • Dialectic supplement — LLM-synthesized reasoning about the user's current state and needs

Both layers are concatenated and truncated to the contextTokens budget if set.

Layer 4 -- Skills

Structured markdown files stored in ~/.hermes/skills/. See #Skill Engine below.

Skill Engine

The skill engine is Hermes Agent's core differentiator — it enables autonomous creation, storage, retrieval, and self-improvement of reusable task knowledge.

Skill Lifecycle

flowchart LR
    A[Complex Task<br>5+ tool calls] --> B[Agent Creates<br>Skill Document]
    B --> C[Stored as<br>SKILL.md]
    C --> D[Pattern-Matched<br>on Future Tasks]
    D --> E{Skill Correct?}
    E -->|Yes| F[Used As-Is]
    E -->|Outdated/Wrong| G[Self-Improve:<br>Patch In-Place]
    G --> C
    F --> H{Eligible for<br>Evolution?}
    H -->|Yes| I[DSPy + GEPA<br>Optimization]
    I --> C

Skill Document Format

Skills are stored as structured markdown with YAML frontmatter:

---
name: my-skill
description: Brief description of what this skill does
version: 1.0.0
platforms: [macos, linux]
metadata:
  hermes:
    tags: [python, automation]
    category: devops
    requires_toolsets: [terminal]
    config:
      - key: my.setting
        description: "What this controls"
        default: "value"
---

# Skill Title

## When to Use
Trigger conditions for this skill.

## Procedure
1. Step one
2. Step two

## Pitfalls
- Known failure modes and fixes

## Verification
How to confirm it worked.

Autonomous Creation

After a complex task finishes (defined as 5+ tool calls), the agent writes a skill document capturing:

  • The approach it took
  • Edge cases encountered
  • Domain knowledge reconstructed during the task

Self-Improvement

Skills are patched in real-time when the agent detects:

  • Outdated content — an API changed or a dependency was updated
  • Incomplete coverage — a missing edge case was encountered
  • Incorrect output — the skill produced wrong results

Skill Discovery

Skills are loaded from three locations:

  1. User skills~/.hermes/skills/
  2. Project skills.hermes/skills/ in the current directory
  3. Hub skills — installed via hermes skills install from registries (official, skills.sh, well-known)

Skills are compatible with the agentskills.io open standard.

Self-Evolution System (DSPy + GEPA)

The companion repository hermes-agent-self-evolution uses DSPy + GEPA (Genetic-Pareto Prompt Evolution) to automatically evolve skills, tool descriptions, system prompts, and agent code.

Evolution Pipeline

flowchart TD
    A[Current Skills/Prompts] --> B[Read Execution Traces]
    B --> C[Understand Why Things Fail]
    C --> D[LLM Generates Text Variants<br>via Mutation]
    D --> E[Evaluate Variants<br>Against Test Cases]
    E --> F{Multi-Objective<br>Pareto-Optimal?}
    F -->|Yes| G[Keep Variant]
    F -->|No| H[Discard]
    G --> I[Selection:<br>Quality + Cost + Speed]
    I --> A

GEPA Process

  1. Mutate — LLM generates text variants of skills/prompts. The GEPA optimizer reads execution traces to understand why things fail, not just that they failed, then proposes targeted improvements.
  2. Evaluate — Run variants against test cases using DSPy evaluation frameworks (COPRO for gradient-free search, MIPRO for instruction tuning with validation sets).
  3. Select — Keep Pareto-optimal variants across multiple objectives: quality, cost, and speed.
  4. Repeat — Evolutionary pressure produces measurably better versions over successive runs.

Operational Characteristics

  • No GPU training required — operates entirely via LLM API calls
  • Cost: $2--10 per optimization run
  • Underlying research: ICLR 2026 Oral Paper
  • MIT licensed

Multi-Platform Gateway

The messaging gateway is a single background process that manages connections to all configured platforms, handles user sessions, executes cron jobs, and delivers voice messages.

Platform Adapters

Each platform has a dedicated adapter in gateway/platforms/ extending BaseAdapter:

Adapter Protocol
telegram.py Telegram Bot API (long polling or webhook)
discord.py Discord bot via discord.py
slack.py Slack Socket Mode
whatsapp.py WhatsApp Business Cloud API
signal.py Signal via signal-cli REST API
matrix.py Matrix via mautrix (optional E2EE)
mattermost.py Mattermost WebSocket API
email.py Email via IMAP/SMTP
sms.py SMS via Twilio
dingtalk.py DingTalk WebSocket
feishu.py Feishu/Lark WebSocket or webhook
wecom.py WeCom (WeChat Work) callback
weixin.py Weixin (personal WeChat) via iLink Bot API
bluebubbles.py Apple iMessage via BlueBubbles macOS server
qqbot.py QQ Bot (Tencent QQ) via Official API v2
webhook.py Inbound/outbound webhook adapter
api_server.py REST API server adapter
homeassistant.py Home Assistant conversation integration

All platforms get full tool access, not just chat — the same agent capabilities are available from Telegram as from the CLI.

Gateway Architecture

The gateway routes incoming messages from any platform adapter through a unified session manager to the agent loop. Sessions are isolated per-platform, and each adapter handles media attachments and platform-specific message formatting independently.

Terminal Backends

Six execution backends determine where the agent's shell commands run:

Backend Isolation Use Case Lifecycle
Local None (host machine) Development, personal use Persistent
Docker Container (hardened) Isolation, reproducibility Long-lived container, docker exec per command, cleaned up on session end
SSH Remote server Remote execution Persistent remote session
Daytona Sandbox Serverless persistence Hibernates when idle
Singularity Container (HPC) HPC clusters Per-command or persistent
Modal Serverless sandbox Cloud pay-per-use Near-zero idle cost

Configuration is via config.yaml or the TERMINAL_ENV environment variable. Container-based backends (Docker, Singularity, Modal, Daytona) default to the nikolaik/python-nodejs:python3.11-nodejs20 image.

Dangerous command handling

In the local backend, Hermes checks every command against a curated list of dangerous patterns (recursive deletes, SQL drops, piping curl to shell, etc.) and prompts for approval. In container backends, dangerous command checks are skipped because the container itself is the security boundary.

Plugin System

The plugin system supports three discovery sources:

  1. User plugins~/.hermes/plugins/
  2. Project plugins.hermes/plugins/
  3. pip entry points — installed Python packages that register as Hermes plugins

Plugins can register:

  • Tools — custom tool schemas and handlers
  • Hooks — event callbacks (e.g., post_tool_call)
  • CLI commands — custom subcommands added to the hermes CLI

Two specialized plugin types exist with single-select semantics:

  • Memory providers — alternative memory backends (e.g., the Honcho plugin)
  • Context engines — custom context injection systems

Plugin loading occurs at startup via the register(ctx) function, which receives a context object for registering tools, hooks, and commands.

Sources