Skip to content

Architecture

MySQL is a relational database management system built around a pluggable storage engine architecture. The server handles connection management, query parsing, optimisation, and execution, while the storage engine layer manages how data is physically stored and retrieved. InnoDB has been the default storage engine since MySQL 5.5 and is the only engine that supports ACID transactions, foreign keys, and crash recovery.

See also: index, architecture, operations, security


Server Component Architecture

MySQL's server layer is shared across all storage engines. It handles connection establishment, authentication, SQL parsing, optimisation, and execution plan dispatch.

graph TB
    CLIENT["Client Connection<br/>(mysql / connector)"]
    CONN["Connection Handler<br/>(Thread-per-connection or ThreadPool)"]
    PARSE["SQL Parser<br/>(Lexer + Yacc/Bison)"]
    PREP["Preprocessor<br/>(Resolve tables, columns, views)"]
    OPT["Query Optimizer<br/>(Cost-based)"]
    EXEC["Query Executor<br/>(Iterator executor)"]
    CACHE["Query Cache<br/>(Removed in 8.0)"]
    API["Handler API<br/>(Storage Engine Interface)"]
    INNODB["InnoDB Storage Engine"]
    MYISAM["MyISAM Engine"]
    OTHER["Other Engines<br/>(Memory, Archive, NDB)"]

    CLIENT --> CONN
    CONN --> PARSE
    PARSE --> PREP
    PREP --> OPT
    OPT --> EXEC
    EXEC --> API
    API --> INNODB
    API --> MYISAM
    API --> OTHER

    style INNODB fill:#e8f5e9,stroke:#4caf50,color:#000
    style OPT fill:#fff3e0,stroke:#ff9800,color:#000

Connection Management

Each client connection gets its own thread (or is serviced by a thread pool when using the thread pool plugin). The connection thread handles authentication, receives SQL statements, and returns results. MySQL supports both TCP/IP and Unix socket connections.

Query Optimizer

MySQL uses a cost-based optimizer that estimates the cost of alternative execution plans by considering table statistics, index cardinality, and available access paths. Key optimizer features include:

  • Range optimisation -- scans only the relevant index range.
  • Index merge -- uses multiple indexes for a single table access.
  • Hash joins -- available since MySQL 8.0.20 for equi-join conditions, replacing the older nested-loop-only approach.
  • Subquery materialisation -- materialises subquery results into a temporary table.
  • Window functions -- natively supported since 8.0.

InnoDB Storage Engine

InnoDB is the transactional storage engine that provides ACID compliance, row-level locking, and crash recovery. It is the default and recommended engine for most workloads.

graph TB
    subgraph "InnoDB Memory Structures"
        BP["Buffer Pool<br/>(data + index pages)"]
        CB["Change Buffer<br/>(secondary index modifications)"]
        AHI["Adaptive Hash Index"]
        LB["Log Buffer<br/>(redo log entries)"]
    end

    subgraph "InnoDB Disk Structures"
        TS["System Tablespace<br/>(ibdata1)"]
        FPT["File-Per-Table<br/>Tablespaces (.ibd)"]
        RL["Redo Log Files<br/>(ib_logfile0/1)"]
        UL["Undo Tablespaces<br/>(mysql.ibd or undo_001)"]
        DW["Doublewrite Buffer"]
    end

    BP --- TS
    BP --- FPT
    LB --> RL
    BP --> DW
    CB --> TS

    style BP fill:#e3f2fd,stroke:#2196f3,color:#000
    style LB fill:#fce4ec,stroke:#e91e63,color:#000
    style RL fill:#fce4ec,stroke:#e91e63,color:#000

Buffer Pool

The buffer pool is the most critical memory structure in InnoDB. It caches data pages (16 KiB each), index pages, and undo log pages in memory to reduce disk I/O.

Parameter Purpose
innodb_buffer_pool_size Total buffer pool memory. Recommended 50-80% of system RAM on dedicated servers.
innodb_buffer_pool_instances Number of partitioned instances (default 1 or 8 depending on size). Reduces latch contention on multi-core systems.
innodb_old_blocks_time Milliseconds a page stays in the old sublist before being promoted. Protects against full table scans evicting hot data.

The buffer pool uses a modified LRU algorithm split into a young sublist (frequently accessed) and an old sublist (recently read). Pages promoted from old to young survive longer.

Change Buffer

The change buffer caches modifications to secondary index pages when those pages are not currently in the buffer pool. Changes are merged later when the affected pages are read into memory during a SELECT.

  • Applies to INSERT, UPDATE, and DELETE operations on non-unique secondary indexes.
  • Reduces random disk I/O by amortising secondary index updates.
  • Monitored via INFORMATION_SCHEMA.INNODB_CMP and INNODB_CMP_RESET.

Deprecation Notice

The change buffer is deprecated as of MySQL 8.0 and is removed in MySQL 8.4+. For new deployments on fast storage (SSD/NVMe), the benefit is marginal.

Adaptive Hash Index (AHI)

InnoDB monitors index lookups and, if it detects that certain index pages are frequently accessed with equality predicates, automatically builds an in-memory hash index on top of the B-tree. This optimises point queries at the cost of additional overhead for DML operations.

  • Controlled by innodb_adaptive_hash_index (default ON).
  • Can cause contention under high concurrency with many writes.
  • Removed in MySQL 9.0.

Doublewrite Buffer

The doublewrite buffer protects against torn page writes (partial 16 KiB page writes caused by a crash mid-I/O). Before writing a page to its final position on disk, InnoDB first writes it to a contiguous area in the system tablespace called the doublewrite buffer. If a torn page is detected during crash recovery, the complete copy is retrieved from the doublewrite buffer.

  • Since MySQL 8.0.20, doublewrite files can be stored in a separate directory via innodb_doublewrite_dir.
  • Controlled by innodb_doublewrite (default ON). Do not disable in production.

Redo Log and Undo Log

Redo Log

The redo log records all changes made to InnoDB data pages. It enables crash recovery by replaying committed transactions that have not yet been flushed to data files.

  • Circular log -- consists of a fixed number of files (historically ib_logfile0, ib_logfile1). Old entries are overwritten once they are no longer needed for recovery.
  • Log buffer -- in-memory buffer where redo entries are first written. Flushed to disk based on innodb_flush_log_at_trx_commit:
    • 1 (default, safest): flush to disk on every transaction commit.
    • 2: flush to OS cache on commit, fsynced once per second.
    • 0: flushed once per second regardless of commit.
  • Capacity control -- since MySQL 8.0.30, innodb_redo_log_capacity replaces the older innodb_log_file_size and innodb_log_files_in_group parameters.

Undo Log

Undo logs (rollback segments) store the pre-modification state of rows. They serve two purposes:

  1. Rollback -- restoring data to its prior state on ROLLBACK.
  2. MVCC -- providing a consistent read view for transactions that started before the modifying transaction committed.

Undo logs are stored in dedicated undo tablespaces. When no active transaction needs a given undo record, the purge thread reclaims the space.


Binary Log (Binlog)

The binary log is a server-level log (separate from the InnoDB redo log) that records all data modification events in a format suitable for replication and point-in-time recovery.

Format Description
STATEMENT Logs the SQL statement. Compact but may produce different results on replicas with different data distributions.
ROW (default) Logs the actual row changes. Larger but deterministic.
MIXED Uses statement format by default, switches to row for non-deterministic statements.

Binlog events include: Query, Gtid, Table_map, Write_rows, Update_rows, Delete_rows, Xid (commit marker).


Replication Architecture

MySQL supports multiple replication modes, each with different consistency and availability trade-offs.

graph TB
    subgraph "Source (Primary)"
        SRC_SQL["SQL Thread"]
        BINLOG["Binary Log"]
        DUMP["Binlog Dump Thread"]
    end

    subgraph "Replica (Secondary)"
        IO["IO Thread<br/>(receives binlog events)"]
        RELAY["Relay Log"]
        SQL_R["SQL Thread<br/>(applies events)"]
        REPL_DATA["Data Files"]
    end

    SRC_SQL --> BINLOG
    BINLOG --> DUMP
    DUMP -->|"TCP/IP"| IO
    IO --> RELAY
    RELAY --> SQL_R
    SQL_R --> REPL_DATA

    style BINLOG fill:#fff3e0,stroke:#ff9800,color:#000
    style RELAY fill:#e8f5e9,stroke:#4caf50,color:#000

GTID (Global Transaction Identifiers)

GTIDs uniquely identify every transaction committed on the source. Each GTID has the form source_id:transaction_sequence, for example 3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5. GTIDs simplify:

  • Change promotion -- promoting a replica to source without manually computing binlog positions.
  • Failover -- orchestration tools can determine which replica has the most complete transaction history.
  • Idempotent replay -- a transaction with an already-applied GTID is skipped.

Enable with: gtid_mode=ON, enforce_gtid_consistency=ON.

Replication Topologies

Topology Consistency Latency Impact Notes
Asynchronous (default) Eventual None on source Source does not wait for replica acknowledgement. Replica may lag.
Semi-synchronous Near-strong Source waits for 1 ACK Source waits for at least one replica to acknowledge receipt of the transaction. Falls back to async if timeout is reached.
Group Replication Strong (single-primary) Consensus overhead Uses Paxos-based consensus. All members must agree on transaction ordering. Supports single-primary (automatic failover) or multi-primary modes.

Group Replication Requirements

Group Replication requires: binlog_format=ROW, gtid_mode=ON, enforce_gtid_consistency=ON, log_slave_updates=ON, and transaction_write_set_extraction=XXHASH64. Only InnoDB tables are supported.

Group Replication Flow

sequenceDiagram
    participant C as Client
    participant P as Primary (Source)
    participant G as Group Communication<br/>(Paxos Consensus)
    participant R1 as Replica 1
    participant R2 as Replica 2

    C->>P: BEGIN; INSERT ...; COMMIT
    P->>P: Write to binlog (GTID assigned)
    P->>G: Broadcast transaction write set
    G->>R1: Deliver write set
    G->>R2: Deliver write set
    R1-->>G: Acknowledge (certify + apply)
    R2-->>G: Acknowledge (certify + apply)
    G-->>P: Majority reached
    P->>P: Commit transaction
    P-->>C: COMMIT OK

InnoDB Data Storage

Tablespaces

Tablespace Type Description
System tablespace (ibdata1) Data dictionary, doublewrite buffer, change buffer. Configured via innodb_data_file_path.
File-per-table (table_name.ibd) Each InnoDB table has its own .ibd file. Default since MySQL 5.6.6 (innodb_file_per_table=ON).
General tablespace Shared tablespace created with CREATE TABLESPACE. Can hold multiple tables.
Undo tablespace Dedicated undo logs. Configured via innodb_undo_tablespaces.
Temporary tablespace Temporary tables and intermediate sort results (ibtmp1).

Page and Extent Structure

  • Page size -- default 16 KiB (configurable: 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB).
  • Extent -- 1 MiB contiguous block (64 pages of 16 KiB each).
  • Segment -- a collection of extents. Each index has two segments: one for non-leaf pages and one for leaf pages.

Key Configuration Parameters

Parameter Default Purpose
innodb_buffer_pool_size 128 MiB Buffer pool size. Tune to 50-80% of RAM.
innodb_flush_log_at_trx_commit 1 Redo log durability. Set to 1 for ACID.
sync_binlog 1 Binlog fsync frequency. Set to 1 for maximum durability.
innodb_io_capacity 200 IOPS hint for background tasks (compaction, flush).
innodb_io_capacity_max 2000 Upper bound for adaptive flushing.
gtid_mode OFF Enable GTIDs for replication.
binlog_format ROW Binary log format. ROW recommended for Group Replication.

Sources


How It Works

InnoDB storage engine, buffer pool, redo/undo logs, Group Replication, and HeatWave accelerator.

InnoDB Architecture

flowchart TB
    subgraph Server["MySQL Server Layer"]
        Parser["SQL Parser"]
        Optimizer["Query Optimizer"]
        Executor["Executor"]
    end

    subgraph InnoDB["InnoDB Storage Engine"]
        BP["Buffer Pool\n(page cache)"]
        CL["Change Buffer\n(secondary index updates)"]
        AHI["Adaptive Hash Index"]
        Redo["Redo Log\n(WAL for crash recovery)"]
        Undo["Undo Log\n(MVCC rollback)"]
        TS["Tablespaces\n(.ibd files)"]
    end

    Server --> InnoDB
    BP --> TS
    Redo --> TS

    style BP fill:#1565c0,color:#fff
    style Redo fill:#e65100,color:#fff

Group Replication

flowchart LR
    subgraph GR["Group Replication (Paxos)"]
        P["Primary\n(read-write)"]
        S1["Secondary 1\n(read-only)"]
        S2["Secondary 2\n(read-only)"]
    end

    P <-->|"Paxos consensus"| S1
    P <-->|"Paxos consensus"| S2
    S1 <-->|"Paxos consensus"| S2

    Client_M["Client"] --> P
    Client_M -.->|"read replicas"| S1
    Client_M -.->|"read replicas"| S2

    style P fill:#e65100,color:#fff

InnoDB Buffer Pool

The buffer pool is InnoDB's primary memory structure -- it caches table and index data to avoid disk reads:

  • Default size: 128 MiB (production typically sets to 50-80% of available RAM)
  • Page size: 16 KiB by default (configurable: 4K, 8K, 16K, 32K, 64K)
  • LRU eviction: Uses a modified LRU algorithm with a midpoint insertion strategy. New pages are inserted at the "midpoint" (5/8 from the tail) rather than the head, preventing a full table scan from evicting useful data
  • Flush strategy: Dirty pages are flushed in the background by the page cleaner thread. The innodb_io_capacity setting controls flush rate (default: 200 IOPS)

Redo Log (WAL)

InnoDB uses a Write-Ahead Log for crash recovery:

  1. Before modifying a data page, InnoDB writes the modification to the redo log
  2. Redo log is sequential: All writes append to a circular buffer (ib_logfile0, ib_logfile1), avoiding random I/O
  3. Group commit: Multiple transactions can share a single redo log write, reducing I/O operations
  4. Checkpoint: InnoDB advances the checkpoint position as dirty pages are flushed to disk, reclaiming redo log space

The redo log uses the innodb_flush_log_at_trx_commit setting: | Value | Durability | Performance | |-------|-----------|-------------| | 1 (default) | ACID-safe: fsync per commit | Slower (1 I/O per commit) | | 2 | fsync to OS cache per commit, flushed to disk ~1/sec | Moderate | | 0 | Written to cache, flushed ~1/sec | Fastest (can lose ~1 sec of data) |

Undo Log and MVCC

MySQL implements Multi-Version Concurrency Control via undo logs:

  • When a row is updated, InnoDB writes the old version to the undo log before writing the new version
  • Read consistency: Each transaction sees a snapshot of data as of its start time by following the undo log chain to find the appropriate version
  • Purge thread: A background thread removes undo logs when no active transaction needs them. If the purge thread falls behind, the undo tablespace grows

Sources


Benchmarks

Scope

Performance characteristics, scaling limits, and resource consumption for MySQL.

sysbench Results

Hardware Threads Read QPS Write QPS Read/Write QPS
4 vCPU, 16Gi 16 25,000 5,000 15,000
8 vCPU, 32Gi 32 60,000 12,000 35,000
16 vCPU, 64Gi 64 120,000 25,000 70,000

InnoDB Buffer Pool Efficiency

Pool Size vs Data Read Hit Ratio Random I/O
Pool > Data > 99.9% Minimal
Pool = 50% Data 95-99% Moderate
Pool = 10% Data 60-80% Heavy

Group Replication

Metric 3 nodes 5 nodes 9 nodes
Certification latency < 1ms 1-2ms 2-5ms
Write throughput 90% of standalone 80% 70%
Failover time 5-30s 5-30s 10-60s

Scaling Limits

Dimension Limit Notes
Table size 1TB comfortable Partition beyond 100M rows
Concurrent connections 150,000 (max) Use connection pooling
Queries/sec 100k+ point reads Depends on working set vs RAM

Sourcing Status

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.

Sources