Architecture¶

MySQL is a relational database management system built around a pluggable storage engine architecture. The server handles connection management, query parsing, optimisation, and execution, while the storage engine layer manages how data is physically stored and retrieved. InnoDB has been the default storage engine since MySQL 5.5 and is the only engine that supports ACID transactions, foreign keys, and crash recovery.

See also: index, architecture, operations, security

Server Component Architecture¶

MySQL's server layer is shared across all storage engines. It handles connection establishment, authentication, SQL parsing, optimisation, and execution plan dispatch.

graph TB
    CLIENT["Client Connection<br/>(mysql / connector)"]
    CONN["Connection Handler<br/>(Thread-per-connection or ThreadPool)"]
    PARSE["SQL Parser<br/>(Lexer + Yacc/Bison)"]
    PREP["Preprocessor<br/>(Resolve tables, columns, views)"]
    OPT["Query Optimizer<br/>(Cost-based)"]
    EXEC["Query Executor<br/>(Iterator executor)"]
    CACHE["Query Cache<br/>(Removed in 8.0)"]
    API["Handler API<br/>(Storage Engine Interface)"]
    INNODB["InnoDB Storage Engine"]
    MYISAM["MyISAM Engine"]
    OTHER["Other Engines<br/>(Memory, Archive, NDB)"]

    CLIENT --> CONN
    CONN --> PARSE
    PARSE --> PREP
    PREP --> OPT
    OPT --> EXEC
    EXEC --> API
    API --> INNODB
    API --> MYISAM
    API --> OTHER

    style INNODB fill:#e8f5e9,stroke:#4caf50,color:#000
    style OPT fill:#fff3e0,stroke:#ff9800,color:#000

Connection Management¶

Each client connection gets its own thread (or is serviced by a thread pool when using the thread pool plugin). The connection thread handles authentication, receives SQL statements, and returns results. MySQL supports both TCP/IP and Unix socket connections.

Query Optimizer¶

MySQL uses a cost-based optimizer that estimates the cost of alternative execution plans by considering table statistics, index cardinality, and available access paths. Key optimizer features include:

Range optimisation -- scans only the relevant index range.
Index merge -- uses multiple indexes for a single table access.
Hash joins -- available since MySQL 8.0.20 for equi-join conditions, replacing the older nested-loop-only approach.
Subquery materialisation -- materialises subquery results into a temporary table.
Window functions -- natively supported since 8.0.

InnoDB Storage Engine¶

InnoDB is the transactional storage engine that provides ACID compliance, row-level locking, and crash recovery. It is the default and recommended engine for most workloads.

graph TB
    subgraph "InnoDB Memory Structures"
        BP["Buffer Pool<br/>(data + index pages)"]
        CB["Change Buffer<br/>(secondary index modifications)"]
        AHI["Adaptive Hash Index"]
        LB["Log Buffer<br/>(redo log entries)"]
    end

    subgraph "InnoDB Disk Structures"
        TS["System Tablespace<br/>(ibdata1)"]
        FPT["File-Per-Table<br/>Tablespaces (.ibd)"]
        RL["Redo Log Files<br/>(ib_logfile0/1)"]
        UL["Undo Tablespaces<br/>(mysql.ibd or undo_001)"]
        DW["Doublewrite Buffer"]
    end

    BP --- TS
    BP --- FPT
    LB --> RL
    BP --> DW
    CB --> TS

    style BP fill:#e3f2fd,stroke:#2196f3,color:#000
    style LB fill:#fce4ec,stroke:#e91e63,color:#000
    style RL fill:#fce4ec,stroke:#e91e63,color:#000

Buffer Pool¶

The buffer pool is the most critical memory structure in InnoDB. It caches data pages (16 KiB each), index pages, and undo log pages in memory to reduce disk I/O.

Parameter	Purpose
`innodb_buffer_pool_size`	Total buffer pool memory. Recommended 50-80% of system RAM on dedicated servers.
`innodb_buffer_pool_instances`	Number of partitioned instances (default 1 or 8 depending on size). Reduces latch contention on multi-core systems.
`innodb_old_blocks_time`	Milliseconds a page stays in the old sublist before being promoted. Protects against full table scans evicting hot data.

The buffer pool uses a modified LRU algorithm split into a young sublist (frequently accessed) and an old sublist (recently read). Pages promoted from old to young survive longer.

Change Buffer¶

The change buffer caches modifications to secondary index pages when those pages are not currently in the buffer pool. Changes are merged later when the affected pages are read into memory during a SELECT.

Applies to INSERT, UPDATE, and DELETE operations on non-unique secondary indexes.
Reduces random disk I/O by amortising secondary index updates.
Monitored via INFORMATION_SCHEMA.INNODB_CMP and INNODB_CMP_RESET.

Deprecation Notice

The change buffer is deprecated as of MySQL 8.0 and is removed in MySQL 8.4+. For new deployments on fast storage (SSD/NVMe), the benefit is marginal.

Adaptive Hash Index (AHI)¶

InnoDB monitors index lookups and, if it detects that certain index pages are frequently accessed with equality predicates, automatically builds an in-memory hash index on top of the B-tree. This optimises point queries at the cost of additional overhead for DML operations.

Controlled by innodb_adaptive_hash_index (default ON).
Can cause contention under high concurrency with many writes.
Removed in MySQL 9.0.

Doublewrite Buffer¶

The doublewrite buffer protects against torn page writes (partial 16 KiB page writes caused by a crash mid-I/O). Before writing a page to its final position on disk, InnoDB first writes it to a contiguous area in the system tablespace called the doublewrite buffer. If a torn page is detected during crash recovery, the complete copy is retrieved from the doublewrite buffer.

Since MySQL 8.0.20, doublewrite files can be stored in a separate directory via innodb_doublewrite_dir.
Controlled by innodb_doublewrite (default ON). Do not disable in production.

Redo Log and Undo Log¶

Redo Log¶

The redo log records all changes made to InnoDB data pages. It enables crash recovery by replaying committed transactions that have not yet been flushed to data files.

Circular log -- consists of a fixed number of files (historically ib_logfile0, ib_logfile1). Old entries are overwritten once they are no longer needed for recovery.
Log buffer -- in-memory buffer where redo entries are first written. Flushed to disk based on innodb_flush_log_at_trx_commit:
- 1 (default, safest): flush to disk on every transaction commit.
- 2: flush to OS cache on commit, fsynced once per second.
- 0: flushed once per second regardless of commit.
Capacity control -- since MySQL 8.0.30, innodb_redo_log_capacity replaces the older innodb_log_file_size and innodb_log_files_in_group parameters.

Undo Log¶

Undo logs (rollback segments) store the pre-modification state of rows. They serve two purposes:

Rollback -- restoring data to its prior state on ROLLBACK.
MVCC -- providing a consistent read view for transactions that started before the modifying transaction committed.

Undo logs are stored in dedicated undo tablespaces. When no active transaction needs a given undo record, the purge thread reclaims the space.

Binary Log (Binlog)¶

The binary log is a server-level log (separate from the InnoDB redo log) that records all data modification events in a format suitable for replication and point-in-time recovery.

Format	Description
STATEMENT	Logs the SQL statement. Compact but may produce different results on replicas with different data distributions.
ROW (default)	Logs the actual row changes. Larger but deterministic.
MIXED	Uses statement format by default, switches to row for non-deterministic statements.

Binlog events include: Query, Gtid, Table_map, Write_rows, Update_rows, Delete_rows, Xid (commit marker).

Replication Architecture¶

MySQL supports multiple replication modes, each with different consistency and availability trade-offs.

graph TB
    subgraph "Source (Primary)"
        SRC_SQL["SQL Thread"]
        BINLOG["Binary Log"]
        DUMP["Binlog Dump Thread"]
    end

    subgraph "Replica (Secondary)"
        IO["IO Thread<br/>(receives binlog events)"]
        RELAY["Relay Log"]
        SQL_R["SQL Thread<br/>(applies events)"]
        REPL_DATA["Data Files"]
    end

    SRC_SQL --> BINLOG
    BINLOG --> DUMP
    DUMP -->|"TCP/IP"| IO
    IO --> RELAY
    RELAY --> SQL_R
    SQL_R --> REPL_DATA

    style BINLOG fill:#fff3e0,stroke:#ff9800,color:#000
    style RELAY fill:#e8f5e9,stroke:#4caf50,color:#000

GTID (Global Transaction Identifiers)¶

GTIDs uniquely identify every transaction committed on the source. Each GTID has the form source_id:transaction_sequence, for example 3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5. GTIDs simplify:

Change promotion -- promoting a replica to source without manually computing binlog positions.
Failover -- orchestration tools can determine which replica has the most complete transaction history.
Idempotent replay -- a transaction with an already-applied GTID is skipped.

Enable with: gtid_mode=ON, enforce_gtid_consistency=ON.

Replication Topologies¶

Topology	Consistency	Latency Impact	Notes
Asynchronous (default)	Eventual	None on source	Source does not wait for replica acknowledgement. Replica may lag.
Semi-synchronous	Near-strong	Source waits for 1 ACK	Source waits for at least one replica to acknowledge receipt of the transaction. Falls back to async if timeout is reached.
Group Replication	Strong (single-primary)	Consensus overhead	Uses Paxos-based consensus. All members must agree on transaction ordering. Supports single-primary (automatic failover) or multi-primary modes.

Group Replication Requirements

Group Replication requires: binlog_format=ROW, gtid_mode=ON, enforce_gtid_consistency=ON, log_slave_updates=ON, and transaction_write_set_extraction=XXHASH64. Only InnoDB tables are supported.

Group Replication Flow¶

sequenceDiagram
    participant C as Client
    participant P as Primary (Source)
    participant G as Group Communication<br/>(Paxos Consensus)
    participant R1 as Replica 1
    participant R2 as Replica 2

    C->>P: BEGIN; INSERT ...; COMMIT
    P->>P: Write to binlog (GTID assigned)
    P->>G: Broadcast transaction write set
    G->>R1: Deliver write set
    G->>R2: Deliver write set
    R1-->>G: Acknowledge (certify + apply)
    R2-->>G: Acknowledge (certify + apply)
    G-->>P: Majority reached
    P->>P: Commit transaction
    P-->>C: COMMIT OK

InnoDB Data Storage¶

Tablespaces¶

Tablespace Type	Description
System tablespace (`ibdata1`)	Data dictionary, doublewrite buffer, change buffer. Configured via `innodb_data_file_path`.
File-per-table (`table_name.ibd`)	Each InnoDB table has its own `.ibd` file. Default since MySQL 5.6.6 (`innodb_file_per_table=ON`).
General tablespace	Shared tablespace created with `CREATE TABLESPACE`. Can hold multiple tables.
Undo tablespace	Dedicated undo logs. Configured via `innodb_undo_tablespaces`.
Temporary tablespace	Temporary tables and intermediate sort results (`ibtmp1`).

Page and Extent Structure¶

Page size -- default 16 KiB (configurable: 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB).
Extent -- 1 MiB contiguous block (64 pages of 16 KiB each).
Segment -- a collection of extents. Each index has two segments: one for non-leaf pages and one for leaf pages.

Key Configuration Parameters¶

Parameter	Default	Purpose
`innodb_buffer_pool_size`	128 MiB	Buffer pool size. Tune to 50-80% of RAM.
`innodb_flush_log_at_trx_commit`	1	Redo log durability. Set to 1 for ACID.
`sync_binlog`	1	Binlog fsync frequency. Set to 1 for maximum durability.
`innodb_io_capacity`	200	IOPS hint for background tasks (compaction, flush).
`innodb_io_capacity_max`	2000	Upper bound for adaptive flushing.
`gtid_mode`	OFF	Enable GTIDs for replication.
`binlog_format`	ROW	Binary log format. ROW recommended for Group Replication.

Sources¶

How It Works¶

InnoDB storage engine, buffer pool, redo/undo logs, Group Replication, and HeatWave accelerator.

InnoDB Architecture¶

flowchart TB
    subgraph Server["MySQL Server Layer"]
        Parser["SQL Parser"]
        Optimizer["Query Optimizer"]
        Executor["Executor"]
    end

    subgraph InnoDB["InnoDB Storage Engine"]
        BP["Buffer Pool\n(page cache)"]
        CL["Change Buffer\n(secondary index updates)"]
        AHI["Adaptive Hash Index"]
        Redo["Redo Log\n(WAL for crash recovery)"]
        Undo["Undo Log\n(MVCC rollback)"]
        TS["Tablespaces\n(.ibd files)"]
    end

    Server --> InnoDB
    BP --> TS
    Redo --> TS

    style BP fill:#1565c0,color:#fff
    style Redo fill:#e65100,color:#fff

Group Replication¶

flowchart LR
    subgraph GR["Group Replication (Paxos)"]
        P["Primary\n(read-write)"]
        S1["Secondary 1\n(read-only)"]
        S2["Secondary 2\n(read-only)"]
    end

    P <-->|"Paxos consensus"| S1
    P <-->|"Paxos consensus"| S2
    S1 <-->|"Paxos consensus"| S2

    Client_M["Client"] --> P
    Client_M -.->|"read replicas"| S1
    Client_M -.->|"read replicas"| S2

    style P fill:#e65100,color:#fff

InnoDB Buffer Pool¶

The buffer pool is InnoDB's primary memory structure -- it caches table and index data to avoid disk reads:

Default size: 128 MiB (production typically sets to 50-80% of available RAM)
Page size: 16 KiB by default (configurable: 4K, 8K, 16K, 32K, 64K)
LRU eviction: Uses a modified LRU algorithm with a midpoint insertion strategy. New pages are inserted at the "midpoint" (5/8 from the tail) rather than the head, preventing a full table scan from evicting useful data
Flush strategy: Dirty pages are flushed in the background by the page cleaner thread. The innodb_io_capacity setting controls flush rate (default: 200 IOPS)

Redo Log (WAL)¶

InnoDB uses a Write-Ahead Log for crash recovery:

Before modifying a data page, InnoDB writes the modification to the redo log
Redo log is sequential: All writes append to a circular buffer (ib_logfile0, ib_logfile1), avoiding random I/O
Group commit: Multiple transactions can share a single redo log write, reducing I/O operations
Checkpoint: InnoDB advances the checkpoint position as dirty pages are flushed to disk, reclaiming redo log space

The redo log uses the innodb_flush_log_at_trx_commit setting: | Value | Durability | Performance | |-------|-----------|-------------| | 1 (default) | ACID-safe: fsync per commit | Slower (1 I/O per commit) | | 2 | fsync to OS cache per commit, flushed to disk ~1/sec | Moderate | | 0 | Written to cache, flushed ~1/sec | Fastest (can lose ~1 sec of data) |

Undo Log and MVCC¶

MySQL implements Multi-Version Concurrency Control via undo logs:

When a row is updated, InnoDB writes the old version to the undo log before writing the new version
Read consistency: Each transaction sees a snapshot of data as of its start time by following the undo log chain to find the appropriate version
Purge thread: A background thread removes undo logs when no active transaction needs them. If the purge thread falls behind, the undo tablespace grows

Sources¶

Benchmarks¶

Scope

Performance characteristics, scaling limits, and resource consumption for MySQL.

sysbench Results¶

Hardware	Threads	Read QPS	Write QPS	Read/Write QPS
4 vCPU, 16Gi	16	25,000	5,000	15,000
8 vCPU, 32Gi	32	60,000	12,000	35,000
16 vCPU, 64Gi	64	120,000	25,000	70,000

InnoDB Buffer Pool Efficiency¶

Pool Size vs Data	Read Hit Ratio	Random I/O
Pool > Data	> 99.9%	Minimal
Pool = 50% Data	95-99%	Moderate
Pool = 10% Data	60-80%	Heavy

Group Replication¶

Metric	3 nodes	5 nodes	9 nodes
Certification latency	< 1ms	1-2ms	2-5ms
Write throughput	90% of standalone	80%	70%
Failover time	5-30s	5-30s	10-60s

Scaling Limits¶

Dimension	Limit	Notes
Table size	1TB comfortable	Partition beyond 100M rows
Concurrent connections	150,000 (max)	Use connection pooling
Queries/sec	100k+ point reads	Depends on working set vs RAM

Sourcing Status¶

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.