Architecture¶
MySQL is a relational database management system built around a pluggable storage engine architecture. The server handles connection management, query parsing, optimisation, and execution, while the storage engine layer manages how data is physically stored and retrieved. InnoDB has been the default storage engine since MySQL 5.5 and is the only engine that supports ACID transactions, foreign keys, and crash recovery.
See also: index, architecture, operations, security
Server Component Architecture¶
MySQL's server layer is shared across all storage engines. It handles connection establishment, authentication, SQL parsing, optimisation, and execution plan dispatch.
graph TB
CLIENT["Client Connection<br/>(mysql / connector)"]
CONN["Connection Handler<br/>(Thread-per-connection or ThreadPool)"]
PARSE["SQL Parser<br/>(Lexer + Yacc/Bison)"]
PREP["Preprocessor<br/>(Resolve tables, columns, views)"]
OPT["Query Optimizer<br/>(Cost-based)"]
EXEC["Query Executor<br/>(Iterator executor)"]
CACHE["Query Cache<br/>(Removed in 8.0)"]
API["Handler API<br/>(Storage Engine Interface)"]
INNODB["InnoDB Storage Engine"]
MYISAM["MyISAM Engine"]
OTHER["Other Engines<br/>(Memory, Archive, NDB)"]
CLIENT --> CONN
CONN --> PARSE
PARSE --> PREP
PREP --> OPT
OPT --> EXEC
EXEC --> API
API --> INNODB
API --> MYISAM
API --> OTHER
style INNODB fill:#e8f5e9,stroke:#4caf50,color:#000
style OPT fill:#fff3e0,stroke:#ff9800,color:#000
Connection Management¶
Each client connection gets its own thread (or is serviced by a thread pool when using the thread pool plugin). The connection thread handles authentication, receives SQL statements, and returns results. MySQL supports both TCP/IP and Unix socket connections.
Query Optimizer¶
MySQL uses a cost-based optimizer that estimates the cost of alternative execution plans by considering table statistics, index cardinality, and available access paths. Key optimizer features include:
- Range optimisation -- scans only the relevant index range.
- Index merge -- uses multiple indexes for a single table access.
- Hash joins -- available since MySQL 8.0.20 for equi-join conditions, replacing the older nested-loop-only approach.
- Subquery materialisation -- materialises subquery results into a temporary table.
- Window functions -- natively supported since 8.0.
InnoDB Storage Engine¶
InnoDB is the transactional storage engine that provides ACID compliance, row-level locking, and crash recovery. It is the default and recommended engine for most workloads.
graph TB
subgraph "InnoDB Memory Structures"
BP["Buffer Pool<br/>(data + index pages)"]
CB["Change Buffer<br/>(secondary index modifications)"]
AHI["Adaptive Hash Index"]
LB["Log Buffer<br/>(redo log entries)"]
end
subgraph "InnoDB Disk Structures"
TS["System Tablespace<br/>(ibdata1)"]
FPT["File-Per-Table<br/>Tablespaces (.ibd)"]
RL["Redo Log Files<br/>(ib_logfile0/1)"]
UL["Undo Tablespaces<br/>(mysql.ibd or undo_001)"]
DW["Doublewrite Buffer"]
end
BP --- TS
BP --- FPT
LB --> RL
BP --> DW
CB --> TS
style BP fill:#e3f2fd,stroke:#2196f3,color:#000
style LB fill:#fce4ec,stroke:#e91e63,color:#000
style RL fill:#fce4ec,stroke:#e91e63,color:#000
Buffer Pool¶
The buffer pool is the most critical memory structure in InnoDB. It caches data pages (16 KiB each), index pages, and undo log pages in memory to reduce disk I/O.
| Parameter | Purpose |
|---|---|
innodb_buffer_pool_size |
Total buffer pool memory. Recommended 50-80% of system RAM on dedicated servers. |
innodb_buffer_pool_instances |
Number of partitioned instances (default 1 or 8 depending on size). Reduces latch contention on multi-core systems. |
innodb_old_blocks_time |
Milliseconds a page stays in the old sublist before being promoted. Protects against full table scans evicting hot data. |
The buffer pool uses a modified LRU algorithm split into a young sublist (frequently accessed) and an old sublist (recently read). Pages promoted from old to young survive longer.
Change Buffer¶
The change buffer caches modifications to secondary index pages when those pages are not currently in the buffer pool. Changes are merged later when the affected pages are read into memory during a SELECT.
- Applies to INSERT, UPDATE, and DELETE operations on non-unique secondary indexes.
- Reduces random disk I/O by amortising secondary index updates.
- Monitored via
INFORMATION_SCHEMA.INNODB_CMPandINNODB_CMP_RESET.
Deprecation Notice
The change buffer is deprecated as of MySQL 8.0 and is removed in MySQL 8.4+. For new deployments on fast storage (SSD/NVMe), the benefit is marginal.
Adaptive Hash Index (AHI)¶
InnoDB monitors index lookups and, if it detects that certain index pages are frequently accessed with equality predicates, automatically builds an in-memory hash index on top of the B-tree. This optimises point queries at the cost of additional overhead for DML operations.
- Controlled by
innodb_adaptive_hash_index(default ON). - Can cause contention under high concurrency with many writes.
- Removed in MySQL 9.0.
Doublewrite Buffer¶
The doublewrite buffer protects against torn page writes (partial 16 KiB page writes caused by a crash mid-I/O). Before writing a page to its final position on disk, InnoDB first writes it to a contiguous area in the system tablespace called the doublewrite buffer. If a torn page is detected during crash recovery, the complete copy is retrieved from the doublewrite buffer.
- Since MySQL 8.0.20, doublewrite files can be stored in a separate directory via
innodb_doublewrite_dir. - Controlled by
innodb_doublewrite(default ON). Do not disable in production.
Redo Log and Undo Log¶
Redo Log¶
The redo log records all changes made to InnoDB data pages. It enables crash recovery by replaying committed transactions that have not yet been flushed to data files.
- Circular log -- consists of a fixed number of files (historically
ib_logfile0,ib_logfile1). Old entries are overwritten once they are no longer needed for recovery. - Log buffer -- in-memory buffer where redo entries are first written. Flushed to disk based on
innodb_flush_log_at_trx_commit:1(default, safest): flush to disk on every transaction commit.2: flush to OS cache on commit, fsynced once per second.0: flushed once per second regardless of commit.
- Capacity control -- since MySQL 8.0.30,
innodb_redo_log_capacityreplaces the olderinnodb_log_file_sizeandinnodb_log_files_in_groupparameters.
Undo Log¶
Undo logs (rollback segments) store the pre-modification state of rows. They serve two purposes:
- Rollback -- restoring data to its prior state on
ROLLBACK. - MVCC -- providing a consistent read view for transactions that started before the modifying transaction committed.
Undo logs are stored in dedicated undo tablespaces. When no active transaction needs a given undo record, the purge thread reclaims the space.
Binary Log (Binlog)¶
The binary log is a server-level log (separate from the InnoDB redo log) that records all data modification events in a format suitable for replication and point-in-time recovery.
| Format | Description |
|---|---|
| STATEMENT | Logs the SQL statement. Compact but may produce different results on replicas with different data distributions. |
| ROW (default) | Logs the actual row changes. Larger but deterministic. |
| MIXED | Uses statement format by default, switches to row for non-deterministic statements. |
Binlog events include: Query, Gtid, Table_map, Write_rows, Update_rows, Delete_rows, Xid (commit marker).
Replication Architecture¶
MySQL supports multiple replication modes, each with different consistency and availability trade-offs.
graph TB
subgraph "Source (Primary)"
SRC_SQL["SQL Thread"]
BINLOG["Binary Log"]
DUMP["Binlog Dump Thread"]
end
subgraph "Replica (Secondary)"
IO["IO Thread<br/>(receives binlog events)"]
RELAY["Relay Log"]
SQL_R["SQL Thread<br/>(applies events)"]
REPL_DATA["Data Files"]
end
SRC_SQL --> BINLOG
BINLOG --> DUMP
DUMP -->|"TCP/IP"| IO
IO --> RELAY
RELAY --> SQL_R
SQL_R --> REPL_DATA
style BINLOG fill:#fff3e0,stroke:#ff9800,color:#000
style RELAY fill:#e8f5e9,stroke:#4caf50,color:#000
GTID (Global Transaction Identifiers)¶
GTIDs uniquely identify every transaction committed on the source. Each GTID has the form source_id:transaction_sequence, for example 3E11FA47-71CA-11E1-9E33-C80AA9429562:1-5. GTIDs simplify:
- Change promotion -- promoting a replica to source without manually computing binlog positions.
- Failover -- orchestration tools can determine which replica has the most complete transaction history.
- Idempotent replay -- a transaction with an already-applied GTID is skipped.
Enable with: gtid_mode=ON, enforce_gtid_consistency=ON.
Replication Topologies¶
| Topology | Consistency | Latency Impact | Notes |
|---|---|---|---|
| Asynchronous (default) | Eventual | None on source | Source does not wait for replica acknowledgement. Replica may lag. |
| Semi-synchronous | Near-strong | Source waits for 1 ACK | Source waits for at least one replica to acknowledge receipt of the transaction. Falls back to async if timeout is reached. |
| Group Replication | Strong (single-primary) | Consensus overhead | Uses Paxos-based consensus. All members must agree on transaction ordering. Supports single-primary (automatic failover) or multi-primary modes. |
Group Replication Requirements
Group Replication requires: binlog_format=ROW, gtid_mode=ON, enforce_gtid_consistency=ON, log_slave_updates=ON, and transaction_write_set_extraction=XXHASH64. Only InnoDB tables are supported.
Group Replication Flow¶
sequenceDiagram
participant C as Client
participant P as Primary (Source)
participant G as Group Communication<br/>(Paxos Consensus)
participant R1 as Replica 1
participant R2 as Replica 2
C->>P: BEGIN; INSERT ...; COMMIT
P->>P: Write to binlog (GTID assigned)
P->>G: Broadcast transaction write set
G->>R1: Deliver write set
G->>R2: Deliver write set
R1-->>G: Acknowledge (certify + apply)
R2-->>G: Acknowledge (certify + apply)
G-->>P: Majority reached
P->>P: Commit transaction
P-->>C: COMMIT OK
InnoDB Data Storage¶
Tablespaces¶
| Tablespace Type | Description |
|---|---|
System tablespace (ibdata1) |
Data dictionary, doublewrite buffer, change buffer. Configured via innodb_data_file_path. |
File-per-table (table_name.ibd) |
Each InnoDB table has its own .ibd file. Default since MySQL 5.6.6 (innodb_file_per_table=ON). |
| General tablespace | Shared tablespace created with CREATE TABLESPACE. Can hold multiple tables. |
| Undo tablespace | Dedicated undo logs. Configured via innodb_undo_tablespaces. |
| Temporary tablespace | Temporary tables and intermediate sort results (ibtmp1). |
Page and Extent Structure¶
- Page size -- default 16 KiB (configurable: 4 KiB, 8 KiB, 16 KiB, 32 KiB, 64 KiB).
- Extent -- 1 MiB contiguous block (64 pages of 16 KiB each).
- Segment -- a collection of extents. Each index has two segments: one for non-leaf pages and one for leaf pages.
Key Configuration Parameters¶
| Parameter | Default | Purpose |
|---|---|---|
innodb_buffer_pool_size |
128 MiB | Buffer pool size. Tune to 50-80% of RAM. |
innodb_flush_log_at_trx_commit |
1 | Redo log durability. Set to 1 for ACID. |
sync_binlog |
1 | Binlog fsync frequency. Set to 1 for maximum durability. |
innodb_io_capacity |
200 | IOPS hint for background tasks (compaction, flush). |
innodb_io_capacity_max |
2000 | Upper bound for adaptive flushing. |
gtid_mode |
OFF | Enable GTIDs for replication. |
binlog_format |
ROW | Binary log format. ROW recommended for Group Replication. |
Sources¶
- MySQL 8.0 Reference Manual -- InnoDB Architecture
- MySQL 8.0 Reference Manual -- Replication
- MySQL 8.0 Reference Manual -- Group Replication
- MySQL 8.0 Reference Manual -- Binary Log
How It Works¶
InnoDB storage engine, buffer pool, redo/undo logs, Group Replication, and HeatWave accelerator.
InnoDB Architecture¶
flowchart TB
subgraph Server["MySQL Server Layer"]
Parser["SQL Parser"]
Optimizer["Query Optimizer"]
Executor["Executor"]
end
subgraph InnoDB["InnoDB Storage Engine"]
BP["Buffer Pool\n(page cache)"]
CL["Change Buffer\n(secondary index updates)"]
AHI["Adaptive Hash Index"]
Redo["Redo Log\n(WAL for crash recovery)"]
Undo["Undo Log\n(MVCC rollback)"]
TS["Tablespaces\n(.ibd files)"]
end
Server --> InnoDB
BP --> TS
Redo --> TS
style BP fill:#1565c0,color:#fff
style Redo fill:#e65100,color:#fff
Group Replication¶
flowchart LR
subgraph GR["Group Replication (Paxos)"]
P["Primary\n(read-write)"]
S1["Secondary 1\n(read-only)"]
S2["Secondary 2\n(read-only)"]
end
P <-->|"Paxos consensus"| S1
P <-->|"Paxos consensus"| S2
S1 <-->|"Paxos consensus"| S2
Client_M["Client"] --> P
Client_M -.->|"read replicas"| S1
Client_M -.->|"read replicas"| S2
style P fill:#e65100,color:#fff
InnoDB Buffer Pool¶
The buffer pool is InnoDB's primary memory structure -- it caches table and index data to avoid disk reads:
- Default size: 128 MiB (production typically sets to 50-80% of available RAM)
- Page size: 16 KiB by default (configurable: 4K, 8K, 16K, 32K, 64K)
- LRU eviction: Uses a modified LRU algorithm with a midpoint insertion strategy. New pages are inserted at the "midpoint" (5/8 from the tail) rather than the head, preventing a full table scan from evicting useful data
- Flush strategy: Dirty pages are flushed in the background by the page cleaner thread. The
innodb_io_capacitysetting controls flush rate (default: 200 IOPS)
Redo Log (WAL)¶
InnoDB uses a Write-Ahead Log for crash recovery:
- Before modifying a data page, InnoDB writes the modification to the redo log
- Redo log is sequential: All writes append to a circular buffer (
ib_logfile0,ib_logfile1), avoiding random I/O - Group commit: Multiple transactions can share a single redo log write, reducing I/O operations
- Checkpoint: InnoDB advances the checkpoint position as dirty pages are flushed to disk, reclaiming redo log space
The redo log uses the innodb_flush_log_at_trx_commit setting:
| Value | Durability | Performance |
|-------|-----------|-------------|
| 1 (default) | ACID-safe: fsync per commit | Slower (1 I/O per commit) |
| 2 | fsync to OS cache per commit, flushed to disk ~1/sec | Moderate |
| 0 | Written to cache, flushed ~1/sec | Fastest (can lose ~1 sec of data) |
Undo Log and MVCC¶
MySQL implements Multi-Version Concurrency Control via undo logs:
- When a row is updated, InnoDB writes the old version to the undo log before writing the new version
- Read consistency: Each transaction sees a snapshot of data as of its start time by following the undo log chain to find the appropriate version
- Purge thread: A background thread removes undo logs when no active transaction needs them. If the purge thread falls behind, the undo tablespace grows
Sources¶
Benchmarks¶
Scope
Performance characteristics, scaling limits, and resource consumption for MySQL.
sysbench Results¶
| Hardware | Threads | Read QPS | Write QPS | Read/Write QPS |
|---|---|---|---|---|
| 4 vCPU, 16Gi | 16 | 25,000 | 5,000 | 15,000 |
| 8 vCPU, 32Gi | 32 | 60,000 | 12,000 | 35,000 |
| 16 vCPU, 64Gi | 64 | 120,000 | 25,000 | 70,000 |
InnoDB Buffer Pool Efficiency¶
| Pool Size vs Data | Read Hit Ratio | Random I/O |
|---|---|---|
| Pool > Data | > 99.9% | Minimal |
| Pool = 50% Data | 95-99% | Moderate |
| Pool = 10% Data | 60-80% | Heavy |
Group Replication¶
| Metric | 3 nodes | 5 nodes | 9 nodes |
|---|---|---|---|
| Certification latency | < 1ms | 1-2ms | 2-5ms |
| Write throughput | 90% of standalone | 80% | 70% |
| Failover time | 5-30s | 5-30s | 10-60s |
Scaling Limits¶
| Dimension | Limit | Notes |
|---|---|---|
| Table size | 1TB comfortable | Partition beyond 100M rows |
| Concurrent connections | 150,000 (max) | Use connection pooling |
| Queries/sec | 100k+ point reads | Depends on working set vs RAM |
Sourcing Status¶
Unsourced Performance Data
The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.
Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.