Architecture¶

MinIO is a high-performance, S3-compatible object storage server designed for cloud-native and on-premises deployments. It uses erasure coding as its foundational resiliency mechanism and delivers multi-petabyte scale with deterministic performance on commodity hardware.

See also: index, architecture, operations, security

Deployment Architecture¶

graph TB
    subgraph Clients
        SDK1[S3 SDK<br/>Java/Python/Go/JS]
        SDK2[MinIO mc CLI]
        SDK3[S3-compatible app]
    end

    subgraph LB[Load Balancer<br/>NGINX / HAProxy]
    end

    subgraph Pool1["Server Pool 1 (4 nodes)"]
        N1A[Node 1<br/>minio server]
        N2A[Node 2<br/>minio server]
        N3A[Node 3<br/>minio server]
        N4A[Node 4<br/>minio server]

        subgraph ES1["Erasure Set 1 (16 drives)"]
            D1A[Drive 1]
            D2A[Drive 2]
            D15A[...Drive 16]
        end
    end

    subgraph Pool2["Server Pool 2 (expansion)"]
        N1B[Node 5]
        N2B[Node 6]
        N3B[Node 7]
        N4B[Node 8]
    end

    SDK1 --> LB
    SDK2 --> LB
    SDK3 --> LB
    LB --> N1A
    LB --> N2A
    LB --> N3A
    LB --> N4A
    LB --> N1B
    LB --> N2B
    LB --> N3B
    LB --> N4B

Server Pools¶

A production MinIO deployment consists of at least 4 homogeneous nodes (matching CPU, RAM, storage, network). MinIO aggregates all nodes in the initial deployment into a single server pool.

Key properties:

Locally-attached storage: MinIO performs best with direct-attached NVMe or SSD drives. Drives should be formatted as XFS, presented in JBOD configuration with no RAID, pooling, or hardware caching.
Any node can serve any request: Every MinIO server has a complete picture of the distributed topology. The receiving node handles internode routing transparently.
Pool expansion: New pools (groups of nodes) can be added to increase capacity. Each pool has its own independent erasure sets. MinIO queries each pool to locate the correct erasure set for a given object, which means each additional pool adds some internode coordination overhead.

Erasure Coding¶

How Erasure Coding Works¶

MinIO automatically groups all drives in a pool into erasure sets -- the foundational unit of availability and resiliency. Each erasure set consists of up to 16 drives striped symmetrically across nodes.

graph LR
    OBJ["Object<br/>(binary data)"]
    subgraph EC["Erasure Encoding EC:4"]
        SH1["Data shard 1"]
        SH2["Data shard 2"]
        SH3["Data shard 3"]
        SH4["Data shard 4"]
        SH5["Data shard 5"]
        SH6["Data shard 6"]
        SH7["Data shard 7"]
        SH8["Data shard 8"]
        SH9["Parity shard 1"]
        SH10["Parity shard 2"]
        SH11["Parity shard 3"]
        SH12["Parity shard 4"]
    end
    OBJ --> SH1
    OBJ --> SH2
    OBJ --> SH3
    OBJ --> SH4
    OBJ --> SH5
    OBJ --> SH6
    OBJ --> SH7
    OBJ --> SH8
    OBJ --> SH9
    OBJ --> SH10
    OBJ --> SH11
    OBJ --> SH12

    SH1 --> DRV1[(Drive 1)]
    SH2 --> DRV2[(Drive 2)]
    SH3 --> DRV3[(Drive 3)]
    SH4 --> DRV4[(Drive 4)]
    SH5 --> DRV5[(Drive 5)]
    SH6 --> DRV6[(Drive 6)]
    SH7 --> DRV7[(Drive 7)]
    SH8 --> DRV8[(Drive 8)]
    SH9 --> DRV9[(Drive 9)]
    SH10 --> DRV10[(Drive 10)]
    SH11 --> DRV11[(Drive 11)]
    SH12 --> DRV12[(Drive 12)]

MinIO partitions each object into data shards and parity shards based on the configured parity level (EC:N). With the maximum parity of EC:8, an object is split into 8 data and 8 parity blocks across the erasure set.

Parity Levels and Fault Tolerance¶

Parity Setting	Data Shards	Parity Shards	Storage Overhead	Drives Tolerated
EC:0	16	0	0% (replication only)	0
EC:2	14	2	~14%	2
EC:4	12	4	~33%	4
EC:8	8	8	100%	8

Read and Write Quorum¶

Read quorum: MinIO needs at least data_shards intact shards (data or parity) to serve an object. With EC:4 on a 16-drive erasure set, 12 of 16 drives must be available.
Write quorum: MinIO needs at least data_shards + 1 drives available to accept a write, preventing split-brain writes to the same object.
Bitrot protection: MinIO computes HighwayHash-256 checksums on every shard, detecting silent data corruption at the drive level.

Object Healing¶

When drives fail or shards become corrupted, MinIO heals objects automatically:

Detects damaged or missing shards during read or scrub operations.
Uses remaining data and parity shards to reconstruct lost shards.
Writes healed shards to healthy drives (or replacement drives).
Healing is transparent to the client; the requesting node reconstructs the full object before returning it.

Erasure Set Selection¶

MinIO uses a deterministic hashing algorithm based on the object name and path (BUCKET/PREFIX/.../OBJECT) to select the erasure set. For any given object namespace, MinIO always selects the same erasure set, ensuring consistency. No single drive contains only data or only parity for all objects; shards are randomized across drives for even load distribution.

Identity and Access Management (IAM)¶

MinIO implements a full IAM subsystem:

Root credentials: Set via environment variables at startup. Equivalent to AWS root account.
Users: Created via mc admin user add. Authenticated by access key + secret key.
Groups: Logical groupings of users for policy attachment.
Policies: JSON policy documents (AWS IAM policy format) attached to users or groups. Support policy variables like ${aws:username} and ${jwt:preferred_username} for OIDC-integrated policies.
Built-in policies: readwrite, readonly, writeonly, diagnostics, consoleAdmin, etc.
OIDC / LDAP integration: External identity providers can be configured for federated authentication.

Security Token Service (STS)¶

MinIO supports STS for issuing temporary credentials:

Web Identity (OIDC): Exchange an OIDC token for temporary MinIO credentials.
Client Grants: Exchange a client credentials grant for temporary access.
AssumeRole: Similar to AWS STS AssumeRole, allowing a user to assume a specific policy for a duration.
LDAP STS: Bind LDAP credentials to temporary S3 access.

Temporary credentials include an access key, secret key, and session token with a configurable expiration.

Bucket Notifications¶

MinIO supports event notifications on bucket operations:

Event types: s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:ObjectAccessed:*
Targets: AMQP, Elasticsearch, Kafka, MQTT, MySQL, NATS, PostgreSQL, Redis, Webhooks.
Configuration via mc event add or the S3-compatible notification API.
Notifications are not replicated across sites in a site replication configuration.

Information Lifecycle Management (ILM)¶

ILM rules define automated object tiering and expiration:

Transition rules: Move objects between storage tiers (for example, from NVMe to HDD-based MinIO or to a remote S3 tier) after a specified number of days.
Expiration rules: Delete objects or incomplete multipart uploads after a specified period.
Noncurrent version expiration: Manage lifecycle of noncurrent object versions in versioned buckets.
ILM configurations are not replicated across sites in a site replication configuration.

Site Replication¶

MinIO supports multi-site replication for BC/DR and geo-distributed access:

graph TB
    subgraph SiteA["Site A (Primary)"]
        MA[MinIO Server]
    end
    subgraph SiteB["Site B (Peer)"]
        MB[MinIO Server]
    end
    subgraph SiteC["Site C (Peer)"]
        MC[MinIO Server]
    end

    GLB[Global Load Balancer<br/>Geo-local / failover]

    GLB --> MA
    GLB --> MB
    GLB --> MC

    MA <-->|Bidirectional replication| MB
    MB <-->|Bidirectional replication| MC
    MA <-->|Bidirectional replication| MC

Bidirectional: All sites are peers; writes to any site replicate to all others.
Replicated objects: Buckets, objects, and IAM configuration replicate automatically.
Not replicated: Bucket notifications, ILM configurations, site-level settings.
Setup: mc admin replicate add site1 site2 site3
Latency consideration: Replication lag depends on inter-site network latency. A 100 ms round trip means at least 100 ms before an object is available on all peers.
Queued replication: Transient failures are handled by queuing objects for retry.

Key Architectural Properties¶

S3 API strict compatibility: Requires AWS Signature V4 (or V2). All operations are signed, making intermediate header modification impossible.
Erasure coding by default: No separate replication layer; erasure coding provides both resiliency and storage efficiency.
Deterministic placement: Object-to-erasure-set mapping is hash-based and consistent.
No RAID, no caching: MinIO expects direct access to raw XFS drives. Hardware RAID or drive-level caching introduces unpredictable performance.
Any-to-any routing: Any node can handle any request and internally routes to the correct erasure set.
Pool-based horizontal scaling: Add capacity by deploying additional pools; existing data stays in place.

Sources¶

How It Works¶

Erasure coding internals, distributed object placement, bitrot protection, and S3-native architecture.

Architecture Overview¶

flowchart TB
    subgraph MinIO_C["MinIO Cluster"]
        subgraph ES["Erasure Set (4+4 example)"]
            D1["Drive 1\n(data)"]
            D2["Drive 2\n(data)"]
            D3["Drive 3\n(data)"]
            D4["Drive 4\n(data)"]
            P1["Drive 5\n(parity)"]
            P2["Drive 6\n(parity)"]
            P3["Drive 7\n(parity)"]
            P4["Drive 8\n(parity)"]
        end
    end

    Client_M["S3 Client"] -->|"PUT object"| MinIO_C

    Note over ES: With 4 parity drives,<br/>survives loss of any 4 drives

    style ES fill:#c62828,color:#fff

Erasure Set Organization¶

MinIO groups drives into erasure sets -- fixed-size groups of drives (default: 16 drives per set, configurable from 4 to 16). A cluster with 64 drives has 4 erasure sets of 16 drives each.

Each object is placed on exactly one erasure set using a deterministic hash of the object name
Within the erasure set, the object is split into data and parity shards using Reed-Solomon coding
The ratio of data to parity shards is configurable per-bucket via minio server --parity or the S3 API
Default parity: EC:4 for production deployments (configurable via --parity). Maximum durability parity is N/2 where N is the erasure set size

Erasure Coding Math¶

For a 16-drive erasure set with 8 data + 8 parity: - A 16MB object is split into 8 x 2MB data shards - 8 x 2MB parity shards are computed from the data shards - All 16 shards (32MB total) are written to 16 drives - Storage efficiency: 50% (16MB stored for 16MB of user data) - Durability: tolerates loss of any 8 drives simultaneously

Write Path¶

sequenceDiagram
    participant Client_MW as S3 Client
    participant MinIO_H as MinIO Node
    participant EC as Erasure Coder
    participant Drives as Local Drives (NVMe/SSD)

    Client_MW->>MinIO_H: PUT /bucket/object (multipart)
    MinIO_H->>EC: Split into data + parity shards
    EC->>Drives: Write shards across drives
    Drives-->>MinIO_H: All shards written
    MinIO_H-->>Client_MW: 200 OK (ETag)

Write Internals¶

Hash to erasure set: Object name + bucket is hashed to select the target erasure set
Bitrot hashing: A HighwayHash-256 checksum is computed for each shard. Unlike MD5/SHA256, HighwayHash is SIMD-accelerated and runs at memory bandwidth speeds
Parallel shard write: All data and parity shards are written to their respective drives simultaneously
Quorum check: Write succeeds when (N/2)+1 shards are confirmed, where N is the total shards in the erasure set
Metadata: Object metadata (size, ETag, content-type, custom headers) is stored alongside the data shards in xl.meta format

Bitrot Protection¶

MinIO uses HighwayHash for bitrot detection at the shard level. Each shard gets a checksum stored in its metadata:

Traditional bitrot (silent data corruption from disk firmware, cosmic rays, etc.) is detected on every read
If a corrupted shard is detected during read, MinIO reconstructs it from the remaining healthy shards
The reconstruction happens transparently -- the S3 client sees no error

Read Path¶

Client sends GET /bucket/object
MinIO hashes the object name to find the erasure set
Reads data shards from the fastest drives (measured by recent latency)
If any data shard is corrupted or missing, reads the corresponding parity shard and reconstructs
Returns the reassembled object to the client

⚠️ OSS Archive Notice¶

The MinIO open-source repository was archived February 13, 2026. MinIO Inc. now develops the commercial AIStor product. For open-source S3 storage, consider Ceph RGW, SeaweedFS, or Garage.

Sources¶

Benchmarks¶

Scope

Performance characteristics, scaling limits, and resource consumption for MinIO.

Object Storage Performance¶

Configuration	PUT (obj/s)	GET (obj/s)	Throughput
4 nodes, HDD	500-1,000	1,000-2,000	1-2 GB/s
4 nodes, SSD	2,000-5,000	5,000-10,000	5-10 GB/s
16 nodes, NVMe	10,000-30,000	30,000-80,000	30-80 GB/s

Erasure Coding Overhead¶

Parity	Storage Efficiency	Write Penalty	Failure Tolerance
EC:2	87.5%	+15%	2 drives
EC:4 (default)	75%	+30%	4 drives
EC:8	50%	+60%	8 drives

Scaling Limits¶

Dimension	Limit	Notes
Objects per bucket	Billions	No practical limit
Object size	5TB (single PUT)	Multipart for larger
Buckets per server	1,000+	Metadata overhead
Server pools	32	Horizontal expansion
Total capacity	Exabytes	Linear scaling

Resource Requirements¶

Nodes	CPU/Node	Memory/Node	Network
4 (minimum)	4 vCPU	8Gi	10Gbps
8 (production)	8 vCPU	16Gi	25Gbps
16 (large)	16 vCPU	32Gi	25-100Gbps

Sourcing Status¶

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.