Skip to content

Architecture

MinIO is a high-performance, S3-compatible object storage server designed for cloud-native and on-premises deployments. It uses erasure coding as its foundational resiliency mechanism and delivers multi-petabyte scale with deterministic performance on commodity hardware.

See also: index, architecture, operations, security

Deployment Architecture

graph TB
    subgraph Clients
        SDK1[S3 SDK<br/>Java/Python/Go/JS]
        SDK2[MinIO mc CLI]
        SDK3[S3-compatible app]
    end

    subgraph LB[Load Balancer<br/>NGINX / HAProxy]
    end

    subgraph Pool1["Server Pool 1 (4 nodes)"]
        N1A[Node 1<br/>minio server]
        N2A[Node 2<br/>minio server]
        N3A[Node 3<br/>minio server]
        N4A[Node 4<br/>minio server]

        subgraph ES1["Erasure Set 1 (16 drives)"]
            D1A[Drive 1]
            D2A[Drive 2]
            D15A[...Drive 16]
        end
    end

    subgraph Pool2["Server Pool 2 (expansion)"]
        N1B[Node 5]
        N2B[Node 6]
        N3B[Node 7]
        N4B[Node 8]
    end

    SDK1 --> LB
    SDK2 --> LB
    SDK3 --> LB
    LB --> N1A
    LB --> N2A
    LB --> N3A
    LB --> N4A
    LB --> N1B
    LB --> N2B
    LB --> N3B
    LB --> N4B

Server Pools

A production MinIO deployment consists of at least 4 homogeneous nodes (matching CPU, RAM, storage, network). MinIO aggregates all nodes in the initial deployment into a single server pool.

Key properties:

  • Locally-attached storage: MinIO performs best with direct-attached NVMe or SSD drives. Drives should be formatted as XFS, presented in JBOD configuration with no RAID, pooling, or hardware caching.
  • Any node can serve any request: Every MinIO server has a complete picture of the distributed topology. The receiving node handles internode routing transparently.
  • Pool expansion: New pools (groups of nodes) can be added to increase capacity. Each pool has its own independent erasure sets. MinIO queries each pool to locate the correct erasure set for a given object, which means each additional pool adds some internode coordination overhead.

Erasure Coding

How Erasure Coding Works

MinIO automatically groups all drives in a pool into erasure sets -- the foundational unit of availability and resiliency. Each erasure set consists of up to 16 drives striped symmetrically across nodes.

graph LR
    OBJ["Object<br/>(binary data)"]
    subgraph EC["Erasure Encoding EC:4"]
        SH1["Data shard 1"]
        SH2["Data shard 2"]
        SH3["Data shard 3"]
        SH4["Data shard 4"]
        SH5["Data shard 5"]
        SH6["Data shard 6"]
        SH7["Data shard 7"]
        SH8["Data shard 8"]
        SH9["Parity shard 1"]
        SH10["Parity shard 2"]
        SH11["Parity shard 3"]
        SH12["Parity shard 4"]
    end
    OBJ --> SH1
    OBJ --> SH2
    OBJ --> SH3
    OBJ --> SH4
    OBJ --> SH5
    OBJ --> SH6
    OBJ --> SH7
    OBJ --> SH8
    OBJ --> SH9
    OBJ --> SH10
    OBJ --> SH11
    OBJ --> SH12

    SH1 --> DRV1[(Drive 1)]
    SH2 --> DRV2[(Drive 2)]
    SH3 --> DRV3[(Drive 3)]
    SH4 --> DRV4[(Drive 4)]
    SH5 --> DRV5[(Drive 5)]
    SH6 --> DRV6[(Drive 6)]
    SH7 --> DRV7[(Drive 7)]
    SH8 --> DRV8[(Drive 8)]
    SH9 --> DRV9[(Drive 9)]
    SH10 --> DRV10[(Drive 10)]
    SH11 --> DRV11[(Drive 11)]
    SH12 --> DRV12[(Drive 12)]

MinIO partitions each object into data shards and parity shards based on the configured parity level (EC:N). With the maximum parity of EC:8, an object is split into 8 data and 8 parity blocks across the erasure set.

Parity Levels and Fault Tolerance

Parity Setting Data Shards Parity Shards Storage Overhead Drives Tolerated
EC:0 16 0 0% (replication only) 0
EC:2 14 2 ~14% 2
EC:4 12 4 ~33% 4
EC:8 8 8 100% 8

Read and Write Quorum

  • Read quorum: MinIO needs at least data_shards intact shards (data or parity) to serve an object. With EC:4 on a 16-drive erasure set, 12 of 16 drives must be available.
  • Write quorum: MinIO needs at least data_shards + 1 drives available to accept a write, preventing split-brain writes to the same object.
  • Bitrot protection: MinIO computes HighwayHash-256 checksums on every shard, detecting silent data corruption at the drive level.

Object Healing

When drives fail or shards become corrupted, MinIO heals objects automatically:

  1. Detects damaged or missing shards during read or scrub operations.
  2. Uses remaining data and parity shards to reconstruct lost shards.
  3. Writes healed shards to healthy drives (or replacement drives).
  4. Healing is transparent to the client; the requesting node reconstructs the full object before returning it.

Erasure Set Selection

MinIO uses a deterministic hashing algorithm based on the object name and path (BUCKET/PREFIX/.../OBJECT) to select the erasure set. For any given object namespace, MinIO always selects the same erasure set, ensuring consistency. No single drive contains only data or only parity for all objects; shards are randomized across drives for even load distribution.

Identity and Access Management (IAM)

MinIO implements a full IAM subsystem:

  • Root credentials: Set via environment variables at startup. Equivalent to AWS root account.
  • Users: Created via mc admin user add. Authenticated by access key + secret key.
  • Groups: Logical groupings of users for policy attachment.
  • Policies: JSON policy documents (AWS IAM policy format) attached to users or groups. Support policy variables like ${aws:username} and ${jwt:preferred_username} for OIDC-integrated policies.
  • Built-in policies: readwrite, readonly, writeonly, diagnostics, consoleAdmin, etc.
  • OIDC / LDAP integration: External identity providers can be configured for federated authentication.

Security Token Service (STS)

MinIO supports STS for issuing temporary credentials:

  • Web Identity (OIDC): Exchange an OIDC token for temporary MinIO credentials.
  • Client Grants: Exchange a client credentials grant for temporary access.
  • AssumeRole: Similar to AWS STS AssumeRole, allowing a user to assume a specific policy for a duration.
  • LDAP STS: Bind LDAP credentials to temporary S3 access.

Temporary credentials include an access key, secret key, and session token with a configurable expiration.

Bucket Notifications

MinIO supports event notifications on bucket operations:

  • Event types: s3:ObjectCreated:*, s3:ObjectRemoved:*, s3:ObjectAccessed:*
  • Targets: AMQP, Elasticsearch, Kafka, MQTT, MySQL, NATS, PostgreSQL, Redis, Webhooks.
  • Configuration via mc event add or the S3-compatible notification API.
  • Notifications are not replicated across sites in a site replication configuration.

Information Lifecycle Management (ILM)

ILM rules define automated object tiering and expiration:

  • Transition rules: Move objects between storage tiers (for example, from NVMe to HDD-based MinIO or to a remote S3 tier) after a specified number of days.
  • Expiration rules: Delete objects or incomplete multipart uploads after a specified period.
  • Noncurrent version expiration: Manage lifecycle of noncurrent object versions in versioned buckets.
  • ILM configurations are not replicated across sites in a site replication configuration.

Site Replication

MinIO supports multi-site replication for BC/DR and geo-distributed access:

graph TB
    subgraph SiteA["Site A (Primary)"]
        MA[MinIO Server]
    end
    subgraph SiteB["Site B (Peer)"]
        MB[MinIO Server]
    end
    subgraph SiteC["Site C (Peer)"]
        MC[MinIO Server]
    end

    GLB[Global Load Balancer<br/>Geo-local / failover]

    GLB --> MA
    GLB --> MB
    GLB --> MC

    MA <-->|Bidirectional replication| MB
    MB <-->|Bidirectional replication| MC
    MA <-->|Bidirectional replication| MC
  • Bidirectional: All sites are peers; writes to any site replicate to all others.
  • Replicated objects: Buckets, objects, and IAM configuration replicate automatically.
  • Not replicated: Bucket notifications, ILM configurations, site-level settings.
  • Setup: mc admin replicate add site1 site2 site3
  • Latency consideration: Replication lag depends on inter-site network latency. A 100 ms round trip means at least 100 ms before an object is available on all peers.
  • Queued replication: Transient failures are handled by queuing objects for retry.

Key Architectural Properties

  • S3 API strict compatibility: Requires AWS Signature V4 (or V2). All operations are signed, making intermediate header modification impossible.
  • Erasure coding by default: No separate replication layer; erasure coding provides both resiliency and storage efficiency.
  • Deterministic placement: Object-to-erasure-set mapping is hash-based and consistent.
  • No RAID, no caching: MinIO expects direct access to raw XFS drives. Hardware RAID or drive-level caching introduces unpredictable performance.
  • Any-to-any routing: Any node can handle any request and internally routes to the correct erasure set.
  • Pool-based horizontal scaling: Add capacity by deploying additional pools; existing data stays in place.

Sources


How It Works

Erasure coding internals, distributed object placement, bitrot protection, and S3-native architecture.

Architecture Overview

flowchart TB
    subgraph MinIO_C["MinIO Cluster"]
        subgraph ES["Erasure Set (4+4 example)"]
            D1["Drive 1\n(data)"]
            D2["Drive 2\n(data)"]
            D3["Drive 3\n(data)"]
            D4["Drive 4\n(data)"]
            P1["Drive 5\n(parity)"]
            P2["Drive 6\n(parity)"]
            P3["Drive 7\n(parity)"]
            P4["Drive 8\n(parity)"]
        end
    end

    Client_M["S3 Client"] -->|"PUT object"| MinIO_C

    Note over ES: With 4 parity drives,<br/>survives loss of any 4 drives

    style ES fill:#c62828,color:#fff

Erasure Set Organization

MinIO groups drives into erasure sets -- fixed-size groups of drives (default: 16 drives per set, configurable from 4 to 16). A cluster with 64 drives has 4 erasure sets of 16 drives each.

  • Each object is placed on exactly one erasure set using a deterministic hash of the object name
  • Within the erasure set, the object is split into data and parity shards using Reed-Solomon coding
  • The ratio of data to parity shards is configurable per-bucket via minio server --parity or the S3 API
  • Default parity: EC:4 for production deployments (configurable via --parity). Maximum durability parity is N/2 where N is the erasure set size

Erasure Coding Math

For a 16-drive erasure set with 8 data + 8 parity: - A 16MB object is split into 8 x 2MB data shards - 8 x 2MB parity shards are computed from the data shards - All 16 shards (32MB total) are written to 16 drives - Storage efficiency: 50% (16MB stored for 16MB of user data) - Durability: tolerates loss of any 8 drives simultaneously

Write Path

sequenceDiagram
    participant Client_MW as S3 Client
    participant MinIO_H as MinIO Node
    participant EC as Erasure Coder
    participant Drives as Local Drives (NVMe/SSD)

    Client_MW->>MinIO_H: PUT /bucket/object (multipart)
    MinIO_H->>EC: Split into data + parity shards
    EC->>Drives: Write shards across drives
    Drives-->>MinIO_H: All shards written
    MinIO_H-->>Client_MW: 200 OK (ETag)

Write Internals

  1. Hash to erasure set: Object name + bucket is hashed to select the target erasure set
  2. Bitrot hashing: A HighwayHash-256 checksum is computed for each shard. Unlike MD5/SHA256, HighwayHash is SIMD-accelerated and runs at memory bandwidth speeds
  3. Parallel shard write: All data and parity shards are written to their respective drives simultaneously
  4. Quorum check: Write succeeds when (N/2)+1 shards are confirmed, where N is the total shards in the erasure set
  5. Metadata: Object metadata (size, ETag, content-type, custom headers) is stored alongside the data shards in xl.meta format

Bitrot Protection

MinIO uses HighwayHash for bitrot detection at the shard level. Each shard gets a checksum stored in its metadata:

  • Traditional bitrot (silent data corruption from disk firmware, cosmic rays, etc.) is detected on every read
  • If a corrupted shard is detected during read, MinIO reconstructs it from the remaining healthy shards
  • The reconstruction happens transparently -- the S3 client sees no error

Read Path

  1. Client sends GET /bucket/object
  2. MinIO hashes the object name to find the erasure set
  3. Reads data shards from the fastest drives (measured by recent latency)
  4. If any data shard is corrupted or missing, reads the corresponding parity shard and reconstructs
  5. Returns the reassembled object to the client

⚠️ OSS Archive Notice

The MinIO open-source repository was archived February 13, 2026. MinIO Inc. now develops the commercial AIStor product. For open-source S3 storage, consider Ceph RGW, SeaweedFS, or Garage.

Sources


Benchmarks

Scope

Performance characteristics, scaling limits, and resource consumption for MinIO.

Object Storage Performance

Configuration PUT (obj/s) GET (obj/s) Throughput
4 nodes, HDD 500-1,000 1,000-2,000 1-2 GB/s
4 nodes, SSD 2,000-5,000 5,000-10,000 5-10 GB/s
16 nodes, NVMe 10,000-30,000 30,000-80,000 30-80 GB/s

Erasure Coding Overhead

Parity Storage Efficiency Write Penalty Failure Tolerance
EC:2 87.5% +15% 2 drives
EC:4 (default) 75% +30% 4 drives
EC:8 50% +60% 8 drives

Scaling Limits

Dimension Limit Notes
Objects per bucket Billions No practical limit
Object size 5TB (single PUT) Multipart for larger
Buckets per server 1,000+ Metadata overhead
Server pools 32 Horizontal expansion
Total capacity Exabytes Linear scaling

Resource Requirements

Nodes CPU/Node Memory/Node Network
4 (minimum) 4 vCPU 8Gi 10Gbps
8 (production) 8 vCPU 16Gi 25Gbps
16 (large) 16 vCPU 32Gi 25-100Gbps

Sourcing Status

Unsourced Performance Data

The performance numbers in this document are estimated from vendor documentation, community benchmarks, and engineering judgment. They do not represent controlled benchmarks with documented test conditions. Specific hardware configurations, software versions, and test methodologies were not recorded.

Use these figures as rough guidance only. For production capacity planning, run your own benchmarks against your specific workload and infrastructure.

Sources