8.8 KiB
Erasure Coding
minio
Table of Contents
MinIO Erasure Coding is a data redundancy and availability feature that allows MinIO deployments to automatically reconstruct objects on-the-fly despite the loss of multiple drives or nodes in the cluster. Erasure Coding provides object-level healing with less overhead than adjacent technologies such as RAID or replication.
MinIO splits each new object into data and parity blocks, where
parity blocks support reconstruction of missing or corrupted data
blocks. MinIO writes these blocks to a single erasure set <minio-ec-erasure-set>
in the
deployment. Since erasure set drives are striped across the deployment,
a given node typically contains only a portion of data or parity blocks
for each object. MinIO can therefore tolerate the loss of multiple
drives or nodes in the deployment depending on the configured parity and
deployment topology.
At maximum parity, MinIO can tolerate the loss of up to half the
drives per erasure set (N/2-1
) and still perform read and
write operations. MinIO defaults to 4 parity blocks per object with
tolerance for the loss of 4 drives per erasure set. For more complete
information on selecting erasure code parity, see minio-ec-parity
.
Erasure coding requires a minimum of 4 drives is only available with
distributed <minio-installation-comparison>
MinIO deployments. Erasure coding is a core requirement for the
following MinIO features:
Object Versioning <minio-bucket-versioning>
Server-Side Replication <minio-bucket-replication>
Write-Once Read-Many Locking <minio-bucket-locking>
Use the MinIO Erasure Code Calculator when planning and designing your MinIO deployment to explore the effect of erasure code settings on your intended topology.
Erasure Sets
An Erasure Set is a set of drives in a MinIO deployment that support Erasure Coding. MinIO evenly distributes object data and parity blocks among the drives in the Erasure Set. MinIO randomly and uniformly distributes the data and parity blocks across drives in the erasure set with no overlap. Each unique object has no more than one data or parity block per drive in the set.
MinIO calculates the number and size of Erasure Sets by
dividing the total number of drives in the Server Pool <minio-intro-server-pool>
into sets
consisting of between 4 and 16 drives each.
Use the MinIO Erasure Coding Calculator to determine the optimal erasure set size for your preferred MinIO topology.
Erasure Code Parity (EC:N
)
MinIO uses a Reed-Solomon algorithm to split objects into data and
parity blocks based on the Erasure Set <minio-ec-erasure-set>
size in the
deployment. For a given erasure set of size M
, MinIO splits
objects into N
parity blocks and M-N
data
blocks.
MinIO uses the EC:N
notation to refer to the number of
parity blocks (N
) in the deployment. MinIO defaults to
EC:4
or 4 parity blocks per object. MinIO uses the same
EC:N
value for all erasure sets and server pools <minio-intro-server-pool>
in the
deployment.
MinIO can tolerate the loss of up to N
drives per
erasure set and continue performing read and write operations
("quorum"). If N
is equal to exactly 1/2 the drives in the
erasure set, MinIO write quorum requires N+1
drives to
avoid data inconsistency ("split-brain").
Setting the parity for a deployment is a balance between availability and total usable storage. Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures. Use the MinIO Erasure Code Calculator to explore the effect of parity on your planned cluster deployment.
The following table lists the outcome of varying erasure code parity levels on a MinIO deployment consisting of 1 node and 16 1TB drives:
Parity | Total Storage | Storage Ratio | Minimum Drives for Read Operations | Minimum Drives for Write Operations |
---|---|---|---|---|
EC: 4 (Default) |
12 Tebibytes | 0.750 | 12 | 12 |
EC: 6 |
10 Tebibytes | 0.625 | 10 | 10 |
EC: 8 |
8 Tebibytes | 0.500 | 8 | 9 |
Storage Classes
MinIO supports storage classes with Erasure Coding to allow
applications to specify per-object parity <minio-ec-parity>
. Each storage class
specifies a EC:N
parity setting to apply to objects created
with that class.
MinIO storage classes are distinct from Amazon Web Services
storage classes <storage-class-intro.html>
.
MinIO storage classes define parity settings per object, while
AWS storage classes define storage tiers per object.
MinIO provides the following two storage classes:
STANDARD
The STANDARD
storage class is the default class for all
objects. MinIO sets the STANDARD
parity based on the number
of volumes in the Erasure Set:
Erasure Set Size | Default Parity (EC:N) |
---|---|
5 or Fewer | EC:2 |
6 - 7 | EC:3 |
8 or more | EC:4 |
You can override the default STANDARD
parity using
either:
- The
MINIO_STORAGE_CLASS_STANDARD
environment variable, or - The
mc admin config
command to modify thestorage_class.standard
configuration setting.
The maximum value is half of the total drives in the Erasure Set <minio-ec-erasure-set>
. The minimum
value is 2
.
STANDARD
parity must be greater than or equal
to REDUCED_REDUNDANCY
. If REDUCED_REDUNDANCY
is unset, STANDARD
parity must be greater than
2.
REDUCED_REDUNDANCY
The REDUCED_REDUNDANCY
storage class allows creating
objects with lower parity than STANDARD
.
REDUCED_REDUNDANCY
requires at least 5 drives in
the MinIO deployment.
MinIO sets the REDUCED_REDUNDANCY
parity to
EC:2
by default. You can override
REDUCED_REDUNDANCY
storage class parity using either:
- The
MINIO_STORAGE_CLASS_RRS
environment variable, or - The
mc admin config
command to modify thestorage_class.rrs
configuration setting.
REDUCED_REDUNDANCY
parity must be less than or
equal to STANDARD
.
MinIO references the x-amz-storage-class
header in
request metadata for determining which storage class to assign an
object. The specific syntax or method for setting headers depends on
your preferred method for interfacing with the MinIO server.
- For the
mc
command line tool, certain commands include a specific option for setting the storage class. For example, themc cp
command has the~mc cp storage-class
option for specifying the storage class to assign to the object being copied. - For MinIO SDKs, the
S3Client
object has specific methods for setting request headers. For example, theminio-go
SDKS3Client.PutObject
method takes aPutObjectOptions
data structure as a parameter. ThePutObjectOptions
data structure includes theStorageClass
option for specifying the storage class to assign to the object being created.
BitRot Protection
Silent data corruption or bitrot is a serious problem faced by disk drives resulting in data getting corrupted without the user’s knowledge. The reasons are manifold (ageing drives, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, driver errors, accidental overwrites) but the result is the same - compromised data.
MinIO’s optimized implementation of the HighwayHash algorithm ensures that it will never read corrupted data - it captures and heals corrupted objects on the fly. Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network and to the memory/drive. The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.