mirror of
https://github.com/minio/docs.git
synced 2025-04-25 17:22:39 +03:00
Creates an administration/concepts.rst file. Adds content to the operation/concepts.rst file stub.
205 lines
10 KiB
ReStructuredText
205 lines
10 KiB
ReStructuredText
.. _minio-erasure-coding:
|
||
|
||
==============
|
||
Erasure Coding
|
||
==============
|
||
|
||
.. default-domain:: minio
|
||
|
||
.. contents:: Table of Contents
|
||
:local:
|
||
:depth: 2
|
||
|
||
MinIO Erasure Coding is a data redundancy and availability feature that allows MinIO deployments to automatically reconstruct objects on-the-fly despite the loss of multiple drives or nodes in the cluster.
|
||
Erasure Coding provides object-level healing with significantly less overhead than adjacent technologies such as RAID or replication.
|
||
|
||
MinIO splits each new object into data and parity blocks, where parity blocks support reconstruction of missing or corrupted data blocks.
|
||
MinIO writes these blocks to a single :ref:`erasure set <minio-ec-erasure-set>` in the deployment.
|
||
Since erasure set drives are striped across the server pool, a given node contains only a portion of data or parity blocks for each object.
|
||
MinIO can therefore tolerate the loss of multiple drives or nodes in the deployment depending on the configured parity and deployment topology.
|
||
|
||
.. image:: /images/erasure-code.jpg
|
||
:width: 600px
|
||
:alt: MinIO Erasure Coding example
|
||
:align: center
|
||
|
||
At maximum parity, MinIO can tolerate the loss of up to half the drives per erasure set (:math:`(N / 2) - 1`) and still perform read and write operations.
|
||
MinIO defaults to 4 parity blocks per object with tolerance for the loss of 4 drives per erasure set.
|
||
For more complete information on selecting erasure code parity, see :ref:`minio-ec-parity`.
|
||
|
||
Use the MinIO `Erasure Code Calculator <https://min.io/product/erasure-code-calculator?ref=docs>`__ when planning and designing your MinIO deployment to explore the effect of erasure code settings on your intended topology.
|
||
|
||
Zero-Parity Deployments
|
||
-----------------------
|
||
|
||
Starting with :minio-release:`RELEASE.2022-06-02T02-11-04Z`, MinIO supports a Single-Node Single-Drive (SNSD) topology with a zero-parity erasure coding backend.
|
||
|
||
Erasure Coding protections do not apply to the zero-parity backend of |SNSD| deployments.
|
||
Zero-parity deployments depend on the underlying storage for resiliency and availability.
|
||
|
||
.. _minio-ec-erasure-set:
|
||
|
||
Erasure Sets
|
||
------------
|
||
|
||
An *Erasure Set* is a set of drives in a MinIO deployment that support Erasure Coding.
|
||
MinIO evenly distributes object data and parity blocks among the drives in the Erasure Set.
|
||
MinIO randomly and uniformly distributes the data and parity blocks across drives in the erasure set with *no overlap*.
|
||
Each unique object has no more than one data or parity block per drive in the set.
|
||
|
||
MinIO calculates the number and size of *Erasure Sets* by dividing the total number of drives in the :ref:`Server Pool <minio-intro-server-pool>` into sets consisting of between 4 and 16 drives each.
|
||
|
||
For clusters, pools, or deployments with more than 16 drives, MinIO divides the drives into multiple erasure sets of the same number of drives.
|
||
For this reason, the total number of drives in a deployment must be divisible evenly by a number between 4 and 16.
|
||
|
||
For example, 20 drives are divided into two erasure sets of 10 drives each.
|
||
28 drives are divided into 2 erasure sets of 14 drives each.
|
||
40 drives are divided into 4 erasure sets of 10 drives each.
|
||
|
||
Because numbers such as 17, 19, or 34 cannot be evenly divided by any number between 2 and 16, you cannot have a deployment with such a number of drives.
|
||
Add or remove drives to return to an allowable number of drives.
|
||
|
||
Use the MinIO `Erasure Coding Calculator <https://min.io/product/erasure-code-calculator>`__ to determine the optimal erasure set size for your preferred MinIO topology.
|
||
|
||
.. _minio-ec-parity:
|
||
|
||
Erasure Code Parity (``EC:N``)
|
||
------------------------------
|
||
|
||
MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks based on the :ref:`Erasure Set <minio-ec-erasure-set>` size in the deployment.
|
||
For a given erasure set of size ``M``, MinIO splits objects into ``N`` parity blocks and ``M-N`` data blocks.
|
||
|
||
MinIO uses the ``EC:N`` notation to refer to the number of parity blocks (``N``) in the deployment.
|
||
MinIO defaults to ``EC:4`` or 4 parity blocks per object.
|
||
MinIO uses the same ``EC:N`` value for all erasure sets and :ref:`server pools <minio-intro-server-pool>` in the deployment.
|
||
|
||
MinIO can tolerate the loss of up to ``N`` drives per erasure set and continue performing read and write operations ("quorum").
|
||
If ``N`` is equal to exactly 1/2 the drives in the erasure set, MinIO write quorum requires :math:`N + 1` drives to avoid data inconsistency ("split-brain").
|
||
|
||
Setting the parity for a deployment is a balance between availability and total usable storage.
|
||
Higher parity values increase resiliency to drive or node failure at the cost of usable storage, while lower parity provides maximum storage with reduced tolerance for drive/node failures.
|
||
Use the MinIO `Erasure Code Calculator <https://min.io/product/erasure-code-calculator?ref=docs>`__ to explore the effect of parity on your planned cluster deployment.
|
||
|
||
The following table lists the outcome of varying erasure code parity levels on a MinIO deployment consisting of 1 node and 16 1TB drives:
|
||
|
||
.. list-table:: Outcome of Parity Settings on a 16 Drive MinIO Cluster
|
||
:header-rows: 1
|
||
:widths: 20 20 20 20 20
|
||
:width: 100%
|
||
|
||
* - Parity
|
||
- Total Storage
|
||
- Storage Ratio
|
||
- Minimum Drives for Read Operations
|
||
- Minimum Drives for Write Operations
|
||
|
||
* - ``EC: 4`` (Default)
|
||
- 12 Tebibytes
|
||
- 0.750
|
||
- 12
|
||
- 12
|
||
|
||
* - ``EC: 6``
|
||
- 10 Tebibytes
|
||
- 0.625
|
||
- 10
|
||
- 10
|
||
|
||
* - ``EC: 8``
|
||
- 8 Tebibytes
|
||
- 0.500
|
||
- 8
|
||
- 9
|
||
|
||
.. _minio-ec-storage-class:
|
||
|
||
Storage Classes
|
||
~~~~~~~~~~~~~~~
|
||
|
||
MinIO supports redundancy storage classes with Erasure Coding to allow applications to specify per-object :ref:`parity <minio-ec-parity>`.
|
||
Each storage class specifies a ``EC:N`` parity setting to apply to objects created with that class.
|
||
|
||
MinIO storage classes for erasure coding are *distinct* from Amazon Web Services :s3-docs:`storage classes <storage-class-intro.html>` used for tiering.
|
||
MinIO erasure coding storage classes define *parity settings per object*, while AWS storage classes define *storage tiers per object*.
|
||
|
||
.. note::
|
||
For transitioning objects between storage classes for tiering purposes in MinIO, refer to the documentation on :ref:`lifecycle management <minio-lifecycle-management-tiering>`.
|
||
|
||
MinIO provides the following two storage classes:
|
||
|
||
.. tab-set::
|
||
|
||
.. tab-item:: STANDARD
|
||
|
||
The ``STANDARD`` storage class is the default class for all objects.
|
||
MinIO sets the ``STANDARD`` parity based on the number of volumes in the Erasure Set:
|
||
|
||
.. list-table::
|
||
:header-rows: 1
|
||
:widths: 30 70
|
||
:width: 100%
|
||
|
||
* - Erasure Set Size
|
||
- Default Parity (EC:N)
|
||
|
||
* - 5 or Fewer
|
||
- EC:2
|
||
|
||
* - 6 - 7
|
||
- EC:3
|
||
|
||
* - 8 or more
|
||
- EC:4
|
||
|
||
You can override the default ``STANDARD`` parity using either:
|
||
|
||
- The :envvar:`MINIO_STORAGE_CLASS_STANDARD` environment variable, *or*
|
||
- The :mc:`mc admin config` command to modify the ``storage_class.standard`` configuration setting.
|
||
|
||
The maximum value is half of the total drives in the :ref:`Erasure Set <minio-ec-erasure-set>`.
|
||
The minimum value is ``2``.
|
||
|
||
``STANDARD`` parity *must* be greater than or equal to ``REDUCED_REDUNDANCY``.
|
||
If ``REDUCED_REDUNDANCY`` is unset, ``STANDARD`` parity *must* be greater than 2.
|
||
|
||
.. tab-item:: REDUCED_REDUNDANCY
|
||
|
||
The ``REDUCED_REDUNDANCY`` storage class allows creating objects with lower parity than ``STANDARD``.
|
||
``REDUCED_REDUNDANCY`` requires *at least* 5 drives in the MinIO deployment.
|
||
|
||
MinIO sets the ``REDUCED_REDUNDANCY`` parity to ``EC:2`` by default.
|
||
You can override ``REDUCED_REDUNDANCY`` storage class parity using either:
|
||
|
||
- The :envvar:`MINIO_STORAGE_CLASS_RRS` environment variable, *or*
|
||
- The :mc:`mc admin config` command to modify the ``storage_class.rrs`` configuration setting.
|
||
|
||
``REDUCED_REDUNDANCY`` parity *must* be less than or equal to ``STANDARD``.
|
||
|
||
MinIO references the ``x-amz-storage-class`` header in request metadata for determining which storage class to assign an object.
|
||
The specific syntax or method for setting headers depends on your preferred method for interfacing with the MinIO server.
|
||
|
||
- For the :mc:`mc` command line tool, certain commands include a specific option for setting the storage class.
|
||
For example, the :mc:`mc cp` command has the :mc-cmd:`~mc cp storage-class` option for specifying the storage class to assign to the object being copied.
|
||
|
||
- For MinIO SDKs, the ``S3Client`` object has specific methods for setting request headers.
|
||
For example, the ``minio-go`` SDK ``S3Client.PutObject`` method takes a ``PutObjectOptions`` data structure as a parameter.
|
||
The ``PutObjectOptions`` data structure includes the ``StorageClass`` option for specifying the storage class to assign to the object being created.
|
||
|
||
|
||
.. _minio-ec-bitrot-protection:
|
||
|
||
Bit Rot Protection
|
||
------------------
|
||
|
||
.. TODO- ReWrite w/ more detail.
|
||
|
||
Silent data corruption or bit rot is a serious problem faced by disk drives resulting in data getting corrupted without the user’s knowledge.
|
||
The corruption of data occurs when the electrical charge on a portion of the disk disperses or changes with no notification to or input from the user.
|
||
Many events can lead to such a silent corruption of stored data.
|
||
For example, ageing drives, current spikes, bugs in disk firmware, phantom writes, misdirected reads/writes, driver errors, accidental overwrites, or a random cosmic ray can each lead to a bit change.
|
||
Whatever the cause, the result is the same - compromised data.
|
||
|
||
MinIO’s optimized implementation of the :minio-git:`HighwayHash algorithm <highwayhash/blob/master/README.md>` ensures that it captures and heals corrupted objects on the fly.
|
||
Integrity is ensured from end to end by computing a hash on READ and verifying it on WRITE from the application, across the network, and to the memory or drive.
|
||
The implementation is designed for speed and can achieve hashing speeds over 10 GB/sec on a single core on Intel CPUs.
|