mirror of
				https://github.com/minio/docs.git
				synced 2025-10-31 06:30:27 +03:00 
			
		
		
		
	
		
			
				
	
	
		
			238 lines
		
	
	
		
			8.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
			
		
		
	
	
			238 lines
		
	
	
		
			8.8 KiB
		
	
	
	
		
			ReStructuredText
		
	
	
	
	
	
| .. _minio-erasure-coding:
 | ||
| 
 | ||
| ==============
 | ||
| Erasure Coding
 | ||
| ==============
 | ||
| 
 | ||
| .. default-domain:: minio
 | ||
| 
 | ||
| .. contents:: Table of Contents
 | ||
|    :local:
 | ||
|    :depth: 2
 | ||
| 
 | ||
| MinIO Erasure Coding is a data redundancy and availability feature that allows
 | ||
| MinIO deployments to automatically reconstruct objects on-the-fly despite the
 | ||
| loss of multiple drives or nodes in the cluster. Erasure Coding provides
 | ||
| object-level healing with less overhead than adjacent technologies such as
 | ||
| RAID or replication. 
 | ||
| 
 | ||
| MinIO splits each new object into data and parity blocks, where parity blocks
 | ||
| support reconstruction of missing or corrupted data blocks. MinIO writes these
 | ||
| blocks to a single :ref:`erasure set <minio-ec-erasure-set>` in the deployment.
 | ||
| Since erasure set drives are striped across the deployment, a given node 
 | ||
| typically contains only a portion of data or parity blocks for each object.
 | ||
| MinIO can therefore tolerate the loss of multiple drives or nodes in the
 | ||
| deployment depending on the configured parity and deployment topology.
 | ||
| 
 | ||
| .. image:: /images/erasure-code.jpg
 | ||
|    :width: 600px
 | ||
|    :alt: MinIO Erasure Coding example
 | ||
|    :align: center
 | ||
| 
 | ||
| At maximum parity, MinIO can tolerate the loss of up to half the drives per
 | ||
| erasure set (``N/2-1``) and still perform read and write operations. MinIO
 | ||
| defaults to 4 parity blocks per object with tolerance for the loss of 4 drives
 | ||
| per erasure set. For more complete information on selecting erasure code parity,
 | ||
| see :ref:`minio-ec-parity`.
 | ||
| 
 | ||
| Erasure coding requires a minimum of 4 drives is only available with 
 | ||
| :ref:`distributed <minio-installation-comparison>` MinIO deployments. Erasure
 | ||
| coding is a core requirement for the following MinIO features:
 | ||
| 
 | ||
| - :ref:`Object Versioning <minio-bucket-versioning>`
 | ||
| - :ref:`Server-Side Replication <minio-bucket-replication>`
 | ||
| - :ref:`Write-Once Read-Many Locking <minio-bucket-locking>`
 | ||
| 
 | ||
| Use the MinIO `Erasure Code Calculator 
 | ||
| <https://min.io/product/erasure-code-calculator?ref=docs>`__ when planning and
 | ||
| designing your MinIO deployment to explore the effect of erasure code settings
 | ||
| on your intended topology.
 | ||
| 
 | ||
| .. _minio-ec-erasure-set:
 | ||
| 
 | ||
| Erasure Sets
 | ||
| ------------
 | ||
| 
 | ||
| An *Erasure Set* is a set of drives in a MinIO deployment that support Erasure
 | ||
| Coding. MinIO evenly distributes object data and parity blocks among the drives
 | ||
| in the Erasure Set. MinIO randomly and uniformly distributes the data and parity
 | ||
| blocks across drives in the erasure set with *no overlap*. Each unique object
 | ||
| has no more than one data or parity block per drive in the set.
 | ||
| 
 | ||
| MinIO calculates the number and size of *Erasure Sets* by dividing the total
 | ||
| number of drives in the :ref:`Server Pool <minio-intro-server-pool>` into sets
 | ||
| consisting of between 4 and 16 drives each. 
 | ||
| 
 | ||
| Use the MinIO 
 | ||
| `Erasure Coding Calculator <https://min.io/product/erasure-code-calculator>`__
 | ||
| to determine the optimal erasure set size for your preferred MinIO topology.
 | ||
| 
 | ||
| .. _minio-ec-parity:
 | ||
| 
 | ||
| Erasure Code Parity (``EC:N``)
 | ||
| ------------------------------
 | ||
| 
 | ||
| MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks
 | ||
| based on the :ref:`Erasure Set <minio-ec-erasure-set>` size in the deployment.
 | ||
| For a given erasure set of size ``M``, MinIO splits objects into ``N`` parity
 | ||
| blocks and ``M-N`` data blocks. 
 | ||
| 
 | ||
| MinIO uses the ``EC:N`` notation to refer to the number of parity blocks (``N``)
 | ||
| in the deployment. MinIO defaults to ``EC:4`` or 4 parity blocks per object.
 | ||
| MinIO uses the same ``EC:N`` value for all erasure sets and
 | ||
| :ref:`server pools <minio-intro-server-pool>` in the deployment.
 | ||
| 
 | ||
| MinIO can tolerate the loss of up to ``N`` drives per erasure set and 
 | ||
| continue performing read and write operations ("quorum"). If ``N`` is equal
 | ||
| to exactly 1/2 the drives in the erasure set, MinIO write quorum requires
 | ||
| ``N+1`` drives to avoid data inconsistency ("split-brain").
 | ||
| 
 | ||
| Setting the parity for a deployment is a balance between availability
 | ||
| and total usable storage. Higher parity values increase resiliency to drive
 | ||
| or node failure at the cost of usable storage, while lower parity provides
 | ||
| maximum storage with reduced tolerance for drive/node failures. 
 | ||
| Use the MinIO `Erasure Code Calculator 
 | ||
| <https://min.io/product/erasure-code-calculator?ref=docs>`__ to explore the
 | ||
| effect of parity on your planned cluster deployment.
 | ||
| 
 | ||
| The following table lists the outcome of varying erasure code parity levels on
 | ||
| a MinIO deployment consisting of 1 node and 16 1TB drives:
 | ||
| 
 | ||
| .. list-table:: Outcome of Parity Settings on a 16 Drive MinIO Cluster
 | ||
|    :header-rows: 1
 | ||
|    :widths: 20 20 20 20 20
 | ||
|    :width: 100%
 | ||
| 
 | ||
|    * - Parity
 | ||
|      - Total Storage
 | ||
|      - Storage Ratio
 | ||
|      - Minimum Drives for Read Operations
 | ||
|      - Minimum Drives for Write Operations
 | ||
| 
 | ||
|    * - ``EC: 4`` (Default)
 | ||
|      - 12 Tebibytes
 | ||
|      - 0.750
 | ||
|      - 12
 | ||
|      - 12
 | ||
| 
 | ||
|    * - ``EC: 6``
 | ||
|      - 10 Tebibytes
 | ||
|      - 0.625
 | ||
|      - 10
 | ||
|      - 10
 | ||
| 
 | ||
|    * - ``EC: 8``
 | ||
|      - 8 Tebibytes
 | ||
|      - 0.500
 | ||
|      - 8
 | ||
|      - 9
 | ||
| 
 | ||
| .. _minio-ec-storage-class:
 | ||
| 
 | ||
| Storage Classes
 | ||
| ~~~~~~~~~~~~~~~
 | ||
| 
 | ||
| MinIO supports storage classes with Erasure Coding to allow applications to
 | ||
| specify per-object :ref:`parity <minio-ec-parity>`. Each storage class specifies
 | ||
| a ``EC:N`` parity setting to apply to objects created with that class. 
 | ||
| 
 | ||
| MinIO storage classes are *distinct* from Amazon Web Services 
 | ||
| :s3-docs:`storage classes <storage-class-intro.html>`. MinIO storage classes
 | ||
| define *parity settings per object*, while AWS storage classes define *storage
 | ||
| tiers per object*. 
 | ||
| 
 | ||
| MinIO provides the following two storage classes:
 | ||
| 
 | ||
| .. tab-set::
 | ||
| 
 | ||
|    .. tab-item:: STANDARD
 | ||
| 
 | ||
|       The ``STANDARD`` storage class is the default class for all objects.
 | ||
|       MinIO sets the ``STANDARD`` parity based on the number of volumes
 | ||
|       in the Erasure Set:
 | ||
| 
 | ||
|       .. list-table::
 | ||
|          :header-rows: 1
 | ||
|          :widths: 30 70
 | ||
|          :width: 100%
 | ||
| 
 | ||
|          * - Erasure Set Size
 | ||
|            - Default Parity (EC:N)
 | ||
| 
 | ||
|          * - 5 or Fewer 
 | ||
|            - EC:2
 | ||
| 
 | ||
|          * - 6 - 7
 | ||
|            - EC:3
 | ||
| 
 | ||
|          * - 8 or more 
 | ||
|            - EC:4
 | ||
| 
 | ||
|       You can override the default ``STANDARD`` parity using either:
 | ||
| 
 | ||
|       - The :envvar:`MINIO_STORAGE_CLASS_STANDARD` environment variable, *or*
 | ||
|       - The :mc:`mc admin config` command to modify the
 | ||
|         ``storage_class.standard`` configuration setting.
 | ||
| 
 | ||
|       The maximum value is half of the total drives in the
 | ||
|       :ref:`Erasure Set <minio-ec-erasure-set>`. The minimum value is ``2``.
 | ||
| 
 | ||
|       ``STANDARD`` parity *must* be greater than or equal to
 | ||
|       ``REDUCED_REDUNDANCY``. If ``REDUCED_REDUNDANCY`` is unset, ``STANDARD``
 | ||
|       parity *must* be greater than 2.
 | ||
| 
 | ||
|    .. tab-item:: REDUCED_REDUNDANCY
 | ||
| 
 | ||
|       The ``REDUCED_REDUNDANCY`` storage class allows creating objects with
 | ||
|       lower parity than ``STANDARD``. ``REDUCED_REDUNDANCY`` requires 
 | ||
|       *at least* 5 drives in the MinIO deployment. 
 | ||
|       
 | ||
|       MinIO sets the ``REDUCED_REDUNDANCY`` parity to ``EC:2`` by default.
 | ||
|       You can override ``REDUCED_REDUNDANCY`` storage class parity using
 | ||
|       either:
 | ||
| 
 | ||
|       - The :envvar:`MINIO_STORAGE_CLASS_RRS` environment variable, *or*
 | ||
|       - The :mc:`mc admin config` command to modify the 
 | ||
|         ``storage_class.rrs`` configuration setting.
 | ||
| 
 | ||
|       ``REDUCED_REDUNDANCY`` parity *must* be less than or equal to
 | ||
|       ``STANDARD``.
 | ||
| 
 | ||
| MinIO references the ``x-amz-storage-class`` header in request metadata for
 | ||
| determining which storage class to assign an object. The specific syntax
 | ||
| or method for setting headers depends on your preferred method for
 | ||
| interfacing with the MinIO server.
 | ||
| 
 | ||
| - For the :mc:`mc` command line tool, certain commands include a specific
 | ||
|   option for setting the storage class. For example, the :mc:`mc cp` command
 | ||
|   has the :mc-cmd:`~mc cp storage-class` option for specifying the
 | ||
|   storage class to assign to the object being copied.
 | ||
| 
 | ||
| - For MinIO SDKs, the ``S3Client`` object has specific methods for setting
 | ||
|   request headers. For example, the ``minio-go`` SDK ``S3Client.PutObject``
 | ||
|   method takes a ``PutObjectOptions`` data structure as a parameter.
 | ||
|   The ``PutObjectOptions`` data structure includes the ``StorageClass``
 | ||
|   option for specifying the storage class to assign to the object being
 | ||
|   created.
 | ||
| 
 | ||
| 
 | ||
| .. _minio-ec-bitrot-protection:
 | ||
| 
 | ||
| BitRot Protection
 | ||
| -----------------
 | ||
| 
 | ||
| .. TODO- ReWrite w/ more detail.
 | ||
| 
 | ||
| Silent data corruption or bitrot is a serious problem faced by disk drives
 | ||
| resulting in data getting corrupted without the user’s knowledge. The reasons
 | ||
| are manifold (ageing drives, current spikes, bugs in disk firmware, phantom
 | ||
| writes, misdirected reads/writes, driver errors, accidental overwrites) but the
 | ||
| result is the same - compromised data.
 | ||
| 
 | ||
| MinIO’s optimized implementation of the HighwayHash algorithm ensures that it
 | ||
| will never read corrupted data - it captures and heals corrupted objects on the
 | ||
| fly. Integrity is ensured from end to end by computing a hash on READ and
 | ||
| verifying it on WRITE from the application, across the network and to the
 | ||
| memory/drive. The implementation is designed for speed and can achieve hashing
 | ||
| speeds over 10 GB/sec on a single core on Intel CPUs.
 |