From 0a47beb2528a4b6d8c4f9f2fa67aeedae8d94f39 Mon Sep 17 00:00:00 2001 From: ravindk89 Date: Thu, 12 Nov 2020 20:35:00 -0500 Subject: [PATCH] Erasure Coding First Pass Eco fixes more fixes --- source/introduction/bitrot-protection.rst | 18 -- source/introduction/deployment-topologies.rst | 67 ----- source/introduction/erasure-coding.rst | 19 -- source/introduction/minio-overview.rst | 79 ++++- source/minio-features/bucket-versioning.rst | 2 +- source/minio-features/erasure-coding.rst | 280 ++++++++++++++++++ source/minio-features/overview.rst | 1 + source/minio-server/minio-server.rst | 71 ++++- 8 files changed, 409 insertions(+), 128 deletions(-) delete mode 100644 source/introduction/bitrot-protection.rst delete mode 100644 source/introduction/deployment-topologies.rst delete mode 100644 source/introduction/erasure-coding.rst create mode 100644 source/minio-features/erasure-coding.rst diff --git a/source/introduction/bitrot-protection.rst b/source/introduction/bitrot-protection.rst deleted file mode 100644 index 2e3fc74b..00000000 --- a/source/introduction/bitrot-protection.rst +++ /dev/null @@ -1,18 +0,0 @@ -.. _minio-bitrot-protection: - -================= -Bitrot Protection -================= - -Silent data corruption or bitrot is a serious problem faced by disk drives -resulting in data getting corrupted without the user’s knowledge. The reasons -are manifold (ageing drives, current spikes, bugs in disk firmware, phantom -writes, misdirected reads/writes, driver errors, accidental overwrites) but the -result is the same - compromised data. - -MinIO’s optimized implementation of the HighwayHash algorithm ensures that it -will never read corrupted data - it captures and heals corrupted objects on the -fly. Integrity is ensured from end to end by computing a hash on READ and -verifying it on WRITE from the application, across the network and to the -memory/drive. The implementation is designed for speed and can achieve hashing -speeds over 10 GB/sec on a single core on Intel CPUs. \ No newline at end of file diff --git a/source/introduction/deployment-topologies.rst b/source/introduction/deployment-topologies.rst deleted file mode 100644 index f4d25afd..00000000 --- a/source/introduction/deployment-topologies.rst +++ /dev/null @@ -1,67 +0,0 @@ -===================== -Deployment Topologies -===================== - -.. default-domain:: minio - -MinIO supports three deployment topologies: - -.. list-table:: - :widths: 30 70 - :header-rows: 1 - - * - Deployment Type - - Description - - * - :ref:`Standalone ` - - A single MinIO server. - - Standalone deployments are ideal for local development and evaluation. - - * - :ref:`Distributed ` - - Multiple MinIO servers allow for horizontal scaling of storage while - allowing applications to treat the deployment as a single MinIO - instance. - - Distributed deployments are ideal for production environments. - - * - :ref:`Active-Active ` - - Multiple distributed deployments with intra-deployment - replication to synchronize :ref:`objects ` across - deployments. - - Active-Active Distributed deployments are ideal for production - environments with globally distributed applications, where applications - prefer routing to the geographically-nearest MinIO instance. - -.. _minio-deployment-standalone: - -Standalone Deployment ---------------------- - -TBD: -- Add a diagram of a standalone deployment -- List the drawbacks (if any) -- Link to deployment tutorials (kubernetes, bare-metal) - -.. _minio-deployment-distributed: -.. _minio-zones: - -Distributed Deployment ----------------------- - -TBD: -- Add a diagram of a distributed deployment -- List the drawbacks (if any) -- Link to deployment tutorials (kubernetes, bare-metal) -- Discuss horizontal expansion / zones - -.. _minio-deployment-active-active: - -Active-Active -------------- - -TBD: -- Add a diagram of a distributed deployment -- List the drawbacks (if any) -- Link to deployment tutorials (kubernetes, bare-metal) \ No newline at end of file diff --git a/source/introduction/erasure-coding.rst b/source/introduction/erasure-coding.rst deleted file mode 100644 index 67cdfe7b..00000000 --- a/source/introduction/erasure-coding.rst +++ /dev/null @@ -1,19 +0,0 @@ -.. _minio-erasure-coding: - -============== -Erasure Coding -============== - -.. default-domain:: minio - -MinIO protects data with per-object, inline erasure coding, which is written in -assembly code to deliver the highest performance possible. MinIO uses -Reed-Solomon code to stripe objects into `n/2` data and ``n/2`` parity blocks - -although these can be configured to any desired redundancy level. - -This means that in a 12 drive setup, an object is sharded across as 6 data and 6 -parity blocks. Even if you lose as many as 5 ((n/2)–1) drives, be it parity or -data, you can still reconstruct the data reliably from the remaining drives. -MinIO's implementation ensures that objects can be read or new objects are -written even if multiple devices are lost or unavailable. Finally, MinIO's -erasure code is at the object level and can heal one object at a time. \ No newline at end of file diff --git a/source/introduction/minio-overview.rst b/source/introduction/minio-overview.rst index fe6867c6..148e25bc 100644 --- a/source/introduction/minio-overview.rst +++ b/source/introduction/minio-overview.rst @@ -46,6 +46,78 @@ following: 2020-01-02-blog-comments.json 2020-01-02-blog-comments.json +Deployment Architecture +----------------------- + +The following diagram describes the individual components in a MinIO +deployment: + + ServerSet -> Cluster > + +:ref:`Erasure Set ` + A set of disks that supports MinIO :ref:`Erasure Coding + `. Erasure Coding provides high availability, + reliability, and redundancy of data stored on a MinIO deployment. + + MinIO divides objects into chunks and evenly distributes them among each + drive in the Erasure Set. MinIO can continue seamlessly serving read and + write requests despite the loss of any single drive. At the highest + redundancy levels, MinIO can serve read requests with minimal performance + impact despite the loss of up to half (``N/2``) of the total drives in the + deployment. + +.. _minio-intro-server-set: + +:ref:`Server Set ` + A set of MinIO :mc-cmd:`minio server` nodes which pool their drives and + resources for supporting object storage/retrieval requests. The + :mc-cmd:`~minio server HOSTNAME` argument passed to the + :mc-cmd:`minio server` command represents a Server Set: + + .. code-block:: shell + + minio server https://minio{1...4}.example.net/mnt/disk{1...4} + + | Server Set | + + The above example describes a single Server Set with + 4 :mc:`minio server` nodes and 4 drives each for a total of 16 drives. + MinIO requires starting each :mc:`minio server` in the set with the same + startup command to enable awareness of all set peers. + + See :mc-cmd:`minio server` for complete syntax and usage. + + MinIO calculates the size and number of Erasure Sets in the Server Set based + on the total number of drives in the set *and* the number of :mc:`minio` + servers in the set. See :ref:`minio-ec-erasure-set` for more information. + +.. _minio-intro-cluster: + +:ref:`Cluster ` + The whole MinIO deployment consisting of one or more Server Sets. Each + :mc-cmd:`~minio server HOSTNAME` argument passed to the + :mc-cmd:`minio server` command represents one Server Set: + + .. code-block:: shell + + minio server https://minio{1...4}.example.net/mnt/disk{1...4} \ + https://minio{5...8}.example.net/mnt/disk{1...4} + + | Server Set | + + The above example describes two Server Sets, each consisting of 4 + :mc:`minio server` nodes with 4 drives each for a total of 32 drives. + + Server Set expansion is a function of Horizontal Scaling, where each new set + expands the cluster storage and compute resources. Server Set expansion + is not intended to support migrating existing sets to newer hardware. + + MinIO Standalone clusters consist of a single Server Set with a single + :mc:`minio server` node. Standalone clusters are best suited for initial + development and evaluation. MinIO strongly recommends production + clusters consist of a *minimum* of 4 :mc:`minio server` nodes in a + Server Set. + Deploying MinIO --------------- @@ -57,10 +129,3 @@ or containerized environments, install and run the :mc:`minio server` on each host in the MinIO deployment. See :ref:`minio-baremetal` for more information. -.. toctree:: - :hidden: - :titlesonly: - - /introduction/deployment-topologies.rst - /introduction/erasure-coding.rst - /introduction/bitrot-protection.rst \ No newline at end of file diff --git a/source/minio-features/bucket-versioning.rst b/source/minio-features/bucket-versioning.rst index e4a03b5a..aeb2992f 100644 --- a/source/minio-features/bucket-versioning.rst +++ b/source/minio-features/bucket-versioning.rst @@ -38,7 +38,7 @@ Enable Bucket Versioning Enabling bucket versioning on a MinIO deployment requires that the deployment have *at least* four disks. Specifically, Bucket Versioning depends on -:doc:`Erasure Coding `. For MinIO deployments that +:ref:`Erasure Coding `. For MinIO deployments that meet the disk requirements, use the :mc-cmd:`mc version enable` command to enable versioning on a specific bucket. diff --git a/source/minio-features/erasure-coding.rst b/source/minio-features/erasure-coding.rst new file mode 100644 index 00000000..09047bdd --- /dev/null +++ b/source/minio-features/erasure-coding.rst @@ -0,0 +1,280 @@ +.. _minio-erasure-coding: + +============== +Erasure Coding +============== + +.. default-domain:: minio + +.. contents:: Table of Contents + :local: + :depth: 2 + +MinIO Erasure Coding is a data redundancy and availability feature that allows +MinIO deployments to automatically reconstruct objects on-the-fly despite the +loss of multiple drives or nodes in the cluster.Erasure Coding provides +object-level healing with less overhead than adjacent technologies such as +RAID or replication. + +Erasure Coding splits objects into data and parity blocks, where parity blocks +support reconstruction of missing or corrupted data blocks. MinIO distributes +both data and parity blocks across :mc:`minio server` nodes and drives in an +:ref:`Erasure Set `. Depending on the configured parity, +number of nodes, and number of drives per node in the Erasure Set, MinIO can +tolerate the loss of up to half (``N/2``) of drives and still retrieve stored +objects. + +For example, consider the following small-scale MinIO deployment consisting of a +single :ref:`Server Set ` with 4 :mc:`minio server` +nodes. Each node in the deployment has 4 locally attached ``1Ti`` drives for +a total of 16 drives: + + + +MinIO creates :ref:`Erasure Sets ` by dividing the total +number of drives in the deployment into sets consisting of between 4 and 16 +drives each. In the example deployment, the largest possible Erasure Set size +that evenly divides into the total number of drives is ``16``: + + + +MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks +based on the size of the Erasure Set. MinIO then uniformly distributes the +data and parity blocks across the Erasure Set drives such that each drive +in the set contains no more than one block per object. MinIO uses +the ``EC:N`` notation to refer to the number of parity blocks (``N``) in the +Erasure Set. + + + +The number of parity blocks in a deployment controls the deployment's relative +data redundancy. Higher levels of parity allow for higher tolerance of drive +loss at the cost of total available storage. For example, using EC:4 in our +example deployment results in 12 data blocks and 4 parity blocks. The parity +blocks take up some portion of space in the deployment, reducing total storage. +*However*, the parity blocks allow MinIO to reconstruct the object with only +8 data blocks, increasing resilience to data corruption or loss. + +The following table lists the outcome of varying EC levels on the example +deployment: + +.. list-table:: Outcome of Parity Settings on a 16 Drive MinIO Cluster + :header-rows: 1 + :widths: 20 20 20 20 20 + :width: 100% + + * - Parity + - Total Storage + - Storage Ratio + - Minimum Drives for Read Operations + - Minimum Drives for Write Operations + + * - ``EC: 4`` (Default) + - 12 Tebibytes + - 0.750 + - 12 + - 13 + + * - ``EC: 6`` + - 10 Tebibytes + - 0.625 + - 10 + - 16 + + * - ``EC: 8`` + - 8 Tebibytes + - 0.500 + - 8 + - 9 + +- For more information on Erasure Sets, see :ref:`minio-ec-erasure-set`. + +- For more information on selecting Erasure Code Parity, see + :ref:`minio-ec-parity` + +- For more information on Erasure Code Object Healing, see + :ref:`minio-ec-object-healing`. + +.. _minio-ec-erasure-set: + +Erasure Sets +------------ + +An *Erasure Set* is a set of drives in a MinIO deployment that support +Erasure Coding. MinIO evenly distributes object data and parity blocks among +the drives in the Erasure Set. + +MinIO calculates the number and size of *Erasure Sets* by dividing the total +number of drives in the :ref:`Server Set ` into sets +consisting of between 4 and 16 drives each. MinIO considers two factors when +selecting the Erasure Set size: + +- The Greatest Common Divisor (GCD) of the total drives. + +- The number of :mc:`minio server` nodes in the Server Set. + +For an even number of nodes, MinIO uses the GCD to calculate the Erasure Set +size and ensure the minimum number of Erasure Sets possible. For an odd number +of nodes, MinIO selects a common denominator that results in an odd number of +Erasure Sets to facilitate more uniform distribution of erasure set drives +among nodes in the Server Set. + +For example, consider a Server Set consisting of 4 nodes with 8 drives each +for a total of 32 drives. The GCD of 16 produces 2 Erasure Sets of 16 drives +each with uniform distribution of erasure set drives across all 4 nodes. + +Now consider a Server Set consisting of 5 nodes with 8 drives each for a total +of 40 drives. Using the GCD, MinIO would create 4 erasure sets with 10 drives +each. However, this distribution would result in uneven distribution with +one node contributing more drives to the Erasure Sets than the others. +MinIO instead creates 5 erasure sets with 8 drives each to ensure uniform +distribution of Erasure Set drives per Nodes. + +MinIO generally recommends maintaining an even number of nodes in a Server Set +to facilitate simplified human calculation of the number and size of +Erasure Sets in the Server Set. + +.. _minio-ec-parity: + +Erasure Code Parity (``EC:N``) +------------------------------ + +MinIO uses a Reed-Solomon algorithm to split objects into data and parity blocks +based on the size of the Erasure Set. MinIO uses parity blocks to automatically +heal damaged or missing data blocks when reconstructing an object. MinIO uses +the ``EC:N`` notation to refer to the number of parity blocks (``N``) in the +Erasure Set. + +MinIO uses a hash of an object's name to determine into which Erasure Set to +store that object. MinIO always uses that erasure set for objects with a +matching name. For example, MinIO stores all :ref:`versions +` of an object in the same Erasure Set. + +After MinIO selects an object's Erasure Set, it divides the object based on the +number of drives in the set and the configured parity. MinIO creates: + +- ``(Erasure Set Drives) - EC:N`` Data Blocks, *and* +- ``EC:N`` Parity Blocks. + +MinIO randomly and uniformly distributes the data and parity blocks across +drives in the erasure set with *no overlap*. While a drive may contain both data +and parity blocks for multiple unique objects, a single unique object has no +more than one block per drive in the set. For versioned objects, MinIO selects +the same drives for both data and parity storage while maintaining zero overlap +on any single drive. + +The specified parity for an object also dictates the minimum number of Erasure +Set drives ("Quorum") required for MinIO to either read or write that object: + +Read Quorum + The minimum number of Erasure Set drives required for MinIO to + serve read operations. MinIO can automatically reconstruct an object + with corrupted or missing data blocks if enough drives are online to + provide Read Quorum for that object. + + MinIO Read Quorum is ``DRIVES - (EC:N)``. + +Write Quorum + The minimum number of Erasure Set drives required for MinIO + to serve write operations. MinIO requires enough available drives to + eliminate the risk of split-brain scenarios. + + MinIO Write Quorum is ``DRIVES - (EC:N-1)``. + +Storage Classes +~~~~~~~~~~~~~~~ + +MinIO supports storage classes with Erasure Coding to allow applications to +specify per-object :ref:`parity `. Each storage class specifies +a ``EC:N`` parity setting to apply to objects created with that class. + +MinIO storage classes are *distinct* from Amazon Web Services :s3-docs:`storage +classes `. MinIO storage classes define +*parity settings per object*, while AWS storage classes define +*storage tiers per object*. + +MinIO provides the following two storage classes: + +``STANDARD`` + The ``STANDARD`` storage class is the default class for all objects. + + You can configure the ``STANDARD`` storage class parity using either: + + - The :envvar:`MINIO_STORAGE_CLASS_STANDARD` environment variable, *or* + - The :mc:`mc admin config` command to modify the ``storage_class.standard`` + configuration setting. + + Starting with , MinIO defaults ``STANDARD`` storage class to + ``EC:4``. + + The maximum value is half of the total drives in the + :ref:`Erasure Set `. + + ``STANDARD`` parity *must* be greater than or equal to + ``REDUCED_REDUNDANCY``. If ``REDUCED_REDUNDANCY`` is unset, ``STANDARD`` + parity *must* be greater than 2 + +``REDUCED_REDUNDANCY`` + The ``REDUCED_REDUNDANCY`` storage class allows creating objects with + lower parity than ``STANDARD``. + + You can configure the ``REDUCED_REDUNDANCY`` storage class parity using + either: + + - The :envvar:`MINIO_STORAGE_CLASS_REDUCED` environment variable, *or* + - The :mc:`mc admin config` command to modify the + ``storage_class.rrs`` configuration setting. + + The default value is ``EC:2``. + + ``REDUCED_REDUNDANCY`` parity *must* be less than or equal to ``STANDARD``. + If ``STANDARD`` is unset, ``REDUCED_REDUNDANCY`` must be less than half of + the total drives in the :ref:`Erasure Set `. + + ``REDUCED_REDUNDANCY`` is not supported for MinIO deployments with + 4 or fewer drives. + +MinIO references the ``x-amz-storage-class`` header in request metadata for +determining which storage class to assign an object. The specific syntax +or method for setting headers depends on your preferred method for +interfacing with the MinIO server. + +- For the :mc:`mc` command line tool, certain commands include a specific + option for setting the storage class. For example, the :mc:`mc cp` command + has the :mc-cmd-option:`~mc cp storage-class` option for specifying the + storage class to assign to the object being copied. + +- For MinIO SDKs, the ``S3Client`` object has specific methods for setting + request headers. For example, the ``minio-go`` SDK ``S3Client.PutObject`` + method takes a ``PutObjectOptions`` data structure as a parameter. + The ``PutObjectOptions`` data structure includes the ``StorageClass`` + option for specifying the storage class to assign to the object being + created. + + +.. _minio-ec-object-healing: + +Object Healing +-------------- + +TODO + +.. _minio-ec-bitrot-protection: + +BitRot Protection +----------------- + +TODO- ReWrite w/ more detail. + +Silent data corruption or bitrot is a serious problem faced by disk drives +resulting in data getting corrupted without the user’s knowledge. The reasons +are manifold (ageing drives, current spikes, bugs in disk firmware, phantom +writes, misdirected reads/writes, driver errors, accidental overwrites) but the +result is the same - compromised data. + +MinIO’s optimized implementation of the HighwayHash algorithm ensures that it +will never read corrupted data - it captures and heals corrupted objects on the +fly. Integrity is ensured from end to end by computing a hash on READ and +verifying it on WRITE from the application, across the network and to the +memory/drive. The implementation is designed for speed and can achieve hashing +speeds over 10 GB/sec on a single core on Intel CPUs. \ No newline at end of file diff --git a/source/minio-features/overview.rst b/source/minio-features/overview.rst index 813fbb29..ca2c6def 100644 --- a/source/minio-features/overview.rst +++ b/source/minio-features/overview.rst @@ -33,3 +33,4 @@ The following table lists MinIO features and their corresponding documentation: /minio-features/bucket-notifications /minio-features/bucket-versioning + /minio-features/erasure-coding \ No newline at end of file diff --git a/source/minio-server/minio-server.rst b/source/minio-server/minio-server.rst index ba41f971..030fe7cb 100644 --- a/source/minio-server/minio-server.rst +++ b/source/minio-server/minio-server.rst @@ -58,36 +58,40 @@ The command accepts the following arguments: The hostname of a :mc:`minio server` process. For standalone deployments, this field is *optional*. You can start a - standalone :mc:`minio ` process with only the + standalone :mc:`~minio server` process with only the :mc-cmd:`~minio server DIRECTORIES` argument. - For distributed deployments, specify the hostname of each - :mc:`minio ` in the deployment. + For distributed deployments, specify the hostname of each :mc:`minio server` + in the deployment. The group of :mc:`minio server` processes represent a + single :ref:`Server Set `. :mc-cmd:`~minio server HOSTNAME` supports MinIO expansion notation - ``{x...y}`` to denote a sequential series of hostnames. For example, + ``{x...y}`` to denote a sequential series of hostnames. MinIO *requires* + sequential hostnames to identify each :mc:`minio server` process in the set. + + For example, ``https://minio{1...4}.example.net`` expands to: - ``https://minio1.example.net`` - ``https://minio2.example.net`` - ``https://minio3.example.net`` - ``https://minio4.example.net`` + + You must run the :mc:`minio server` command with the *same* combination of + :mc-cmd:`~minio server HOSTNAME` and :mc-cmd:`~minio server DIRECTORIES` on + each host in the Server Set. - The set of :mc:`minio server` processes in :mc-cmd:`~minio server HOSTNAME` - define a single :ref:`zone `. MinIO *requires* sequential - hostnames to identify each :mc:`minio server` process in the zone. - - Each additional ``HOSTNAME/DIRECTORIES`` pair denotes an additional zone for - the purpose of horizontal expansion of the MinIO deployment. For more - information on zones, see :ref:`minio-zones`. + Each additional ``HOSTNAME/DIRECTORIES`` pair denotes an additional Server + Set for the purpose of horizontal expansion of the MinIO deployment. For more + information on Server Sets, see :ref:`Server Set `. .. mc-cmd:: DIRECTORIES - The directories or disks the :mc:`minio server` process uses as the + The directories or drives the :mc:`minio server` process uses as the storage backend. :mc-cmd:`~minio server DIRECTORIES` supports MinIO expansion notation - ``{x...y}`` to denote a sequential series of folders or disks. For example, + ``{x...y}`` to denote a sequential series of folders or drives. For example, ``/mnt/disk{1...4}`` expands to: - ``/mnt/disk1`` @@ -98,14 +102,16 @@ The command accepts the following arguments: The :mc-cmd:`~minio server DIRECTORIES` path(s) *must* be empty when first starting the :mc:`minio ` process. - The :mc:`minio server` process requires *at least* 4 disks or directories + The :mc:`minio server` process requires *at least* 4 drives or directories to enable :ref:`erasure coding `. .. important:: - MinIO recommends locally-attached disks, where the + MinIO recommends locally-attached drives, where the :mc-cmd:`~minio server DIRECTORIES` path points to each disk on the - host machine. + host machine. MinIO recommends *against* using network-attached + storage, as network latency reduces performance of those drives + compared to locally-attached storage. For development or evaluation, you can specify multiple logical directories or partitions on a single physical volume to enable erasure @@ -321,3 +327,36 @@ Root Credentials the server configuration with the new credentials. After the process restarts successfully, you can restart it without :envvar:`MINIO_SECRET_KEY_OLD`. + +Storage Class +~~~~~~~~~~~~~ + +These environment variables configure the :ref:`parity ` +to use for objects written to the MinIO cluster. + +MinIO Storage Classes are distinct from AWS Storage Classes, where the latter +refers to the specific storage tier on which to store a given object. + +.. envvar:: MINIO_STORAGE_CLASS_STANDARD + + The number of :ref:`parity blocks ` to create for + objects with the standard (default) storage class. MinIO uses the + ``EC:N`` notation to refer to the number of parity blocks (``N``). + This environment variable only applies to deployments with + :ref:`Erasure Coding ` enabled. + + Defaults to ``4``. + +.. envvar:: MINIO_STORAGE_CLASS_REDUCED + + The number of :ref:`parity blocks ` to create for objects + with the reduced redundancy storage class. MinIO uses the ``EC:N`` + notation to refer to the number of parity blocks (``N``). This environment + variable only applies to deployments with :ref:`Erasure Coding + ` enabled. + + Defaults to ``2``. + +.. envvar:: MINIO_STORAGE_CLASS_COMMENT + + Adds a comment to the storage class settings. \ No newline at end of file