DOCS-412: Node and Drive Recovery Procedures

Klaus suggestions Adding Eco's suggestions Co-authored-by: Eco <41090896+eco-minio@users.noreply.github.com> CR updates Additional review comments final work
2025-07-28 19:42:10 +03:00 · 2022-01-11 18:25:02 -05:00
parent a6fb44ac60
commit ef4541a085
8 changed files with 285 additions and 6 deletions
--- a/.gitignore
+++ b/.gitignore
@ -4,3 +4,4 @@ venv
 __pycache__
 node_modules
 npm-debug.log
 .python-version
--- a/source/conf.py
+++ b/source/conf.py
@ -141,5 +141,6 @@ rst_prolog = """
 .. |minio-latest| replace:: RELEASE.2022-01-08T03-11-54Z
 .. |minio-rpm| replace:: https://dl.min.io/server/minio/release/linux-amd64/minio-20220108031154.0.0.x86_64.rpm
 .. |minio-deb| replace:: https://dl.min.io/server/minio/release/linux-amd64/minio_20220108031154.0.0_amd64.deb
 .. |subnet| replace:: `MinIO SUBNET <https://min.io/pricing?jmp=docs>`__
 """
--- a/source/default-conf.py
+++ b/source/default-conf.py
@ -141,5 +141,6 @@ rst_prolog = """
 .. |minio-latest| replace:: MINIOLATEST
 .. |minio-rpm| replace:: RPMURL
 .. |minio-deb| replace:: DEBURL
 .. |subnet| replace:: `MinIO SUBNET <https://min.io/pricing?jmp=docs>`
 """
--- a/source/includes/common-installation.rst
+++ b/source/includes/common-installation.rst
@ -197,10 +197,18 @@ Console.
 .. start-local-jbod-desc
-MinIO strongly recommends local :abbr:`JBOD (Just a Bunch of Disks)` arrays with
+MinIO strongly recommends direct-attached :abbr:`JBOD (Just a Bunch of Disks)`
-XFS-formatted disks for best performance. RAID or similar technologies do not
+arrays with XFS-formatted disks for best performance.  
-provide additional resilience or availability benefits when used with
+
-distributed MinIO deployments, and typically reduce system performance. 
+- Direct-Attached Storage (DAS) has significant performance and consistency
  advantages over networked storage (NAS, SAN, NFS). 
 - Deployments using non-XFS filesystems (ext4, btrfs, zfs) tend to have
  lower performance while exhibiting unexpected or undesired behavior.  
 - RAID or similar technologies do not provide additional resilience or
  availability benefits when used with distributed MinIO deployments, and
  typically reduce system performance.
 Ensure all nodes in the |deployment| use the same type (NVMe, SSD, or HDD)  of
 drive with identical capacity (e.g. ``N`` TB) . MinIO does not distinguish drive
--- a/source/installation/deploy-minio-distributed.rst
+++ b/source/installation/deploy-minio-distributed.rst
@ -98,6 +98,8 @@ You can specify the entire range of hostnames using the expansion notation
 Configuring DNS to support MinIO is out of scope for this procedure.
 .. _deploy-minio-distributed-prereqs-storage:
 Local JBOD Storage with Sequential Mounts
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@ -111,7 +113,7 @@ Local JBOD Storage with Sequential Mounts
   :class: note
   MinIO's strict **read-after-write** and **list-after-write** consistency
-   model requires local disk filesystems (``xfs``, ``ext4``, etc.).
+   model requires local disk filesystems.
   MinIO cannot provide consistency guarantees if the underlying storage
   volumes are NFS or a similar network-attached storage volume. 
--- a/source/installation/deployment-and-management.rst
+++ b/source/installation/deployment-and-management.rst
@ -129,4 +129,5 @@ source-based installations in production environments.
   /installation/deploy-minio-distributed
   /installation/expand-minio-distributed
   /installation/deploy-minio-standalone
-   /installation/upgrade-minio
+   /installation/upgrade-minio
   /installation/restore-minio
--- a/source/installation/restore-minio.rst
+++ b/source/installation/restore-minio.rst
@ -0,0 +1,261 @@
 .. _minio-restore-hardware-failure:
 ==============================
 Recover after Hardware Failure
 ==============================
 .. default-domain:: minio
 .. contents:: Table of Contents
   :local:
   :depth: 1
 Distributed MinIO deployments rely on :ref:`Erasure Coding
 <minio-erasure-coding>` to provide built-in tolerance for multiple disk or node
 failures. Depending on the deployment topology and the selected erasure code
 parity, MinIO can tolerate the loss of up to half the drives or nodes in the
 deployment while maintaining read access ("read quorum") to objects. 
 The following table lists the typical types of failure in a MinIO deployment
 and links to procedures for recovering from each:
 .. list-table::
   :header-rows: 1
   :widths: 30 70
   :width: 100%
   * - Failure Type
     - Description
   * - :ref:`Drive Failure <minio-restore-hardware-failure-drive>`
     - MinIO supports hot-swapping failed drives with new healthy drives. 
   * - :ref:`Node Failure <minio-restore-hardware-failure-node>`
     - MinIO detects when a node rejoins the deployment and begins proactively healing the node shortly after it is joined back to the cluster
       healing data previously stored on that node.
 Since MinIO can operate in a degraded state without significant performance
 loss, administrators can schedule hardware replacement in proportion to the rate
 of hardware failure. "Normal" failure rates (single drive or node failure) may
 allow for a more reasonable replacement timeframe, while "critical" failure
 rates (multiple drives or nodes) may require a faster response.
 For nodes with one or more drives that are either partially failed or operating
 in a degraded state (increasing disk errors, SMART warnings, timeouts in MinIO
 logs, etc.), you can safely unmount the drive *if* the cluster has sufficient
 remaining healthy drives to maintain
 :ref:`read and write quorum <minio-ec-parity>`. Missing drives are less
 disruptive to the deployment than drives that are reliably producing read and
 write errors.
 .. admonition:: MinIO Professional Support
   :class: note
   |subnet| users can `log in <https://subnet.min.io/>`__ and create a new issue
   related to drive or node failures. Coordination with MinIO Engineering via
   SUBNET can ensure successful recovery operations of production MinIO
   deployments, including root-cause analysis, and health diagnostics.
   Community users can seek support on the `MinIO Community Slack
   <https://minio.slack.com>`__. Community Support is best-effort only and has
   no SLAs around responsiveness.
 .. _minio-restore-hardware-failure-drive:
 Drive Failure Recovery
 ----------------------
 MinIO supports hot-swapping failed drives with new healthy drives. MinIO detects
 and heals those drives without requiring any node or deployment-level restart.
 MinIO healing occurs only on the replaced drive(s) and does not typically impact
 deployment performance.
 MinIO healing ensures consistency and correctness of all data restored onto the
 drive. **Do not** attempt to manually recover or migrate data from the failed
 drive onto the new healthy drive.
 The following steps provide a more detailed walkthrough of drive replacement.
 These steps assume a MinIO deployment where each node manages drives using
 ``/etc/fstab`` with per-drive labels as per the
 :ref:`documented prerequisites <minio-installation>`.
 1) Unmount the failed drive(s)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Unmount each failed drive using ``umount``. For example, the following
 command unmounts the drive at ``/dev/sdb``:
 .. code-block:: shell
   umount /dev/sdb
 2) Replace the failed drive(s)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Remove the failed drive(s) from the node hardware and replace it with known
 healthy drive(s). Replacement drives *must* meet the following requirements:
 - :ref:`XFS formatted <deploy-minio-distributed-prereqs-storage>` and empty.
 - Same drive type (e.g. HDD, SSD, NVMe).
 - Equal or greater performance.
 - Equal or greater capacity.
 Using a replacement drive with greater capacity does not increase the total
 cluster storage. MinIO uses the *smallest* drive's capacity as the ceiling for
 all drives in the :ref:`Server Pool <minio-intro-server-pool>`.
 The following command formats a drive as XFS and assigns it a label to match
 the failed drive.
 .. code-block:: shell
   mkfs.xfs /dev/sdb -L DISK1
 MinIO **strongly recommends** using label-based mounting to ensure consistent
 drive order that persists through system restarts.
 3) Review and Update ``fstab``
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Review the ``/etc/fstab`` file and update as needed such that the entry for
 the failed disk points to the newly formatted replacement.
 - If using label-based disk assignment, ensure that each label points to the
  correct newly formatted disk.
 - If using UUID-based disk assignment, update the UUID for each point based on
  the newly formatted disk. You can use ``lsblk`` to view disk UUIDs.
 For example, consider 
 .. code-block:: shell
   $ cat /etc/fstab
     # <file system>  <mount point>  <type>  <options>         <dump>  <pass>
     LABEL=DISK1      /mnt/disk1     xfs     defaults,noatime  0       2
     LABEL=DISK2      /mnt/disk2     xfs     defaults,noatime  0       2
     LABEL=DISK3      /mnt/disk3     xfs     defaults,noatime  0       2
     LABEL=DISK4      /mnt/disk4     xfs     defaults,noatime  0       2
 Given the previous example command, no changes are required to 
 ``fstab`` since the replacement disk at ``/mnt/disk1`` uses the same
 label ``DISK1`` as the failed disk.
 4) Remount the Replaced Drive(s)
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Use ``mount -a`` to remount the drives unmounted at the beginning of this
 procedure:
 .. code-block:: shell
   :class: copyable
   mount -a
 The command should result in remounting of all of the replaced drives.
 5) Monitor MinIO for Drive Detection and Healing Status
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Use :mc-cmd:`mc admin console` command *or* ``journalctl -u minio`` for
 ``systemd``-managed installations to monitor the server log output after
 remounting drives. The output should include messages identifying each formatted
 and empty drive.
 Use :mc-cmd:`mc admin heal` to monitor the overall healing status on the
 deployment. MinIO aggressively heals replaced drive(s) to ensure rapid recovery
 from the degraded state.
 6) Next Steps
 ~~~~~~~~~~~~~
 Monitor the cluster for any further drive failures. Some drive batches may fail
 in close proximity to each other. Deployments seeing higher than expected drive
 failure rates should schedule dedicated maintenance around replacing the known
 bad batch. Consider using |subnet| to coordinate with MinIO engineering around
 guidance for any such operations.
 .. _minio-restore-hardware-failure-node:
 Node Failure Recovery
 ---------------------
 If a MinIO node suffers complete hardware failure (e.g. loss of all drives,
 data, etc.), the node begins healing operations once it rejoins the deployment.
 MinIO healing occurs only on the replaced hardware and does not typically impact
 deployment performance.
 MinIO healing ensures consistency and correctness of all data restored onto the
 drive. **Do not** attempt to manually recover or migrate data from the failed
 node onto the new healthy node.
 The replacement node hardware should be substantially similar to the failed
 node. There are no negative performance implications to using improved hardware.
 The replacement drive hardware should be substantially similar to the failed
 drive. For example, replace a failed SSD with another SSD drive of the same
 capacity. While you can use drives with larger capacity, MinIO uses the
 *smallest* drive's capacity as the ceiling for all drives in the :ref:`Server
 Pool <minio-intro-server-pool>`.
 The following steps provide a more detailed walkthrough of node replacement.
 These steps assume a MinIO deployment where each node has a DNS hostname 
 as per the :ref:`documented prerequisites <minio-installation>`.
 1) Start the Replacement Node
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Ensure the new node has received all necessary security, firmware, and OS
 updates as per industry, regulatory, or organizational standards and
 requirements.
 The new node software configuration *must* match that of the other nodes in the
 deployment, including but not limited to the OS and Kernel versions and
 configurations. Heterogeneous software configurations may result in unexpected
 or undesired behavior in the deployment.
 2) Update Hostname for the New Node
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 *Optional* This step is only required if the replacement node has a
 different IP address from the failed host.
 Ensure the hostname associated to the failed node now resolves to the new node.
 For example, if ``https://minio-1.example.net`` previously resolved to the
 failed host, it should now resolve to the new host.
 3) Download and Prepare the MinIO Server
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Follow the :ref:`deployment procedure <minio-installation>` to download
 and run the MinIO server using a matching configuration as all other nodes
 in the deployment.
 - The MinIO server version *must* match across all nodes
 - The MinIO service and environment file configurations *must* match across
  all nodes.
 4) Rejoin the node to the deployment
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Start the MinIO server process on the node and monitor the process output
 using :mc-cmd:`mc admin console` or by monitoring the MinIO service logs
 using ``journalctl -u minio`` for ``systemd`` managed installations.
 The server output should indicate that it has detected the other nodes
 in the deployment and begun healing operations.
 Use :mc-cmd:`mc admin heal` to monitor overall healing status on the
 deployment. MinIO aggressively heals the node to ensure rapid recovery
 from the degraded state.
 5) Next Steps
 ~~~~~~~~~~~~~
 Continue monitoring the deployment until healing completes. Deployments with
 persistent and repeated node failures should schedule dedicated maintenance to
 identify the root cause. Consider using |subnet| to coordinate with MinIO
 engineering around guidance for any such operations.
--- a/source/object-retention/minio-object-locking.rst
+++ b/source/object-retention/minio-object-locking.rst
@ -31,6 +31,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-versioning-delete-object.svg
         :alt: Deleting an Object
         :align: center
         :width: 600px
      MinIO versioning preserves the full history of object mutations. 
      However, applications can explicitly delete specific object versions.
@ -40,6 +41,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking.svg
         :alt: 30 Day Locked Objects
         :align: center
         :width: 600px
      Applying a default 30 Day WORM lock to objects in the bucket ensures
      a minimum period of retention and protection for all object versions.
@ -49,6 +51,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking-delete.svg
         :alt: Delete Operation in Locked Bucket
         :align: center
         :width: 600px
      Delete operations follow normal behavior in 
      :ref:`versioned buckets <minio-bucket-versioning-delete>`, where MinIO
@ -61,6 +64,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking-delete-version.svg
         :alt: Versioned Delete Operation in a Locked Bucket
         :align: center
         :width: 600px
      MinIO blocks any attempt to delete a specific object version held under
      WORM lock. The earliest possible time after which a client may delete