DOCS-412: Node and Drive Recovery Procedures

Klaus suggestions Adding Eco's suggestions Co-authored-by: Eco <41090896+eco-minio@users.noreply.github.com> CR updates Additional review comments final work
2025-07-28 19:42:10 +03:00 · 2022-01-11 18:25:02 -05:00
parent a6fb44ac60
commit ef4541a085
8 changed files with 285 additions and 6 deletions
--- a/.gitignore
+++ b/.gitignore
@ -4,3 +4,4 @@ venv
 __pycache__
 node_modules
 npm-debug.log
+.python-version
--- a/source/conf.py
+++ b/source/conf.py
@ -141,5 +141,6 @@ rst_prolog = """
 .. |minio-latest| replace:: RELEASE.2022-01-08T03-11-54Z
 .. |minio-rpm| replace:: https://dl.min.io/server/minio/release/linux-amd64/minio-20220108031154.0.0.x86_64.rpm
 .. |minio-deb| replace:: https://dl.min.io/server/minio/release/linux-amd64/minio_20220108031154.0.0_amd64.deb
+.. |subnet| replace:: `MinIO SUBNET <https://min.io/pricing?jmp=docs>`__

 """
--- a/source/default-conf.py
+++ b/source/default-conf.py
@ -141,5 +141,6 @@ rst_prolog = """
 .. |minio-latest| replace:: MINIOLATEST
 .. |minio-rpm| replace:: RPMURL
 .. |minio-deb| replace:: DEBURL
+.. |subnet| replace:: `MinIO SUBNET <https://min.io/pricing?jmp=docs>`

 """
--- a/source/includes/common-installation.rst
+++ b/source/includes/common-installation.rst
@ -197,10 +197,18 @@ Console.

 .. start-local-jbod-desc

-MinIO strongly recommends local :abbr:`JBOD (Just a Bunch of Disks)` arrays with
-XFS-formatted disks for best performance. RAID or similar technologies do not
-provide additional resilience or availability benefits when used with
-distributed MinIO deployments, and typically reduce system performance. 
+MinIO strongly recommends direct-attached :abbr:`JBOD (Just a Bunch of Disks)`
+arrays with XFS-formatted disks for best performance.  
+
+- Direct-Attached Storage (DAS) has significant performance and consistency
+  advantages over networked storage (NAS, SAN, NFS). 
+
+- Deployments using non-XFS filesystems (ext4, btrfs, zfs) tend to have
+  lower performance while exhibiting unexpected or undesired behavior.  
+
+- RAID or similar technologies do not provide additional resilience or
+  availability benefits when used with distributed MinIO deployments, and
+  typically reduce system performance.

 Ensure all nodes in the |deployment| use the same type (NVMe, SSD, or HDD)  of
 drive with identical capacity (e.g. ``N`` TB) . MinIO does not distinguish drive
--- a/source/installation/deploy-minio-distributed.rst
+++ b/source/installation/deploy-minio-distributed.rst
@ -98,6 +98,8 @@ You can specify the entire range of hostnames using the expansion notation

 Configuring DNS to support MinIO is out of scope for this procedure.

+.. _deploy-minio-distributed-prereqs-storage:
+
 Local JBOD Storage with Sequential Mounts
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

@ -111,7 +113,7 @@ Local JBOD Storage with Sequential Mounts
   :class: note

   MinIO's strict **read-after-write** and **list-after-write** consistency
-   model requires local disk filesystems (``xfs``, ``ext4``, etc.).
+   model requires local disk filesystems.

   MinIO cannot provide consistency guarantees if the underlying storage
   volumes are NFS or a similar network-attached storage volume. 
--- a/source/installation/deployment-and-management.rst
+++ b/source/installation/deployment-and-management.rst
@ -130,3 +130,4 @@ source-based installations in production environments.
   /installation/expand-minio-distributed
   /installation/deploy-minio-standalone
   /installation/upgrade-minio
+   /installation/restore-minio
--- a/source/installation/restore-minio.rst
+++ b/source/installation/restore-minio.rst
@ -0,0 +1,261 @@
+
+.. _minio-restore-hardware-failure:
+
+==============================
+Recover after Hardware Failure
+==============================
+
+.. default-domain:: minio
+
+.. contents:: Table of Contents
+   :local:
+   :depth: 1
+
+Distributed MinIO deployments rely on :ref:`Erasure Coding
+<minio-erasure-coding>` to provide built-in tolerance for multiple disk or node
+failures. Depending on the deployment topology and the selected erasure code
+parity, MinIO can tolerate the loss of up to half the drives or nodes in the
+deployment while maintaining read access ("read quorum") to objects. 
+
+The following table lists the typical types of failure in a MinIO deployment
+and links to procedures for recovering from each:
+
+.. list-table::
+   :header-rows: 1
+   :widths: 30 70
+   :width: 100%
+
+   * - Failure Type
+     - Description
+
+   * - :ref:`Drive Failure <minio-restore-hardware-failure-drive>`
+     - MinIO supports hot-swapping failed drives with new healthy drives. 
+
+   * - :ref:`Node Failure <minio-restore-hardware-failure-node>`
+     - MinIO detects when a node rejoins the deployment and begins proactively healing the node shortly after it is joined back to the cluster
+       healing data previously stored on that node.
+
+Since MinIO can operate in a degraded state without significant performance
+loss, administrators can schedule hardware replacement in proportion to the rate
+of hardware failure. "Normal" failure rates (single drive or node failure) may
+allow for a more reasonable replacement timeframe, while "critical" failure
+rates (multiple drives or nodes) may require a faster response.
+
+For nodes with one or more drives that are either partially failed or operating
+in a degraded state (increasing disk errors, SMART warnings, timeouts in MinIO
+logs, etc.), you can safely unmount the drive *if* the cluster has sufficient
+remaining healthy drives to maintain
+:ref:`read and write quorum <minio-ec-parity>`. Missing drives are less
+disruptive to the deployment than drives that are reliably producing read and
+write errors.
+
+.. admonition:: MinIO Professional Support
+   :class: note
+
+   |subnet| users can `log in <https://subnet.min.io/>`__ and create a new issue
+   related to drive or node failures. Coordination with MinIO Engineering via
+   SUBNET can ensure successful recovery operations of production MinIO
+   deployments, including root-cause analysis, and health diagnostics.
+
+   Community users can seek support on the `MinIO Community Slack
+   <https://minio.slack.com>`__. Community Support is best-effort only and has
+   no SLAs around responsiveness.
+
+.. _minio-restore-hardware-failure-drive:
+
+Drive Failure Recovery
+----------------------
+
+MinIO supports hot-swapping failed drives with new healthy drives. MinIO detects
+and heals those drives without requiring any node or deployment-level restart.
+MinIO healing occurs only on the replaced drive(s) and does not typically impact
+deployment performance.
+
+MinIO healing ensures consistency and correctness of all data restored onto the
+drive. **Do not** attempt to manually recover or migrate data from the failed
+drive onto the new healthy drive.
+
+The following steps provide a more detailed walkthrough of drive replacement.
+These steps assume a MinIO deployment where each node manages drives using
+``/etc/fstab`` with per-drive labels as per the
+:ref:`documented prerequisites <minio-installation>`.
+
+1) Unmount the failed drive(s)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Unmount each failed drive using ``umount``. For example, the following
+command unmounts the drive at ``/dev/sdb``:
+
+.. code-block:: shell
+
+   umount /dev/sdb
+
+2) Replace the failed drive(s)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Remove the failed drive(s) from the node hardware and replace it with known
+healthy drive(s). Replacement drives *must* meet the following requirements:
+
+- :ref:`XFS formatted <deploy-minio-distributed-prereqs-storage>` and empty.
+- Same drive type (e.g. HDD, SSD, NVMe).
+- Equal or greater performance.
+- Equal or greater capacity.
+
+Using a replacement drive with greater capacity does not increase the total
+cluster storage. MinIO uses the *smallest* drive's capacity as the ceiling for
+all drives in the :ref:`Server Pool <minio-intro-server-pool>`.
+
+The following command formats a drive as XFS and assigns it a label to match
+the failed drive.
+
+.. code-block:: shell
+
+   mkfs.xfs /dev/sdb -L DISK1
+
+MinIO **strongly recommends** using label-based mounting to ensure consistent
+drive order that persists through system restarts.
+
+3) Review and Update ``fstab``
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Review the ``/etc/fstab`` file and update as needed such that the entry for
+the failed disk points to the newly formatted replacement.
+
+- If using label-based disk assignment, ensure that each label points to the
+  correct newly formatted disk.
+
+- If using UUID-based disk assignment, update the UUID for each point based on
+  the newly formatted disk. You can use ``lsblk`` to view disk UUIDs.
+
+For example, consider 
+
+.. code-block:: shell
+
+   $ cat /etc/fstab
+
+     # <file system>  <mount point>  <type>  <options>         <dump>  <pass>
+     LABEL=DISK1      /mnt/disk1     xfs     defaults,noatime  0       2
+     LABEL=DISK2      /mnt/disk2     xfs     defaults,noatime  0       2
+     LABEL=DISK3      /mnt/disk3     xfs     defaults,noatime  0       2
+     LABEL=DISK4      /mnt/disk4     xfs     defaults,noatime  0       2
+
+Given the previous example command, no changes are required to 
+``fstab`` since the replacement disk at ``/mnt/disk1`` uses the same
+label ``DISK1`` as the failed disk.
+
+4) Remount the Replaced Drive(s)
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use ``mount -a`` to remount the drives unmounted at the beginning of this
+procedure:
+
+.. code-block:: shell
+   :class: copyable
+
+   mount -a
+
+The command should result in remounting of all of the replaced drives.
+
+5) Monitor MinIO for Drive Detection and Healing Status
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use :mc-cmd:`mc admin console` command *or* ``journalctl -u minio`` for
+``systemd``-managed installations to monitor the server log output after
+remounting drives. The output should include messages identifying each formatted
+and empty drive.
+
+Use :mc-cmd:`mc admin heal` to monitor the overall healing status on the
+deployment. MinIO aggressively heals replaced drive(s) to ensure rapid recovery
+from the degraded state.
+
+6) Next Steps
+~~~~~~~~~~~~~
+
+Monitor the cluster for any further drive failures. Some drive batches may fail
+in close proximity to each other. Deployments seeing higher than expected drive
+failure rates should schedule dedicated maintenance around replacing the known
+bad batch. Consider using |subnet| to coordinate with MinIO engineering around
+guidance for any such operations.
+
+.. _minio-restore-hardware-failure-node:
+
+Node Failure Recovery
+---------------------
+
+If a MinIO node suffers complete hardware failure (e.g. loss of all drives,
+data, etc.), the node begins healing operations once it rejoins the deployment.
+MinIO healing occurs only on the replaced hardware and does not typically impact
+deployment performance.
+
+MinIO healing ensures consistency and correctness of all data restored onto the
+drive. **Do not** attempt to manually recover or migrate data from the failed
+node onto the new healthy node.
+
+The replacement node hardware should be substantially similar to the failed
+node. There are no negative performance implications to using improved hardware.
+
+The replacement drive hardware should be substantially similar to the failed
+drive. For example, replace a failed SSD with another SSD drive of the same
+capacity. While you can use drives with larger capacity, MinIO uses the
+*smallest* drive's capacity as the ceiling for all drives in the :ref:`Server
+Pool <minio-intro-server-pool>`.
+
+The following steps provide a more detailed walkthrough of node replacement.
+These steps assume a MinIO deployment where each node has a DNS hostname 
+as per the :ref:`documented prerequisites <minio-installation>`.
+
+1) Start the Replacement Node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Ensure the new node has received all necessary security, firmware, and OS
+updates as per industry, regulatory, or organizational standards and
+requirements.
+
+The new node software configuration *must* match that of the other nodes in the
+deployment, including but not limited to the OS and Kernel versions and
+configurations. Heterogeneous software configurations may result in unexpected
+or undesired behavior in the deployment.
+
+2) Update Hostname for the New Node
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+*Optional* This step is only required if the replacement node has a
+different IP address from the failed host.
+
+Ensure the hostname associated to the failed node now resolves to the new node.
+
+For example, if ``https://minio-1.example.net`` previously resolved to the
+failed host, it should now resolve to the new host.
+
+3) Download and Prepare the MinIO Server
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Follow the :ref:`deployment procedure <minio-installation>` to download
+and run the MinIO server using a matching configuration as all other nodes
+in the deployment.
+
+- The MinIO server version *must* match across all nodes
+- The MinIO service and environment file configurations *must* match across
+  all nodes.
+
+4) Rejoin the node to the deployment
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Start the MinIO server process on the node and monitor the process output
+using :mc-cmd:`mc admin console` or by monitoring the MinIO service logs
+using ``journalctl -u minio`` for ``systemd`` managed installations.
+
+The server output should indicate that it has detected the other nodes
+in the deployment and begun healing operations.
+
+Use :mc-cmd:`mc admin heal` to monitor overall healing status on the
+deployment. MinIO aggressively heals the node to ensure rapid recovery
+from the degraded state.
+
+5) Next Steps
+~~~~~~~~~~~~~
+
+Continue monitoring the deployment until healing completes. Deployments with
+persistent and repeated node failures should schedule dedicated maintenance to
+identify the root cause. Consider using |subnet| to coordinate with MinIO
+engineering around guidance for any such operations.
--- a/source/object-retention/minio-object-locking.rst
+++ b/source/object-retention/minio-object-locking.rst
@ -31,6 +31,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-versioning-delete-object.svg
         :alt: Deleting an Object
         :align: center
+         :width: 600px

      MinIO versioning preserves the full history of object mutations. 
      However, applications can explicitly delete specific object versions.
@ -40,6 +41,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking.svg
         :alt: 30 Day Locked Objects
         :align: center
+         :width: 600px

      Applying a default 30 Day WORM lock to objects in the bucket ensures
      a minimum period of retention and protection for all object versions.
@ -49,6 +51,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking-delete.svg
         :alt: Delete Operation in Locked Bucket
         :align: center
+         :width: 600px

      Delete operations follow normal behavior in 
      :ref:`versioned buckets <minio-bucket-versioning-delete>`, where MinIO
@ -61,6 +64,7 @@ SEC17a-4(f), FINRA 4511(C), and CFTC 1.31(c)-(d) requirements as per
      .. image:: /images/retention/minio-object-locking-delete-version.svg
         :alt: Versioned Delete Operation in a Locked Bucket
         :align: center
+         :width: 600px

      MinIO blocks any attempt to delete a specific object version held under
      WORM lock. The earliest possible time after which a client may delete