1
0
mirror of https://github.com/minio/docs.git synced 2025-07-31 18:04:52 +03:00

Docs Multiplatform Slice

This commit is contained in:
Ravind Kumar
2022-05-06 16:44:42 -04:00
parent df33ddee6a
commit b99c20a16f
134 changed files with 3689 additions and 2200 deletions

View File

@ -0,0 +1,122 @@
.. _minio-restore-hardware-failure-drive:
======================
Drive Failure Recovery
======================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 1
MinIO supports hot-swapping failed drives with new healthy drives. MinIO detects
and heals those drives without requiring any node or deployment-level restart.
MinIO healing occurs only on the replaced drive(s) and does not typically impact
deployment performance.
MinIO healing ensures consistency and correctness of all data restored onto the
drive. **Do not** attempt to manually recover or migrate data from the failed
drive onto the new healthy drive.
The following steps provide a more detailed walkthrough of drive replacement.
These steps assume a MinIO deployment where each node manages drives using
``/etc/fstab`` with per-drive labels as per the
:ref:`documented prerequisites <minio-installation>`.
1) Unmount the failed drive(s)
------------------------------
Unmount each failed drive using ``umount``. For example, the following
command unmounts the drive at ``/dev/sdb``:
.. code-block:: shell
umount /dev/sdb
2) Replace the failed drive(s)
------------------------------
Remove the failed drive(s) from the node hardware and replace it with known
healthy drive(s). Replacement drives *must* meet the following requirements:
- :ref:`XFS formatted <deploy-minio-distributed-prereqs-storage>` and empty.
- Same drive type (e.g. HDD, SSD, NVMe).
- Equal or greater performance.
- Equal or greater capacity.
Using a replacement drive with greater capacity does not increase the total
cluster storage. MinIO uses the *smallest* drive's capacity as the ceiling for
all drives in the :ref:`Server Pool <minio-intro-server-pool>`.
The following command formats a drive as XFS and assigns it a label to match
the failed drive.
.. code-block:: shell
mkfs.xfs /dev/sdb -L DISK1
MinIO **strongly recommends** using label-based mounting to ensure consistent
drive order that persists through system restarts.
3) Review and Update ``fstab``
------------------------------
Review the ``/etc/fstab`` file and update as needed such that the entry for
the failed disk points to the newly formatted replacement.
- If using label-based disk assignment, ensure that each label points to the
correct newly formatted disk.
- If using UUID-based disk assignment, update the UUID for each point based on
the newly formatted disk. You can use ``lsblk`` to view disk UUIDs.
For example, consider
.. code-block:: shell
$ cat /etc/fstab
# <file system> <mount point> <type> <options> <dump> <pass>
LABEL=DISK1 /mnt/disk1 xfs defaults,noatime 0 2
LABEL=DISK2 /mnt/disk2 xfs defaults,noatime 0 2
LABEL=DISK3 /mnt/disk3 xfs defaults,noatime 0 2
LABEL=DISK4 /mnt/disk4 xfs defaults,noatime 0 2
Given the previous example command, no changes are required to
``fstab`` since the replacement disk at ``/mnt/disk1`` uses the same
label ``DISK1`` as the failed disk.
4) Remount the Replaced Drive(s)
--------------------------------
Use ``mount -a`` to remount the drives unmounted at the beginning of this
procedure:
.. code-block:: shell
:class: copyable
mount -a
The command should result in remounting of all of the replaced drives.
5) Monitor MinIO for Drive Detection and Healing Status
-------------------------------------------------------
Use :mc-cmd:`mc admin console` command *or* ``journalctl -u minio`` for
``systemd``-managed installations to monitor the server log output after
remounting drives. The output should include messages identifying each formatted
and empty drive.
Use :mc-cmd:`mc admin heal` to monitor the overall healing status on the
deployment. MinIO aggressively heals replaced drive(s) to ensure rapid recovery
from the degraded state.
6) Next Steps
-------------
Monitor the cluster for any further drive failures. Some drive batches may fail
in close proximity to each other. Deployments seeing higher than expected drive
failure rates should schedule dedicated maintenance around replacing the known
bad batch. Consider using `MinIO SUBNET <https://min.io/pricing?jmp=docs>`__ to
coordinate with MinIO engineering around guidance for any such operations.

View File

@ -0,0 +1,90 @@
.. _minio-restore-hardware-failure-node:
=====================
Node Failure Recovery
=====================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 1
If a MinIO node suffers complete hardware failure (e.g. loss of all drives,
data, etc.), the node begins healing operations once it rejoins the deployment.
MinIO healing occurs only on the replaced hardware and does not typically impact
deployment performance.
MinIO healing ensures consistency and correctness of all data restored onto the
drive. **Do not** attempt to manually recover or migrate data from the failed
node onto the new healthy node.
The replacement node hardware should be substantially similar to the failed
node. There are no negative performance implications to using improved hardware.
The replacement drive hardware should be substantially similar to the failed
drive. For example, replace a failed SSD with another SSD drive of the same
capacity. While you can use drives with larger capacity, MinIO uses the
*smallest* drive's capacity as the ceiling for all drives in the
:ref:`Server Pool <minio-intro-server-pool>`.
The following steps provide a more detailed walkthrough of node replacement.
These steps assume a MinIO deployment where each node has a DNS hostname
as per the :ref:`documented prerequisites <minio-installation>`.
1) Start the Replacement Node
-----------------------------
Ensure the new node has received all necessary security, firmware, and OS
updates as per industry, regulatory, or organizational standards and
requirements.
The new node software configuration *must* match that of the other nodes in the
deployment, including but not limited to the OS and Kernel versions and
configurations. Heterogeneous software configurations may result in unexpected
or undesired behavior in the deployment.
2) Update Hostname for the New Node
-----------------------------------
*Optional* This step is only required if the replacement node has a
different IP address from the failed host.
Ensure the hostname associated to the failed node now resolves to the new node.
For example, if ``https://minio-1.example.net`` previously resolved to the
failed host, it should now resolve to the new host.
3) Download and Prepare the MinIO Server
----------------------------------------
Follow the :ref:`deployment procedure <minio-installation>` to download
and run the MinIO server using a matching configuration as all other nodes
in the deployment.
- The MinIO server version *must* match across all nodes
- The MinIO service and environment file configurations *must* match across
all nodes.
4) Rejoin the node to the deployment
------------------------------------
Start the MinIO server process on the node and monitor the process output
using :mc-cmd:`mc admin console` or by monitoring the MinIO service logs
using ``journalctl -u minio`` for ``systemd`` managed installations.
The server output should indicate that it has detected the other nodes
in the deployment and begun healing operations.
Use :mc-cmd:`mc admin heal` to monitor overall healing status on the
deployment. MinIO aggressively heals the node to ensure rapid recovery
from the degraded state.
5) Next Steps
-------------
Continue monitoring the deployment until healing completes. Deployments with
persistent and repeated node failures should schedule dedicated maintenance to
identify the root cause. Consider using
`MinIO SUBNET <https://min.io/pricing?jmp=docs>`__ to coordinate with MinIO
engineering around guidance for any such operations.

View File

@ -0,0 +1,13 @@
.. _minio-restore-hardware-failure-site:
==========================
Recover after Site Failure
==========================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 1
ToDo- site replication recovery procedure