mirror of
https://github.com/minio/docs.git
synced 2025-07-30 07:03:26 +03:00
Improvements to replication conceptual docs (#885)
This commit is contained in:
@ -134,18 +134,6 @@ MinIO :ref:`site replication <minio-site-replication-overview>` provides support
|
||||
A MinIO multi-site deployment with three peers.
|
||||
Write operations on one peer replicate to all other peers in the configuration automatically.
|
||||
|
||||
Each peer site consists of an independent set of MinIO hosts, ideally having matching pool configurations.
|
||||
The architecture of each peer site should closely match to ensure consistent performance and behavior between sites.
|
||||
All peer sites must use the same primary identity provider, and during initial configuration only one peer site can have any data.
|
||||
|
||||
.. figure:: /images/architecture/architecture-multi-site-setup.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment during initial setup
|
||||
|
||||
The initial setup of a MinIO multi-site deployment.
|
||||
The first peer site replicates all required information to other peers in the configuration.
|
||||
Adding new peers uses the same sequence for synchronizing data.
|
||||
|
||||
Replication performance primarily depends on the network latency between each peer site.
|
||||
With geographically distributed peer sites, high latency between sites can result in significant replication lag.
|
||||
This can compound with workloads that are near or at the deployment's overall performance capacity, as the replication process itself requires sufficient free :abbr:`I/O (Input / Output)` to synchronize objects.
|
||||
@ -162,27 +150,10 @@ Deploying a global load balancer or similar network appliance with support for s
|
||||
|
||||
.. figure:: /images/architecture/architecture-load-balancer-multi-site.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment with a failed site
|
||||
:alt: Diagram of a site replication deployment with two sites
|
||||
|
||||
One of the peer sites has failed completely.
|
||||
The load balancer automatically routes requests to the remaining healthy peer site.
|
||||
The Load Balancer automatically routes client requests using configured logic (geo-local, latency, etc.).
|
||||
Data written to one site automatically replicates to the other peer site.
|
||||
|
||||
The load balancer should meet the same requirements as single-site deployments regarding connection balancing and header preservation.
|
||||
MinIO replication handles transient failures by queuing objects for replication.
|
||||
|
||||
MinIO replication can automatically heal a site that has partial or total data loss due to transient or sustained downtime.
|
||||
If a peer site completely fails, you can remove that site from the configuration entirely.
|
||||
The load balancer configuration should also remove that site to avoid routing client requests to the offline site.
|
||||
|
||||
You can then restore the peer site, either after repairing the original hardware or replacing it entirely, by adding it back to the site replication configuration.
|
||||
MinIO automatically begins resynchronizing existing data while continuously replicating new data.
|
||||
|
||||
.. figure:: /images/architecture/architecture-load-balancer-multi-site-healing.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment with a healing site
|
||||
|
||||
The peer site has recovered and reestablished connectivity with its healthy peers.
|
||||
MinIO automatically works through the replication queue to catch the site back up.
|
||||
|
||||
Once all data synchronizes, you can restore normal connectivity to that site.
|
||||
Depending on the amount of replication lag, latency between sites and overall workload :abbr:`I/O (Input / Output)`, you may need to temporarily stop write operations to allow the sites to completely catch up.
|
||||
MinIO replication handles transient failures by queuing objects for replication.
|
@ -24,6 +24,9 @@ This page provides an overview of MinIO's availability and resiliency design and
|
||||
Community users can seek support on the `MinIO Community Slack <https://slack.min.io>`__.
|
||||
Community Support is best-effort only and has no SLAs around responsiveness.
|
||||
|
||||
Distributed MinIO Deployments
|
||||
-----------------------------
|
||||
|
||||
MinIO implements :ref:`erasure coding <minio-erasure-coding>` as the core component in providing availability and resiliency during drive or node-level failure events.
|
||||
MinIO partitions each object into data and :ref:`parity <minio-ec-parity>` shards and distributes those shards across a single :ref:`erasure set <minio-ec-erasure-set>`.
|
||||
|
||||
@ -158,4 +161,48 @@ For multi-pool MinIO deployments, each pool requires at least one erasure set ma
|
||||
Use replicated remotes to restore the lost data to the deployment.
|
||||
All data stored on the healthy pools remain safe on disk.
|
||||
|
||||
Replicated MinIO Deployments
|
||||
----------------------------
|
||||
|
||||
MinIO implements :ref:`site replication <minio-site-replication-overview>` as the primary measure for ensuring Business Continuity and Disaster Recovery (BC/DR) in the case of both small and large scale data loss in a MinIO deployment.
|
||||
.. figure:: /images/availability/availability-multi-site-setup.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment during initial setup
|
||||
|
||||
Each peer site is deployed to an independent datacenter to provide protection from large-scale failure or disaster.
|
||||
If one datacenter goes completely offline, clients can fail over to the other site.
|
||||
|
||||
MinIO replication can automatically heal a site that has partial or total data loss due to transient or sustained downtime.
|
||||
.. figure:: /images/availability/availability-multi-site-healing.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment while healing
|
||||
|
||||
Datacenter 2 was down and Site B requires resynchronization.
|
||||
The Load Balancer handles routing operations to Site A in Datacenter 1.
|
||||
Site A continuously replicates data to Site B.
|
||||
|
||||
Once all data synchronizes, you can restore normal connectivity to that site.
|
||||
Depending on the amount of replication lag, latency between sites and overall workload :abbr:`I/O (Input / Output)`, you may need to temporarily stop write operations to allow the sites to completely catch up.
|
||||
|
||||
If a peer site completely fails, you can remove that site from the configuration entirely.
|
||||
The load balancer configuration should also remove that site to avoid routing client requests to the offline site.
|
||||
|
||||
You can then restore the peer site, either after repairing the original hardware or replacing it entirely, by :ref:`adding it back to the site replication configuration <minio-expand-site-replication>`.
|
||||
MinIO automatically begins resynchronizing existing data while continuously replicating new data.
|
||||
|
||||
Sites can continue processing operations during resynchronization by proxying ``GET/HEAD`` requests to healthy peer sites
|
||||
.. figure:: /images/availability/availability-multi-site-proxy.svg
|
||||
:figwidth: 100%
|
||||
:alt: Diagram of a multi-site deployment while healing
|
||||
|
||||
Site B does not have the requested object, possibly due to replication lag.
|
||||
It proxies the ``GET`` request to Site A.
|
||||
Site A returns the object, which Site B then returns to the requesting client.
|
||||
|
||||
The client receives the results from first peer site to return *any* version of the requested object.
|
||||
|
||||
``PUT`` and ``DELETE`` operations synchronize using the regular replication process.
|
||||
``LIST`` operations do not proxy and require clients to issue them exclusively against healthy peers.
|
||||
|
||||
|
||||
|
||||
|
@ -88,6 +88,25 @@ Any MinIO deployment in the site replication configuration can resynchronize dam
|
||||
If one site loses data for any reason, resynchronize the data from another healthy site with :mc-cmd:`mc admin replicate resync`.
|
||||
This launches an active process that resynchronizes the data without waiting for the passive MinIO scanner to recognize the missing data.
|
||||
|
||||
Proxy to Other Sites
|
||||
~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
MinIO peer sites can proxy ``GET/HEAD`` requests for an object to other peers to check if it exists.
|
||||
This allows a site that is healing or lagging behind other peers to still return an object persisted to other sites.
|
||||
|
||||
For example:
|
||||
|
||||
1. A client issues ``GET("data/invoices/january.xls")`` to ``Site1``
|
||||
2. ``Site1`` cannot locate the object
|
||||
3. ``Site1`` proxies the request to ``Site2``
|
||||
4. ``Site2`` returns the latest version of the requested object
|
||||
5. ``Site1`` returns the proxied object to the client
|
||||
|
||||
For ``GET/HEAD`` requests that do *not* include a unique version ID, the proxy request returns the *latest* version of that object on the peer site.
|
||||
This may result in retrieval of a non-current version of an object, such as if the responding peer site is also experiencing replication lag.
|
||||
|
||||
MinIO does not proxy ``LIST``, ``DELETE``, and ``PUT`` operations.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
|
Reference in New Issue
Block a user