MinIO Server [RELEASE.2024-08-03T04-33-23Z](https://github.com/minio/minio/releases/tag/RELEASE.2024-08-03T04-33-23Z) adds a v2 for the batch job replicate API. This new version supports including multiple prefixes on the source for replication. Not linked to a docs issue.
		
			
				
	
	
	
		
			11 KiB
		
	
	
	
	
	
	
	
			
		
		
	
	Batch Replication
minio
Table of Contents
MinIO RELEASE.2022-10-08T20-11-00Z
The Batch Framework was introduced with the replicate
job type in the mc RELEASE.2022-10-08T20-11-00Z.
The MinIO Batch Framework allows you to create, manage, monitor, and
execute jobs using a YAML-formatted job definition file (a "batch
file"). The batch jobs run directly on the MinIO deployment to take
advantage of the server-side processing power without constraints of the
local machine where you run the MinIO Client <minio-client>.
The replicate batch job replicates objects from one
MinIO deployment (the source deployment) to another MinIO
deployment (the target deployment). Either the
source or the target must be
the local <minio-batch-local> deployment.
Batch Replication between MinIO deployments have the following
advantages over using mc mirror:
- Removes the client to cluster network as a potential bottleneck
- A user only needs access to starting a batch job with no other permissions, as the job runs entirely server side on the cluster
- The job provides for retry attempts in event that objects do not replicate
- Batch jobs are one-time, curated processes allowing for fine control replication
- (MinIO to MinIO only) The replication process copies object versions from source to target
MinIO Server RELEASE.2023-02-17T17-52-43Z
Run batch replication with multiple workers in parallel by specifying
the MINIO_BATCH_REPLICATION_WORKERS environment
variable.
Starting with the MinIO Server
RELEASE.2023-05-04T21-44-30Z, the other deployment can be
either another MinIO deployment or any S3-compatible location using a
realtime storage class. Use filtering options in the replication
YAML file to exclude objects stored in locations that
require rehydration or other restoration methods before serving the
requested object. Batch replication to these types of remotes uses
mc mirror behavior.
Behavior
Access Control and Requirements
Batch replication shares similar access and permission requirements
as bucket replication <minio-bucket-replication-requirements>.
The credentials for the "source" deployment must have a policy similar to the following:
/extra/examples/ReplicationAdminPolicy.json
The credentials for the "remote" deployment must have a policy similar to the following:
/extra/examples/ReplicationRemoteUserPolicy.json
See mc admin user,
mc admin user svcacct,
and mc admin policy for
more complete documentation on adding users, access keys, and policies
to a MinIO deployment.
MinIO deployments configured for Active Directory/LDAP <minio-external-identity-management-ad-ldap>
or OpenID Connect <minio-external-identity-management-openid>
user management can instead create dedicated access keys <minio-idp-service-account> for
supporting batch replication.
Filter Replication Targets
The batch job definition file can limit the replication by bucket, prefix, and/or filters to only replicate certain objects. The access to objects and buckets for the replication process may be restricted by the credentials you provide in the YAML for either the source or target destinations.
MinIO Server RELEASE.2023-04-07T05-28-58Z
You can replicate from a remote MinIO deployment to the local deployment that runs the batch job.
For example, you can use a batch job to perform a one-time
replication sync to push objects from a bucket on a local deployment at
minio-local/invoices/ to a bucket on a remote deployment at
minio-remote/invoices. You can also pull objects from the
remote deployment at minio-remote/invoices to the local
deployment at minio-local/invoices.
Small File Optimization
Starting with RELEASE.2023-12-09T18-17-51Z, batch
replication by default automatically batches and compresses objects
smaller than 5MiB to efficiently transfer data between the source and
remote. The remote MinIO deployment can check and immediately apply
lifecycle management tiering rules to batched objects. The functionality
resembles that offered by S3 Snowball Edge small file batching.
You can modify the compression settings in the replicate <minio-batch-job-types> job
configuration.
Replicate Batch Job Reference
The YAML must define the source and target
deployments. If the source deployment is remote, then the
target deployment must be local.
Optionally, the YAML can also define flags to filter which objects
replicate, send notifications for the job, or define retry attempts for
the job.
MinIO RELEASE.2023-04-07T05-28-58Z
You can replicate from a remote MinIO deployment to the local deployment that runs the batch job.
MinIO RELEASE.2024-08-03T04-33-23Z
This release introduces a new version of the Batch Job Replicate API,
v2. The updated API allows you to list multiple prefixes on
the source to replicate from. To replicate multiple prefixes from a
source, specify replicate.apiVersion as
v2.
replicate:
  apiVersion: v2
  source:
    type: minio
    bucket: mybucket
    prefix:
      - prefix1
      - prefix2
...For the source deployment
- Required information - type:- Must be - minio.- bucket:- The bucket on the deployment. 
- Optional information - prefix:The prefix on the object(s) that should replicate.
 Beginning with MinIO Server- RELEASE.2024-08-03T04-33-23Z, v2 of the Batch Job Replicate API allows you to list multiple prefixes.
 Specify- replicate.apiVersionas- v2to replicate from multiple prefixes.- endpoint:Location of the deployment to use for either the source or the target of a replication batch job.
 For example,- https://minio.example.net.
 
 If the deployment is the- aliasspecified to the command, omit this field to direct MinIO to use that alias for the endpoint and credentials values.
 Either the source deployment or the remote deployment must be the- "local" <minio-batch-local>alias.
 The non-"local" deployment must specify the- endpointand- credentials.- path:Directs MinIO to use Path or Virtual Style (DNS) lookup of the bucket.
 
 - Specify- onfor Path style
 - Specify- offfor Virtual style
 - Specify- autoto let MinIO determine the correct lookup style.
 
 Defaults to- auto.- credentials:The- accesskey:and- secretKey:or the- sessionToken:that grants access to the object(s).
 Only specify for the deployment that is not the- local <minio-batch-local>deployment.- snowballversion added: RELEASE.2023-12-09T18-17-51Z
 
 Configuration options for controlling the batch-and-compress functionality.- snowball.disableSpecify- trueto disable the batch-and-compress functionality during replication.
 Defaults to- false.- snowball.batchSpecify the maximum integer number of objects to batch for compression.
 Defaults to- 100.- snowball.inmemorySpecify- falseto stage archives using local storage or- trueto stage to memory (RAM).
 Defaults to- true.- snowball.compressSpecify- trueto generate compress batched objects over the wire using the S2/Snappy compression algorithm.
 Defaults to- falseor no compression.- snowball.smallerThanSpecify the size of object in Megabits (MiB) under which MinIO should batch objects.
 Defaults to- 5MiB.- snowball.skipErrsSpecify- falseto direct MinIO to halt on any object which produces errors on read.
 Defaults to- true.
For the target deployment
- Required information - type:- Must be - minio.- bucket:- The bucket on the deployment. 
- Optional information - prefix:- The prefix on the object(s) to replicate. - endpoint:The location of the target deployment.
 
 If the target is the- alias <alias>specified to the command, you can omit this and the- credentialsfields.
 If the target is "local", the source must specify the remote deployment with- endpointand- credentials.- credentials:- The - accesskeyand- secretKeyor the- sessionTokenthat grants access to the object(s).
For filters
| 
 | A string representing a length of time in  Only objects newer than the specified length of time replicate. For
example,  | 
| 
 | A string representing a length of time in  Only objects older than the specified length of time replicate. | 
| 
 | A date in  Only objects created after the date replicate. | 
| 
 | A date in  Only objects created prior to the date replicate. | 
For notifications
| endpoint: | The predefined endpoint to send events for notifications. | 
| token: | An optional JWT <JSON Web Token>to access theendpoint. | 
For retry attempts
If something interrupts the job, you can define how many attempts to retry the job batch. For each retry, you can also define how long to wait between attempts.
| attempts: | Number of tries to complete the batch job before giving up. | 
| delay: | The least amount of time to wait between each attempt. | 
Sample
YAML Description File for a replicate Job Type
Use mc batch generate
to create a basic replicate batch job for further
customization.
For the local <minio-batch-local> deployment, do not
specify the endpoint or credentials. Either delete or comment out those
lines for the source or the target section, depending on which is the
local.
/includes/code/replicate.yaml