7.1 KiB
Monitoring and alerting using Prometheus
minio
Table of Contents
MinIO publishes metrics using the Prometheus Data Model <concepts/data_model/#data-model>
.
The procedure on this page documents the following:
- Configure a Prometheus service to scrape and display metrics from a MinIO deployment
- Configure an Alert Rule on a MinIO Metric to trigger an AlertManager action
Prerequisites
This procedure requires the following:
- An existing
Prometheus deployment <prometheus/latest/installation/>
with backingAlert Manager <alerting/latest/overview/>
- An existing MinIO deployment with network access to the Prometheus deployment
- An
mc
installation on your local host configured toaccess <alias>
the MinIO deployment
Metrics Version 2 Deprecated
Starting with MinIO Server RELEASE.2024-07-15T19-02-30Z
and MinIO
Client RELEASE.2024-07-11T18-01-28Z
, metrics version 3
replaces the deprecated metrics version 2 <minio-metrics-v2>
.
Collect and alert on metrics
Generate the scrape configuration
Use mc admin prometheus generate --api-version v3 <mc admin prometheus generate --api-version>
to generate a scrape configuration for each type of metric <minio-metrics-and-alerts-available-metrics>
you want to scrape with Prometheus.
For example, the following command scrapes all version 3 audit metrics for the MinIO cluster:
mc admin prometheus generate ALIAS audit --api-version v3
Replace ALIAS <mc admin prometheus generate ALIAS>
with the alias <mc alias>
of the MinIO deployment.
The command returns output similar to the following:
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/metrics/v3
scheme: https
static_configs:
- targets: [minio.example.net]
To scrape multiple types of metrics, run mc admin prometheus generate --api-version v3 <mc admin prometheus generate --api-version>
for each type and add the job_name
section to the
scrape_configs
in your Prometheus configuration.
The following example scrapes audit and system metrics every 60 seconds:
global:
scrape_interval: 60s
scrape_configs:
- job_name: minio-job-audit
bearer_token: TOKEN
metrics_path: /minio/metrics/v3/audit
scheme: https
static_configs:
- targets: [minio.example.net]
- job_name: minio-job-system
bearer_token: TOKEN
metrics_path: /minio/metrics/v3/system
scheme: https
static_configs:
- targets: [minio.example.net]
If needed, edit the generated configuration for your environment. Common changes include:
Set an appropriate
scrape_interval
value to ensure each scraping operation completes before the next one begins. The recommended value is 60 seconds.Some deployments require a longer scrape interval due to the number of metrics being scraped. To reduce the load on your MinIO and Prometheus servers, choose the longest interval that meets your monitoring requirements.
You can specify a global
scrape_interval
for all jobs or override a global value by setting ascrape_interval
in an individualjob_name
section.Set the
job_name
to a value associated to the MinIO deployment.Use a unique value for each job to ensure isolation of the deployment metrics from any others collected by that Prometheus service.
MinIO deployments started with
MINIO_PROMETHEUS_AUTH_TYPE
set to"public"
can omit thebearer_token
field.Set the
scheme
to http for MinIO deployments not using TLS.Set the
targets
array with a hostname that resolves to the MinIO deployment.This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
k8s
For Prometheus deployments in the same cluster as the MinIO Tenant, you can specify the service DNS name for the
minio
service.For Prometheus deployments external to the cluster, you must specify an ingress or load balancer endpoint configured to route connections to and from the MinIO Tenant.
Restart Prometheus with the updated configuration
Add the desired scrape_configs
jobs to your Prometheus
configuration file and start the Prometheus cluster:
prometheus --config.file=prometheus.yaml
Analyze collected metrics
Prometheus includes an expression browser <prometheus/latest/getting_started/#using-the-expression-browser>
.
You can execute queries here to analyze the collected metrics.
The following query examples return metrics collected by Prometheus
every five minutes for a scrape job named minio-job
:
minio_system_drive_used_bytes{job-"minio-job"}[5m]
minio_system_drive_used_inodes{job-"minio-job"}[5m]
minio_cluster_usage_buckets_total_bytes{job-"minio-job"}[5m]
minio_cluster_usage_buckets_objects_count{job-"minio-job"}[5m]
minio_api_requests_total{job-"minio-job"}[5m]
minio_api_requests_errors_total{job-"minio-job"}[5m]
Configure an alert rule
To trigger alerts based on metrics, configure Alert Rules <prometheus/latest/configuration/alerting_rules/>
on the Prometheus deployment.
The following example alert provides a baseline of alerts for a MinIO deployment. You can modify or use these examples as guidance for building your own alerts.
groups:
- name: minio-alerts
rules:
- alert: NodesOffline
expr: avg_over_time(minio_cluster_health_nodes_offline_count{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Node down in MinIO deployment"
description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
- alert: DisksOffline
expr: avg_over_time(minio_system_drive_offline_count{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Disks down in MinIO deployment"
description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
In the Prometheus configuration, specify the path to the alert file
in the rule_files
key:
rule_files:
- minio-alerting.yml
Once triggered, Prometheus sends the alert to the configured AlertManager service.