6.7 KiB
Monitoring and Alerting using Prometheus
minio
Table of Contents
MinIO publishes cluster and node metrics using the Prometheus Data Model <data_model/>
.
The procedure on this page documents the following:
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
- Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action
Prerequisites
This procedure requires the following:
- An existing Prometheus deployment with backing
Alert Manager <alerting/latest/overview/>
- An existing MinIO deployment with network access to the Prometheus deployment
- An
mc
installation on your local host configured toaccess <alias>
the MinIO deployment
k8s
The MinIO Operator supports deploying a per-tenant Prometheus instance <create-tenant-configure-section>
configured to support metrics and visualizations. This includes
automatically configuring the Tenant to enable the Tenant Console historical metric view <minio-console-metrics>
.
You can still use this procedure to configure an external Prometheus
service for supporting monitoring and alerting for a MinIO Tenant. You
must configure all necessary network control components, such as Ingress
or a Load Balancer, to facilitate access between the Tenant and the
Prometheus service. This procedure assumes your local host machine can
access the Tenant via mc
.
Configure Prometheus to Collect and Alert using MinIO Metrics
1) Generate the Scrape Configuration
Use the mc admin prometheus generate
command to generate
the scrape configuration for use by Prometheus in making scraping
requests:
mc admin prometheus generate ALIAS
Replace ALIAS <mc admin prometheus generate TARGET>
with the alias <mc alias>
of the MinIO deployment.
The command returns output similar to the following:
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
Set the
job_name
to a value associated to the MinIO deployment.Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.
MinIO deployments started with
MINIO_PROMETHEUS_AUTH_TYPE
set to"public"
can omit thebearer_token
field.Set the
scheme
to http for MinIO deployments not using TLS.Set the
targets
array with a hostname that resolves to the MinIO deployment.This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
2) Restart Prometheus with the Updated Configuration
Append the scrape_configs
job generated in the previous
step to the configuration file:
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
Start the Prometheus cluster using the configuration file:
prometheus --config.file=prometheus.yaml
3) Analyze Collected Metrics
Prometheus includes a expression browser <prometheus/latest/getting_started/#using-the-expression-browser>
.
You can execute queries here to analyze the collected metrics.
The following query examples return metrics collected by Prometheus:
minio_cluster_disk_online_total{job="minio-job"}[5m]
minio_cluster_disk_offline_total{job="minio-job"}[5m]
minio_bucket_usage_object_total{job="minio-job"}[5m]
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
See minio-metrics-and-alerts-available-metrics
for a
complete list of published metrics.
4) Configure an Alert Rule using MinIO Metrics
You must configure Alert Rules <prometheus/latest/configuration/alerting_rules/>
on the Prometheus deployment to trigger alerts based on collected MinIO
metrics.
The following example alert rule files provide a baseline of alerts for a MinIO deployment. You can modify or otherwise use these examples as guidance in building your own alerts.
groups:
- name: minio-alerts
rules:
- alert: NodesOffline
expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Node down in MinIO deployment"
description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
- alert: DisksOffline
expr: avg_over_time(minio_cluster_disk_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Disks down in MinIO deployment"
description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
Specify the path to the alert file to the Prometheus configuration as
part of the rule_files
key:
global:
scrape_interval: 5s
rule_files:
- minio-alerting.yml
Once triggered, Prometheus sends the alert to the configured AlertManager service.
5) (Optional) Configure MinIO Console to Query Prometheus
The Console also supports displaying time-series and historical data
by querying a Prometheus <prometheus/latest/getting_started/>
service configured to scrape data from the MinIO deployment.
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
- Set
MINIO_PROMETHEUS_URL
to the URL of the Prometheus service - Set
MINIO_PROMETHEUS_JOB_ID
to the unique job ID assigned to the collected metrics
Restart the MinIO deployment and visit the Monitoring <minio-console-monitoring>
pane to
see the historical data views.