1
0
mirror of https://github.com/minio/docs.git synced 2025-08-05 03:41:24 +03:00
Files
docs/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
Daryl White 54584b290c Adds recommended metrics to Prometheus procedure (#1147)
Partially addresses #1135

To consider:
I added the tabs as part of step 3 of the procedure, but we might want
to consider having a recommended alerts section separate from the
procedure, perhaps above the "Dashboards" heading. Let me know your
thoughts.
2024-03-08 12:29:40 -05:00

334 lines
11 KiB
ReStructuredText

.. _minio-metrics-collect-using-prometheus:
========================================
Monitoring and Alerting using Prometheus
========================================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 1
.. container:: extlinks-video
- `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
- `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__
MinIO publishes cluster, node, bucket, and resource metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
The procedure on this page documents the following:
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
- Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action
.. admonition:: Prerequisites
:class: note
This procedure requires the following:
- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
- An existing MinIO deployment with network access to the Prometheus deployment
- An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
Configure Prometheus to Collect and Alert using MinIO Metrics
-------------------------------------------------------------
1) Generate the Scrape Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
.. tab-set::
.. tab-item:: MinIO Server
The following command scrapes metrics for the MinIO cluster.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Nodes
The following command scrapes metrics for a node on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS node
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Buckets
The following command scrapes metrics for buckets on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS bucket
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Resources
.. versionadded:: RELEASE.2023-10-07T15-07-38Z
The following command scrapes metrics for resources on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS resource
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
The command returns output similar to the following:
.. code-block:: yaml
:class: copyable
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
- Set the ``job_name`` to a value associated to the MinIO deployment.
Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.
- MinIO deployments started with :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"`` can omit the ``bearer_token`` field.
- Set the ``scheme`` to http for MinIO deployments not using TLS.
- Set the ``targets`` array with a hostname that resolves to the MinIO deployment.
This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
.. cond:: k8s
For Prometheus deployments in the same cluster as the MinIO Tenant, you can specify the service DNS name for the ``minio`` service.
For Prometheus deployments external to the cluster, you must specify an ingress or load balancer endpoint configured to route connections to and from the MinIO Tenant.
2) Restart Prometheus with the Updated Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:
.. tab-set::
.. tab-item:: Cluster
Cluster metrics aggregate node-level metrics and, where appropriate, attach labels to metrics for the originating node.
If you are already collecting ``cluster`` metrics, you do not need to add an additional ``scrape_configs`` job for ``node``.
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
.. tab-item:: Bucket
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job-bucket
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/bucket
scheme: https
static_configs:
- targets: [minio.example.net]
.. tab-item:: Resource
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job-resource
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/resource
scheme: https
static_configs:
- targets: [minio.example.net]
Start the Prometheus cluster using the configuration file:
.. code-block:: shell
:class: copyable
prometheus --config.file=prometheus.yaml
3) Analyze Collected Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prometheus includes an :prometheus-docs:`expression browser <prometheus/latest/getting_started/#using-the-expression-browser>`.
You can execute queries here to analyze the collected metrics.
.. tab-set::
.. tab-item:: Examples
The following query examples return metrics collected by Prometheus every five minutes for a scrape job named ``minio-job``:
.. code-block:: shell
:class: copyable
minio_node_drive_free_bytes{job-"minio-job"}[5m]
minio_node_drive_free_inodes{job-"minio-job"}[5m]
minio_node_drive_latency_us{job-"minio-job"}[5m]
minio_node_drive_offline_total{job-"minio-job"}[5m]
minio_node_drive_online_total{job-"minio-job"}[5m]
minio_node_drive_total{job-"minio-job"}[5m]
minio_node_drive_total_bytes{job-"minio-job"}[5m]
minio_node_drive_used_bytes{job-"minio-job"}[5m]
minio_node_drive_errors_timeout{job-"minio-job"}[5m]
minio_node_drive_errors_availability{job-"minio-job"}[5m]
minio_node_drive_io_waiting{job-"minio-job"}[5m]
.. tab-item:: Recommended Metrics
MinIO recommends the following as a basic set of metrics to monitor.
See :ref:`minio-metrics-and-alerts` for information about all available metrics.
.. list-table::
:header-rows: 1
:widths: 40 60
:width: 100%
* - Metric
- Description
* - ``minio_node_drive_free_bytes``
- Total storage available on a drive.
* - ``minio_node_drive_free_inodes``
- Total free inodes.
* - ``minio_node_drive_latency_us``
- Average last minute latency in µs for drive API storage operations.
* - ``minio_node_drive_offline_total``
- Total drives offline in this node.
* - ``minio_node_drive_online_total``
- Total drives online in this node.
* - ``minio_node_drive_total``
- Total drives in this node.
* - ``minio_node_drive_total_bytes``
- Total storage on a drive.
* - ``minio_node_drive_used_bytes``
- Total storage used on a drive.
* - ``minio_node_drive_errors_timeout``
- Total number of drive timeout errors since server start.
* - ``minio_node_drive_errors_availability``
- Total number of drive I/O errors, permission denied and timeouts since server start.
* - ``minio_node_drive_io_waiting``
- Total number of I/O operations waiting on drive.
4) Configure an Alert Rule using MinIO Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
The following example alert rule files provide a baseline of alerts for a MinIO deployment.
You can modify or otherwise use these examples as guidance in building your own alerts.
.. code-block:: yaml
:class: copyable
groups:
- name: minio-alerts
rules:
- alert: NodesOffline
expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Node down in MinIO deployment"
description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
- alert: DisksOffline
expr: avg_over_time(minio_cluster_drive_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Disks down in MinIO deployment"
description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
Specify the path to the alert file to the Prometheus configuration as part of the ``rule_files`` key:
.. code-block:: yaml
global:
scrape_interval: 5s
rule_files:
- minio-alerting.yml
Once triggered, Prometheus sends the alert to the configured AlertManager service.
5) (Optional) Configure MinIO Console to Query Prometheus
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
.. image:: /images/minio-console/console-metrics.png
:width: 600px
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
:align: center
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
Dashboards
----------
MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
For more information, see :ref:`minio-grafana`