Grafana and metric updates (#953)

- Adds a new page for Grafana to overview. - Replaces the list of metrics in the Metrics and Alerts page with an include to pull the list of metrics maintained in GitHub. - Removes use of the :metric: role throughout the docs. - Adds note about the introduction of a new bucket metric endpoint. Partially addresses #930 Partially addresses #931 Partially addresses #898 Closes #864 Staged: - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
2026-01-04 02:44:36 +03:00 · 2023-08-17 09:01:46 -05:00
parent 1a1c340c3c
commit 20644952de
17 changed files with 634 additions and 524 deletions
--- a/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
+++ b/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
@@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
   - `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
   - `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__

-MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
+MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
 The procedure on this page documents the following:

 - Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
@@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics

 Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:

-.. code-block:: shell
-   :class: copyable
+.. tab-set::

-   mc admin prometheus generate ALIAS
+   .. tab-item:: MinIO Server

-Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+      The following command scrapes metrics for the MinIO cluster.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+
+   .. tab-item:: Nodes
+
+      The following command scrapes metrics for a nodes on the MinIO Server.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS node
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+
+   .. tab-item:: Buckets
+
+      The following command scrapes metrics for buckets on the MinIO Server.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS bucket
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.

 The command returns output similar to the following:

@@ -81,21 +109,44 @@ The command returns output similar to the following:
 2) Restart Prometheus with the Updated Configuration
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Append the ``scrape_configs`` job generated in the previous step to the configuration file:
+Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:

-.. code-block:: yaml
-   :class: copyable
+.. tab-set::
+
+   .. tab-item:: Cluster metrics
+
+      For server metrics:
+      
+      .. code-block:: yaml
+         :class: copyable
+      
+         global:
+            scrape_interval: 15s
+         
+         scrape_configs:
+            - job_name: minio-job
+              bearer_token: TOKEN
+              metrics_path: /minio/v2/metrics/cluster
+              scheme: https
+              static_configs:
+              - targets: [minio.example.net]
+
+   .. tab-item:: Bucket metrics:
+
+      .. code-block:: yaml
+         :class: copyable
+      
+         global:
+            scrape_interval: 15s
+         
+         scrape_configs:
+            - job_name: minio-job-bucket
+              bearer_token: TOKEN
+              metrics_path: /minio/v2/metrics/bucket
+              scheme: https
+              static_configs:
+              - targets: [minio.example.net]

-   global:
-      scrape_interval: 15s
-   
-   scrape_configs:
-      - job_name: minio-job
-        bearer_token: TOKEN
-        metrics_path: /minio/v2/metrics/cluster
-        scheme: https
-        static_configs:
-        - targets: [minio.example.net]

 Start the Prometheus cluster using the configuration file:

@@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:

   minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]

-See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
+See :ref:`minio-metrics-and-alerts` for information about metrics.

-4) Configure an Alert Rule using MinIO Metrics
+1) Configure an Alert Rule using MinIO Metrics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
@@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
 - Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics

 Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
+
+Dashboards
+----------
+
+MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
+For more information, see :ref:`minio-grafana`
--- a/source/operations/monitoring/grafana.rst
+++ b/source/operations/monitoring/grafana.rst
@@ -0,0 +1,60 @@
+.. _minio-grafana:
+
+===================================
+Monitor a MinIO Server with Grafana 
+===================================
+
+.. default-domain:: minio
+
+.. contents:: Table of Contents
+   :local:
+   :depth: 2
+   
+`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored. 
+Create, explore, and share dashboards with your team and foster a data driven culture.
+
+Prerequisites
+-------------
+
+- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
+- An existing MinIO deployment with network access to the Prometheus deployment
+- `Grafana installed <https://grafana.com/grafana/download>`__
+
+MinIO Grafana Dashboard
+-----------------------
+
+MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
+
+1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
+2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
+
+To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
+
+.. _minio-server-grafana-metrics:
+
+MinIO Server Metrics Dashboard
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
+
+MinIO provides a Grafana Dashboard for MinIO Server metrics.
+For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
+
+.. image:: /images/grafana-minio.png
+   :width: 600px
+   :alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
+   :align: center
+
+.. _minio-buckets-grafana-metrics:
+
+MinIO Bucket Metrics Dashboard
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
+
+Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
+
+.. image:: /images/grafana-bucket.png
+   :width: 600px
+   :alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
+   :align: center
--- a/source/operations/monitoring/healthcheck-probe.rst
+++ b/source/operations/monitoring/healthcheck-probe.rst
@@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.

 The healthcheck probe alone cannot determine if a MinIO server is offline - only
 that the current host machine cannot reach the server. Consider configuring
-a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the 
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
+a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the 
+``minio_cluster_nodes_offline_total`` metric to detect whether one or
 more MinIO nodes are offline.

 Cluster Write Quorum
@@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
 processing write operations normally - only whether enough MinIO servers are
 online to meet write quorum  requirements based on the configured 
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
+:ref:`alert <minio-metrics-and-alerts>` using one of the following
 metrics to detect potential issues or errors on the MinIO cluster:

- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
+- ``minio_cluster_nodes_offline_total`` to alert if one or more
  MinIO nodes are offline.

- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
+- ``minio_node_disk_free_bytes`` to alert if the cluster is running
  low on free drive space.

 Cluster Read Quorum
@@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
 processing read operations normally - only whether enough MinIO servers are
 online to meet read quorum requirements based on the configured 
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using the
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
+:ref:`alert <minio-metrics-and-alerts>` using the
+``minio_cluster_nodes_offline_total`` metric to detect whether one or more
 MinIO nodes are offline.

 Cluster Maintenance Check
@@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
 whether enough MinIO servers will be online after taking the node down for
 maintenance to meet read and write quorum requirements based on the configured
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using the
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
+:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
 MinIO nodes are offline.
--- a/source/operations/monitoring/metrics-and-alerts.rst
+++ b/source/operations/monitoring/metrics-and-alerts.rst
--- a/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
+++ b/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
@@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics

      Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.

-      For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
+      For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.

   #. Configure a Check

@@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics

      - Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``. 
      
-        Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
+        Set the filter for the ``minio_cluster_nodes_offline_total`` key.
        
        Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`

      - Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.

-        Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
+        Set the filter for the ``minio_cluster_disk_offline_total`` key.

        Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.