Grafana and metric updates (#953)

- Adds a new page for Grafana to overview. - Replaces the list of metrics in the Metrics and Alerts page with an include to pull the list of metrics maintained in GitHub. - Removes use of the :metric: role throughout the docs. - Adds note about the introduction of a new bucket metric endpoint. Partially addresses #930 Partially addresses #931 Partially addresses #898 Closes #864 Staged: - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
2025-07-28 19:42:10 +03:00 · 2023-08-17 09:01:46 -05:00
parent 1a1c340c3c
commit 20644952de
17 changed files with 634 additions and 524 deletions
--- a/.gitignore
+++ b/.gitignore
@ -19,4 +19,5 @@ source/developers/haskell/*.md
 source/developers/java/*.md
 source/developers/javascript/*.md
 source/developers/python/*.md
+source/operations/monitoring/*.md
 *.inv
--- a/source/administration/identity-access-management/policy-based-access-control.rst
+++ b/source/administration/identity-access-management/policy-based-access-control.rst
@ -864,7 +864,7 @@ services:

 .. policy-action:: admin:Prometheus

-   Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts-endpoints>`. 
+   Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts>`. 
   Only required if MinIO requires authentication for scraping metrics.

 .. policy-action:: admin:ListBatchJobs
--- a/source/administration/monitoring.rst
+++ b/source/administration/monitoring.rst
@ -37,7 +37,7 @@ Deployment Metrics

 MinIO provides a Prometheus-compatible endpoint for supporting time-series querying of metrics.

-MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts-endpoints>` provide a detailed metrics view through the MinIO Console.
+MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts>` provide a detailed metrics view through the MinIO Console.

 Server Logs
 -----------
--- a/source/administration/monitoring/publish-events-to-webhook.rst
+++ b/source/administration/monitoring/publish-events-to-webhook.rst
@ -311,9 +311,3 @@ a notification.
   :class: copyable

   mc cp ~/data/new-object.txt ALIAS/BUCKET
-
-Webhook Metrics
---------------
-
-MinIO publishes several :ref:`metrics <minio-metrics-and-alerts>` for monitoring webhook endpoints.
-See :ref:`minio-metrics-and-alerts-webhook` for a list of available metrics.
--- a/source/administration/object-management/object-lifecycle-management.rst
+++ b/source/administration/object-management/object-lifecycle-management.rst
@ -125,9 +125,7 @@ As the cluster or workload increases, scanner performance decreases as it yields

 Consider regularly checking cluster metrics, capacity, and resource usage to ensure the cluster hardware is scaling alongside cluster and workload growth:

- :ref:`minio-metrics-and-alerts-capacity`
- :ref:`minio-metrics-and-alerts-lifecycle-management`
- :ref:`minio-metrics-and-alerts-scanner`
+- :ref:`minio-metrics-and-alerts`

 .. toctree::
   :hidden:
--- a/source/design.rst
+++ b/source/design.rst
@ -535,5 +535,11 @@ for display. This is intentional (For now).

      These are nested and linked.

+Images
+------

+.. image:: /images/minio-console/minio-console.png
+   :width: 600px
+   :alt: MinIO Console Landing Page provides a view of the Object Browser for the authenticated user
+   :align: center

--- a/source/images/grafana-bucket.png
+++ b/source/images/grafana-bucket.png
--- a/source/images/grafana-minio.png
+++ b/source/images/grafana-minio.png
--- a/source/operations/checklists/software.rst
+++ b/source/operations/checklists/software.rst
@ -38,7 +38,10 @@ MinIO Pre-requisites
     - Load balancer to handle routing of requests (for example, `NGINX <https://www.nginx.com/>`__)

   * - :octicon:`circle`
-     - :ref:`Prometheus / Grafana <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
+     - :ref:`Prometheus <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
+
+   * - :octicon:`circle`
+     - :ref:`Grafana configured <minio-grafana>` for dashboards 

   * - :octicon:`circle` 
     - (optional) :mc:`mc` installed on the local host system
--- a/source/operations/monitoring.rst
+++ b/source/operations/monitoring.rst
@ -70,4 +70,6 @@ See :ref:`minio-healthcheck-api` for more information.

   /operations/monitoring/metrics-and-alerts
   /operations/monitoring/minio-logging
-   /operations/monitoring/healthcheck-probe
+   /operations/monitoring/healthcheck-probe
+   /operations/monitoring/grafana
+   
--- a/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
+++ b/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
   - `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
   - `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__

-MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
+MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
 The procedure on this page documents the following:

 - Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics

 Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:

-.. code-block:: shell
-   :class: copyable
+.. tab-set::

-   mc admin prometheus generate ALIAS
+   .. tab-item:: MinIO Server

-Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+      The following command scrapes metrics for the MinIO cluster.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+
+   .. tab-item:: Nodes
+
+      The following command scrapes metrics for a nodes on the MinIO Server.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS node
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
+
+   .. tab-item:: Buckets
+
+      The following command scrapes metrics for buckets on the MinIO Server.
+
+      .. code-block:: shell
+         :class: copyable
+      
+         mc admin prometheus generate ALIAS bucket
+
+      Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.

 The command returns output similar to the following:

@ -81,21 +109,44 @@ The command returns output similar to the following:
 2) Restart Prometheus with the Updated Configuration
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Append the ``scrape_configs`` job generated in the previous step to the configuration file:
+Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:

-.. code-block:: yaml
-   :class: copyable
+.. tab-set::
+
+   .. tab-item:: Cluster metrics
+
+      For server metrics:
+      
+      .. code-block:: yaml
+         :class: copyable
+      
+         global:
+            scrape_interval: 15s
+         
+         scrape_configs:
+            - job_name: minio-job
+              bearer_token: TOKEN
+              metrics_path: /minio/v2/metrics/cluster
+              scheme: https
+              static_configs:
+              - targets: [minio.example.net]
+
+   .. tab-item:: Bucket metrics:
+
+      .. code-block:: yaml
+         :class: copyable
+      
+         global:
+            scrape_interval: 15s
+         
+         scrape_configs:
+            - job_name: minio-job-bucket
+              bearer_token: TOKEN
+              metrics_path: /minio/v2/metrics/bucket
+              scheme: https
+              static_configs:
+              - targets: [minio.example.net]

-   global:
-      scrape_interval: 15s
-   
-   scrape_configs:
-      - job_name: minio-job
-        bearer_token: TOKEN
-        metrics_path: /minio/v2/metrics/cluster
-        scheme: https
-        static_configs:
-        - targets: [minio.example.net]

 Start the Prometheus cluster using the configuration file:

@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:

   minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]

-See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
+See :ref:`minio-metrics-and-alerts` for information about metrics.

-4) Configure an Alert Rule using MinIO Metrics
+1) Configure an Alert Rule using MinIO Metrics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

 You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
 - Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics

 Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
+
+Dashboards
+----------
+
+MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
+For more information, see :ref:`minio-grafana`
--- a/source/operations/monitoring/grafana.rst
+++ b/source/operations/monitoring/grafana.rst
@ -0,0 +1,60 @@
+.. _minio-grafana:
+
+===================================
+Monitor a MinIO Server with Grafana 
+===================================
+
+.. default-domain:: minio
+
+.. contents:: Table of Contents
+   :local:
+   :depth: 2
+   
+`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored. 
+Create, explore, and share dashboards with your team and foster a data driven culture.
+
+Prerequisites
+-------------
+
+- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
+- An existing MinIO deployment with network access to the Prometheus deployment
+- `Grafana installed <https://grafana.com/grafana/download>`__
+
+MinIO Grafana Dashboard
+-----------------------
+
+MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
+
+1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
+2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
+
+To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
+
+.. _minio-server-grafana-metrics:
+
+MinIO Server Metrics Dashboard
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
+
+MinIO provides a Grafana Dashboard for MinIO Server metrics.
+For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
+
+.. image:: /images/grafana-minio.png
+   :width: 600px
+   :alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
+   :align: center
+
+.. _minio-buckets-grafana-metrics:
+
+MinIO Bucket Metrics Dashboard
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
+
+Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
+
+.. image:: /images/grafana-bucket.png
+   :width: 600px
+   :alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
+   :align: center
--- a/source/operations/monitoring/healthcheck-probe.rst
+++ b/source/operations/monitoring/healthcheck-probe.rst
@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.

 The healthcheck probe alone cannot determine if a MinIO server is offline - only
 that the current host machine cannot reach the server. Consider configuring
-a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the 
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
+a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the 
+``minio_cluster_nodes_offline_total`` metric to detect whether one or
 more MinIO nodes are offline.

 Cluster Write Quorum
@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
 processing write operations normally - only whether enough MinIO servers are
 online to meet write quorum  requirements based on the configured 
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
+:ref:`alert <minio-metrics-and-alerts>` using one of the following
 metrics to detect potential issues or errors on the MinIO cluster:

- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
+- ``minio_cluster_nodes_offline_total`` to alert if one or more
  MinIO nodes are offline.

- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
+- ``minio_node_disk_free_bytes`` to alert if the cluster is running
  low on free drive space.

 Cluster Read Quorum
@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
 processing read operations normally - only whether enough MinIO servers are
 online to meet read quorum requirements based on the configured 
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using the
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
+:ref:`alert <minio-metrics-and-alerts>` using the
+``minio_cluster_nodes_offline_total`` metric to detect whether one or more
 MinIO nodes are offline.

 Cluster Maintenance Check
@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
 whether enough MinIO servers will be online after taking the node down for
 maintenance to meet read and write quorum requirements based on the configured
 :ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
-:ref:`alert <minio-metrics-and-alerts-alerting>` using the
-:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
+:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
 MinIO nodes are offline.
--- a/source/operations/monitoring/metrics-and-alerts.rst
+++ b/source/operations/monitoring/metrics-and-alerts.rst
--- a/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
+++ b/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics

      Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.

-      For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
+      For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.

   #. Configure a Check

@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics

      - Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``. 
      
-        Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
+        Set the filter for the ``minio_cluster_nodes_offline_total`` key.
        
        Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`

      - Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.

-        Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
+        Set the filter for the ``minio_cluster_disk_offline_total`` key.

        Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.

--- a/source/reference/minio-mc-admin/mc-admin-prometheus.rst
+++ b/source/reference/minio-mc-admin/mc-admin-prometheus.rst
@ -43,7 +43,7 @@ Syntax
   .. code-block:: shell
      :class: copyable

-      mc admin prometheus generate TARGET
+      mc admin prometheus generate TARGET TYPE

   The command accepts the following arguments:

@ -52,3 +52,11 @@ Syntax
      The :mc:`alias <mc alias>` of a configured MinIO deployment for which
      the command generates a Prometheus-compatible configuration file.

+   .. mc-cmd:: TYPE
+
+      The type of metrics to scrape.
+
+      Valid values are ``cluster``, ``node``, or ``bucket``.
+
+      If not specified, the command returns cluster metrics.
+
--- a/source/reference/minio-server/minio-server.rst
+++ b/source/reference/minio-server/minio-server.rst
@ -601,7 +601,7 @@ logging. See :ref:`minio-metrics-and-alerts` for more information.
 .. envvar:: MINIO_PROMETHEUS_AUTH_TYPE

   Specifies the authentication mode for the Prometheus
-   :ref:`scraping endpoints <minio-metrics-and-alerts-endpoints>`.
+   :ref:`scraping endpoints <minio-metrics-and-alerts>`.

   - ``jwt`` - *Default* MinIO requires that the scraping client specify a JWT
     token for authenticating requests. Use