1
0
mirror of https://github.com/minio/docs.git synced 2026-01-04 02:44:36 +03:00

Grafana and metric updates (#953)

- Adds a new page for Grafana to overview.
- Replaces the list of metrics in the Metrics and Alerts page with an
include to pull the list of metrics maintained in GitHub.
- Removes use of the :metric: role throughout the docs.
- Adds note about the introduction of a new bucket metric endpoint.

Partially addresses #930
Partially addresses #931
Partially addresses #898
Closes #864

Staged:
-
http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
-
http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
This commit is contained in:
Daryl White
2023-08-17 09:01:46 -05:00
committed by GitHub
parent 1a1c340c3c
commit 20644952de
17 changed files with 634 additions and 524 deletions

View File

@@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
- `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
- `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
The procedure on this page documents the following:
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
@@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
.. code-block:: shell
:class: copyable
.. tab-set::
mc admin prometheus generate ALIAS
.. tab-item:: MinIO Server
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
The following command scrapes metrics for the MinIO cluster.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Nodes
The following command scrapes metrics for a nodes on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS node
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Buckets
The following command scrapes metrics for buckets on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS bucket
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
The command returns output similar to the following:
@@ -81,21 +109,44 @@ The command returns output similar to the following:
2) Restart Prometheus with the Updated Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:
.. code-block:: yaml
:class: copyable
.. tab-set::
.. tab-item:: Cluster metrics
For server metrics:
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
.. tab-item:: Bucket metrics:
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job-bucket
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/bucket
scheme: https
static_configs:
- targets: [minio.example.net]
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: [minio.example.net]
Start the Prometheus cluster using the configuration file:
@@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
See :ref:`minio-metrics-and-alerts` for information about metrics.
4) Configure an Alert Rule using MinIO Metrics
1) Configure an Alert Rule using MinIO Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
@@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
Dashboards
----------
MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
For more information, see :ref:`minio-grafana`

View File

@@ -0,0 +1,60 @@
.. _minio-grafana:
===================================
Monitor a MinIO Server with Grafana
===================================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 2
`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored.
Create, explore, and share dashboards with your team and foster a data driven culture.
Prerequisites
-------------
- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
- An existing MinIO deployment with network access to the Prometheus deployment
- `Grafana installed <https://grafana.com/grafana/download>`__
MinIO Grafana Dashboard
-----------------------
MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
.. _minio-server-grafana-metrics:
MinIO Server Metrics Dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
MinIO provides a Grafana Dashboard for MinIO Server metrics.
For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
.. image:: /images/grafana-minio.png
:width: 600px
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
:align: center
.. _minio-buckets-grafana-metrics:
MinIO Bucket Metrics Dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
.. image:: /images/grafana-bucket.png
:width: 600px
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
:align: center

View File

@@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.
The healthcheck probe alone cannot determine if a MinIO server is offline - only
that the current host machine cannot reach the server. Consider configuring
a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the
``minio_cluster_nodes_offline_total`` metric to detect whether one or
more MinIO nodes are offline.
Cluster Write Quorum
@@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
processing write operations normally - only whether enough MinIO servers are
online to meet write quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
:ref:`alert <minio-metrics-and-alerts>` using one of the following
metrics to detect potential issues or errors on the MinIO cluster:
- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
- ``minio_cluster_nodes_offline_total`` to alert if one or more
MinIO nodes are offline.
- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
- ``minio_node_disk_free_bytes`` to alert if the cluster is running
low on free drive space.
Cluster Read Quorum
@@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
processing read operations normally - only whether enough MinIO servers are
online to meet read quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
:ref:`alert <minio-metrics-and-alerts>` using the
``minio_cluster_nodes_offline_total`` metric to detect whether one or more
MinIO nodes are offline.
Cluster Maintenance Check
@@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
whether enough MinIO servers will be online after taking the node down for
maintenance to meet read and write quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
MinIO nodes are offline.

File diff suppressed because it is too large Load Diff

View File

@@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.
#. Configure a Check
@@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
Set the filter for the ``minio_cluster_nodes_offline_total`` key.
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
Set the filter for the ``minio_cluster_disk_offline_total`` key.
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.