mirror of
https://github.com/minio/docs.git
synced 2025-07-28 19:42:10 +03:00
Grafana and metric updates (#953)
- Adds a new page for Grafana to overview. - Replaces the list of metrics in the Metrics and Alerts page with an include to pull the list of metrics maintained in GitHub. - Removes use of the :metric: role throughout the docs. - Adds note about the introduction of a new bucket metric endpoint. Partially addresses #930 Partially addresses #931 Partially addresses #898 Closes #864 Staged: - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
This commit is contained in:
@ -864,7 +864,7 @@ services:
|
||||
|
||||
.. policy-action:: admin:Prometheus
|
||||
|
||||
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts-endpoints>`.
|
||||
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts>`.
|
||||
Only required if MinIO requires authentication for scraping metrics.
|
||||
|
||||
.. policy-action:: admin:ListBatchJobs
|
||||
|
@ -37,7 +37,7 @@ Deployment Metrics
|
||||
|
||||
MinIO provides a Prometheus-compatible endpoint for supporting time-series querying of metrics.
|
||||
|
||||
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts-endpoints>` provide a detailed metrics view through the MinIO Console.
|
||||
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts>` provide a detailed metrics view through the MinIO Console.
|
||||
|
||||
Server Logs
|
||||
-----------
|
||||
|
@ -311,9 +311,3 @@ a notification.
|
||||
:class: copyable
|
||||
|
||||
mc cp ~/data/new-object.txt ALIAS/BUCKET
|
||||
|
||||
Webhook Metrics
|
||||
---------------
|
||||
|
||||
MinIO publishes several :ref:`metrics <minio-metrics-and-alerts>` for monitoring webhook endpoints.
|
||||
See :ref:`minio-metrics-and-alerts-webhook` for a list of available metrics.
|
||||
|
@ -125,9 +125,7 @@ As the cluster or workload increases, scanner performance decreases as it yields
|
||||
|
||||
Consider regularly checking cluster metrics, capacity, and resource usage to ensure the cluster hardware is scaling alongside cluster and workload growth:
|
||||
|
||||
- :ref:`minio-metrics-and-alerts-capacity`
|
||||
- :ref:`minio-metrics-and-alerts-lifecycle-management`
|
||||
- :ref:`minio-metrics-and-alerts-scanner`
|
||||
- :ref:`minio-metrics-and-alerts`
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
@ -535,5 +535,11 @@ for display. This is intentional (For now).
|
||||
|
||||
These are nested and linked.
|
||||
|
||||
Images
|
||||
------
|
||||
|
||||
.. image:: /images/minio-console/minio-console.png
|
||||
:width: 600px
|
||||
:alt: MinIO Console Landing Page provides a view of the Object Browser for the authenticated user
|
||||
:align: center
|
||||
|
||||
|
BIN
source/images/grafana-bucket.png
Normal file
BIN
source/images/grafana-bucket.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 474 KiB |
BIN
source/images/grafana-minio.png
Normal file
BIN
source/images/grafana-minio.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 469 KiB |
@ -38,7 +38,10 @@ MinIO Pre-requisites
|
||||
- Load balancer to handle routing of requests (for example, `NGINX <https://www.nginx.com/>`__)
|
||||
|
||||
* - :octicon:`circle`
|
||||
- :ref:`Prometheus / Grafana <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
|
||||
- :ref:`Prometheus <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
|
||||
|
||||
* - :octicon:`circle`
|
||||
- :ref:`Grafana configured <minio-grafana>` for dashboards
|
||||
|
||||
* - :octicon:`circle`
|
||||
- (optional) :mc:`mc` installed on the local host system
|
||||
|
@ -70,4 +70,6 @@ See :ref:`minio-healthcheck-api` for more information.
|
||||
|
||||
/operations/monitoring/metrics-and-alerts
|
||||
/operations/monitoring/minio-logging
|
||||
/operations/monitoring/healthcheck-probe
|
||||
/operations/monitoring/healthcheck-probe
|
||||
/operations/monitoring/grafana
|
||||
|
@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
|
||||
- `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
|
||||
- `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__
|
||||
|
||||
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
|
||||
MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
|
||||
The procedure on this page documents the following:
|
||||
|
||||
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
|
||||
@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics
|
||||
|
||||
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
.. tab-set::
|
||||
|
||||
mc admin prometheus generate ALIAS
|
||||
.. tab-item:: MinIO Server
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
The following command scrapes metrics for the MinIO cluster.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
.. tab-item:: Nodes
|
||||
|
||||
The following command scrapes metrics for a nodes on the MinIO Server.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS node
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
.. tab-item:: Buckets
|
||||
|
||||
The following command scrapes metrics for buckets on the MinIO Server.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS bucket
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
The command returns output similar to the following:
|
||||
|
||||
@ -81,21 +109,44 @@ The command returns output similar to the following:
|
||||
2) Restart Prometheus with the Updated Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
|
||||
Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
.. tab-set::
|
||||
|
||||
.. tab-item:: Cluster metrics
|
||||
|
||||
For server metrics:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/cluster
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
.. tab-item:: Bucket metrics:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job-bucket
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/bucket
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/cluster
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
Start the Prometheus cluster using the configuration file:
|
||||
|
||||
@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:
|
||||
|
||||
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
|
||||
|
||||
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
|
||||
See :ref:`minio-metrics-and-alerts` for information about metrics.
|
||||
|
||||
4) Configure an Alert Rule using MinIO Metrics
|
||||
1) Configure an Alert Rule using MinIO Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
|
||||
@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
|
||||
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
|
||||
|
||||
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
|
||||
|
||||
Dashboards
|
||||
----------
|
||||
|
||||
MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
|
||||
For more information, see :ref:`minio-grafana`
|
||||
|
60
source/operations/monitoring/grafana.rst
Normal file
60
source/operations/monitoring/grafana.rst
Normal file
@ -0,0 +1,60 @@
|
||||
.. _minio-grafana:
|
||||
|
||||
===================================
|
||||
Monitor a MinIO Server with Grafana
|
||||
===================================
|
||||
|
||||
.. default-domain:: minio
|
||||
|
||||
.. contents:: Table of Contents
|
||||
:local:
|
||||
:depth: 2
|
||||
|
||||
`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored.
|
||||
Create, explore, and share dashboards with your team and foster a data driven culture.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
|
||||
- An existing MinIO deployment with network access to the Prometheus deployment
|
||||
- `Grafana installed <https://grafana.com/grafana/download>`__
|
||||
|
||||
MinIO Grafana Dashboard
|
||||
-----------------------
|
||||
|
||||
MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
|
||||
|
||||
1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
|
||||
2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
|
||||
|
||||
To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
|
||||
|
||||
.. _minio-server-grafana-metrics:
|
||||
|
||||
MinIO Server Metrics Dashboard
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
|
||||
|
||||
MinIO provides a Grafana Dashboard for MinIO Server metrics.
|
||||
For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
|
||||
|
||||
.. image:: /images/grafana-minio.png
|
||||
:width: 600px
|
||||
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
|
||||
:align: center
|
||||
|
||||
.. _minio-buckets-grafana-metrics:
|
||||
|
||||
MinIO Bucket Metrics Dashboard
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
|
||||
|
||||
Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
|
||||
|
||||
.. image:: /images/grafana-bucket.png
|
||||
:width: 600px
|
||||
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
|
||||
:align: center
|
@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.
|
||||
|
||||
The healthcheck probe alone cannot determine if a MinIO server is offline - only
|
||||
that the current host machine cannot reach the server. Consider configuring
|
||||
a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
|
||||
a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the
|
||||
``minio_cluster_nodes_offline_total`` metric to detect whether one or
|
||||
more MinIO nodes are offline.
|
||||
|
||||
Cluster Write Quorum
|
||||
@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
|
||||
processing write operations normally - only whether enough MinIO servers are
|
||||
online to meet write quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
|
||||
:ref:`alert <minio-metrics-and-alerts>` using one of the following
|
||||
metrics to detect potential issues or errors on the MinIO cluster:
|
||||
|
||||
- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
|
||||
- ``minio_cluster_nodes_offline_total`` to alert if one or more
|
||||
MinIO nodes are offline.
|
||||
|
||||
- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
|
||||
- ``minio_node_disk_free_bytes`` to alert if the cluster is running
|
||||
low on free drive space.
|
||||
|
||||
Cluster Read Quorum
|
||||
@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
|
||||
processing read operations normally - only whether enough MinIO servers are
|
||||
online to meet read quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
|
||||
:ref:`alert <minio-metrics-and-alerts>` using the
|
||||
``minio_cluster_nodes_offline_total`` metric to detect whether one or more
|
||||
MinIO nodes are offline.
|
||||
|
||||
Cluster Maintenance Check
|
||||
@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
|
||||
whether enough MinIO servers will be online after taking the node down for
|
||||
maintenance to meet read and write quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
|
||||
:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
|
||||
MinIO nodes are offline.
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
|
||||
|
||||
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
|
||||
|
||||
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
|
||||
For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.
|
||||
|
||||
#. Configure a Check
|
||||
|
||||
@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
|
||||
|
||||
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
|
||||
|
||||
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
|
||||
Set the filter for the ``minio_cluster_nodes_offline_total`` key.
|
||||
|
||||
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
|
||||
|
||||
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
|
||||
|
||||
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
|
||||
Set the filter for the ``minio_cluster_disk_offline_total`` key.
|
||||
|
||||
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.
|
||||
|
||||
|
@ -43,7 +43,7 @@ Syntax
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate TARGET
|
||||
mc admin prometheus generate TARGET TYPE
|
||||
|
||||
The command accepts the following arguments:
|
||||
|
||||
@ -52,3 +52,11 @@ Syntax
|
||||
The :mc:`alias <mc alias>` of a configured MinIO deployment for which
|
||||
the command generates a Prometheus-compatible configuration file.
|
||||
|
||||
.. mc-cmd:: TYPE
|
||||
|
||||
The type of metrics to scrape.
|
||||
|
||||
Valid values are ``cluster``, ``node``, or ``bucket``.
|
||||
|
||||
If not specified, the command returns cluster metrics.
|
||||
|
||||
|
@ -601,7 +601,7 @@ logging. See :ref:`minio-metrics-and-alerts` for more information.
|
||||
.. envvar:: MINIO_PROMETHEUS_AUTH_TYPE
|
||||
|
||||
Specifies the authentication mode for the Prometheus
|
||||
:ref:`scraping endpoints <minio-metrics-and-alerts-endpoints>`.
|
||||
:ref:`scraping endpoints <minio-metrics-and-alerts>`.
|
||||
|
||||
- ``jwt`` - *Default* MinIO requires that the scraping client specify a JWT
|
||||
token for authenticating requests. Use
|
||||
|
Reference in New Issue
Block a user