mirror of
https://github.com/minio/docs.git
synced 2025-07-28 19:42:10 +03:00
Major overhaul for Monitoring docs: Part 1
This commit is contained in:
committed by
Ravind Kumar
parent
1735b77d8f
commit
4e4cc97f45
@ -283,6 +283,8 @@ Some subsections may not be visible if the authenticated user does not have the
|
|||||||
|
|
||||||
Use the :guilabel:`Users` and :guilabel:`Groups` views to assign a created policy to users and groups, respectively.
|
Use the :guilabel:`Users` and :guilabel:`Groups` views to assign a created policy to users and groups, respectively.
|
||||||
|
|
||||||
|
.. _minio-console-monitoring:
|
||||||
|
|
||||||
Monitoring
|
Monitoring
|
||||||
----------
|
----------
|
||||||
|
|
||||||
@ -295,25 +297,23 @@ Some subsections may not be visible if the authenticated user does not have the
|
|||||||
|
|
||||||
.. tab-item:: Metrics
|
.. tab-item:: Metrics
|
||||||
|
|
||||||
.. image:: /images/minio-console/console-metrics.png
|
.. image:: /images/minio-console/console-metrics-simple.png
|
||||||
:width: 600px
|
:width: 600px
|
||||||
:alt: MinIO Console Metrics displaying detailed data using Prometheus
|
:alt: MinIO Console Metrics displaying point-in-time data
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
The Console :guilabel:`Dashboard` section displays metrics for the MinIO deployment.
|
The Console :guilabel:`Dashboard` section displays metrics for the MinIO deployment.
|
||||||
|
The default view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.
|
||||||
The Console depends on a :ref:`configured Prometheus service <minio-metrics-collect-using-prometheus>` to generate the detailed metrics shown above.
|
|
||||||
|
|
||||||
The default metrics view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.
|
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
|
||||||
|
Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display historical metrics:
|
||||||
|
|
||||||
.. image:: /images/minio-console/console-metrics-simple.png
|
.. image:: /images/minio-console/console-metrics.png
|
||||||
:width: 600px
|
:width: 600px
|
||||||
:alt: MinIO Console Metrics displaying simplified data
|
:alt: MinIO Console Metrics displaying simplified data
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
This view requires configuring a Prometheus service to scrape the deployment metrics.
|
See :ref:`minio-console-metrics` for more information on the historical metric visualization.
|
||||||
You can download these metrics as a ``.png`` image or ``.csv`` file.
|
|
||||||
See :ref:`minio-metrics-collect-using-prometheus` for complete instructions.
|
|
||||||
|
|
||||||
.. tab-item:: Logs
|
.. tab-item:: Logs
|
||||||
|
|
||||||
|
@ -79,6 +79,7 @@ extlinks = {
|
|||||||
'podman-git' : ('https://github.com/containers/podman/%s',''),
|
'podman-git' : ('https://github.com/containers/podman/%s',''),
|
||||||
'docker-docs' : ('https://docs.docker.com/%s', ''),
|
'docker-docs' : ('https://docs.docker.com/%s', ''),
|
||||||
'openshift-docs' : ('https://docs.openshift.com/container-platform/4.11/%s', ''),
|
'openshift-docs' : ('https://docs.openshift.com/container-platform/4.11/%s', ''),
|
||||||
|
'influxdb-docs' : ('https://docs.influxdata.com/influxdb/v2.4/%s',''),
|
||||||
|
|
||||||
}
|
}
|
||||||
|
|
||||||
|
Binary file not shown.
Before Width: | Height: | Size: 225 KiB After Width: | Height: | Size: 126 KiB |
Binary file not shown.
Before Width: | Height: | Size: 197 KiB After Width: | Height: | Size: 196 KiB |
@ -1,5 +1,5 @@
|
|||||||
=====================
|
=====================
|
||||||
Prometheus Monitoring
|
Monitoring and Alerts
|
||||||
=====================
|
=====================
|
||||||
|
|
||||||
.. default-domain:: minio
|
.. default-domain:: minio
|
||||||
@ -12,22 +12,27 @@ Metrics and Alerts
|
|||||||
------------------
|
------------------
|
||||||
|
|
||||||
MinIO provides point-in-time metrics on cluster status and operations.
|
MinIO provides point-in-time metrics on cluster status and operations.
|
||||||
MinIO publishes collected metrics data using Prometheus-compatible data structures.
|
The :ref:`MinIO Console <minio-console-metrics>` provides a graphical display of these metrics.
|
||||||
|
|
||||||
For alerts, time-series metric data, or additional metrics, MinIO can leverage `Prometheus <https://prometheus.io/>`__.
|
For historical metrics and analytics, MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
|
||||||
Prometheus is an Open Source systems and service monitoring system which supports analyzing and alerting based on collected metrics.
|
You can use any scraping tool which supports that data model to pull metrics data from MinIO for further analysis and alerting.
|
||||||
The Prometheus ecosystem includes multiple :prometheus-docs:`integrations <operating/integrations/>`, allowing wide latitude in processing and storing collected metrics.
|
|
||||||
|
|
||||||
- MinIO publishes Prometheus-compatible scraping endpoints for cluster and node-level metrics.
|
The following table lists tutorials for integrating MinIO metrics with select third-party monitoring software.
|
||||||
Any Prometheus-compatible scraping software can ingest and process MinIO metrics for analysis, visualization, and alerting.
|
|
||||||
See :ref:`minio-metrics-and-alerts-endpoints` for more information.
|
|
||||||
|
|
||||||
- For alerts, use Prometheus :prometheus-docs:`Alerting Rules <prometheus/latest/configuration/alerting_rules/>` and the
|
.. list-table::
|
||||||
:prometheus-docs:`Alert Manager <alerting/latest/overview/>` to trigger alerts based on collected metrics.
|
:stub-columns: 1
|
||||||
See :ref:`minio-metrics-and-alerts-alerting` for more information.
|
:widths: 30 70
|
||||||
|
:width: 100%
|
||||||
|
|
||||||
When configured, the :ref:`MinIO Console <minio-console-metrics>` shows some metrics in the :guilabel:`Monitoring > Metrics` page.
|
* - :ref:`minio-metrics-collect-using-prometheus`
|
||||||
You can download these metrics as either ``.png`` images or ``.csv`` files.
|
- Configure Prometheus to Monitor and Alert for a MinIO deployment
|
||||||
|
|
||||||
|
Configure MinIO to query the Prometheus deployment to enable historical metrics via the MinIO Console
|
||||||
|
|
||||||
|
* - :ref:`minio-metrics-influxdb`
|
||||||
|
- Configure InfluxDB to Monitor and Alert for a MinIO deployment.
|
||||||
|
|
||||||
|
Other metrics and analytics software suites which support the Prometheus data model may work regardless of their inclusion on the above list.
|
||||||
|
|
||||||
Logging
|
Logging
|
||||||
-------
|
-------
|
||||||
@ -58,6 +63,6 @@ See :ref:`minio-healthcheck-api` for more information.
|
|||||||
:titlesonly:
|
:titlesonly:
|
||||||
:hidden:
|
:hidden:
|
||||||
|
|
||||||
/operations/monitoring/collect-minio-metrics-using-prometheus
|
/operations/monitoring/metrics-and-alerts
|
||||||
/operations/monitoring/minio-logging
|
/operations/monitoring/minio-logging
|
||||||
/operations/monitoring/healthcheck-probe
|
/operations/monitoring/healthcheck-probe
|
@ -1,9 +1,8 @@
|
|||||||
.. _minio-metrics-collect-using-prometheus:
|
.. _minio-metrics-collect-using-prometheus:
|
||||||
.. _minio-metrics-and-alerts:
|
|
||||||
|
|
||||||
======================================
|
========================================
|
||||||
Collect MinIO Metrics Using Prometheus
|
Monitoring and Alerting using Prometheus
|
||||||
======================================
|
========================================
|
||||||
|
|
||||||
.. default-domain:: minio
|
.. default-domain:: minio
|
||||||
|
|
||||||
@ -11,60 +10,46 @@ Collect MinIO Metrics Using Prometheus
|
|||||||
:local:
|
:local:
|
||||||
:depth: 1
|
:depth: 1
|
||||||
|
|
||||||
MinIO leverages `Prometheus <https://prometheus.io/>`__ for metrics and alerts.
|
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
|
||||||
MinIO publishes Prometheus-compatible scraping endpoints for cluster and
|
The procedure on this page documents the following:
|
||||||
node-level metrics. See :ref:`minio-metrics-and-alerts-endpoints` for more
|
|
||||||
information.
|
|
||||||
|
|
||||||
The procedure on this page documents scraping the MinIO metrics
|
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
|
||||||
endpoints using a Prometheus instance, including deploying and configuring
|
- Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action
|
||||||
a simple Prometheus server for collecting metrics.
|
|
||||||
|
|
||||||
This procedure is not a replacement for the official
|
.. admonition:: Prerequisites
|
||||||
:prometheus-docs:`Prometheus Documentation <>`. Any specific guidance
|
:class: note
|
||||||
related to configuring, deploying, and using Prometheus is made on a best-effort
|
|
||||||
basis.
|
|
||||||
|
|
||||||
Requirements
|
This procedure requires the following:
|
||||||
------------
|
|
||||||
|
|
||||||
Install and Configure ``mc`` with Access to the MinIO Cluster
|
- An existing Prometheus deployment with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This procedure uses :mc:`mc` for performing operations on the MinIO
|
- An existing MinIO deployment with network access to the Prometheus deployment
|
||||||
deployment. Install ``mc`` on a machine with network access to the
|
|
||||||
deployment. See the ``mc`` :ref:`Installation Quickstart <mc-install>` for
|
|
||||||
more complete instructions.
|
|
||||||
|
|
||||||
Prometheus Service
|
- An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
|
||||||
~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
This procedure provides instruction for deploying Prometheus for rapid local
|
.. cond:: k8s
|
||||||
evaluation and development. All other environments should have an existing
|
|
||||||
Prometheus or Prometheus-compatible service with access to the MinIO cluster.
|
|
||||||
|
|
||||||
Procedure
|
The MinIO Operator supports deploying a :ref:`per-tenant Prometheus instance <create-tenant-configure-section>` configured to support metrics and visualizations.
|
||||||
---------
|
This includes automatically configuring the Tenant to enable the :ref:`Tenant Console historical metric view <minio-console-metrics>`.
|
||||||
|
|
||||||
1) Generate the Bearer Token
|
You can still use this procedure to configure an external Prometheus service for supporting monitoring and alerting for a MinIO Tenant.
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
You must configure all necessary network control components, such as Ingress or a Load Balancer, to facilitate access between the Tenant and the Prometheus service.
|
||||||
|
This procedure assumes your local host machine can access the Tenant via :mc:`mc`.
|
||||||
|
|
||||||
MinIO by default requires authentication for requests made to the metrics
|
Configure Prometheus to Collect and Alert using MinIO Metrics
|
||||||
endpoints. While this step is not required for MinIO deployments started with
|
-------------------------------------------------------------
|
||||||
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"``, you can still use the
|
|
||||||
command output for retrieving a Prometheus ``scrape_configs`` entry.
|
|
||||||
|
|
||||||
Use the :mc-cmd:`mc admin prometheus generate` command to generate a
|
1) Generate the Scrape Configuration
|
||||||
JWT bearer token for use by Prometheus in making authenticated scraping
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
requests:
|
|
||||||
|
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
|
||||||
|
|
||||||
.. code-block:: shell
|
.. code-block:: shell
|
||||||
:class: copyable
|
:class: copyable
|
||||||
|
|
||||||
mc admin prometheus generate ALIAS
|
mc admin prometheus generate ALIAS
|
||||||
|
|
||||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the
|
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||||
:mc:`alias <mc alias>` of the MinIO deployment.
|
|
||||||
|
|
||||||
The command returns output similar to the following:
|
The command returns output similar to the following:
|
||||||
|
|
||||||
@ -72,31 +57,29 @@ The command returns output similar to the following:
|
|||||||
:class: copyable
|
:class: copyable
|
||||||
|
|
||||||
scrape_configs:
|
scrape_configs:
|
||||||
- job_name: minio-job
|
- job_name: minio-job
|
||||||
bearer_token: TOKEN
|
bearer_token: TOKEN
|
||||||
metrics_path: /minio/v2/metrics/cluster
|
metrics_path: /minio/v2/metrics/cluster
|
||||||
scheme: https
|
scheme: https
|
||||||
static_configs:
|
static_configs:
|
||||||
- targets: [minio.example.net]
|
- targets: [minio.example.net]
|
||||||
|
|
||||||
The ``targets`` array can contain the hostname for any node in the deployment.
|
- Set the ``job_name`` to a value associated to the MinIO deployment.
|
||||||
For clusters with a load balancer managing connections between MinIO nodes,
|
|
||||||
specify the address of the load balancer.
|
|
||||||
|
|
||||||
Specify the output block to the
|
Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.
|
||||||
:prometheus-docs:`scrape_config
|
|
||||||
<prometheus/latest/configuration/configuration/#scrape_config>` section of
|
|
||||||
the Prometheus configuration.
|
|
||||||
|
|
||||||
2) Configure and Run Prometheus
|
- MinIO deployments started with :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"`` can omit the ``bearer_token`` field.
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
Follow the Prometheus :prometheus-docs:`Getting Started
|
- Set the ``scheme`` to http for MinIO deployments not using TLS.
|
||||||
<prometheus/latest/getting_started/#downloading-and-running-prometheus>` guide
|
|
||||||
to download and run Prometheus locally.
|
|
||||||
|
|
||||||
Append the ``scrape_configs`` job generated in the previous step to the
|
- Set the ``targets`` array with a hostname that resolves to the MinIO deployment.
|
||||||
configuration file:
|
|
||||||
|
This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
|
||||||
|
|
||||||
|
2) Restart Prometheus with the Updated Configuration
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
|
||||||
|
|
||||||
.. code-block:: yaml
|
.. code-block:: yaml
|
||||||
:class: copyable
|
:class: copyable
|
||||||
@ -122,10 +105,8 @@ Start the Prometheus cluster using the configuration file:
|
|||||||
3) Analyze Collected Metrics
|
3) Analyze Collected Metrics
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
Prometheus includes a
|
Prometheus includes a :prometheus-docs:`expression browser <prometheus/latest/getting_started/#using-the-expression-browser>`.
|
||||||
:prometheus-docs:`expression browser
|
You can execute queries here to analyze the collected metrics.
|
||||||
<prometheus/latest/getting_started/#using-the-expression-browser>`. You can
|
|
||||||
execute queries here to analyze the collected metrics.
|
|
||||||
|
|
||||||
The following query examples return metrics collected by Prometheus:
|
The following query examples return metrics collected by Prometheus:
|
||||||
|
|
||||||
@ -139,386 +120,65 @@ The following query examples return metrics collected by Prometheus:
|
|||||||
|
|
||||||
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
|
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
|
||||||
|
|
||||||
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete
|
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
|
||||||
list of published metrics.
|
|
||||||
|
|
||||||
.. _minio-console-metrics:
|
4) Configure an Alert Rule using MinIO Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
4) Visualize Collected Metrics
|
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
|
||||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
The :minio-git:`MinIO Console <console>` supports visualizing collected metrics from Prometheus.
|
The following example alert rule files provide a baseline of alerts for a MinIO deployment.
|
||||||
Specify the URL of the Prometheus service to the :envvar:`MINIO_PROMETHEUS_URL` environment variable to each MinIO server in the deployment:
|
You can modify or otherwise use these examples as guidance in building your own alerts.
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
:class: copyable
|
|
||||||
|
|
||||||
export MINIO_PROMETHEUS_URL="https://prometheus.example.net"
|
|
||||||
|
|
||||||
If you set a custom ``job_name`` for the Prometheus scraping job, you must also set :envvar:`MINIO_PROMETHEUS_JOB_ID` to match that job name.
|
|
||||||
|
|
||||||
Restart the deployment using :mc-cmd:`mc admin service restart` to apply the changes.
|
|
||||||
|
|
||||||
The MinIO Console uses the metrics collected by the ``minio-job`` scraping job to populate the Dashboard metrics available from :guilabel:`Monitoring > Metrics`.
|
|
||||||
You can download the metrics from the MinIO Console as either a ``.png`` image or a ``.csv`` file.
|
|
||||||
|
|
||||||
.. image:: /images/minio-console/console-metrics.png
|
|
||||||
:width: 600px
|
|
||||||
:alt: MinIO Console Dashboard displaying Monitoring Data
|
|
||||||
:align: center
|
|
||||||
|
|
||||||
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
|
|
||||||
For more complete documentation on configuring a Prometheus data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
|
|
||||||
|
|
||||||
Prometheus includes a :prometheus-docs:`graphing interface <prometheus/latest/getting_started/#using-the-graphing-interface>` for visualizing collected metrics.
|
|
||||||
|
|
||||||
.. _minio-metrics-and-alerts-endpoints:
|
|
||||||
|
|
||||||
Metrics
|
|
||||||
-------
|
|
||||||
|
|
||||||
MinIO provides a scraping endpoint for cluster-level metrics:
|
|
||||||
|
|
||||||
.. code-block:: shell
|
|
||||||
:class: copyable
|
|
||||||
|
|
||||||
http://minio.example.net:9000/minio/v2/metrics/cluster
|
|
||||||
|
|
||||||
Replace ``http://minio.example.net`` with the hostname of any node in the MinIO
|
|
||||||
deployment. For deployments with a load balancer managing connections between
|
|
||||||
MinIO nodes, specify the address of the load balancer.
|
|
||||||
|
|
||||||
Create a new :prometheus-docs:`scraping configuration
|
|
||||||
<prometheus/latest/configuration/configuration/#scrape_config>` to begin
|
|
||||||
collecting metrics from the MinIO deployment. See
|
|
||||||
:ref:`minio-metrics-collect-using-prometheus` for a complete tutorial.
|
|
||||||
|
|
||||||
The following example describes a ``scrape_configs`` entry for collecting
|
|
||||||
cluster metrics.
|
|
||||||
|
|
||||||
.. code-block:: yaml
|
.. code-block:: yaml
|
||||||
:class: copyable
|
:class: copyable
|
||||||
|
|
||||||
scrape_configs:
|
groups:
|
||||||
- job_name: minio-job
|
- name: minio-alerts
|
||||||
bearer_token: <secret>
|
rules:
|
||||||
metrics_path: /minio/v2/metrics/cluster
|
- alert: NodesOffline
|
||||||
scheme: https
|
expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
|
||||||
static_configs:
|
for: 10m
|
||||||
- targets: ['minio.example.net:9000']
|
labels:
|
||||||
|
severity: warn
|
||||||
|
annotations:
|
||||||
|
summary: "Node down in MinIO deployment"
|
||||||
|
description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
|
||||||
|
|
||||||
.. list-table::
|
- alert: DisksOffline
|
||||||
:stub-columns: 1
|
expr: avg_over_time(minio_cluster_disk_offline_total{job="minio-job"}[5m]) > 0
|
||||||
:widths: 20 80
|
for: 10m
|
||||||
:width: 100%
|
labels:
|
||||||
|
severity: warn
|
||||||
|
annotations:
|
||||||
|
summary: "Disks down in MinIO deployment"
|
||||||
|
description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
|
||||||
|
|
||||||
* - ``job_name``
|
Specify the path to the alert file to the Prometheus configuration as part of the ``rule_files`` key:
|
||||||
- The name of the scraping job.
|
|
||||||
|
|
||||||
* - ``bearer_token``
|
.. code-block:: yaml
|
||||||
- The JWT token generated by :mc-cmd:`mc admin prometheus generate`.
|
|
||||||
|
|
||||||
Omit this field if the MinIO deployment was started with
|
global:
|
||||||
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``public``.
|
scrape_interval: 5s
|
||||||
|
|
||||||
* - ``targets``
|
rule_files:
|
||||||
- The endpoint for the MinIO deployment. You can specify any node in the
|
- minio-alerting.yml
|
||||||
deployment for collecting cluster metrics. For clusters with a load
|
|
||||||
balancer managing connections between MinIO nodes, specify the
|
|
||||||
address of the load balancer.
|
|
||||||
|
|
||||||
MinIO by default requires authentication for scraping the metrics endpoints.
|
Once triggered, Prometheus sends the alert to the configured AlertManager service.
|
||||||
Use the :mc-cmd:`mc admin prometheus generate` command to generate the
|
|
||||||
necessary bearer tokens for use with configuring the
|
|
||||||
``scrape_configs.bearer_token`` field. You can alternatively disable
|
|
||||||
metrics endpoint authentication by setting
|
|
||||||
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
|
|
||||||
|
|
||||||
Visualizing Metrics
|
5) (Optional) Configure MinIO Console to Query Prometheus
|
||||||
~~~~~~~~~~~~~~~~~~~
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
The MinIO Console uses the metrics collected by Prometheus to populate the
|
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
|
||||||
Dashboard metrics:
|
|
||||||
|
|
||||||
.. image:: /images/minio-console/console-metrics.png
|
.. image:: /images/minio-console/console-metrics.png
|
||||||
:width: 600px
|
:width: 600px
|
||||||
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
|
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
|
||||||
:align: center
|
:align: center
|
||||||
|
|
||||||
Set the :envvar:`MINIO_PROMETHEUS_URL` environment variable to the URL of the
|
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
|
||||||
Prometheus service to allow the Console to retrieve and display collected
|
|
||||||
metrics. See :ref:`minio-metrics-collect-using-prometheus` for a complete
|
|
||||||
example.
|
|
||||||
|
|
||||||
MinIO also publishes a `Grafana Dashboard
|
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
|
||||||
<https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected
|
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
|
||||||
metrics. For more complete documentation on configuring a Prometheus data source
|
|
||||||
for Grafana, see :prometheus-docs:`Grafana Support for Prometheus
|
|
||||||
<visualization/grafana/>`.
|
|
||||||
|
|
||||||
.. _minio-metrics-and-alerts-available-metrics:
|
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
|
||||||
|
|
||||||
Available Metrics
|
|
||||||
~~~~~~~~~~~~~~~~~
|
|
||||||
|
|
||||||
MinIO publishes the following metrics, where each metric includes a label for
|
|
||||||
the MinIO server which generated that metric.
|
|
||||||
|
|
||||||
Object Metrics
|
|
||||||
++++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_objects_size_distribution
|
|
||||||
|
|
||||||
Distribution of object sizes in the bucket, includes label for the bucket
|
|
||||||
name.
|
|
||||||
|
|
||||||
Replication Metrics
|
|
||||||
+++++++++++++++++++
|
|
||||||
|
|
||||||
These metrics are only populated for MinIO clusters with
|
|
||||||
:ref:`minio-bucket-replication-serverside` enabled.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_failed_bytes
|
|
||||||
|
|
||||||
Total number of bytes failed at least once to replicate.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_pending_bytes
|
|
||||||
|
|
||||||
Total bytes pending to replicate.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_received_bytes
|
|
||||||
|
|
||||||
Total number of bytes replicated to this bucket from another source bucket.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_sent_bytes
|
|
||||||
|
|
||||||
Total number of bytes replicated to the target bucket.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_pending_count
|
|
||||||
|
|
||||||
Total number of replication operations pending for this bucket.
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_replication_failed_count
|
|
||||||
|
|
||||||
Total number of replication operations failed for this bucket.
|
|
||||||
|
|
||||||
Bucket Metrics
|
|
||||||
++++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_usage_object_total
|
|
||||||
|
|
||||||
Total number of objects
|
|
||||||
|
|
||||||
.. metric:: minio_bucket_usage_total_bytes
|
|
||||||
|
|
||||||
Total bucket size in bytes
|
|
||||||
|
|
||||||
Cache Metrics
|
|
||||||
+++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_cache_hits_total
|
|
||||||
|
|
||||||
Total number of disk cache hits
|
|
||||||
|
|
||||||
.. metric:: minio_cache_missed_total
|
|
||||||
|
|
||||||
Total number of disk cache misses
|
|
||||||
|
|
||||||
.. metric:: minio_cache_sent_bytes
|
|
||||||
|
|
||||||
Total number of bytes served from cache
|
|
||||||
|
|
||||||
.. metric:: minio_cache_total_bytes
|
|
||||||
|
|
||||||
Total size of cache disk in bytes
|
|
||||||
|
|
||||||
.. metric:: minio_cache_usage_info
|
|
||||||
|
|
||||||
Total percentage cache usage, value of 1 indicates high and 0 low, label
|
|
||||||
level is set as well
|
|
||||||
|
|
||||||
.. metric:: minio_cache_used_bytes
|
|
||||||
|
|
||||||
Current cache usage in bytes
|
|
||||||
|
|
||||||
Cluster Metrics
|
|
||||||
+++++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_capacity_raw_free_bytes
|
|
||||||
|
|
||||||
Total free capacity online in the cluster.
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_capacity_raw_total_bytes
|
|
||||||
|
|
||||||
Total capacity online in the cluster.
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_capacity_usable_free_bytes
|
|
||||||
|
|
||||||
Total free usable capacity online in the cluster.
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_capacity_usable_total_bytes
|
|
||||||
|
|
||||||
Total usable capacity online in the cluster.
|
|
||||||
|
|
||||||
Node Metrics
|
|
||||||
++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_nodes_offline_total
|
|
||||||
|
|
||||||
Total number of MinIO nodes offline.
|
|
||||||
|
|
||||||
.. metric:: minio_cluster_nodes_online_total
|
|
||||||
|
|
||||||
Total number of MinIO nodes online.
|
|
||||||
|
|
||||||
.. metric:: minio_heal_objects_error_total
|
|
||||||
|
|
||||||
Objects for which healing failed in current self healing run
|
|
||||||
|
|
||||||
.. metric:: minio_heal_objects_heal_total
|
|
||||||
|
|
||||||
Objects healed in current self healing run
|
|
||||||
|
|
||||||
.. metric:: minio_heal_objects_total
|
|
||||||
|
|
||||||
Objects scanned in current self healing run
|
|
||||||
|
|
||||||
.. metric:: minio_heal_time_last_activity_nano_seconds
|
|
||||||
|
|
||||||
Time elapsed (in nano seconds) since last self healing activity. This is set
|
|
||||||
to -1 until initial self heal
|
|
||||||
|
|
||||||
.. metric:: minio_inter_node_traffic_received_bytes
|
|
||||||
|
|
||||||
Total number of bytes received from other peer nodes.
|
|
||||||
|
|
||||||
.. metric:: minio_inter_node_traffic_sent_bytes
|
|
||||||
|
|
||||||
Total number of bytes sent to the other peer nodes.
|
|
||||||
|
|
||||||
.. metric:: minio_node_disk_free_bytes
|
|
||||||
|
|
||||||
Total storage available on a disk.
|
|
||||||
|
|
||||||
.. metric:: minio_node_disk_total_bytes
|
|
||||||
|
|
||||||
Total storage on a disk.
|
|
||||||
|
|
||||||
.. metric:: minio_node_disk_used_bytes
|
|
||||||
|
|
||||||
Total storage used on a disk.
|
|
||||||
|
|
||||||
.. metric:: minio_node_file_descriptor_limit_total
|
|
||||||
|
|
||||||
Limit on total number of open file descriptors for the MinIO Server process.
|
|
||||||
|
|
||||||
.. metric:: minio_node_file_descriptor_open_total
|
|
||||||
|
|
||||||
Total number of open file descriptors by the MinIO Server process.
|
|
||||||
|
|
||||||
.. metric:: minio_node_io_rchar_bytes
|
|
||||||
|
|
||||||
Total bytes read by the process from the underlying storage system including
|
|
||||||
cache, ``/proc/[pid]/io rchar``
|
|
||||||
|
|
||||||
.. metric:: minio_node_io_read_bytes
|
|
||||||
|
|
||||||
Total bytes read by the process from the underlying storage system,
|
|
||||||
``/proc/[pid]/io read_bytes``
|
|
||||||
|
|
||||||
.. metric:: minio_node_io_wchar_bytes
|
|
||||||
|
|
||||||
Total bytes written by the process to the underlying storage system including
|
|
||||||
page cache, ``/proc/[pid]/io wchar``
|
|
||||||
|
|
||||||
.. metric:: minio_node_io_write_bytes
|
|
||||||
|
|
||||||
Total bytes written by the process to the underlying storage system,
|
|
||||||
``/proc/[pid]/io write_bytes``
|
|
||||||
|
|
||||||
.. metric:: minio_node_process_starttime_seconds
|
|
||||||
|
|
||||||
Start time for MinIO process per node, time in seconds since Unix epoch.
|
|
||||||
|
|
||||||
.. metric:: minio_node_process_uptime_seconds
|
|
||||||
|
|
||||||
Uptime for MinIO process per node in seconds.
|
|
||||||
|
|
||||||
.. metric:: minio_node_scanner_bucket_scans_finished
|
|
||||||
|
|
||||||
Total number of bucket scans finished since server start.
|
|
||||||
|
|
||||||
.. metric:: minio_node_scanner_bucket_scans_started
|
|
||||||
|
|
||||||
Total number of bucket scans started since server start.
|
|
||||||
|
|
||||||
.. metric:: minio_node_scanner_directories_scanned
|
|
||||||
|
|
||||||
Total number of directories scanned since server start.
|
|
||||||
|
|
||||||
.. metric:: minio_node_scanner_objects_scanned
|
|
||||||
|
|
||||||
Total number of unique objects scanned since server start.
|
|
||||||
|
|
||||||
.. metric:: minio_node_scanner_versions_scanned
|
|
||||||
|
|
||||||
Total number of object versions scanned since server start.
|
|
||||||
|
|
||||||
.. metric:: minio_node_syscall_read_total
|
|
||||||
|
|
||||||
Total read SysCalls to the kernel. ``/proc/[pid]/io syscr``
|
|
||||||
|
|
||||||
.. metric:: minio_node_syscall_write_total
|
|
||||||
|
|
||||||
Total write SysCalls to the kernel. ``/proc/[pid]/io syscw``
|
|
||||||
|
|
||||||
S3 Metrics
|
|
||||||
++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_s3_requests_error_total
|
|
||||||
|
|
||||||
Total number S3 requests with errors
|
|
||||||
|
|
||||||
.. metric:: minio_s3_requests_inflight_total
|
|
||||||
|
|
||||||
Total number of S3 requests currently in flight
|
|
||||||
|
|
||||||
.. metric:: minio_s3_requests_total
|
|
||||||
|
|
||||||
Total number S3 requests
|
|
||||||
|
|
||||||
.. metric:: minio_s3_time_ttbf_seconds_distribution
|
|
||||||
|
|
||||||
Distribution of the time to first byte across API calls.
|
|
||||||
|
|
||||||
.. metric:: minio_s3_traffic_received_bytes
|
|
||||||
|
|
||||||
Total number of s3 bytes received.
|
|
||||||
|
|
||||||
.. metric:: minio_s3_traffic_sent_bytes
|
|
||||||
|
|
||||||
Total number of s3 bytes sent
|
|
||||||
|
|
||||||
Software Metrics
|
|
||||||
++++++++++++++++
|
|
||||||
|
|
||||||
.. metric:: minio_software_commit_info
|
|
||||||
|
|
||||||
Git commit hash for the MinIO release.
|
|
||||||
|
|
||||||
.. metric:: minio_software_version_info
|
|
||||||
|
|
||||||
MinIO Release tag for the server
|
|
||||||
|
|
||||||
.. _minio-metrics-and-alerts-alerting:
|
|
||||||
|
|
||||||
Alerts
|
|
||||||
------
|
|
||||||
|
|
||||||
You can configure alerts using Prometheus :prometheus-docs:`Alerting Rules
|
|
||||||
<prometheus/latest/configuration/alerting_rules/>` based on the collected MinIO
|
|
||||||
metrics. The Prometheus :prometheus-docs:`Alert Manager
|
|
||||||
<alerting/latest/overview/>` supports managing alerts produced by the configured
|
|
||||||
alerting rules. Prometheus also supports a :prometheus-docs:`Webhook Receiver
|
|
||||||
<operating/integrations/#alertmanager-webhook-receiver>` for publishing alerts
|
|
||||||
to mechanisms not supported by Prometheus AlertManager.
|
|
||||||
|
377
source/operations/monitoring/metrics-and-alerts.rst
Normal file
377
source/operations/monitoring/metrics-and-alerts.rst
Normal file
@ -0,0 +1,377 @@
|
|||||||
|
.. _minio-metrics-and-alerts-endpoints:
|
||||||
|
.. _minio-metrics-and-alerts-alerting:
|
||||||
|
.. _minio-metrics-and-alerts:
|
||||||
|
|
||||||
|
==================
|
||||||
|
Metrics and Alerts
|
||||||
|
==================
|
||||||
|
|
||||||
|
.. default-domain:: minio
|
||||||
|
|
||||||
|
.. contents:: Table of Contents
|
||||||
|
:local:
|
||||||
|
:depth: 2
|
||||||
|
|
||||||
|
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
|
||||||
|
You can use any scraping tool to pull metrics data from MinIO for further analysis and alerting.
|
||||||
|
|
||||||
|
MinIO provides a scraping endpoint for cluster-level metrics:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
:class: copyable
|
||||||
|
|
||||||
|
http://minio.example.net:9000/minio/v2/metrics/cluster
|
||||||
|
|
||||||
|
Replace ``http://minio.example.net`` with the hostname of any node in the MinIO deployment.
|
||||||
|
For deployments with a load balancer managing connections between MinIO nodes, specify the address of the load balancer.
|
||||||
|
|
||||||
|
MinIO by default requires authentication for scraping the metrics endpoints.
|
||||||
|
Use the :mc-cmd:`mc admin prometheus generate` command to generate the necessary bearer tokens.
|
||||||
|
You can alternatively disable metrics endpoint authentication by setting :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
|
||||||
|
|
||||||
|
.. _minio-console-metrics:
|
||||||
|
|
||||||
|
MinIO Console Metrics Dashboard
|
||||||
|
-------------------------------
|
||||||
|
|
||||||
|
The :ref:`MinIO Console <minio-console-monitoring>` provides a point-in-time metrics dashboard by default:
|
||||||
|
|
||||||
|
.. image:: /images/minio-console/console-metrics-simple.png
|
||||||
|
:width: 600px
|
||||||
|
:alt: MinIO Console with Point-In-Time Metrics
|
||||||
|
:align: center
|
||||||
|
|
||||||
|
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
|
||||||
|
Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display the following visualizations:
|
||||||
|
|
||||||
|
- :guilabel:`Usage` - provides historical and on-demand visualization of overall usage and status
|
||||||
|
- :guilabel:`Traffic` - provides historical and on-demand visualization of network traffic
|
||||||
|
- :guilabel:`Resources` - provides historical and on-demand visualization of resources (compute and storage)
|
||||||
|
- :guilabel:`Info` - provides point-in-time status of the deployment
|
||||||
|
|
||||||
|
.. image:: /images/minio-console/console-metrics.png
|
||||||
|
:width: 600px
|
||||||
|
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
|
||||||
|
:align: center
|
||||||
|
|
||||||
|
.. cond:: k8s
|
||||||
|
|
||||||
|
The MinIO Operator supports deploying a per-tenant Prometheus instance configured to support metrics and visualization.
|
||||||
|
|
||||||
|
If you deploy the Tenant with this feature disabled *but* still want the historical metric views, you can instead configure an external Prometheus service to scrape the Tenant metrics.
|
||||||
|
Once configured, you can update the Tenant to query that Prometheus service to retrieve metric data:
|
||||||
|
|
||||||
|
.. cond:: linux or container or macos or windows
|
||||||
|
|
||||||
|
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
|
||||||
|
|
||||||
|
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
|
||||||
|
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
|
||||||
|
|
||||||
|
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
|
||||||
|
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
|
||||||
|
|
||||||
|
.. _minio-metrics-and-alerts-available-metrics:
|
||||||
|
|
||||||
|
Available Metrics
|
||||||
|
-----------------
|
||||||
|
|
||||||
|
MinIO publishes the following metrics, where each metric includes a label for
|
||||||
|
the MinIO server which generated that metric.
|
||||||
|
|
||||||
|
Object and Bucket Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_objects_size_distribution
|
||||||
|
|
||||||
|
Distribution of object sizes in a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_usage_object_total
|
||||||
|
|
||||||
|
Total number of objects in a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_usage_total_bytes
|
||||||
|
|
||||||
|
Total bucket size in bytes in a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
Replication Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
These metrics are only populated for MinIO clusters with
|
||||||
|
:ref:`minio-bucket-replication-serverside` enabled.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_failed_bytes
|
||||||
|
|
||||||
|
Total number of bytes that failed at least once to replicate for a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_pending_bytes
|
||||||
|
|
||||||
|
Total number of bytes pending to replicate for a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_received_bytes
|
||||||
|
|
||||||
|
Total number of bytes replicated to this bucket from another source bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_sent_bytes
|
||||||
|
|
||||||
|
Total number of bytes replicated to the target bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_pending_count
|
||||||
|
|
||||||
|
Total number of replication operations pending for a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_replication_failed_count
|
||||||
|
|
||||||
|
Total number of replication operations failed for a given bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
Capacity Metrics
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_capacity_raw_free_bytes
|
||||||
|
|
||||||
|
Total free capacity online in the cluster.
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_capacity_raw_total_bytes
|
||||||
|
|
||||||
|
Total capacity online in the cluster.
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_capacity_usable_free_bytes
|
||||||
|
|
||||||
|
Total free usable capacity online in the cluster.
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_capacity_usable_total_bytes
|
||||||
|
|
||||||
|
Total usable capacity online in the cluster.
|
||||||
|
|
||||||
|
.. metric:: minio_node_disk_free_bytes
|
||||||
|
|
||||||
|
Total storage available on a specific drive for a node in the MinIO deployment.
|
||||||
|
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||||
|
|
||||||
|
.. metric:: minio_node_disk_total_bytes
|
||||||
|
|
||||||
|
Total storage on a specific drive for a node in the MinIO deployment.
|
||||||
|
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||||
|
|
||||||
|
.. metric:: minio_node_disk_used_bytes
|
||||||
|
|
||||||
|
Total storage used on a specific drive for a node in a MinIO deployment.
|
||||||
|
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||||
|
|
||||||
|
Lifecycle Management Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_ilm_transitioned_bytes
|
||||||
|
|
||||||
|
Total number of bytes transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||||
|
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_ilm_transitioned_objects
|
||||||
|
|
||||||
|
Total number of objects transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_ilm_transitioned_versions
|
||||||
|
|
||||||
|
Total number of non-current object versions transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||||
|
|
||||||
|
.. metric:: minio_node_ilm_transition_pending_tasks
|
||||||
|
|
||||||
|
Total number of pending :ref:`object transition <minio-lifecycle-management-tiering>` tasks
|
||||||
|
|
||||||
|
.. metric:: minio_node_ilm_expiry_pending_tasks
|
||||||
|
|
||||||
|
Total number of pending :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
|
||||||
|
|
||||||
|
.. metric:: minio_node_ilm_expiry_active_tasks
|
||||||
|
|
||||||
|
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
|
||||||
|
|
||||||
|
Node and Disk Health Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_disk_online_total
|
||||||
|
|
||||||
|
The total number of disks online
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_disk_offline_total
|
||||||
|
|
||||||
|
The total number of disks offline
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_disk_total
|
||||||
|
|
||||||
|
The total number of disks
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_nodes_offline_total
|
||||||
|
|
||||||
|
Total number of MinIO nodes offline.
|
||||||
|
|
||||||
|
.. metric:: minio_cluster_nodes_online_total
|
||||||
|
|
||||||
|
Total number of MinIO nodes online.
|
||||||
|
|
||||||
|
.. metric:: minio_heal_objects_error_total
|
||||||
|
|
||||||
|
Objects for which healing failed in current self healing run
|
||||||
|
|
||||||
|
.. metric:: minio_heal_objects_heal_total
|
||||||
|
|
||||||
|
Objects healed in current self healing run
|
||||||
|
|
||||||
|
.. metric:: minio_heal_objects_total
|
||||||
|
|
||||||
|
Objects scanned in current self healing run
|
||||||
|
|
||||||
|
.. metric:: minio_heal_time_last_activity_nano_seconds
|
||||||
|
|
||||||
|
Time elapsed (in nano seconds) since last self healing activity. This is set
|
||||||
|
to -1 until initial self heal
|
||||||
|
|
||||||
|
Scanner Metrics
|
||||||
|
~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_node_scanner_bucket_scans_finished
|
||||||
|
|
||||||
|
Total number of bucket scans finished since server start.
|
||||||
|
|
||||||
|
.. metric:: minio_node_scanner_bucket_scans_started
|
||||||
|
|
||||||
|
Total number of bucket scans started since server start.
|
||||||
|
|
||||||
|
.. metric:: minio_node_scanner_directories_scanned
|
||||||
|
|
||||||
|
Total number of directories scanned since server start.
|
||||||
|
|
||||||
|
.. metric:: minio_node_scanner_objects_scanned
|
||||||
|
|
||||||
|
Total number of unique objects scanned since server start.
|
||||||
|
|
||||||
|
.. metric:: minio_node_scanner_versions_scanned
|
||||||
|
|
||||||
|
Total number of object versions scanned since server start.
|
||||||
|
|
||||||
|
.. metric:: minio_node_syscall_read_total
|
||||||
|
|
||||||
|
Total number of read SysCalls to the kernel. ``/proc/[pid]/io syscr``
|
||||||
|
|
||||||
|
.. metric:: minio_node_syscall_write_total
|
||||||
|
|
||||||
|
Total number of write SysCalls to the kernel. ``/proc/[pid]/io syscw``
|
||||||
|
|
||||||
|
S3 Metrics
|
||||||
|
~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_traffic_sent_bytes
|
||||||
|
|
||||||
|
Total number of bytes of S3 traffic sent per bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_bucket_traffic_received_bytes
|
||||||
|
|
||||||
|
Total number of bytes of S3 traffic received per bucket.
|
||||||
|
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_requests_inflight_total
|
||||||
|
|
||||||
|
Total number of S3 requests currently in flight.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_requests_total
|
||||||
|
|
||||||
|
Total number of S3 requests.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_time_ttfb_seconds_distribution
|
||||||
|
|
||||||
|
Distribution of the time to first byte across API calls.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_traffic_received_bytes
|
||||||
|
|
||||||
|
Total number of S3 bytes received.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_traffic_sent_bytes
|
||||||
|
|
||||||
|
Total number of S3 bytes sent.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_requests_errors_total
|
||||||
|
|
||||||
|
Total number of S3 requests with 4xx and 5xx errors.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_requests_4xx_errors_total
|
||||||
|
|
||||||
|
Total number of S3 requests with 4xx errors.
|
||||||
|
|
||||||
|
.. metric:: minio_s3_requests_5xx_errors_total
|
||||||
|
|
||||||
|
Total number of S3 requests with 5xx errors.
|
||||||
|
|
||||||
|
Internal Metrics
|
||||||
|
~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_inter_node_traffic_received_bytes
|
||||||
|
|
||||||
|
Total number of bytes received from other peer nodes.
|
||||||
|
|
||||||
|
.. metric:: minio_inter_node_traffic_sent_bytes
|
||||||
|
|
||||||
|
Total number of bytes sent to the other peer nodes.
|
||||||
|
|
||||||
|
.. metric:: minio_node_file_descriptor_limit_total
|
||||||
|
|
||||||
|
Limit on total number of open file descriptors for the MinIO Server process.
|
||||||
|
|
||||||
|
.. metric:: minio_node_file_descriptor_open_total
|
||||||
|
|
||||||
|
Total number of open file descriptors by the MinIO Server process.
|
||||||
|
|
||||||
|
.. metric:: minio_node_io_rchar_bytes
|
||||||
|
|
||||||
|
Total bytes read by the process from the underlying storage system including
|
||||||
|
cache, ``/proc/[pid]/io rchar``
|
||||||
|
|
||||||
|
.. metric:: minio_node_io_read_bytes
|
||||||
|
|
||||||
|
Total bytes read by the process from the underlying storage system,
|
||||||
|
``/proc/[pid]/io read_bytes``
|
||||||
|
|
||||||
|
.. metric:: minio_node_io_wchar_bytes
|
||||||
|
|
||||||
|
Total bytes written by the process to the underlying storage system including
|
||||||
|
page cache, ``/proc/[pid]/io wchar``
|
||||||
|
|
||||||
|
.. metric:: minio_node_io_write_bytes
|
||||||
|
|
||||||
|
Total bytes written by the process to the underlying storage system,
|
||||||
|
``/proc/[pid]/io write_bytes``
|
||||||
|
|
||||||
|
Software and Process Metrics
|
||||||
|
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||||
|
|
||||||
|
.. metric:: minio_software_commit_info
|
||||||
|
|
||||||
|
Git commit hash for the MinIO release.
|
||||||
|
|
||||||
|
.. metric:: minio_software_version_info
|
||||||
|
|
||||||
|
MinIO Release tag for the server
|
||||||
|
|
||||||
|
.. metric:: minio_node_process_starttime_seconds
|
||||||
|
|
||||||
|
Start time for MinIO process per node, time in seconds since Unix epoch.
|
||||||
|
|
||||||
|
.. metric:: minio_node_process_uptime_seconds
|
||||||
|
|
||||||
|
Uptime for MinIO process per node in seconds.
|
||||||
|
|
||||||
|
.. toctree::
|
||||||
|
:titlesonly:
|
||||||
|
:hidden:
|
||||||
|
|
||||||
|
/operations/monitoring/collect-minio-metrics-using-prometheus
|
||||||
|
/operations/monitoring/monitor-and-alert-using-influxdb
|
@ -0,0 +1,121 @@
|
|||||||
|
.. _minio-metrics-influxdb:
|
||||||
|
|
||||||
|
======================================
|
||||||
|
Monitoring and Alerting using InfluxDB
|
||||||
|
======================================
|
||||||
|
|
||||||
|
.. default-domain:: minio
|
||||||
|
|
||||||
|
.. contents:: Table of Contents
|
||||||
|
:local:
|
||||||
|
:depth: 1
|
||||||
|
|
||||||
|
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
|
||||||
|
`InfluxDB <https://www.influxdata.com/?ref=minio>`__ supports scraping MinIO metrics data for monitoring and alerting.
|
||||||
|
|
||||||
|
The procedure on this page documents the following:
|
||||||
|
|
||||||
|
- Configuring an InfluxDB service to scrape and display metrics from a MinIO deployment
|
||||||
|
- Configuring an Alert on a MinIO metric
|
||||||
|
|
||||||
|
.. admonition:: Prerequisites
|
||||||
|
:class: note
|
||||||
|
|
||||||
|
This procedure requires the following:
|
||||||
|
|
||||||
|
- An existing InfluxDB deployment configured with one or more :influxdb-docs:`notification endpoints <notification-endpoints/>`
|
||||||
|
- An existing MinIO deployment with network access to the InfluxDB deployment
|
||||||
|
- An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
|
||||||
|
|
||||||
|
.. cond:: k8s
|
||||||
|
|
||||||
|
This procedure assumes all necessary network control components, such as Ingress or Load Balancers, to facilitate access between the MinIO Tenant and the InfluxDB service.
|
||||||
|
|
||||||
|
Configure InfluxDB to Collect and Alert using MinIO Metrics
|
||||||
|
-----------------------------------------------------------
|
||||||
|
|
||||||
|
.. important::
|
||||||
|
|
||||||
|
This procedure specifically uses the InfluxDB UI to create a scraping endpoint.
|
||||||
|
|
||||||
|
The InfluxDB UI does not provide the same level of configuration as using `Telegraf <https://docs.influxdata.com/telegraf/v1.24/>`__ and the corresponding `Prometheus plugin <https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/prometheus/README.md>`__.
|
||||||
|
Specifically:
|
||||||
|
|
||||||
|
- You cannot enable authenticated access to the MinIO metrics endpoint via the InfluxDB UI
|
||||||
|
- You cannot set a tag for collected metrics (e.g. ``url_tag``) for uniquely identifying the metrics for a given MinIO deployment
|
||||||
|
|
||||||
|
.. cond:: k8s
|
||||||
|
|
||||||
|
The Telegraf Prometheus plugin also supports Kubernetes-specific features, such as scraping the ``minio`` service for a given MinIO Tenant.
|
||||||
|
|
||||||
|
Configuring Telegraf is out of scope for this procedure.
|
||||||
|
You can use this procedure as general guidance for configuring Telegraf to scrape MinIO metrics.
|
||||||
|
|
||||||
|
.. container:: procedure
|
||||||
|
|
||||||
|
1. Configure Public Access to MinIO Metrics
|
||||||
|
|
||||||
|
Set the :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` environment variable to ``"public"`` for all nodes in the MinIO deployment.
|
||||||
|
You can then restart the deployment to allow public access to MinIO metrics.
|
||||||
|
|
||||||
|
You can validate the change by attempting to ``curl`` the metrics endpoint:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
:class: copyable
|
||||||
|
|
||||||
|
curl https://HOSTNAME/minio/v2/metrics/cluster
|
||||||
|
|
||||||
|
Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
|
||||||
|
You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
|
||||||
|
|
||||||
|
The response body should include a list of collected MinIO metrics.
|
||||||
|
|
||||||
|
#. Log into the InfluxDB UI and Create a Bucket
|
||||||
|
|
||||||
|
Select the :influxdb-docs:`Organization <organizations/view-orgs/>` under which you want to store MinIO metrics.
|
||||||
|
|
||||||
|
Create a :influxdb-docs:`New Bucket <organizations/buckets/create-bucket/>` in which to store metrics for the MinIO deployment.
|
||||||
|
|
||||||
|
#. Create a new Scraping Source
|
||||||
|
|
||||||
|
Create a :influxdb-docs:`new InfluxDB Scraper <write-data/no-code/scrape-data/manage-scrapers/create-a-scraper/>`.
|
||||||
|
|
||||||
|
Specify the full URL to the MinIO deployment, including the metrics endpoint:
|
||||||
|
|
||||||
|
.. code-block:: shell
|
||||||
|
:class: copyable
|
||||||
|
|
||||||
|
https://HOSTNAME/minio/v2/metrics/cluster
|
||||||
|
|
||||||
|
Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
|
||||||
|
You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
|
||||||
|
|
||||||
|
#. Validate the Data
|
||||||
|
|
||||||
|
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
|
||||||
|
|
||||||
|
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
|
||||||
|
|
||||||
|
#. Configure a Check
|
||||||
|
|
||||||
|
Create a :influxdb-docs:`new Check <https://docs.influxdata.com/influxdb/v2.4/monitor-alert/checks/create/>` on a MinIO metric.
|
||||||
|
|
||||||
|
The following example check rules provide a baseline of alerts for a MinIO deployment.
|
||||||
|
You can modify or otherwise use these examples for guidance in building your own checks.
|
||||||
|
|
||||||
|
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
|
||||||
|
|
||||||
|
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
|
||||||
|
|
||||||
|
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
|
||||||
|
|
||||||
|
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
|
||||||
|
|
||||||
|
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
|
||||||
|
|
||||||
|
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.
|
||||||
|
|
||||||
|
For example, a deployment using EC:4 should set this value to ``3``.
|
||||||
|
|
||||||
|
Configure your :influxdb-docs:`Notification endpoints <monitor-alert/notification-endpoints/>` and :influxdb-docs:`Notification rules <monitor-alert/notification-rules/>` such that checks of each type trigger an appropriate response.
|
||||||
|
|
Reference in New Issue
Block a user