1
0
mirror of https://github.com/minio/docs.git synced 2025-07-28 19:42:10 +03:00

Major overhaul for Monitoring docs: Part 1

This commit is contained in:
Ravind Kumar
2022-10-20 17:39:54 -04:00
committed by Ravind Kumar
parent 1735b77d8f
commit 4e4cc97f45
8 changed files with 607 additions and 443 deletions

View File

@ -283,6 +283,8 @@ Some subsections may not be visible if the authenticated user does not have the
Use the :guilabel:`Users` and :guilabel:`Groups` views to assign a created policy to users and groups, respectively.
.. _minio-console-monitoring:
Monitoring
----------
@ -295,25 +297,23 @@ Some subsections may not be visible if the authenticated user does not have the
.. tab-item:: Metrics
.. image:: /images/minio-console/console-metrics.png
.. image:: /images/minio-console/console-metrics-simple.png
:width: 600px
:alt: MinIO Console Metrics displaying detailed data using Prometheus
:alt: MinIO Console Metrics displaying point-in-time data
:align: center
The Console :guilabel:`Dashboard` section displays metrics for the MinIO deployment.
The default view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.
The Console depends on a :ref:`configured Prometheus service <minio-metrics-collect-using-prometheus>` to generate the detailed metrics shown above.
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display historical metrics:
The default metrics view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.
.. image:: /images/minio-console/console-metrics-simple.png
.. image:: /images/minio-console/console-metrics.png
:width: 600px
:alt: MinIO Console Metrics displaying simplified data
:align: center
This view requires configuring a Prometheus service to scrape the deployment metrics.
You can download these metrics as a ``.png`` image or ``.csv`` file.
See :ref:`minio-metrics-collect-using-prometheus` for complete instructions.
See :ref:`minio-console-metrics` for more information on the historical metric visualization.
.. tab-item:: Logs

View File

@ -79,6 +79,7 @@ extlinks = {
'podman-git' : ('https://github.com/containers/podman/%s',''),
'docker-docs' : ('https://docs.docker.com/%s', ''),
'openshift-docs' : ('https://docs.openshift.com/container-platform/4.11/%s', ''),
'influxdb-docs' : ('https://docs.influxdata.com/influxdb/v2.4/%s',''),
}

Binary file not shown.

Before

Width:  |  Height:  |  Size: 225 KiB

After

Width:  |  Height:  |  Size: 126 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 197 KiB

After

Width:  |  Height:  |  Size: 196 KiB

View File

@ -1,5 +1,5 @@
=====================
Prometheus Monitoring
Monitoring and Alerts
=====================
.. default-domain:: minio
@ -12,22 +12,27 @@ Metrics and Alerts
------------------
MinIO provides point-in-time metrics on cluster status and operations.
MinIO publishes collected metrics data using Prometheus-compatible data structures.
The :ref:`MinIO Console <minio-console-metrics>` provides a graphical display of these metrics.
For alerts, time-series metric data, or additional metrics, MinIO can leverage `Prometheus <https://prometheus.io/>`__.
Prometheus is an Open Source systems and service monitoring system which supports analyzing and alerting based on collected metrics.
The Prometheus ecosystem includes multiple :prometheus-docs:`integrations <operating/integrations/>`, allowing wide latitude in processing and storing collected metrics.
For historical metrics and analytics, MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
You can use any scraping tool which supports that data model to pull metrics data from MinIO for further analysis and alerting.
- MinIO publishes Prometheus-compatible scraping endpoints for cluster and node-level metrics.
Any Prometheus-compatible scraping software can ingest and process MinIO metrics for analysis, visualization, and alerting.
See :ref:`minio-metrics-and-alerts-endpoints` for more information.
The following table lists tutorials for integrating MinIO metrics with select third-party monitoring software.
- For alerts, use Prometheus :prometheus-docs:`Alerting Rules <prometheus/latest/configuration/alerting_rules/>` and the
:prometheus-docs:`Alert Manager <alerting/latest/overview/>` to trigger alerts based on collected metrics.
See :ref:`minio-metrics-and-alerts-alerting` for more information.
.. list-table::
:stub-columns: 1
:widths: 30 70
:width: 100%
When configured, the :ref:`MinIO Console <minio-console-metrics>` shows some metrics in the :guilabel:`Monitoring > Metrics` page.
You can download these metrics as either ``.png`` images or ``.csv`` files.
* - :ref:`minio-metrics-collect-using-prometheus`
- Configure Prometheus to Monitor and Alert for a MinIO deployment
Configure MinIO to query the Prometheus deployment to enable historical metrics via the MinIO Console
* - :ref:`minio-metrics-influxdb`
- Configure InfluxDB to Monitor and Alert for a MinIO deployment.
Other metrics and analytics software suites which support the Prometheus data model may work regardless of their inclusion on the above list.
Logging
-------
@ -58,6 +63,6 @@ See :ref:`minio-healthcheck-api` for more information.
:titlesonly:
:hidden:
/operations/monitoring/collect-minio-metrics-using-prometheus
/operations/monitoring/metrics-and-alerts
/operations/monitoring/minio-logging
/operations/monitoring/healthcheck-probe

View File

@ -1,9 +1,8 @@
.. _minio-metrics-collect-using-prometheus:
.. _minio-metrics-and-alerts:
======================================
Collect MinIO Metrics Using Prometheus
======================================
========================================
Monitoring and Alerting using Prometheus
========================================
.. default-domain:: minio
@ -11,60 +10,46 @@ Collect MinIO Metrics Using Prometheus
:local:
:depth: 1
MinIO leverages `Prometheus <https://prometheus.io/>`__ for metrics and alerts.
MinIO publishes Prometheus-compatible scraping endpoints for cluster and
node-level metrics. See :ref:`minio-metrics-and-alerts-endpoints` for more
information.
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
The procedure on this page documents the following:
The procedure on this page documents scraping the MinIO metrics
endpoints using a Prometheus instance, including deploying and configuring
a simple Prometheus server for collecting metrics.
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
- Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action
This procedure is not a replacement for the official
:prometheus-docs:`Prometheus Documentation <>`. Any specific guidance
related to configuring, deploying, and using Prometheus is made on a best-effort
basis.
.. admonition:: Prerequisites
:class: note
Requirements
------------
This procedure requires the following:
Install and Configure ``mc`` with Access to the MinIO Cluster
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- An existing Prometheus deployment with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
This procedure uses :mc:`mc` for performing operations on the MinIO
deployment. Install ``mc`` on a machine with network access to the
deployment. See the ``mc`` :ref:`Installation Quickstart <mc-install>` for
more complete instructions.
- An existing MinIO deployment with network access to the Prometheus deployment
Prometheus Service
~~~~~~~~~~~~~~~~~~
- An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
This procedure provides instruction for deploying Prometheus for rapid local
evaluation and development. All other environments should have an existing
Prometheus or Prometheus-compatible service with access to the MinIO cluster.
.. cond:: k8s
Procedure
---------
The MinIO Operator supports deploying a :ref:`per-tenant Prometheus instance <create-tenant-configure-section>` configured to support metrics and visualizations.
This includes automatically configuring the Tenant to enable the :ref:`Tenant Console historical metric view <minio-console-metrics>`.
1) Generate the Bearer Token
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You can still use this procedure to configure an external Prometheus service for supporting monitoring and alerting for a MinIO Tenant.
You must configure all necessary network control components, such as Ingress or a Load Balancer, to facilitate access between the Tenant and the Prometheus service.
This procedure assumes your local host machine can access the Tenant via :mc:`mc`.
MinIO by default requires authentication for requests made to the metrics
endpoints. While this step is not required for MinIO deployments started with
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"``, you can still use the
command output for retrieving a Prometheus ``scrape_configs`` entry.
Configure Prometheus to Collect and Alert using MinIO Metrics
-------------------------------------------------------------
Use the :mc-cmd:`mc admin prometheus generate` command to generate a
JWT bearer token for use by Prometheus in making authenticated scraping
requests:
1) Generate the Scrape Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the
:mc:`alias <mc alias>` of the MinIO deployment.
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
The command returns output similar to the following:
@ -79,24 +64,22 @@ The command returns output similar to the following:
static_configs:
- targets: [minio.example.net]
The ``targets`` array can contain the hostname for any node in the deployment.
For clusters with a load balancer managing connections between MinIO nodes,
specify the address of the load balancer.
- Set the ``job_name`` to a value associated to the MinIO deployment.
Specify the output block to the
:prometheus-docs:`scrape_config
<prometheus/latest/configuration/configuration/#scrape_config>` section of
the Prometheus configuration.
Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.
2) Configure and Run Prometheus
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
- MinIO deployments started with :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"`` can omit the ``bearer_token`` field.
Follow the Prometheus :prometheus-docs:`Getting Started
<prometheus/latest/getting_started/#downloading-and-running-prometheus>` guide
to download and run Prometheus locally.
- Set the ``scheme`` to http for MinIO deployments not using TLS.
Append the ``scrape_configs`` job generated in the previous step to the
configuration file:
- Set the ``targets`` array with a hostname that resolves to the MinIO deployment.
This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
2) Restart Prometheus with the Updated Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
.. code-block:: yaml
:class: copyable
@ -122,10 +105,8 @@ Start the Prometheus cluster using the configuration file:
3) Analyze Collected Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Prometheus includes a
:prometheus-docs:`expression browser
<prometheus/latest/getting_started/#using-the-expression-browser>`. You can
execute queries here to analyze the collected metrics.
Prometheus includes a :prometheus-docs:`expression browser <prometheus/latest/getting_started/#using-the-expression-browser>`.
You can execute queries here to analyze the collected metrics.
The following query examples return metrics collected by Prometheus:
@ -139,386 +120,65 @@ The following query examples return metrics collected by Prometheus:
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete
list of published metrics.
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
.. _minio-console-metrics:
4) Configure an Alert Rule using MinIO Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4) Visualize Collected Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
The :minio-git:`MinIO Console <console>` supports visualizing collected metrics from Prometheus.
Specify the URL of the Prometheus service to the :envvar:`MINIO_PROMETHEUS_URL` environment variable to each MinIO server in the deployment:
.. code-block:: shell
:class: copyable
export MINIO_PROMETHEUS_URL="https://prometheus.example.net"
If you set a custom ``job_name`` for the Prometheus scraping job, you must also set :envvar:`MINIO_PROMETHEUS_JOB_ID` to match that job name.
Restart the deployment using :mc-cmd:`mc admin service restart` to apply the changes.
The MinIO Console uses the metrics collected by the ``minio-job`` scraping job to populate the Dashboard metrics available from :guilabel:`Monitoring > Metrics`.
You can download the metrics from the MinIO Console as either a ``.png`` image or a ``.csv`` file.
.. image:: /images/minio-console/console-metrics.png
:width: 600px
:alt: MinIO Console Dashboard displaying Monitoring Data
:align: center
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
For more complete documentation on configuring a Prometheus data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
Prometheus includes a :prometheus-docs:`graphing interface <prometheus/latest/getting_started/#using-the-graphing-interface>` for visualizing collected metrics.
.. _minio-metrics-and-alerts-endpoints:
Metrics
-------
MinIO provides a scraping endpoint for cluster-level metrics:
.. code-block:: shell
:class: copyable
http://minio.example.net:9000/minio/v2/metrics/cluster
Replace ``http://minio.example.net`` with the hostname of any node in the MinIO
deployment. For deployments with a load balancer managing connections between
MinIO nodes, specify the address of the load balancer.
Create a new :prometheus-docs:`scraping configuration
<prometheus/latest/configuration/configuration/#scrape_config>` to begin
collecting metrics from the MinIO deployment. See
:ref:`minio-metrics-collect-using-prometheus` for a complete tutorial.
The following example describes a ``scrape_configs`` entry for collecting
cluster metrics.
The following example alert rule files provide a baseline of alerts for a MinIO deployment.
You can modify or otherwise use these examples as guidance in building your own alerts.
.. code-block:: yaml
:class: copyable
scrape_configs:
- job_name: minio-job
bearer_token: <secret>
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: ['minio.example.net:9000']
groups:
- name: minio-alerts
rules:
- alert: NodesOffline
expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Node down in MinIO deployment"
description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
.. list-table::
:stub-columns: 1
:widths: 20 80
:width: 100%
- alert: DisksOffline
expr: avg_over_time(minio_cluster_disk_offline_total{job="minio-job"}[5m]) > 0
for: 10m
labels:
severity: warn
annotations:
summary: "Disks down in MinIO deployment"
description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"
* - ``job_name``
- The name of the scraping job.
Specify the path to the alert file to the Prometheus configuration as part of the ``rule_files`` key:
* - ``bearer_token``
- The JWT token generated by :mc-cmd:`mc admin prometheus generate`.
.. code-block:: yaml
Omit this field if the MinIO deployment was started with
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``public``.
global:
scrape_interval: 5s
* - ``targets``
- The endpoint for the MinIO deployment. You can specify any node in the
deployment for collecting cluster metrics. For clusters with a load
balancer managing connections between MinIO nodes, specify the
address of the load balancer.
rule_files:
- minio-alerting.yml
MinIO by default requires authentication for scraping the metrics endpoints.
Use the :mc-cmd:`mc admin prometheus generate` command to generate the
necessary bearer tokens for use with configuring the
``scrape_configs.bearer_token`` field. You can alternatively disable
metrics endpoint authentication by setting
:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
Once triggered, Prometheus sends the alert to the configured AlertManager service.
Visualizing Metrics
~~~~~~~~~~~~~~~~~~~
5) (Optional) Configure MinIO Console to Query Prometheus
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The MinIO Console uses the metrics collected by Prometheus to populate the
Dashboard metrics:
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
.. image:: /images/minio-console/console-metrics.png
:width: 600px
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
:align: center
Set the :envvar:`MINIO_PROMETHEUS_URL` environment variable to the URL of the
Prometheus service to allow the Console to retrieve and display collected
metrics. See :ref:`minio-metrics-collect-using-prometheus` for a complete
example.
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
MinIO also publishes a `Grafana Dashboard
<https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected
metrics. For more complete documentation on configuring a Prometheus data source
for Grafana, see :prometheus-docs:`Grafana Support for Prometheus
<visualization/grafana/>`.
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
.. _minio-metrics-and-alerts-available-metrics:
Available Metrics
~~~~~~~~~~~~~~~~~
MinIO publishes the following metrics, where each metric includes a label for
the MinIO server which generated that metric.
Object Metrics
++++++++++++++
.. metric:: minio_bucket_objects_size_distribution
Distribution of object sizes in the bucket, includes label for the bucket
name.
Replication Metrics
+++++++++++++++++++
These metrics are only populated for MinIO clusters with
:ref:`minio-bucket-replication-serverside` enabled.
.. metric:: minio_bucket_replication_failed_bytes
Total number of bytes failed at least once to replicate.
.. metric:: minio_bucket_replication_pending_bytes
Total bytes pending to replicate.
.. metric:: minio_bucket_replication_received_bytes
Total number of bytes replicated to this bucket from another source bucket.
.. metric:: minio_bucket_replication_sent_bytes
Total number of bytes replicated to the target bucket.
.. metric:: minio_bucket_replication_pending_count
Total number of replication operations pending for this bucket.
.. metric:: minio_bucket_replication_failed_count
Total number of replication operations failed for this bucket.
Bucket Metrics
++++++++++++++
.. metric:: minio_bucket_usage_object_total
Total number of objects
.. metric:: minio_bucket_usage_total_bytes
Total bucket size in bytes
Cache Metrics
+++++++++++++
.. metric:: minio_cache_hits_total
Total number of disk cache hits
.. metric:: minio_cache_missed_total
Total number of disk cache misses
.. metric:: minio_cache_sent_bytes
Total number of bytes served from cache
.. metric:: minio_cache_total_bytes
Total size of cache disk in bytes
.. metric:: minio_cache_usage_info
Total percentage cache usage, value of 1 indicates high and 0 low, label
level is set as well
.. metric:: minio_cache_used_bytes
Current cache usage in bytes
Cluster Metrics
+++++++++++++++
.. metric:: minio_cluster_capacity_raw_free_bytes
Total free capacity online in the cluster.
.. metric:: minio_cluster_capacity_raw_total_bytes
Total capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_free_bytes
Total free usable capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_total_bytes
Total usable capacity online in the cluster.
Node Metrics
++++++++++++
.. metric:: minio_cluster_nodes_offline_total
Total number of MinIO nodes offline.
.. metric:: minio_cluster_nodes_online_total
Total number of MinIO nodes online.
.. metric:: minio_heal_objects_error_total
Objects for which healing failed in current self healing run
.. metric:: minio_heal_objects_heal_total
Objects healed in current self healing run
.. metric:: minio_heal_objects_total
Objects scanned in current self healing run
.. metric:: minio_heal_time_last_activity_nano_seconds
Time elapsed (in nano seconds) since last self healing activity. This is set
to -1 until initial self heal
.. metric:: minio_inter_node_traffic_received_bytes
Total number of bytes received from other peer nodes.
.. metric:: minio_inter_node_traffic_sent_bytes
Total number of bytes sent to the other peer nodes.
.. metric:: minio_node_disk_free_bytes
Total storage available on a disk.
.. metric:: minio_node_disk_total_bytes
Total storage on a disk.
.. metric:: minio_node_disk_used_bytes
Total storage used on a disk.
.. metric:: minio_node_file_descriptor_limit_total
Limit on total number of open file descriptors for the MinIO Server process.
.. metric:: minio_node_file_descriptor_open_total
Total number of open file descriptors by the MinIO Server process.
.. metric:: minio_node_io_rchar_bytes
Total bytes read by the process from the underlying storage system including
cache, ``/proc/[pid]/io rchar``
.. metric:: minio_node_io_read_bytes
Total bytes read by the process from the underlying storage system,
``/proc/[pid]/io read_bytes``
.. metric:: minio_node_io_wchar_bytes
Total bytes written by the process to the underlying storage system including
page cache, ``/proc/[pid]/io wchar``
.. metric:: minio_node_io_write_bytes
Total bytes written by the process to the underlying storage system,
``/proc/[pid]/io write_bytes``
.. metric:: minio_node_process_starttime_seconds
Start time for MinIO process per node, time in seconds since Unix epoch.
.. metric:: minio_node_process_uptime_seconds
Uptime for MinIO process per node in seconds.
.. metric:: minio_node_scanner_bucket_scans_finished
Total number of bucket scans finished since server start.
.. metric:: minio_node_scanner_bucket_scans_started
Total number of bucket scans started since server start.
.. metric:: minio_node_scanner_directories_scanned
Total number of directories scanned since server start.
.. metric:: minio_node_scanner_objects_scanned
Total number of unique objects scanned since server start.
.. metric:: minio_node_scanner_versions_scanned
Total number of object versions scanned since server start.
.. metric:: minio_node_syscall_read_total
Total read SysCalls to the kernel. ``/proc/[pid]/io syscr``
.. metric:: minio_node_syscall_write_total
Total write SysCalls to the kernel. ``/proc/[pid]/io syscw``
S3 Metrics
++++++++++
.. metric:: minio_s3_requests_error_total
Total number S3 requests with errors
.. metric:: minio_s3_requests_inflight_total
Total number of S3 requests currently in flight
.. metric:: minio_s3_requests_total
Total number S3 requests
.. metric:: minio_s3_time_ttbf_seconds_distribution
Distribution of the time to first byte across API calls.
.. metric:: minio_s3_traffic_received_bytes
Total number of s3 bytes received.
.. metric:: minio_s3_traffic_sent_bytes
Total number of s3 bytes sent
Software Metrics
++++++++++++++++
.. metric:: minio_software_commit_info
Git commit hash for the MinIO release.
.. metric:: minio_software_version_info
MinIO Release tag for the server
.. _minio-metrics-and-alerts-alerting:
Alerts
------
You can configure alerts using Prometheus :prometheus-docs:`Alerting Rules
<prometheus/latest/configuration/alerting_rules/>` based on the collected MinIO
metrics. The Prometheus :prometheus-docs:`Alert Manager
<alerting/latest/overview/>` supports managing alerts produced by the configured
alerting rules. Prometheus also supports a :prometheus-docs:`Webhook Receiver
<operating/integrations/#alertmanager-webhook-receiver>` for publishing alerts
to mechanisms not supported by Prometheus AlertManager.
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.

View File

@ -0,0 +1,377 @@
.. _minio-metrics-and-alerts-endpoints:
.. _minio-metrics-and-alerts-alerting:
.. _minio-metrics-and-alerts:
==================
Metrics and Alerts
==================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 2
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
You can use any scraping tool to pull metrics data from MinIO for further analysis and alerting.
MinIO provides a scraping endpoint for cluster-level metrics:
.. code-block:: shell
:class: copyable
http://minio.example.net:9000/minio/v2/metrics/cluster
Replace ``http://minio.example.net`` with the hostname of any node in the MinIO deployment.
For deployments with a load balancer managing connections between MinIO nodes, specify the address of the load balancer.
MinIO by default requires authentication for scraping the metrics endpoints.
Use the :mc-cmd:`mc admin prometheus generate` command to generate the necessary bearer tokens.
You can alternatively disable metrics endpoint authentication by setting :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
.. _minio-console-metrics:
MinIO Console Metrics Dashboard
-------------------------------
The :ref:`MinIO Console <minio-console-monitoring>` provides a point-in-time metrics dashboard by default:
.. image:: /images/minio-console/console-metrics-simple.png
:width: 600px
:alt: MinIO Console with Point-In-Time Metrics
:align: center
The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment.
Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display the following visualizations:
- :guilabel:`Usage` - provides historical and on-demand visualization of overall usage and status
- :guilabel:`Traffic` - provides historical and on-demand visualization of network traffic
- :guilabel:`Resources` - provides historical and on-demand visualization of resources (compute and storage)
- :guilabel:`Info` - provides point-in-time status of the deployment
.. image:: /images/minio-console/console-metrics.png
:width: 600px
:alt: MinIO Console displaying Prometheus-backed Monitoring Data
:align: center
.. cond:: k8s
The MinIO Operator supports deploying a per-tenant Prometheus instance configured to support metrics and visualization.
If you deploy the Tenant with this feature disabled *but* still want the historical metric views, you can instead configure an external Prometheus service to scrape the Tenant metrics.
Once configured, you can update the Tenant to query that Prometheus service to retrieve metric data:
.. cond:: linux or container or macos or windows
To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
.. _minio-metrics-and-alerts-available-metrics:
Available Metrics
-----------------
MinIO publishes the following metrics, where each metric includes a label for
the MinIO server which generated that metric.
Object and Bucket Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_bucket_objects_size_distribution
Distribution of object sizes in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_usage_object_total
Total number of objects in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_usage_total_bytes
Total bucket size in bytes in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
Replication Metrics
~~~~~~~~~~~~~~~~~~~
These metrics are only populated for MinIO clusters with
:ref:`minio-bucket-replication-serverside` enabled.
.. metric:: minio_bucket_replication_failed_bytes
Total number of bytes that failed at least once to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label
.. metric:: minio_bucket_replication_pending_bytes
Total number of bytes pending to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label
.. metric:: minio_bucket_replication_received_bytes
Total number of bytes replicated to this bucket from another source bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_sent_bytes
Total number of bytes replicated to the target bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_pending_count
Total number of replication operations pending for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_failed_count
Total number of replication operations failed for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
Capacity Metrics
~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_capacity_raw_free_bytes
Total free capacity online in the cluster.
.. metric:: minio_cluster_capacity_raw_total_bytes
Total capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_free_bytes
Total free usable capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_total_bytes
Total usable capacity online in the cluster.
.. metric:: minio_node_disk_free_bytes
Total storage available on a specific drive for a node in the MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
.. metric:: minio_node_disk_total_bytes
Total storage on a specific drive for a node in the MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
.. metric:: minio_node_disk_used_bytes
Total storage used on a specific drive for a node in a MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
Lifecycle Management Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_ilm_transitioned_bytes
Total number of bytes transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_cluster_ilm_transitioned_objects
Total number of objects transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_cluster_ilm_transitioned_versions
Total number of non-current object versions transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_node_ilm_transition_pending_tasks
Total number of pending :ref:`object transition <minio-lifecycle-management-tiering>` tasks
.. metric:: minio_node_ilm_expiry_pending_tasks
Total number of pending :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
.. metric:: minio_node_ilm_expiry_active_tasks
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
Node and Disk Health Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_disk_online_total
The total number of disks online
.. metric:: minio_cluster_disk_offline_total
The total number of disks offline
.. metric:: minio_cluster_disk_total
The total number of disks
.. metric:: minio_cluster_nodes_offline_total
Total number of MinIO nodes offline.
.. metric:: minio_cluster_nodes_online_total
Total number of MinIO nodes online.
.. metric:: minio_heal_objects_error_total
Objects for which healing failed in current self healing run
.. metric:: minio_heal_objects_heal_total
Objects healed in current self healing run
.. metric:: minio_heal_objects_total
Objects scanned in current self healing run
.. metric:: minio_heal_time_last_activity_nano_seconds
Time elapsed (in nano seconds) since last self healing activity. This is set
to -1 until initial self heal
Scanner Metrics
~~~~~~~~~~~~~~~
.. metric:: minio_node_scanner_bucket_scans_finished
Total number of bucket scans finished since server start.
.. metric:: minio_node_scanner_bucket_scans_started
Total number of bucket scans started since server start.
.. metric:: minio_node_scanner_directories_scanned
Total number of directories scanned since server start.
.. metric:: minio_node_scanner_objects_scanned
Total number of unique objects scanned since server start.
.. metric:: minio_node_scanner_versions_scanned
Total number of object versions scanned since server start.
.. metric:: minio_node_syscall_read_total
Total number of read SysCalls to the kernel. ``/proc/[pid]/io syscr``
.. metric:: minio_node_syscall_write_total
Total number of write SysCalls to the kernel. ``/proc/[pid]/io syscw``
S3 Metrics
~~~~~~~~~~
.. metric:: minio_bucket_traffic_sent_bytes
Total number of bytes of S3 traffic sent per bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_traffic_received_bytes
Total number of bytes of S3 traffic received per bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_s3_requests_inflight_total
Total number of S3 requests currently in flight.
.. metric:: minio_s3_requests_total
Total number of S3 requests.
.. metric:: minio_s3_time_ttfb_seconds_distribution
Distribution of the time to first byte across API calls.
.. metric:: minio_s3_traffic_received_bytes
Total number of S3 bytes received.
.. metric:: minio_s3_traffic_sent_bytes
Total number of S3 bytes sent.
.. metric:: minio_s3_requests_errors_total
Total number of S3 requests with 4xx and 5xx errors.
.. metric:: minio_s3_requests_4xx_errors_total
Total number of S3 requests with 4xx errors.
.. metric:: minio_s3_requests_5xx_errors_total
Total number of S3 requests with 5xx errors.
Internal Metrics
~~~~~~~~~~~~~~~~
.. metric:: minio_inter_node_traffic_received_bytes
Total number of bytes received from other peer nodes.
.. metric:: minio_inter_node_traffic_sent_bytes
Total number of bytes sent to the other peer nodes.
.. metric:: minio_node_file_descriptor_limit_total
Limit on total number of open file descriptors for the MinIO Server process.
.. metric:: minio_node_file_descriptor_open_total
Total number of open file descriptors by the MinIO Server process.
.. metric:: minio_node_io_rchar_bytes
Total bytes read by the process from the underlying storage system including
cache, ``/proc/[pid]/io rchar``
.. metric:: minio_node_io_read_bytes
Total bytes read by the process from the underlying storage system,
``/proc/[pid]/io read_bytes``
.. metric:: minio_node_io_wchar_bytes
Total bytes written by the process to the underlying storage system including
page cache, ``/proc/[pid]/io wchar``
.. metric:: minio_node_io_write_bytes
Total bytes written by the process to the underlying storage system,
``/proc/[pid]/io write_bytes``
Software and Process Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_software_commit_info
Git commit hash for the MinIO release.
.. metric:: minio_software_version_info
MinIO Release tag for the server
.. metric:: minio_node_process_starttime_seconds
Start time for MinIO process per node, time in seconds since Unix epoch.
.. metric:: minio_node_process_uptime_seconds
Uptime for MinIO process per node in seconds.
.. toctree::
:titlesonly:
:hidden:
/operations/monitoring/collect-minio-metrics-using-prometheus
/operations/monitoring/monitor-and-alert-using-influxdb

View File

@ -0,0 +1,121 @@
.. _minio-metrics-influxdb:
======================================
Monitoring and Alerting using InfluxDB
======================================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 1
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
`InfluxDB <https://www.influxdata.com/?ref=minio>`__ supports scraping MinIO metrics data for monitoring and alerting.
The procedure on this page documents the following:
- Configuring an InfluxDB service to scrape and display metrics from a MinIO deployment
- Configuring an Alert on a MinIO metric
.. admonition:: Prerequisites
:class: note
This procedure requires the following:
- An existing InfluxDB deployment configured with one or more :influxdb-docs:`notification endpoints <notification-endpoints/>`
- An existing MinIO deployment with network access to the InfluxDB deployment
- An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
.. cond:: k8s
This procedure assumes all necessary network control components, such as Ingress or Load Balancers, to facilitate access between the MinIO Tenant and the InfluxDB service.
Configure InfluxDB to Collect and Alert using MinIO Metrics
-----------------------------------------------------------
.. important::
This procedure specifically uses the InfluxDB UI to create a scraping endpoint.
The InfluxDB UI does not provide the same level of configuration as using `Telegraf <https://docs.influxdata.com/telegraf/v1.24/>`__ and the corresponding `Prometheus plugin <https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/prometheus/README.md>`__.
Specifically:
- You cannot enable authenticated access to the MinIO metrics endpoint via the InfluxDB UI
- You cannot set a tag for collected metrics (e.g. ``url_tag``) for uniquely identifying the metrics for a given MinIO deployment
.. cond:: k8s
The Telegraf Prometheus plugin also supports Kubernetes-specific features, such as scraping the ``minio`` service for a given MinIO Tenant.
Configuring Telegraf is out of scope for this procedure.
You can use this procedure as general guidance for configuring Telegraf to scrape MinIO metrics.
.. container:: procedure
1. Configure Public Access to MinIO Metrics
Set the :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` environment variable to ``"public"`` for all nodes in the MinIO deployment.
You can then restart the deployment to allow public access to MinIO metrics.
You can validate the change by attempting to ``curl`` the metrics endpoint:
.. code-block:: shell
:class: copyable
curl https://HOSTNAME/minio/v2/metrics/cluster
Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
The response body should include a list of collected MinIO metrics.
#. Log into the InfluxDB UI and Create a Bucket
Select the :influxdb-docs:`Organization <organizations/view-orgs/>` under which you want to store MinIO metrics.
Create a :influxdb-docs:`New Bucket <organizations/buckets/create-bucket/>` in which to store metrics for the MinIO deployment.
#. Create a new Scraping Source
Create a :influxdb-docs:`new InfluxDB Scraper <write-data/no-code/scrape-data/manage-scrapers/create-a-scraper/>`.
Specify the full URL to the MinIO deployment, including the metrics endpoint:
.. code-block:: shell
:class: copyable
https://HOSTNAME/minio/v2/metrics/cluster
Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
#. Validate the Data
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
#. Configure a Check
Create a :influxdb-docs:`new Check <https://docs.influxdata.com/influxdb/v2.4/monitor-alert/checks/create/>` on a MinIO metric.
The following example check rules provide a baseline of alerts for a MinIO deployment.
You can modify or otherwise use these examples for guidance in building your own checks.
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.
For example, a deployment using EC:4 should set this value to ``3``.
Configure your :influxdb-docs:`Notification endpoints <monitor-alert/notification-endpoints/>` and :influxdb-docs:`Notification rules <monitor-alert/notification-rules/>` such that checks of each type trigger an appropriate response.