mirror of
https://github.com/minio/docs.git
synced 2025-07-28 19:42:10 +03:00
Grafana and metric updates (#953)
- Adds a new page for Grafana to overview. - Replaces the list of metrics in the Metrics and Alerts page with an include to pull the list of metrics maintained in GitHub. - Removes use of the :metric: role throughout the docs. - Adds note about the introduction of a new bucket metric endpoint. Partially addresses #930 Partially addresses #931 Partially addresses #898 Closes #864 Staged: - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html - http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@ -19,4 +19,5 @@ source/developers/haskell/*.md
|
||||
source/developers/java/*.md
|
||||
source/developers/javascript/*.md
|
||||
source/developers/python/*.md
|
||||
source/operations/monitoring/*.md
|
||||
*.inv
|
||||
|
@ -864,7 +864,7 @@ services:
|
||||
|
||||
.. policy-action:: admin:Prometheus
|
||||
|
||||
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts-endpoints>`.
|
||||
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts>`.
|
||||
Only required if MinIO requires authentication for scraping metrics.
|
||||
|
||||
.. policy-action:: admin:ListBatchJobs
|
||||
|
@ -37,7 +37,7 @@ Deployment Metrics
|
||||
|
||||
MinIO provides a Prometheus-compatible endpoint for supporting time-series querying of metrics.
|
||||
|
||||
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts-endpoints>` provide a detailed metrics view through the MinIO Console.
|
||||
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts>` provide a detailed metrics view through the MinIO Console.
|
||||
|
||||
Server Logs
|
||||
-----------
|
||||
|
@ -311,9 +311,3 @@ a notification.
|
||||
:class: copyable
|
||||
|
||||
mc cp ~/data/new-object.txt ALIAS/BUCKET
|
||||
|
||||
Webhook Metrics
|
||||
---------------
|
||||
|
||||
MinIO publishes several :ref:`metrics <minio-metrics-and-alerts>` for monitoring webhook endpoints.
|
||||
See :ref:`minio-metrics-and-alerts-webhook` for a list of available metrics.
|
||||
|
@ -125,9 +125,7 @@ As the cluster or workload increases, scanner performance decreases as it yields
|
||||
|
||||
Consider regularly checking cluster metrics, capacity, and resource usage to ensure the cluster hardware is scaling alongside cluster and workload growth:
|
||||
|
||||
- :ref:`minio-metrics-and-alerts-capacity`
|
||||
- :ref:`minio-metrics-and-alerts-lifecycle-management`
|
||||
- :ref:`minio-metrics-and-alerts-scanner`
|
||||
- :ref:`minio-metrics-and-alerts`
|
||||
|
||||
.. toctree::
|
||||
:hidden:
|
||||
|
@ -535,5 +535,11 @@ for display. This is intentional (For now).
|
||||
|
||||
These are nested and linked.
|
||||
|
||||
Images
|
||||
------
|
||||
|
||||
.. image:: /images/minio-console/minio-console.png
|
||||
:width: 600px
|
||||
:alt: MinIO Console Landing Page provides a view of the Object Browser for the authenticated user
|
||||
:align: center
|
||||
|
||||
|
BIN
source/images/grafana-bucket.png
Normal file
BIN
source/images/grafana-bucket.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 474 KiB |
BIN
source/images/grafana-minio.png
Normal file
BIN
source/images/grafana-minio.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 469 KiB |
@ -38,7 +38,10 @@ MinIO Pre-requisites
|
||||
- Load balancer to handle routing of requests (for example, `NGINX <https://www.nginx.com/>`__)
|
||||
|
||||
* - :octicon:`circle`
|
||||
- :ref:`Prometheus / Grafana <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
|
||||
- :ref:`Prometheus <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
|
||||
|
||||
* - :octicon:`circle`
|
||||
- :ref:`Grafana configured <minio-grafana>` for dashboards
|
||||
|
||||
* - :octicon:`circle`
|
||||
- (optional) :mc:`mc` installed on the local host system
|
||||
|
@ -71,3 +71,5 @@ See :ref:`minio-healthcheck-api` for more information.
|
||||
/operations/monitoring/metrics-and-alerts
|
||||
/operations/monitoring/minio-logging
|
||||
/operations/monitoring/healthcheck-probe
|
||||
/operations/monitoring/grafana
|
||||
|
@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
|
||||
- `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
|
||||
- `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__
|
||||
|
||||
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
|
||||
MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
|
||||
The procedure on this page documents the following:
|
||||
|
||||
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
|
||||
@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics
|
||||
|
||||
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
.. tab-set::
|
||||
|
||||
mc admin prometheus generate ALIAS
|
||||
.. tab-item:: MinIO Server
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
The following command scrapes metrics for the MinIO cluster.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
.. tab-item:: Nodes
|
||||
|
||||
The following command scrapes metrics for a nodes on the MinIO Server.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS node
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
.. tab-item:: Buckets
|
||||
|
||||
The following command scrapes metrics for buckets on the MinIO Server.
|
||||
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate ALIAS bucket
|
||||
|
||||
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
|
||||
|
||||
The command returns output similar to the following:
|
||||
|
||||
@ -81,21 +109,44 @@ The command returns output similar to the following:
|
||||
2) Restart Prometheus with the Updated Configuration
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
|
||||
Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
.. tab-set::
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
.. tab-item:: Cluster metrics
|
||||
|
||||
For server metrics:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/cluster
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
.. tab-item:: Bucket metrics:
|
||||
|
||||
.. code-block:: yaml
|
||||
:class: copyable
|
||||
|
||||
global:
|
||||
scrape_interval: 15s
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job-bucket
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/bucket
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
scrape_configs:
|
||||
- job_name: minio-job
|
||||
bearer_token: TOKEN
|
||||
metrics_path: /minio/v2/metrics/cluster
|
||||
scheme: https
|
||||
static_configs:
|
||||
- targets: [minio.example.net]
|
||||
|
||||
Start the Prometheus cluster using the configuration file:
|
||||
|
||||
@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:
|
||||
|
||||
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
|
||||
|
||||
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
|
||||
See :ref:`minio-metrics-and-alerts` for information about metrics.
|
||||
|
||||
4) Configure an Alert Rule using MinIO Metrics
|
||||
1) Configure an Alert Rule using MinIO Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
|
||||
@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
|
||||
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
|
||||
|
||||
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
|
||||
|
||||
Dashboards
|
||||
----------
|
||||
|
||||
MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
|
||||
For more information, see :ref:`minio-grafana`
|
||||
|
60
source/operations/monitoring/grafana.rst
Normal file
60
source/operations/monitoring/grafana.rst
Normal file
@ -0,0 +1,60 @@
|
||||
.. _minio-grafana:
|
||||
|
||||
===================================
|
||||
Monitor a MinIO Server with Grafana
|
||||
===================================
|
||||
|
||||
.. default-domain:: minio
|
||||
|
||||
.. contents:: Table of Contents
|
||||
:local:
|
||||
:depth: 2
|
||||
|
||||
`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored.
|
||||
Create, explore, and share dashboards with your team and foster a data driven culture.
|
||||
|
||||
Prerequisites
|
||||
-------------
|
||||
|
||||
- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
|
||||
- An existing MinIO deployment with network access to the Prometheus deployment
|
||||
- `Grafana installed <https://grafana.com/grafana/download>`__
|
||||
|
||||
MinIO Grafana Dashboard
|
||||
-----------------------
|
||||
|
||||
MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
|
||||
|
||||
1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
|
||||
2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
|
||||
|
||||
To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
|
||||
|
||||
.. _minio-server-grafana-metrics:
|
||||
|
||||
MinIO Server Metrics Dashboard
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
|
||||
|
||||
MinIO provides a Grafana Dashboard for MinIO Server metrics.
|
||||
For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
|
||||
|
||||
.. image:: /images/grafana-minio.png
|
||||
:width: 600px
|
||||
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
|
||||
:align: center
|
||||
|
||||
.. _minio-buckets-grafana-metrics:
|
||||
|
||||
MinIO Bucket Metrics Dashboard
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
|
||||
|
||||
Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
|
||||
|
||||
.. image:: /images/grafana-bucket.png
|
||||
:width: 600px
|
||||
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
|
||||
:align: center
|
@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.
|
||||
|
||||
The healthcheck probe alone cannot determine if a MinIO server is offline - only
|
||||
that the current host machine cannot reach the server. Consider configuring
|
||||
a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
|
||||
a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the
|
||||
``minio_cluster_nodes_offline_total`` metric to detect whether one or
|
||||
more MinIO nodes are offline.
|
||||
|
||||
Cluster Write Quorum
|
||||
@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
|
||||
processing write operations normally - only whether enough MinIO servers are
|
||||
online to meet write quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
|
||||
:ref:`alert <minio-metrics-and-alerts>` using one of the following
|
||||
metrics to detect potential issues or errors on the MinIO cluster:
|
||||
|
||||
- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
|
||||
- ``minio_cluster_nodes_offline_total`` to alert if one or more
|
||||
MinIO nodes are offline.
|
||||
|
||||
- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
|
||||
- ``minio_node_disk_free_bytes`` to alert if the cluster is running
|
||||
low on free drive space.
|
||||
|
||||
Cluster Read Quorum
|
||||
@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
|
||||
processing read operations normally - only whether enough MinIO servers are
|
||||
online to meet read quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
|
||||
:ref:`alert <minio-metrics-and-alerts>` using the
|
||||
``minio_cluster_nodes_offline_total`` metric to detect whether one or more
|
||||
MinIO nodes are offline.
|
||||
|
||||
Cluster Maintenance Check
|
||||
@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
|
||||
whether enough MinIO servers will be online after taking the node down for
|
||||
maintenance to meet read and write quorum requirements based on the configured
|
||||
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
|
||||
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
|
||||
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
|
||||
:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
|
||||
MinIO nodes are offline.
|
||||
|
@ -68,571 +68,553 @@ Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <pro
|
||||
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
|
||||
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
|
||||
|
||||
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
|
||||
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
|
||||
MinIO Grafana Dashboard
|
||||
-----------------------
|
||||
|
||||
MinIO also publishes two :ref:`Grafana Dashboards <minio-grafana>` for visualizing collected metrics.
|
||||
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see the :prometheus-docs:`Prometheus documentation on Grafana Support <visualization/grafana/>`.
|
||||
|
||||
.. _minio-metrics-and-alerts-available-metrics:
|
||||
|
||||
Available Metrics
|
||||
-----------------
|
||||
|
||||
MinIO publishes the following metrics, where each metric includes a label for
|
||||
the MinIO server which generated that metric.
|
||||
MinIO publishes a number of metrics at the cluster, node, or bucket levels.
|
||||
Each metric includes a label for the MinIO server which generated that metric.
|
||||
|
||||
Object and Bucket Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
.. versionchanged:: MinIO RELEASE.2023-07-21T21-12-44Z
|
||||
|
||||
.. metric:: minio_bucket_objects_size_distribution
|
||||
Bucket metrics have moved to use their own, separate endpoint.
|
||||
|
||||
Distribution of object sizes in a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
- :ref:`Cluster Metrics <minio_available_cluster_metrics>`
|
||||
- :ref:`Node Metrics <minio_available_node_metrics>`
|
||||
- :ref:`Bucket Metrics <minio_available_bucket_metrics>`
|
||||
|
||||
.. metric:: minio_bucket_objects_version_distribution
|
||||
.. _minio_available_cluster_metrics:
|
||||
|
||||
Distribution of number of versions per object in a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_usage_object_total
|
||||
|
||||
Total number of objects in a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_usage_total_bytes
|
||||
|
||||
Total bucket size in bytes in a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_quota_total_bytes
|
||||
|
||||
Total bucket quota size in bytes.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_usage_version_total
|
||||
|
||||
Total number of object versions contained in a bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
|
||||
Replication Metrics
|
||||
~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
These metrics are only populated for MinIO clusters with
|
||||
:ref:`minio-bucket-replication-serverside` enabled.
|
||||
|
||||
.. metric:: minio_bucket_replication_failed_bytes
|
||||
|
||||
Total number of bytes that failed at least once to replicate for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label
|
||||
|
||||
.. metric:: minio_bucket_replication_latency
|
||||
|
||||
Replication latency in milliseconds.
|
||||
|
||||
.. metric:: minio_bucket_replication_pending_bytes
|
||||
|
||||
Total number of bytes pending to replicate for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label
|
||||
|
||||
.. metric:: minio_bucket_replication_received_bytes
|
||||
|
||||
Total number of bytes replicated to this bucket from another source bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_replication_sent_bytes
|
||||
|
||||
Total number of bytes replicated to the target bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_replication_pending_count
|
||||
|
||||
Total number of replication operations pending for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_bucket_replication_failed_count
|
||||
|
||||
Total number of replication operations failed for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. _minio-metrics-and-alerts-capacity:
|
||||
|
||||
Capacity Metrics
|
||||
~~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_cluster_capacity_raw_free_bytes
|
||||
|
||||
Total free capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_cluster_capacity_raw_total_bytes
|
||||
|
||||
Total capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_cluster_capacity_usable_free_bytes
|
||||
|
||||
Total free usable capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_cluster_capacity_usable_total_bytes
|
||||
|
||||
Total usable capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_node_disk_free_bytes
|
||||
|
||||
Total storage available on a specific drive for a node in the MinIO deployment.
|
||||
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||
|
||||
.. metric:: minio_node_disk_total_bytes
|
||||
|
||||
Total storage on a specific drive for a node in the MinIO deployment.
|
||||
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||
|
||||
.. metric:: minio_node_disk_used_bytes
|
||||
|
||||
Total storage used on a specific drive for a node in a MinIO deployment.
|
||||
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
|
||||
|
||||
.. _minio-metrics-and-alerts-lifecycle-management:
|
||||
|
||||
Lifecycle Management Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_cluster_ilm_transitioned_bytes
|
||||
|
||||
Total number of bytes transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||
|
||||
.. metric:: minio_cluster_ilm_transitioned_objects
|
||||
|
||||
Total number of objects transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||
|
||||
.. metric:: minio_cluster_ilm_transitioned_versions
|
||||
|
||||
Total number of non-current object versions transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
|
||||
|
||||
.. metric:: minio_node_ilm_transition_pending_tasks
|
||||
|
||||
Total number of pending :ref:`object transition <minio-lifecycle-management-tiering>` tasks
|
||||
|
||||
.. metric:: minio_node_ilm_transition_active_tasks
|
||||
|
||||
Number of active ILM transition tasks
|
||||
|
||||
.. metric:: minio_node_ilm_expiry_pending_tasks
|
||||
|
||||
Total number of pending :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
|
||||
|
||||
.. metric:: minio_node_ilm_expiry_active_tasks
|
||||
|
||||
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
|
||||
|
||||
.. metric:: minio_node_ilm_versions_scanned
|
||||
|
||||
Total number of object versions checked for ilm actions since server start
|
||||
|
||||
Node and Drive Health Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_cluster_disk_online_total
|
||||
|
||||
The total number of drives online
|
||||
|
||||
.. metric:: minio_cluster_disk_offline_total
|
||||
|
||||
The total number of drives offline
|
||||
|
||||
.. metric:: minio_cluster_disk_total
|
||||
|
||||
The total number of drives
|
||||
|
||||
.. metric:: minio_cluster_nodes_offline_total
|
||||
|
||||
Total number of MinIO nodes offline.
|
||||
|
||||
.. metric:: minio_cluster_nodes_online_total
|
||||
|
||||
Total number of MinIO nodes online.
|
||||
|
||||
.. metric:: minio_node_disk_free_inodes
|
||||
|
||||
Total free inodes.
|
||||
|
||||
.. metric:: minio_node_disk_latency_us
|
||||
|
||||
Average last minute latency in µs for drive API storage operations.
|
||||
|
||||
.. metric:: minio_node_disk_offline_total
|
||||
|
||||
Total drives offline.
|
||||
|
||||
.. metric:: minio_node_disk_online_total
|
||||
|
||||
Total drives online.
|
||||
|
||||
.. metric:: minio_node_disk_total
|
||||
|
||||
Total drives.
|
||||
|
||||
.. metric:: minio_heal_objects_errors_total
|
||||
|
||||
Objects for which healing failed in current self healing run
|
||||
|
||||
.. metric:: minio_heal_objects_heal_total
|
||||
|
||||
Objects healed in current self healing run
|
||||
|
||||
.. metric:: minio_heal_objects_total
|
||||
|
||||
Objects scanned in current self healing run
|
||||
|
||||
.. metric:: minio_heal_time_last_activity_nano_seconds
|
||||
|
||||
Time elapsed (in nano seconds) since last self healing activity. This is set
|
||||
to -1 until initial self heal
|
||||
|
||||
.. metric:: minio_node_storage_class_standard_parity
|
||||
|
||||
The configured value of :envvar:`MINIO_STORAGE_CLASS_STANDARD`.
|
||||
|
||||
Use this to alert for changes to the Standard :ref:`erasure parity <minio-erasure-coding>`.
|
||||
|
||||
.. metric:: minio_node_storage_class_rrs_parity
|
||||
|
||||
The configured value of :envvar:`MINIO_STORAGE_CLASS_RRS`.
|
||||
|
||||
Use this to alert for changes to the Reduced :ref:`erasure parity <minio-erasure-coding>`.
|
||||
|
||||
Notification Queue Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_audit_target_queue_length
|
||||
|
||||
Total number of unsent audit messages in the queue.
|
||||
|
||||
.. metric:: minio_audit_total_messages
|
||||
|
||||
Total number of audit messages sent since last server start.
|
||||
|
||||
.. metric:: minio_audit_failed_messages
|
||||
|
||||
Total number of audit messages which failed to send since last server start.
|
||||
|
||||
.. metric:: minio_notify_current_send_in_progress
|
||||
|
||||
Total number of notification messages in progress to configured targets.
|
||||
|
||||
.. metric:: minio_notify_target_queue_length
|
||||
|
||||
Total number of unsent notification messages in the queue.
|
||||
|
||||
.. _minio-metrics-and-alerts-scanner:
|
||||
|
||||
Scanner Metrics
|
||||
Cluster Metrics
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_node_scanner_bucket_scans_finished
|
||||
Each metric includes the following labels:
|
||||
|
||||
Total number of bucket scans finished since server start.
|
||||
- Server that generated the metric.
|
||||
- Server that calculated the metric.
|
||||
|
||||
.. metric:: minio_node_scanner_bucket_scans_started
|
||||
These metrics can be obtained from any MinIO server once per collection.
|
||||
|
||||
Total number of bucket scans started since server start.
|
||||
Audit Metrics
|
||||
+++++++++++++
|
||||
|
||||
.. metric:: minio_node_scanner_directories_scanned
|
||||
``minio_audit_failed_messages``
|
||||
Total number of messages that failed to send since start.
|
||||
|
||||
Total number of directories scanned since server start.
|
||||
``minio_audit_target_queue_length``
|
||||
Number of unsent messages in queue for target.
|
||||
|
||||
.. metric:: minio_node_scanner_objects_scanned
|
||||
``minio_audit_total_messages``
|
||||
Total number of messages sent since start.
|
||||
|
||||
Total number of unique objects scanned since server start.
|
||||
Cluster Capacity Metrics
|
||||
++++++++++++++++++++++++
|
||||
|
||||
.. metric:: minio_node_scanner_versions_scanned
|
||||
``minio_cluster_capacity_raw_free_bytes``
|
||||
Total free capacity online in the cluster.
|
||||
|
||||
Total number of object versions scanned since server start.
|
||||
``minio_cluster_capacity_raw_total_bytes``
|
||||
Total capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_node_syscall_read_total
|
||||
``minio_cluster_capacity_usable_free_bytes``
|
||||
Total free usable capacity online in the cluster.
|
||||
|
||||
Total number of read SysCalls to the kernel. ``/proc/[pid]/io syscr``
|
||||
``minio_cluster_capacity_usable_total_bytes``
|
||||
Total usable capacity online in the cluster.
|
||||
|
||||
.. metric:: minio_node_syscall_write_total
|
||||
Cluster Usage Metrics
|
||||
+++++++++++++++++++++
|
||||
|
||||
Total number of write SysCalls to the kernel. ``/proc/[pid]/io syscw``
|
||||
``minio_cluster_objects_size_distribution``
|
||||
Distribution of object sizes across a cluster.
|
||||
|
||||
.. metric:: minio_usage_last_activity_nano_seconds
|
||||
``minio_cluster_objects_version_distribution``
|
||||
Distribution of object sizes across a cluster.
|
||||
|
||||
Time elapsed since last scan activity.
|
||||
This is set to ``0`` until first scan cycle.
|
||||
``minio_cluster_usage_object_total``
|
||||
Total number of objects in a cluster.
|
||||
|
||||
S3 Metrics
|
||||
~~~~~~~~~~
|
||||
``minio_cluster_usage_total_bytes``
|
||||
Total cluster usage in bytes.
|
||||
|
||||
.. metric:: minio_bucket_traffic_sent_bytes
|
||||
``minio_cluster_usage_version_total``
|
||||
Total number of versions (includes delete marker) in a cluster.
|
||||
|
||||
Total number of bytes of S3 traffic sent per bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
``minio_cluster_usage_deletemarker_total``
|
||||
Total number of delete markers in a cluster.
|
||||
|
||||
.. metric:: minio_bucket_traffic_received_bytes
|
||||
``minio_cluster_usage_total_bytes``
|
||||
Total cluster usage in bytes.
|
||||
|
||||
Total number of bytes of S3 traffic received per bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
``minio_cluster_buckets_total``
|
||||
Total number of buckets in the cluster.
|
||||
|
||||
.. metric:: minio_s3_requests_incoming_total
|
||||
Drive Metrics
|
||||
+++++++++++++
|
||||
|
||||
Volatile number of total incoming S3 requests.
|
||||
``minio_cluster_disk_offline_total``
|
||||
Total drives offline.
|
||||
|
||||
.. metric:: minio_s3_requests_canceled_total
|
||||
``minio_cluster_disk_online_total``
|
||||
Total drives online.
|
||||
|
||||
Total number S3 requests that were canceled from the client while processing.
|
||||
``minio_cluster_disk_total``
|
||||
Total drives.
|
||||
|
||||
.. metric:: minio_s3_requests_inflight_total
|
||||
ILM Metrics
|
||||
+++++++++++
|
||||
|
||||
Total number of S3 requests currently in flight.
|
||||
``minio_cluster_ilm_transitioned_bytes``
|
||||
Total bytes transitioned to a tier.
|
||||
|
||||
.. metric:: minio_s3_requests_total
|
||||
``minio_cluster_ilm_transitioned_objects``
|
||||
Total number of objects transitioned to a tier.
|
||||
|
||||
Total number of S3 requests.
|
||||
``minio_cluster_ilm_transitioned_versions``
|
||||
Total number of versions transitioned to a tier.
|
||||
|
||||
.. metric:: minio_s3_requests_rejected_auth_total
|
||||
``minio_node_ilm_expiry_active_tasks``
|
||||
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks.
|
||||
|
||||
Total number S3 requests rejected for auth failure.
|
||||
|
||||
.. metric:: minio_s3_requests_rejected_header_total
|
||||
Key Management Metrics
|
||||
++++++++++++++++++++++
|
||||
|
||||
Total number S3 requests rejected for invalid header.
|
||||
``minio_cluster_kms_online``
|
||||
Reports whether the KMS is online (1) or offline (0).
|
||||
|
||||
.. metric:: minio_s3_requests_rejected_invalid_total
|
||||
``minio_cluster_kms_request_error``
|
||||
Number of KMS requests that failed due to some error.
|
||||
(HTTP 4xx status code).
|
||||
|
||||
Total number S3 invalid requests.
|
||||
``minio_cluster_kms_request_failure``
|
||||
Number of KMS requests that failed due to some internal failure.
|
||||
(HTTP 5xx status code).
|
||||
|
||||
.. metric:: minio_s3_requests_rejected_timestamp_total
|
||||
``minio_cluster_kms_request_success``
|
||||
Number of KMS requests that succeeded.
|
||||
|
||||
Total number S3 requests rejected for invalid timestamp.
|
||||
``minio_cluster_kms_uptime``
|
||||
The time the KMS has been up and running in seconds.
|
||||
|
||||
.. metric:: minio_s3_requests_waiting_total
|
||||
Cluster Health Metrics
|
||||
++++++++++++++++++++++
|
||||
|
||||
Number of S3 requests in the waiting queue.
|
||||
``minio_cluster_nodes_offline_total``
|
||||
Total number of MinIO nodes offline.
|
||||
|
||||
.. metric:: minio_s3_time_ttfb_seconds_distribution
|
||||
``minio_cluster_nodes_online_total``
|
||||
Total number of MinIO nodes online.
|
||||
|
||||
Distribution of the time to first byte across API calls.
|
||||
``minio_cluster_write_quorum``
|
||||
Maximum write quorum across all pools and sets.
|
||||
|
||||
.. metric:: minio_s3_traffic_received_bytes
|
||||
``minio_cluster_health_status``
|
||||
Get current cluster health status.
|
||||
|
||||
Total number of S3 bytes received.
|
||||
``minio_heal_objects_errors_total``
|
||||
Objects for which healing failed in current self healing run.
|
||||
|
||||
.. metric:: minio_s3_traffic_sent_bytes
|
||||
``minio_heal_objects_heal_total``
|
||||
Objects healed in current self healing run.
|
||||
|
||||
Total number of S3 bytes sent.
|
||||
``minio_heal_objects_total``
|
||||
Objects scanned in current self healing run.
|
||||
|
||||
.. metric:: minio_s3_requests_errors_total
|
||||
``minio_heal_time_last_activity_nano_seconds``
|
||||
Time elapsed (in nano seconds) since last self healing activity.
|
||||
|
||||
.. versionchanged:: MinIO RELEASE.2023-04-28T18-11-17Z
|
||||
``minio_minio_update_percent``
|
||||
Total percentage cache usage.
|
||||
|
||||
This metric has been removed.
|
||||
Use ``minio_s3_requests_4xx_errors_total`` and ``minio_s3_requests_5xx_errors_total`` instead.
|
||||
``minio_software_commit_info``
|
||||
Git commit hash for the MinIO release.
|
||||
|
||||
Total number of S3 requests with 4xx and 5xx errors.
|
||||
``minio_software_version_info``
|
||||
MinIO Release tag for the server.
|
||||
|
||||
.. metric:: minio_s3_requests_4xx_errors_total
|
||||
``minio_usage_last_activity_nano_seconds``
|
||||
Time elapsed (in nano seconds) since last scan activity.
|
||||
|
||||
Total number of S3 requests with 4xx errors.
|
||||
Inter Node Metrics
|
||||
++++++++++++++++++
|
||||
|
||||
.. metric:: minio_s3_requests_5xx_errors_total
|
||||
``minio_inter_node_traffic_dial_avg_time``
|
||||
Average time of internodes TCP dial calls.
|
||||
|
||||
Total number of S3 requests with 5xx errors.
|
||||
``minio_inter_node_traffic_dial_errors``
|
||||
Total number of internode TCP dial timeouts and errors.
|
||||
|
||||
IAM Metrics
|
||||
~~~~~~~~~~~
|
||||
``minio_inter_node_traffic_errors_total``
|
||||
Total number of failed internode calls.
|
||||
|
||||
.. metric:: minio_node_iam_last_sync_duration_millis
|
||||
``minio_inter_node_traffic_received_bytes``
|
||||
Total number of bytes received from other peer nodes.
|
||||
|
||||
Last successful IAM data sync duration in milliseconds.
|
||||
``minio_inter_node_traffic_sent_bytes``
|
||||
Total number of bytes sent to the other peer nodes.
|
||||
|
||||
.. metric:: minio_node_iam_since_last_sync_millis
|
||||
S3 Request Metrics
|
||||
++++++++++++++++++
|
||||
|
||||
Time (in milliseconds) since last successful IAM data sync.
|
||||
``minio_s3_requests_4xx_errors_total``
|
||||
Total number S3 requests with (4xx) errors.
|
||||
|
||||
This value starts at zero and only increments after the the first sync after server start.
|
||||
``minio_s3_requests_5xx_errors_total``
|
||||
Total number S3 requests with (5xx) errors.
|
||||
|
||||
.. metric:: minio_node_iam_sync_failures
|
||||
``minio_s3_requests_canceled_total``
|
||||
Total number S3 requests canceled by the client.
|
||||
|
||||
Number of failed IAM data syncs since server start.
|
||||
``minio_s3_requests_errors_total``
|
||||
Total number S3 requests with (4xx and 5xx) errors.
|
||||
|
||||
.. metric:: minio_node_iam_sync_successes
|
||||
``minio_s3_requests_incoming_total``
|
||||
Volatile number of total incoming S3 requests.
|
||||
|
||||
Number of successful IAM data syncs since server start.
|
||||
``minio_s3_requests_inflight_total``
|
||||
Total number of S3 requests currently in flight.
|
||||
|
||||
``minio_s3_requests_rejected_auth_total``
|
||||
Total number S3 requests rejected for auth failure.
|
||||
|
||||
``minio_s3_requests_rejected_header_total``
|
||||
Total number S3 requests rejected for invalid header.
|
||||
|
||||
``minio_s3_requests_rejected_invalid_total``
|
||||
Total number S3 invalid requests.
|
||||
|
||||
``minio_s3_requests_rejected_timestamp_total``
|
||||
Total number S3 requests rejected for invalid timestamp.
|
||||
|
||||
``minio_s3_requests_total``
|
||||
Total number S3 requests.
|
||||
|
||||
``minio_s3_requests_waiting_total``
|
||||
Number of S3 requests in the waiting queue.
|
||||
|
||||
``minio_s3_requests_ttfb_seconds_distribution``
|
||||
Distribution of the time to first byte across API calls.
|
||||
|
||||
``minio_s3_traffic_received_bytes``
|
||||
Total number of s3 bytes received.
|
||||
|
||||
``minio_s3_traffic_sent_bytes``
|
||||
Total number of s3 bytes sent.
|
||||
|
||||
Lock Metrics
|
||||
++++++++++++
|
||||
|
||||
``minio_locks_total``
|
||||
Total number of current locks on the peer.
|
||||
|
||||
``minio_locks_write_total``
|
||||
Number of current WRITE locks on the peer.
|
||||
|
||||
``minio_locks_read_total``
|
||||
Number of current READ locks on the peer.
|
||||
|
||||
Webhook Metrics
|
||||
+++++++++++++++
|
||||
|
||||
``minio_cluster_webhook_failed_messages``
|
||||
Number of messages that failed to send.
|
||||
|
||||
``minio_cluster_webhook_online``
|
||||
Reports whether the webhook endpoint is online (1) or offline (0).
|
||||
|
||||
``minio_cluster_webhook_queue_length``
|
||||
Number of messages in the webhook queue.
|
||||
|
||||
``minio_cluster_webhook_total_messages``
|
||||
Number of messages sent to this webhook endpoint.
|
||||
|
||||
|
||||
.. _minio_available_node_metrics:
|
||||
|
||||
Node Metrics
|
||||
~~~~~~~~~~~~
|
||||
|
||||
Each metric includes the following labels:
|
||||
|
||||
- Server that generated the metric.
|
||||
- Server that calculated the metric.
|
||||
|
||||
These metrics can be obtained from any MinIO server once per collection.
|
||||
|
||||
Drive Metrics
|
||||
+++++++++++++
|
||||
|
||||
``minio_node_disk_free_bytes``
|
||||
Total storage available on a drive.
|
||||
|
||||
``minio_node_disk_free_inodes``
|
||||
Total free inodes.
|
||||
|
||||
``minio_node_disk_latency_us``
|
||||
Average last minute latency in µs for drive API storage operations.
|
||||
|
||||
``minio_node_disk_offline_total``
|
||||
Total drives offline.
|
||||
|
||||
``minio_node_disk_online_total``
|
||||
Total drives online.
|
||||
|
||||
``minio_node_disk_total``
|
||||
Total drives.
|
||||
|
||||
``minio_node_disk_total_bytes``
|
||||
Total storage on a drive.
|
||||
|
||||
``minio_node_disk_used_bytes``
|
||||
Total storage used on a drive.
|
||||
|
||||
File Metrics
|
||||
++++++++++++
|
||||
|
||||
``minio_node_file_descriptor_limit_total``
|
||||
Limit on total number of open file descriptors for the MinIO Server process.
|
||||
|
||||
``minio_node_file_descriptor_open_total``
|
||||
Total number of open file descriptors by the MinIO Server process.
|
||||
|
||||
Go Metrics
|
||||
++++++++++
|
||||
|
||||
``minio_node_go_routine_total``
|
||||
Total number of go routines running.
|
||||
|
||||
Access Management (IAM) Metrics
|
||||
+++++++++++++++++++++++++++++++
|
||||
|
||||
``minio_node_iam_last_sync_duration_millis``
|
||||
Last successful IAM data sync duration in milliseconds.
|
||||
|
||||
``minio_node_iam_since_last_sync_millis``
|
||||
Time (in milliseconds) since last successful IAM data sync.
|
||||
|
||||
``minio_node_iam_sync_failures``
|
||||
Number of failed IAM data syncs since server start.
|
||||
|
||||
``minio_node_iam_sync_successes``
|
||||
Number of successful IAM data syncs since server start.
|
||||
|
||||
Lifecycle Management (ILM) Metrics
|
||||
++++++++++++++++++++++++++++++++++
|
||||
|
||||
``minio_node_ilm_expiry_pending_tasks``
|
||||
Number of pending ILM expiry tasks in the queue.
|
||||
|
||||
``minio_node_ilm_transition_active_tasks``
|
||||
Number of active ILM transition tasks.
|
||||
|
||||
``minio_node_ilm_transition_pending_tasks``
|
||||
Number of pending ILM transition tasks in the queue.
|
||||
|
||||
``minio_node_ilm_versions_scanned``
|
||||
Total number of object versions checked for ilm actions since server start.
|
||||
|
||||
I/O Metrics
|
||||
+++++++++++
|
||||
|
||||
``minio_node_io_rchar_bytes``
|
||||
Total bytes read by the process from the underlying storage system including cache, ``/proc/[pid]/io`` rchar.
|
||||
|
||||
``minio_node_io_read_bytes``
|
||||
Total bytes read by the process from the underlying storage system, ``/proc/[pid]/io`` read_bytes.
|
||||
|
||||
``minio_node_io_wchar_bytes``
|
||||
Total bytes written by the process to the underlying storage system including page cache, ``/proc/[pid]/io`` wchar.
|
||||
|
||||
``minio_node_io_write_bytes``
|
||||
Total bytes written by the process to the underlying storage system, ``/proc/[pid]/io`` write_bytes.
|
||||
|
||||
Process Metrics
|
||||
+++++++++++++++
|
||||
|
||||
``minio_node_process_cpu_total_seconds``
|
||||
Total user and system CPU time spent in seconds.
|
||||
|
||||
``minio_node_process_resident_memory_bytes``
|
||||
Resident memory size in bytes.
|
||||
|
||||
``minio_node_process_starttime_seconds``
|
||||
Start time for MinIO process per node, time in seconds since Unix epoc.
|
||||
|
||||
``minio_node_process_uptime_seconds``
|
||||
Uptime for MinIO process per node in seconds.
|
||||
|
||||
Scanner Metrics
|
||||
+++++++++++++++
|
||||
|
||||
``minio_node_scanner_bucket_scans_finished``
|
||||
Total number of bucket scans finished since server start.
|
||||
|
||||
``minio_node_scanner_bucket_scans_started``
|
||||
Total number of bucket scans started since server start.
|
||||
|
||||
``minio_node_scanner_directories_scanned``
|
||||
Total number of directories scanned since server start.
|
||||
|
||||
``minio_node_scanner_objects_scanned``
|
||||
Total number of unique objects scanned since server start.
|
||||
|
||||
``minio_node_scanner_versions_scanned``
|
||||
Total number of object versions scanned since server start.
|
||||
|
||||
Read and Write Metrics
|
||||
++++++++++++++++++++++
|
||||
|
||||
``minio_node_syscall_read_total``
|
||||
Total read SysCalls to the kernel.
|
||||
``/proc/[pid]/io`` syscr.
|
||||
|
||||
``minio_node_syscall_write_total``
|
||||
Total write SysCalls to the kernel.
|
||||
``/proc/[pid]/io`` syscw.
|
||||
|
||||
Notification Metrics
|
||||
++++++++++++++++++++
|
||||
|
||||
``minio_notify_current_send_in_progress``
|
||||
Number of concurrent async Send calls active to all targets.
|
||||
|
||||
``minio_notify_target_queue_length``
|
||||
Number of unsent notifications in queue for target.
|
||||
|
||||
IAM Plugin Metrics
|
||||
~~~~~~~~~~~~~~~~~~
|
||||
++++++++++++++++++
|
||||
|
||||
.. note::
|
||||
|
||||
The metrics in this section require that you have configured the :ref:`MinIO External Identity Management Plugin <minio-external-identity-management-plugin>`.
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_last_succ_seconds
|
||||
``minio_node_iam_plugin_authn_service_last_succ_seconds``
|
||||
Time (in seconds) since last successful request to the external IDP service.
|
||||
|
||||
Time (in seconds) since last successful request to the external IDP service.
|
||||
``minio_node_iam_plugin_authn_service_last_fail_seconds``
|
||||
Time (in seconds) since last failed request to the external IDP service.
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_last_fail_seconds
|
||||
``minio_node_iam_plugin_authn_service_total_requests_minute``
|
||||
Total requests count to the external IDP service in the last full minute.
|
||||
|
||||
Time (in seconds) since last failed request to the external IDP service.
|
||||
``minio_node_iam_plugin_authn_service_failed_requests_minute``
|
||||
Count of the failed requests to the external IDP service in the last full minute.
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_total_requests_minute
|
||||
``minio_node_iam_plugin_authn_service_succ_avg_rtt_ms_minute``
|
||||
Average round trip time (RTT) of successful requests to the IDP service in the last full minute.
|
||||
|
||||
Total requests count to the external IDP service in the last full minute.
|
||||
``minio_node_iam_plugin_authn_service_succ_max_rtt_ms_minute``
|
||||
Maximum round trip time (RTT) of successful requests to the IDP service in the last full minute.
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_failed_requests_minute
|
||||
|
||||
Count of the failed requests to the external IDP service in the last full minute.
|
||||
.. _minio_available_bucket_metrics:
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_succ_avg_rtt_ms_minute
|
||||
Bucket Metrics
|
||||
~~~~~~~~~~~~~~
|
||||
|
||||
Average round trip time (RTT) of successful requests to the IDP service in the last full minute.
|
||||
Each bucket metric includes the following labels:
|
||||
|
||||
.. metric:: minio_node_iam_plugin_authn_service_succ_max_rtt_ms_minute
|
||||
- The server that calculated the metric.
|
||||
- The server that generated the metric.
|
||||
- The bucket the metric is for.
|
||||
|
||||
Maximum round trip time (RTT) of successful requests to the IDP service in the last full minute.
|
||||
These metrics can be obtained from any MinIO server once per collection.
|
||||
|
||||
Internal Metrics
|
||||
~~~~~~~~~~~~~~~~
|
||||
Distribution Metrics
|
||||
++++++++++++++++++++
|
||||
|
||||
.. metric:: minio_inter_node_traffic_received_bytes
|
||||
``minio_bucket_objects_size_distribution``
|
||||
Distribution of object sizes in the bucket, includes label for the bucket name.
|
||||
|
||||
Total number of bytes received from other peer nodes.
|
||||
``minio_bucket_objects_version_distribution``
|
||||
Distribution of object sizes in a bucket, by number of versions.
|
||||
|
||||
.. metric:: minio_inter_node_traffic_sent_bytes
|
||||
``minio_bucket_quota_total_bytes``
|
||||
Total bucket quota size in bytes.
|
||||
|
||||
Total number of bytes sent to the other peer nodes.
|
||||
Replication Metrics
|
||||
+++++++++++++++++++
|
||||
|
||||
.. metric:: minio_inter_node_traffic_dial_avg_time
|
||||
.. note::
|
||||
|
||||
Average time of internodes TCP dial calls.
|
||||
The metrics for bucket replication only populate for MinIO clusters with :ref:`minio-bucket-replication-serverside` enabled.
|
||||
|
||||
.. metric:: minio_inter_node_traffic_dial_errors
|
||||
``minio_bucket_replication_failed_count``
|
||||
Total number of objects which failed replication.
|
||||
|
||||
Total number of internode TCP dial timeouts and errors.
|
||||
``minio_bucket_replication_latency_ms``
|
||||
Replication latency in milliseconds.
|
||||
|
||||
.. versionadded:: MinIO RELEASE.2023-04-28T18-11-17Z
|
||||
``minio_bucket_replication_received_bytes``
|
||||
Total number of bytes replicated to this bucket from another source bucket.
|
||||
|
||||
This metric is available on the MinIO Dashboard if :ref:`Prometheus <minio-metrics-collect-using-prometheus>` and Grafana are enabled.
|
||||
``minio_bucket_replication_sent_bytes``
|
||||
Total number of bytes replicated to the target bucket.
|
||||
|
||||
.. metric:: minio_inter_node_traffic_errors_total
|
||||
``minio_bucket_replication_failed_bytes``
|
||||
Total number of bytes that failed at least once to replicate for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
Total number of failed internode calls.
|
||||
``minio_bucket_replication_pending_bytes``
|
||||
Total number of bytes pending to replicate for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
.. metric:: minio_node_file_descriptor_limit_total
|
||||
``minio_bucket_replication_pending_count``
|
||||
Total number of replication operations pending for a given bucket.
|
||||
You can identify the bucket using the ``{ bucket="STRING" }`` label.
|
||||
|
||||
Limit on total number of open file descriptors for the MinIO Server process.
|
||||
Traffic Metrics
|
||||
+++++++++++++++
|
||||
|
||||
.. metric:: minio_node_file_descriptor_open_total
|
||||
``minio_bucket_traffic_received_bytes``
|
||||
Total number of S3 bytes received for this bucket.
|
||||
|
||||
Total number of open file descriptors by the MinIO Server process.
|
||||
``minio_bucket_traffic_sent_bytes``
|
||||
Total number of S3 bytes sent for this bucket.
|
||||
|
||||
.. metric:: minio_node_io_rchar_bytes
|
||||
Usage Metrics
|
||||
+++++++++++++
|
||||
|
||||
Total bytes read by the process from the underlying storage system including
|
||||
cache, ``/proc/[pid]/io rchar``
|
||||
``minio_bucket_usage_object_total``
|
||||
Total number of objects.
|
||||
|
||||
.. metric:: minio_node_io_read_bytes
|
||||
``minio_bucket_usage_version_total``
|
||||
Total number of versions (includes delete marker).
|
||||
|
||||
Total bytes read by the process from the underlying storage system,
|
||||
``/proc/[pid]/io read_bytes``
|
||||
``minio_bucket_usage_deletemarker_total``
|
||||
Total number of delete markers.
|
||||
|
||||
.. metric:: minio_node_io_wchar_bytes
|
||||
``minio_bucket_usage_total_bytes``
|
||||
Total bucket size in bytes.
|
||||
|
||||
Total bytes written by the process to the underlying storage system including
|
||||
page cache, ``/proc/[pid]/io wchar``
|
||||
Requests Metrics
|
||||
++++++++++++++++
|
||||
|
||||
.. metric:: minio_node_io_write_bytes
|
||||
``minio_bucket_requests_4xx_errors_total``
|
||||
Total number of S3 requests with (4xx) errors on a bucket.
|
||||
|
||||
Total bytes written by the process to the underlying storage system,
|
||||
``/proc/[pid]/io write_bytes``
|
||||
``minio_bucket_requests_5xx_errors_total``
|
||||
Total number of S3 requests with (5xx) errors on a bucket.
|
||||
|
||||
Key Management System (KMS) Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
``minio_bucket_requests_inflight_total``
|
||||
Total number of S3 requests currently in flight on a bucket.
|
||||
|
||||
.. metric:: minio_cluster_kms_online
|
||||
``minio_bucket_requests_total``
|
||||
Total number of S3 requests on a bucket.
|
||||
|
||||
Reports whether the KMS is online (1) or offline (0).
|
||||
``minio_bucket_requests_canceled_total``
|
||||
Total number S3 requests canceled by the client.
|
||||
|
||||
.. metric:: minio_cluster_kms_request_error
|
||||
|
||||
Number of KMS requests that failed due to some error. (HTTP 4xx status code).
|
||||
|
||||
.. metric:: minio_cluster_kms_request_failure
|
||||
|
||||
Number of KMS requests that failed due to some internal failure. (HTTP 5xx status code).
|
||||
|
||||
.. metric:: minio_cluster_kms_request_success
|
||||
|
||||
Number of KMS requests that succeeded.
|
||||
|
||||
.. metric:: minio_cluster_kms_uptime
|
||||
|
||||
The time the KMS has been up and running in seconds.
|
||||
|
||||
Software and Process Metrics
|
||||
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_software_commit_info
|
||||
|
||||
Git commit hash for the MinIO release.
|
||||
|
||||
.. metric:: minio_software_version_info
|
||||
|
||||
MinIO Release tag for the server
|
||||
|
||||
.. metric:: minio_node_go_routine_total
|
||||
|
||||
Total number of go routines running.
|
||||
|
||||
.. metric:: minio_node_process_starttime_seconds
|
||||
|
||||
Start time for MinIO process per node, time in seconds since Unix epoch.
|
||||
|
||||
.. metric:: minio_node_process_uptime_seconds
|
||||
|
||||
Uptime for MinIO process per node in seconds.
|
||||
|
||||
.. metric:: minio_node_process_cpu_total_seconds
|
||||
|
||||
Total user and system CPU time spent in seconds.
|
||||
|
||||
.. metric:: minio_node_process_resident_memory_bytes
|
||||
|
||||
Resident memory size in bytes.
|
||||
|
||||
Lock Metrics
|
||||
~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_locks_total
|
||||
|
||||
Total number of current locks on the peer.
|
||||
|
||||
.. metric:: minio_locks_write_total
|
||||
|
||||
Number of current WRITE locks on the peer.
|
||||
|
||||
.. metric:: minio_locks_read_total
|
||||
|
||||
Number of current READ locks on the peer.
|
||||
|
||||
.. _minio-metrics-and-alerts-webhook:
|
||||
|
||||
Webhook Metrics
|
||||
~~~~~~~~~~~~~~~
|
||||
|
||||
.. metric:: minio_cluster_webhook_failed_messages
|
||||
|
||||
Number of messages that failed to send.
|
||||
|
||||
.. metric:: minio_cluster_webhook_online
|
||||
|
||||
Reports whether the webhook endpoint is online (1) or offline (0).
|
||||
|
||||
.. metric:: minio_cluster_webhook_queue_length
|
||||
|
||||
Number of messages in the webhook queue.
|
||||
|
||||
.. metric:: minio_cluster_webhook_total_messages
|
||||
|
||||
Number of messages sent to this webhook endpoint.
|
||||
``minio_bucket_requests_ttfb_seconds_distribution``
|
||||
Distribution of time to first byte across API calls per bucket.
|
||||
|
||||
.. toctree::
|
||||
:titlesonly:
|
||||
|
@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
|
||||
|
||||
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
|
||||
|
||||
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
|
||||
For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.
|
||||
|
||||
#. Configure a Check
|
||||
|
||||
@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
|
||||
|
||||
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
|
||||
|
||||
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
|
||||
Set the filter for the ``minio_cluster_nodes_offline_total`` key.
|
||||
|
||||
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
|
||||
|
||||
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
|
||||
|
||||
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
|
||||
Set the filter for the ``minio_cluster_disk_offline_total`` key.
|
||||
|
||||
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.
|
||||
|
||||
|
@ -43,7 +43,7 @@ Syntax
|
||||
.. code-block:: shell
|
||||
:class: copyable
|
||||
|
||||
mc admin prometheus generate TARGET
|
||||
mc admin prometheus generate TARGET TYPE
|
||||
|
||||
The command accepts the following arguments:
|
||||
|
||||
@ -52,3 +52,11 @@ Syntax
|
||||
The :mc:`alias <mc alias>` of a configured MinIO deployment for which
|
||||
the command generates a Prometheus-compatible configuration file.
|
||||
|
||||
.. mc-cmd:: TYPE
|
||||
|
||||
The type of metrics to scrape.
|
||||
|
||||
Valid values are ``cluster``, ``node``, or ``bucket``.
|
||||
|
||||
If not specified, the command returns cluster metrics.
|
||||
|
||||
|
@ -601,7 +601,7 @@ logging. See :ref:`minio-metrics-and-alerts` for more information.
|
||||
.. envvar:: MINIO_PROMETHEUS_AUTH_TYPE
|
||||
|
||||
Specifies the authentication mode for the Prometheus
|
||||
:ref:`scraping endpoints <minio-metrics-and-alerts-endpoints>`.
|
||||
:ref:`scraping endpoints <minio-metrics-and-alerts>`.
|
||||
|
||||
- ``jwt`` - *Default* MinIO requires that the scraping client specify a JWT
|
||||
token for authenticating requests. Use
|
||||
|
Reference in New Issue
Block a user