1
0
mirror of https://github.com/minio/docs.git synced 2025-07-28 19:42:10 +03:00

Grafana and metric updates (#953)

- Adds a new page for Grafana to overview.
- Replaces the list of metrics in the Metrics and Alerts page with an
include to pull the list of metrics maintained in GitHub.
- Removes use of the :metric: role throughout the docs.
- Adds note about the introduction of a new bucket metric endpoint.

Partially addresses #930
Partially addresses #931
Partially addresses #898
Closes #864

Staged:
-
http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
-
http://192.241.195.202:9000/staging/grafana/operations/monitoring/grafana.html
This commit is contained in:
Daryl White
2023-08-17 09:01:46 -05:00
committed by GitHub
parent 1a1c340c3c
commit 20644952de
17 changed files with 634 additions and 524 deletions

1
.gitignore vendored
View File

@ -19,4 +19,5 @@ source/developers/haskell/*.md
source/developers/java/*.md
source/developers/javascript/*.md
source/developers/python/*.md
source/operations/monitoring/*.md
*.inv

View File

@ -864,7 +864,7 @@ services:
.. policy-action:: admin:Prometheus
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts-endpoints>`.
Allows access to MinIO :ref:`metrics <minio-metrics-and-alerts>`.
Only required if MinIO requires authentication for scraping metrics.
.. policy-action:: admin:ListBatchJobs

View File

@ -37,7 +37,7 @@ Deployment Metrics
MinIO provides a Prometheus-compatible endpoint for supporting time-series querying of metrics.
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts-endpoints>` provide a detailed metrics view through the MinIO Console.
MinIO deployments :ref:`configured to enable Prometheus scraping <minio-metrics-and-alerts>` provide a detailed metrics view through the MinIO Console.
Server Logs
-----------

View File

@ -311,9 +311,3 @@ a notification.
:class: copyable
mc cp ~/data/new-object.txt ALIAS/BUCKET
Webhook Metrics
---------------
MinIO publishes several :ref:`metrics <minio-metrics-and-alerts>` for monitoring webhook endpoints.
See :ref:`minio-metrics-and-alerts-webhook` for a list of available metrics.

View File

@ -125,9 +125,7 @@ As the cluster or workload increases, scanner performance decreases as it yields
Consider regularly checking cluster metrics, capacity, and resource usage to ensure the cluster hardware is scaling alongside cluster and workload growth:
- :ref:`minio-metrics-and-alerts-capacity`
- :ref:`minio-metrics-and-alerts-lifecycle-management`
- :ref:`minio-metrics-and-alerts-scanner`
- :ref:`minio-metrics-and-alerts`
.. toctree::
:hidden:

View File

@ -535,5 +535,11 @@ for display. This is intentional (For now).
These are nested and linked.
Images
------
.. image:: /images/minio-console/minio-console.png
:width: 600px
:alt: MinIO Console Landing Page provides a view of the Object Browser for the authenticated user
:align: center

Binary file not shown.

After

Width:  |  Height:  |  Size: 474 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 469 KiB

View File

@ -38,7 +38,10 @@ MinIO Pre-requisites
- Load balancer to handle routing of requests (for example, `NGINX <https://www.nginx.com/>`__)
* - :octicon:`circle`
- :ref:`Prometheus / Grafana <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
- :ref:`Prometheus <minio-metrics-collect-using-prometheus>` setup for monitoring and metrics
* - :octicon:`circle`
- :ref:`Grafana configured <minio-grafana>` for dashboards
* - :octicon:`circle`
- (optional) :mc:`mc` installed on the local host system

View File

@ -71,3 +71,5 @@ See :ref:`minio-healthcheck-api` for more information.
/operations/monitoring/metrics-and-alerts
/operations/monitoring/minio-logging
/operations/monitoring/healthcheck-probe
/operations/monitoring/grafana

View File

@ -15,7 +15,7 @@ Monitoring and Alerting using Prometheus
- `Monitoring with MinIO and Prometheus: Overview <https://youtu.be/A3vCDaFWNNs?ref=docs>`__
- `Monitoring with MinIO and Prometheus: Lab <https://youtu.be/Oix9iXndSUY?ref=docs>`__
MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
MinIO publishes cluster, node, and bucket metrics using the :prometheus-docs:`Prometheus Data Model <concepts/data_model/#data-model>`.
The procedure on this page documents the following:
- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
@ -40,12 +40,40 @@ Configure Prometheus to Collect and Alert using MinIO Metrics
Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:
.. code-block:: shell
.. tab-set::
.. tab-item:: MinIO Server
The following command scrapes metrics for the MinIO cluster.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Nodes
The following command scrapes metrics for a nodes on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS node
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
.. tab-item:: Buckets
The following command scrapes metrics for buckets on the MinIO Server.
.. code-block:: shell
:class: copyable
mc admin prometheus generate ALIAS bucket
Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.
The command returns output similar to the following:
@ -81,9 +109,15 @@ The command returns output similar to the following:
2) Restart Prometheus with the Updated Configuration
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Append the ``scrape_configs`` job generated in the previous step to the configuration file:
Append the desired ``scrape_configs`` job generated in the previous step to the configuration file:
.. code-block:: yaml
.. tab-set::
.. tab-item:: Cluster metrics
For server metrics:
.. code-block:: yaml
:class: copyable
global:
@ -97,6 +131,23 @@ Append the ``scrape_configs`` job generated in the previous step to the configur
static_configs:
- targets: [minio.example.net]
.. tab-item:: Bucket metrics:
.. code-block:: yaml
:class: copyable
global:
scrape_interval: 15s
scrape_configs:
- job_name: minio-job-bucket
bearer_token: TOKEN
metrics_path: /minio/v2/metrics/bucket
scheme: https
static_configs:
- targets: [minio.example.net]
Start the Prometheus cluster using the configuration file:
.. code-block:: shell
@ -122,9 +173,9 @@ The following query examples return metrics collected by Prometheus:
minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]
See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.
See :ref:`minio-metrics-and-alerts` for information about metrics.
4) Configure an Alert Rule using MinIO Metrics
1) Configure an Alert Rule using MinIO Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.
@ -184,3 +235,9 @@ To enable historical data visualization in MinIO Console, set the following envi
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
Dashboards
----------
MinIO provides Grafana Dashboards to display metrics collected by Prometheus.
For more information, see :ref:`minio-grafana`

View File

@ -0,0 +1,60 @@
.. _minio-grafana:
===================================
Monitor a MinIO Server with Grafana
===================================
.. default-domain:: minio
.. contents:: Table of Contents
:local:
:depth: 2
`Grafana <https://grafana.com/>`__ allows you to query, visualize, alert on and understand your metrics no matter where they are stored.
Create, explore, and share dashboards with your team and foster a data driven culture.
Prerequisites
-------------
- An existing :prometheus-docs:`Prometheus deployment <prometheus/latest/installation/>` with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`
- An existing MinIO deployment with network access to the Prometheus deployment
- `Grafana installed <https://grafana.com/grafana/download>`__
MinIO Grafana Dashboard
-----------------------
MinIO provides two official Grafana Dashboards you can download from the Grafana Dashboard portal.
1. :ref:`MinIO Server metrics <minio-server-grafana-metrics>`
2. :ref:`MinIO Bucket metrics <minio-buckets-grafana-metrics>`
To track changes to the Grafana dashboard, introspect the JSON files for the `server <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__ or `bucket <https://github.com/minio/minio/blob/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__ dashboards in the MinIO Server GitHub repository.
.. _minio-server-grafana-metrics:
MinIO Server Metrics Dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Visualize MinIO metrics with the official MinIO Grafana dashboard for the MinIO Server available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/13502-minio-dashboard/>`__.
MinIO provides a Grafana Dashboard for MinIO Server metrics.
For specifics on the dashboard's configuration, see the `JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-dashboard.json>`__.
.. image:: /images/grafana-minio.png
:width: 600px
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics on a MinIO Server.
:align: center
.. _minio-buckets-grafana-metrics:
MinIO Bucket Metrics Dashboard
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Visualize MinIO bucket metrics with the official MinIO Grafana dashboard for buckets available on the `Grafana dashboard portal <https://grafana.com/grafana/dashboards/19237-minio-bucket-dashboard//>`__.
Bucket metrics can be viewed in the Grafana dashboard using the `bucket JSON file on GitHub <https://raw.githubusercontent.com/minio/minio/master/docs/metrics/prometheus/grafana/minio-bucket.json>`__.
.. image:: /images/grafana-bucket.png
:width: 600px
:alt: A sample of the MinIO Grafana dashboard showing many different captured metrics MinIO buckets.
:align: center

View File

@ -35,8 +35,8 @@ the server, such as a transient network issue or potential downtime.
The healthcheck probe alone cannot determine if a MinIO server is offline - only
that the current host machine cannot reach the server. Consider configuring
a Prometheus :ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or
a Prometheus :ref:`alert <minio-metrics-and-alerts>` using the
``minio_cluster_nodes_offline_total`` metric to detect whether one or
more MinIO nodes are offline.
Cluster Write Quorum
@ -63,13 +63,13 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
processing write operations normally - only whether enough MinIO servers are
online to meet write quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using one of the following
:ref:`alert <minio-metrics-and-alerts>` using one of the following
metrics to detect potential issues or errors on the MinIO cluster:
- :metric:`minio_cluster_nodes_offline_total` to alert if one or more
- ``minio_cluster_nodes_offline_total`` to alert if one or more
MinIO nodes are offline.
- :metric:`minio_node_disk_free_bytes` to alert if the cluster is running
- ``minio_node_disk_free_bytes`` to alert if the cluster is running
low on free drive space.
Cluster Read Quorum
@ -96,8 +96,8 @@ The healthcheck probe alone cannot determine if a MinIO server is offline or
processing read operations normally - only whether enough MinIO servers are
online to meet read quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
:ref:`alert <minio-metrics-and-alerts>` using the
``minio_cluster_nodes_offline_total`` metric to detect whether one or more
MinIO nodes are offline.
Cluster Maintenance Check
@ -125,6 +125,5 @@ The healthcheck probe alone cannot determine if a MinIO server is offline - only
whether enough MinIO servers will be online after taking the node down for
maintenance to meet read and write quorum requirements based on the configured
:ref:`erasure code parity <minio-ec-parity>`. Consider configuring a Prometheus
:ref:`alert <minio-metrics-and-alerts-alerting>` using the
:metric:`minio_cluster_nodes_offline_total` metric to detect whether one or more
:ref:`alert <minio-metrics-and-alerts>` using the ``minio_cluster_nodes_offline_total`` metric to detect whether one or more
MinIO nodes are offline.

View File

@ -68,571 +68,553 @@ Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <pro
- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics.
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
MinIO Grafana Dashboard
-----------------------
MinIO also publishes two :ref:`Grafana Dashboards <minio-grafana>` for visualizing collected metrics.
For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see the :prometheus-docs:`Prometheus documentation on Grafana Support <visualization/grafana/>`.
.. _minio-metrics-and-alerts-available-metrics:
Available Metrics
-----------------
MinIO publishes the following metrics, where each metric includes a label for
the MinIO server which generated that metric.
MinIO publishes a number of metrics at the cluster, node, or bucket levels.
Each metric includes a label for the MinIO server which generated that metric.
Object and Bucket Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~
.. versionchanged:: MinIO RELEASE.2023-07-21T21-12-44Z
.. metric:: minio_bucket_objects_size_distribution
Bucket metrics have moved to use their own, separate endpoint.
Distribution of object sizes in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
- :ref:`Cluster Metrics <minio_available_cluster_metrics>`
- :ref:`Node Metrics <minio_available_node_metrics>`
- :ref:`Bucket Metrics <minio_available_bucket_metrics>`
.. metric:: minio_bucket_objects_version_distribution
.. _minio_available_cluster_metrics:
Distribution of number of versions per object in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_usage_object_total
Total number of objects in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_usage_total_bytes
Total bucket size in bytes in a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_quota_total_bytes
Total bucket quota size in bytes.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_usage_version_total
Total number of object versions contained in a bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
Replication Metrics
~~~~~~~~~~~~~~~~~~~
These metrics are only populated for MinIO clusters with
:ref:`minio-bucket-replication-serverside` enabled.
.. metric:: minio_bucket_replication_failed_bytes
Total number of bytes that failed at least once to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label
.. metric:: minio_bucket_replication_latency
Replication latency in milliseconds.
.. metric:: minio_bucket_replication_pending_bytes
Total number of bytes pending to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label
.. metric:: minio_bucket_replication_received_bytes
Total number of bytes replicated to this bucket from another source bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_sent_bytes
Total number of bytes replicated to the target bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_pending_count
Total number of replication operations pending for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_bucket_replication_failed_count
Total number of replication operations failed for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. _minio-metrics-and-alerts-capacity:
Capacity Metrics
~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_capacity_raw_free_bytes
Total free capacity online in the cluster.
.. metric:: minio_cluster_capacity_raw_total_bytes
Total capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_free_bytes
Total free usable capacity online in the cluster.
.. metric:: minio_cluster_capacity_usable_total_bytes
Total usable capacity online in the cluster.
.. metric:: minio_node_disk_free_bytes
Total storage available on a specific drive for a node in the MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
.. metric:: minio_node_disk_total_bytes
Total storage on a specific drive for a node in the MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
.. metric:: minio_node_disk_used_bytes
Total storage used on a specific drive for a node in a MinIO deployment.
You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
.. _minio-metrics-and-alerts-lifecycle-management:
Lifecycle Management Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_ilm_transitioned_bytes
Total number of bytes transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_cluster_ilm_transitioned_objects
Total number of objects transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_cluster_ilm_transitioned_versions
Total number of non-current object versions transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
.. metric:: minio_node_ilm_transition_pending_tasks
Total number of pending :ref:`object transition <minio-lifecycle-management-tiering>` tasks
.. metric:: minio_node_ilm_transition_active_tasks
Number of active ILM transition tasks
.. metric:: minio_node_ilm_expiry_pending_tasks
Total number of pending :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
.. metric:: minio_node_ilm_expiry_active_tasks
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
.. metric:: minio_node_ilm_versions_scanned
Total number of object versions checked for ilm actions since server start
Node and Drive Health Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_cluster_disk_online_total
The total number of drives online
.. metric:: minio_cluster_disk_offline_total
The total number of drives offline
.. metric:: minio_cluster_disk_total
The total number of drives
.. metric:: minio_cluster_nodes_offline_total
Total number of MinIO nodes offline.
.. metric:: minio_cluster_nodes_online_total
Total number of MinIO nodes online.
.. metric:: minio_node_disk_free_inodes
Total free inodes.
.. metric:: minio_node_disk_latency_us
Average last minute latency in µs for drive API storage operations.
.. metric:: minio_node_disk_offline_total
Total drives offline.
.. metric:: minio_node_disk_online_total
Total drives online.
.. metric:: minio_node_disk_total
Total drives.
.. metric:: minio_heal_objects_errors_total
Objects for which healing failed in current self healing run
.. metric:: minio_heal_objects_heal_total
Objects healed in current self healing run
.. metric:: minio_heal_objects_total
Objects scanned in current self healing run
.. metric:: minio_heal_time_last_activity_nano_seconds
Time elapsed (in nano seconds) since last self healing activity. This is set
to -1 until initial self heal
.. metric:: minio_node_storage_class_standard_parity
The configured value of :envvar:`MINIO_STORAGE_CLASS_STANDARD`.
Use this to alert for changes to the Standard :ref:`erasure parity <minio-erasure-coding>`.
.. metric:: minio_node_storage_class_rrs_parity
The configured value of :envvar:`MINIO_STORAGE_CLASS_RRS`.
Use this to alert for changes to the Reduced :ref:`erasure parity <minio-erasure-coding>`.
Notification Queue Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_audit_target_queue_length
Total number of unsent audit messages in the queue.
.. metric:: minio_audit_total_messages
Total number of audit messages sent since last server start.
.. metric:: minio_audit_failed_messages
Total number of audit messages which failed to send since last server start.
.. metric:: minio_notify_current_send_in_progress
Total number of notification messages in progress to configured targets.
.. metric:: minio_notify_target_queue_length
Total number of unsent notification messages in the queue.
.. _minio-metrics-and-alerts-scanner:
Scanner Metrics
Cluster Metrics
~~~~~~~~~~~~~~~
.. metric:: minio_node_scanner_bucket_scans_finished
Each metric includes the following labels:
Total number of bucket scans finished since server start.
- Server that generated the metric.
- Server that calculated the metric.
.. metric:: minio_node_scanner_bucket_scans_started
These metrics can be obtained from any MinIO server once per collection.
Total number of bucket scans started since server start.
Audit Metrics
+++++++++++++
.. metric:: minio_node_scanner_directories_scanned
``minio_audit_failed_messages``
Total number of messages that failed to send since start.
Total number of directories scanned since server start.
``minio_audit_target_queue_length``
Number of unsent messages in queue for target.
.. metric:: minio_node_scanner_objects_scanned
``minio_audit_total_messages``
Total number of messages sent since start.
Total number of unique objects scanned since server start.
Cluster Capacity Metrics
++++++++++++++++++++++++
.. metric:: minio_node_scanner_versions_scanned
``minio_cluster_capacity_raw_free_bytes``
Total free capacity online in the cluster.
Total number of object versions scanned since server start.
``minio_cluster_capacity_raw_total_bytes``
Total capacity online in the cluster.
.. metric:: minio_node_syscall_read_total
``minio_cluster_capacity_usable_free_bytes``
Total free usable capacity online in the cluster.
Total number of read SysCalls to the kernel. ``/proc/[pid]/io syscr``
``minio_cluster_capacity_usable_total_bytes``
Total usable capacity online in the cluster.
.. metric:: minio_node_syscall_write_total
Cluster Usage Metrics
+++++++++++++++++++++
Total number of write SysCalls to the kernel. ``/proc/[pid]/io syscw``
``minio_cluster_objects_size_distribution``
Distribution of object sizes across a cluster.
.. metric:: minio_usage_last_activity_nano_seconds
``minio_cluster_objects_version_distribution``
Distribution of object sizes across a cluster.
Time elapsed since last scan activity.
This is set to ``0`` until first scan cycle.
``minio_cluster_usage_object_total``
Total number of objects in a cluster.
S3 Metrics
~~~~~~~~~~
``minio_cluster_usage_total_bytes``
Total cluster usage in bytes.
.. metric:: minio_bucket_traffic_sent_bytes
``minio_cluster_usage_version_total``
Total number of versions (includes delete marker) in a cluster.
Total number of bytes of S3 traffic sent per bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
``minio_cluster_usage_deletemarker_total``
Total number of delete markers in a cluster.
.. metric:: minio_bucket_traffic_received_bytes
``minio_cluster_usage_total_bytes``
Total cluster usage in bytes.
Total number of bytes of S3 traffic received per bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
``minio_cluster_buckets_total``
Total number of buckets in the cluster.
.. metric:: minio_s3_requests_incoming_total
Drive Metrics
+++++++++++++
``minio_cluster_disk_offline_total``
Total drives offline.
``minio_cluster_disk_online_total``
Total drives online.
``minio_cluster_disk_total``
Total drives.
ILM Metrics
+++++++++++
``minio_cluster_ilm_transitioned_bytes``
Total bytes transitioned to a tier.
``minio_cluster_ilm_transitioned_objects``
Total number of objects transitioned to a tier.
``minio_cluster_ilm_transitioned_versions``
Total number of versions transitioned to a tier.
``minio_node_ilm_expiry_active_tasks``
Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks.
Key Management Metrics
++++++++++++++++++++++
``minio_cluster_kms_online``
Reports whether the KMS is online (1) or offline (0).
``minio_cluster_kms_request_error``
Number of KMS requests that failed due to some error.
(HTTP 4xx status code).
``minio_cluster_kms_request_failure``
Number of KMS requests that failed due to some internal failure.
(HTTP 5xx status code).
``minio_cluster_kms_request_success``
Number of KMS requests that succeeded.
``minio_cluster_kms_uptime``
The time the KMS has been up and running in seconds.
Cluster Health Metrics
++++++++++++++++++++++
``minio_cluster_nodes_offline_total``
Total number of MinIO nodes offline.
``minio_cluster_nodes_online_total``
Total number of MinIO nodes online.
``minio_cluster_write_quorum``
Maximum write quorum across all pools and sets.
``minio_cluster_health_status``
Get current cluster health status.
``minio_heal_objects_errors_total``
Objects for which healing failed in current self healing run.
``minio_heal_objects_heal_total``
Objects healed in current self healing run.
``minio_heal_objects_total``
Objects scanned in current self healing run.
``minio_heal_time_last_activity_nano_seconds``
Time elapsed (in nano seconds) since last self healing activity.
``minio_minio_update_percent``
Total percentage cache usage.
``minio_software_commit_info``
Git commit hash for the MinIO release.
``minio_software_version_info``
MinIO Release tag for the server.
``minio_usage_last_activity_nano_seconds``
Time elapsed (in nano seconds) since last scan activity.
Inter Node Metrics
++++++++++++++++++
``minio_inter_node_traffic_dial_avg_time``
Average time of internodes TCP dial calls.
``minio_inter_node_traffic_dial_errors``
Total number of internode TCP dial timeouts and errors.
``minio_inter_node_traffic_errors_total``
Total number of failed internode calls.
``minio_inter_node_traffic_received_bytes``
Total number of bytes received from other peer nodes.
``minio_inter_node_traffic_sent_bytes``
Total number of bytes sent to the other peer nodes.
S3 Request Metrics
++++++++++++++++++
``minio_s3_requests_4xx_errors_total``
Total number S3 requests with (4xx) errors.
``minio_s3_requests_5xx_errors_total``
Total number S3 requests with (5xx) errors.
``minio_s3_requests_canceled_total``
Total number S3 requests canceled by the client.
``minio_s3_requests_errors_total``
Total number S3 requests with (4xx and 5xx) errors.
``minio_s3_requests_incoming_total``
Volatile number of total incoming S3 requests.
.. metric:: minio_s3_requests_canceled_total
Total number S3 requests that were canceled from the client while processing.
.. metric:: minio_s3_requests_inflight_total
``minio_s3_requests_inflight_total``
Total number of S3 requests currently in flight.
.. metric:: minio_s3_requests_total
Total number of S3 requests.
.. metric:: minio_s3_requests_rejected_auth_total
``minio_s3_requests_rejected_auth_total``
Total number S3 requests rejected for auth failure.
.. metric:: minio_s3_requests_rejected_header_total
``minio_s3_requests_rejected_header_total``
Total number S3 requests rejected for invalid header.
.. metric:: minio_s3_requests_rejected_invalid_total
``minio_s3_requests_rejected_invalid_total``
Total number S3 invalid requests.
.. metric:: minio_s3_requests_rejected_timestamp_total
``minio_s3_requests_rejected_timestamp_total``
Total number S3 requests rejected for invalid timestamp.
.. metric:: minio_s3_requests_waiting_total
``minio_s3_requests_total``
Total number S3 requests.
``minio_s3_requests_waiting_total``
Number of S3 requests in the waiting queue.
.. metric:: minio_s3_time_ttfb_seconds_distribution
``minio_s3_requests_ttfb_seconds_distribution``
Distribution of the time to first byte across API calls.
.. metric:: minio_s3_traffic_received_bytes
``minio_s3_traffic_received_bytes``
Total number of s3 bytes received.
Total number of S3 bytes received.
``minio_s3_traffic_sent_bytes``
Total number of s3 bytes sent.
.. metric:: minio_s3_traffic_sent_bytes
Lock Metrics
++++++++++++
Total number of S3 bytes sent.
``minio_locks_total``
Total number of current locks on the peer.
.. metric:: minio_s3_requests_errors_total
``minio_locks_write_total``
Number of current WRITE locks on the peer.
.. versionchanged:: MinIO RELEASE.2023-04-28T18-11-17Z
``minio_locks_read_total``
Number of current READ locks on the peer.
This metric has been removed.
Use ``minio_s3_requests_4xx_errors_total`` and ``minio_s3_requests_5xx_errors_total`` instead.
Webhook Metrics
+++++++++++++++
Total number of S3 requests with 4xx and 5xx errors.
``minio_cluster_webhook_failed_messages``
Number of messages that failed to send.
.. metric:: minio_s3_requests_4xx_errors_total
``minio_cluster_webhook_online``
Reports whether the webhook endpoint is online (1) or offline (0).
Total number of S3 requests with 4xx errors.
``minio_cluster_webhook_queue_length``
Number of messages in the webhook queue.
.. metric:: minio_s3_requests_5xx_errors_total
``minio_cluster_webhook_total_messages``
Number of messages sent to this webhook endpoint.
Total number of S3 requests with 5xx errors.
IAM Metrics
~~~~~~~~~~~
.. _minio_available_node_metrics:
.. metric:: minio_node_iam_last_sync_duration_millis
Node Metrics
~~~~~~~~~~~~
Each metric includes the following labels:
- Server that generated the metric.
- Server that calculated the metric.
These metrics can be obtained from any MinIO server once per collection.
Drive Metrics
+++++++++++++
``minio_node_disk_free_bytes``
Total storage available on a drive.
``minio_node_disk_free_inodes``
Total free inodes.
``minio_node_disk_latency_us``
Average last minute latency in µs for drive API storage operations.
``minio_node_disk_offline_total``
Total drives offline.
``minio_node_disk_online_total``
Total drives online.
``minio_node_disk_total``
Total drives.
``minio_node_disk_total_bytes``
Total storage on a drive.
``minio_node_disk_used_bytes``
Total storage used on a drive.
File Metrics
++++++++++++
``minio_node_file_descriptor_limit_total``
Limit on total number of open file descriptors for the MinIO Server process.
``minio_node_file_descriptor_open_total``
Total number of open file descriptors by the MinIO Server process.
Go Metrics
++++++++++
``minio_node_go_routine_total``
Total number of go routines running.
Access Management (IAM) Metrics
+++++++++++++++++++++++++++++++
``minio_node_iam_last_sync_duration_millis``
Last successful IAM data sync duration in milliseconds.
.. metric:: minio_node_iam_since_last_sync_millis
``minio_node_iam_since_last_sync_millis``
Time (in milliseconds) since last successful IAM data sync.
This value starts at zero and only increments after the the first sync after server start.
.. metric:: minio_node_iam_sync_failures
``minio_node_iam_sync_failures``
Number of failed IAM data syncs since server start.
.. metric:: minio_node_iam_sync_successes
``minio_node_iam_sync_successes``
Number of successful IAM data syncs since server start.
Lifecycle Management (ILM) Metrics
++++++++++++++++++++++++++++++++++
``minio_node_ilm_expiry_pending_tasks``
Number of pending ILM expiry tasks in the queue.
``minio_node_ilm_transition_active_tasks``
Number of active ILM transition tasks.
``minio_node_ilm_transition_pending_tasks``
Number of pending ILM transition tasks in the queue.
``minio_node_ilm_versions_scanned``
Total number of object versions checked for ilm actions since server start.
I/O Metrics
+++++++++++
``minio_node_io_rchar_bytes``
Total bytes read by the process from the underlying storage system including cache, ``/proc/[pid]/io`` rchar.
``minio_node_io_read_bytes``
Total bytes read by the process from the underlying storage system, ``/proc/[pid]/io`` read_bytes.
``minio_node_io_wchar_bytes``
Total bytes written by the process to the underlying storage system including page cache, ``/proc/[pid]/io`` wchar.
``minio_node_io_write_bytes``
Total bytes written by the process to the underlying storage system, ``/proc/[pid]/io`` write_bytes.
Process Metrics
+++++++++++++++
``minio_node_process_cpu_total_seconds``
Total user and system CPU time spent in seconds.
``minio_node_process_resident_memory_bytes``
Resident memory size in bytes.
``minio_node_process_starttime_seconds``
Start time for MinIO process per node, time in seconds since Unix epoc.
``minio_node_process_uptime_seconds``
Uptime for MinIO process per node in seconds.
Scanner Metrics
+++++++++++++++
``minio_node_scanner_bucket_scans_finished``
Total number of bucket scans finished since server start.
``minio_node_scanner_bucket_scans_started``
Total number of bucket scans started since server start.
``minio_node_scanner_directories_scanned``
Total number of directories scanned since server start.
``minio_node_scanner_objects_scanned``
Total number of unique objects scanned since server start.
``minio_node_scanner_versions_scanned``
Total number of object versions scanned since server start.
Read and Write Metrics
++++++++++++++++++++++
``minio_node_syscall_read_total``
Total read SysCalls to the kernel.
``/proc/[pid]/io`` syscr.
``minio_node_syscall_write_total``
Total write SysCalls to the kernel.
``/proc/[pid]/io`` syscw.
Notification Metrics
++++++++++++++++++++
``minio_notify_current_send_in_progress``
Number of concurrent async Send calls active to all targets.
``minio_notify_target_queue_length``
Number of unsent notifications in queue for target.
IAM Plugin Metrics
~~~~~~~~~~~~~~~~~~
++++++++++++++++++
.. note::
The metrics in this section require that you have configured the :ref:`MinIO External Identity Management Plugin <minio-external-identity-management-plugin>`.
.. metric:: minio_node_iam_plugin_authn_service_last_succ_seconds
``minio_node_iam_plugin_authn_service_last_succ_seconds``
Time (in seconds) since last successful request to the external IDP service.
.. metric:: minio_node_iam_plugin_authn_service_last_fail_seconds
``minio_node_iam_plugin_authn_service_last_fail_seconds``
Time (in seconds) since last failed request to the external IDP service.
.. metric:: minio_node_iam_plugin_authn_service_total_requests_minute
``minio_node_iam_plugin_authn_service_total_requests_minute``
Total requests count to the external IDP service in the last full minute.
.. metric:: minio_node_iam_plugin_authn_service_failed_requests_minute
``minio_node_iam_plugin_authn_service_failed_requests_minute``
Count of the failed requests to the external IDP service in the last full minute.
.. metric:: minio_node_iam_plugin_authn_service_succ_avg_rtt_ms_minute
``minio_node_iam_plugin_authn_service_succ_avg_rtt_ms_minute``
Average round trip time (RTT) of successful requests to the IDP service in the last full minute.
.. metric:: minio_node_iam_plugin_authn_service_succ_max_rtt_ms_minute
``minio_node_iam_plugin_authn_service_succ_max_rtt_ms_minute``
Maximum round trip time (RTT) of successful requests to the IDP service in the last full minute.
Internal Metrics
~~~~~~~~~~~~~~~~
.. metric:: minio_inter_node_traffic_received_bytes
.. _minio_available_bucket_metrics:
Total number of bytes received from other peer nodes.
Bucket Metrics
~~~~~~~~~~~~~~
.. metric:: minio_inter_node_traffic_sent_bytes
Each bucket metric includes the following labels:
Total number of bytes sent to the other peer nodes.
- The server that calculated the metric.
- The server that generated the metric.
- The bucket the metric is for.
.. metric:: minio_inter_node_traffic_dial_avg_time
These metrics can be obtained from any MinIO server once per collection.
Average time of internodes TCP dial calls.
Distribution Metrics
++++++++++++++++++++
.. metric:: minio_inter_node_traffic_dial_errors
``minio_bucket_objects_size_distribution``
Distribution of object sizes in the bucket, includes label for the bucket name.
Total number of internode TCP dial timeouts and errors.
``minio_bucket_objects_version_distribution``
Distribution of object sizes in a bucket, by number of versions.
.. versionadded:: MinIO RELEASE.2023-04-28T18-11-17Z
``minio_bucket_quota_total_bytes``
Total bucket quota size in bytes.
This metric is available on the MinIO Dashboard if :ref:`Prometheus <minio-metrics-collect-using-prometheus>` and Grafana are enabled.
Replication Metrics
+++++++++++++++++++
.. metric:: minio_inter_node_traffic_errors_total
.. note::
Total number of failed internode calls.
The metrics for bucket replication only populate for MinIO clusters with :ref:`minio-bucket-replication-serverside` enabled.
.. metric:: minio_node_file_descriptor_limit_total
``minio_bucket_replication_failed_count``
Total number of objects which failed replication.
Limit on total number of open file descriptors for the MinIO Server process.
``minio_bucket_replication_latency_ms``
Replication latency in milliseconds.
.. metric:: minio_node_file_descriptor_open_total
``minio_bucket_replication_received_bytes``
Total number of bytes replicated to this bucket from another source bucket.
Total number of open file descriptors by the MinIO Server process.
``minio_bucket_replication_sent_bytes``
Total number of bytes replicated to the target bucket.
.. metric:: minio_node_io_rchar_bytes
``minio_bucket_replication_failed_bytes``
Total number of bytes that failed at least once to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
Total bytes read by the process from the underlying storage system including
cache, ``/proc/[pid]/io rchar``
``minio_bucket_replication_pending_bytes``
Total number of bytes pending to replicate for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
.. metric:: minio_node_io_read_bytes
``minio_bucket_replication_pending_count``
Total number of replication operations pending for a given bucket.
You can identify the bucket using the ``{ bucket="STRING" }`` label.
Total bytes read by the process from the underlying storage system,
``/proc/[pid]/io read_bytes``
Traffic Metrics
+++++++++++++++
.. metric:: minio_node_io_wchar_bytes
``minio_bucket_traffic_received_bytes``
Total number of S3 bytes received for this bucket.
Total bytes written by the process to the underlying storage system including
page cache, ``/proc/[pid]/io wchar``
``minio_bucket_traffic_sent_bytes``
Total number of S3 bytes sent for this bucket.
.. metric:: minio_node_io_write_bytes
Usage Metrics
+++++++++++++
Total bytes written by the process to the underlying storage system,
``/proc/[pid]/io write_bytes``
``minio_bucket_usage_object_total``
Total number of objects.
Key Management System (KMS) Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
``minio_bucket_usage_version_total``
Total number of versions (includes delete marker).
.. metric:: minio_cluster_kms_online
``minio_bucket_usage_deletemarker_total``
Total number of delete markers.
Reports whether the KMS is online (1) or offline (0).
``minio_bucket_usage_total_bytes``
Total bucket size in bytes.
.. metric:: minio_cluster_kms_request_error
Requests Metrics
++++++++++++++++
Number of KMS requests that failed due to some error. (HTTP 4xx status code).
``minio_bucket_requests_4xx_errors_total``
Total number of S3 requests with (4xx) errors on a bucket.
.. metric:: minio_cluster_kms_request_failure
``minio_bucket_requests_5xx_errors_total``
Total number of S3 requests with (5xx) errors on a bucket.
Number of KMS requests that failed due to some internal failure. (HTTP 5xx status code).
``minio_bucket_requests_inflight_total``
Total number of S3 requests currently in flight on a bucket.
.. metric:: minio_cluster_kms_request_success
``minio_bucket_requests_total``
Total number of S3 requests on a bucket.
Number of KMS requests that succeeded.
``minio_bucket_requests_canceled_total``
Total number S3 requests canceled by the client.
.. metric:: minio_cluster_kms_uptime
The time the KMS has been up and running in seconds.
Software and Process Metrics
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
.. metric:: minio_software_commit_info
Git commit hash for the MinIO release.
.. metric:: minio_software_version_info
MinIO Release tag for the server
.. metric:: minio_node_go_routine_total
Total number of go routines running.
.. metric:: minio_node_process_starttime_seconds
Start time for MinIO process per node, time in seconds since Unix epoch.
.. metric:: minio_node_process_uptime_seconds
Uptime for MinIO process per node in seconds.
.. metric:: minio_node_process_cpu_total_seconds
Total user and system CPU time spent in seconds.
.. metric:: minio_node_process_resident_memory_bytes
Resident memory size in bytes.
Lock Metrics
~~~~~~~~~~~~
.. metric:: minio_locks_total
Total number of current locks on the peer.
.. metric:: minio_locks_write_total
Number of current WRITE locks on the peer.
.. metric:: minio_locks_read_total
Number of current READ locks on the peer.
.. _minio-metrics-and-alerts-webhook:
Webhook Metrics
~~~~~~~~~~~~~~~
.. metric:: minio_cluster_webhook_failed_messages
Number of messages that failed to send.
.. metric:: minio_cluster_webhook_online
Reports whether the webhook endpoint is online (1) or offline (0).
.. metric:: minio_cluster_webhook_queue_length
Number of messages in the webhook queue.
.. metric:: minio_cluster_webhook_total_messages
Number of messages sent to this webhook endpoint.
``minio_bucket_requests_ttfb_seconds_distribution``
Distribution of time to first byte across API calls per bucket.
.. toctree::
:titlesonly:

View File

@ -94,7 +94,7 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
For example, you can set a filter on ``minio_cluster_capacity_usable_total_bytes`` and ``minio_cluster_capacity_usable_free_bytes`` to compare the total usable against total free space on the MinIO deployment.
#. Configure a Check
@ -105,13 +105,13 @@ Configure InfluxDB to Collect and Alert using MinIO Metrics
- Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``.
Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
Set the filter for the ``minio_cluster_nodes_offline_total`` key.
Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
- Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
Set the filter for the ``minio_cluster_disk_offline_total`` key.
Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.

View File

@ -43,7 +43,7 @@ Syntax
.. code-block:: shell
:class: copyable
mc admin prometheus generate TARGET
mc admin prometheus generate TARGET TYPE
The command accepts the following arguments:
@ -52,3 +52,11 @@ Syntax
The :mc:`alias <mc alias>` of a configured MinIO deployment for which
the command generates a Prometheus-compatible configuration file.
.. mc-cmd:: TYPE
The type of metrics to scrape.
Valid values are ``cluster``, ``node``, or ``bucket``.
If not specified, the command returns cluster metrics.

View File

@ -601,7 +601,7 @@ logging. See :ref:`minio-metrics-and-alerts` for more information.
.. envvar:: MINIO_PROMETHEUS_AUTH_TYPE
Specifies the authentication mode for the Prometheus
:ref:`scraping endpoints <minio-metrics-and-alerts-endpoints>`.
:ref:`scraping endpoints <minio-metrics-and-alerts>`.
- ``jwt`` - *Default* MinIO requires that the scraping client specify a JWT
token for authenticating requests. Use