Major overhaul for Monitoring docs: Part 1

2025-07-28 19:42:10 +03:00 · 2022-10-20 17:39:54 -04:00
parent 1735b77d8f
commit 4e4cc97f45
8 changed files with 607 additions and 443 deletions
--- a/source/administration/minio-console.rst
+++ b/source/administration/minio-console.rst
@ -283,6 +283,8 @@ Some subsections may not be visible if the authenticated user does not have the

      Use the :guilabel:`Users` and :guilabel:`Groups` views to assign a created policy to users and groups, respectively.

+.. _minio-console-monitoring:
+
 Monitoring
 ----------

@ -295,25 +297,23 @@ Some subsections may not be visible if the authenticated user does not have the

   .. tab-item:: Metrics

-      .. image:: /images/minio-console/console-metrics.png
+      .. image:: /images/minio-console/console-metrics-simple.png
         :width: 600px
-         :alt: MinIO Console Metrics displaying detailed data using Prometheus
+         :alt: MinIO Console Metrics displaying point-in-time data
         :align: center

      The Console :guilabel:`Dashboard` section displays metrics for the MinIO deployment. 
+      The default view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.

-      The Console depends on a :ref:`configured Prometheus service <minio-metrics-collect-using-prometheus>` to generate the detailed metrics shown above.
+      The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment. 
+      Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display historical metrics:

-      The default metrics view provides a high-level overview of the deployment status, including the uptime and availability of individual servers and drives.
-
-      .. image:: /images/minio-console/console-metrics-simple.png
+      .. image:: /images/minio-console/console-metrics.png
         :width: 600px
         :alt: MinIO Console Metrics displaying simplified data
         :align: center

-      This view requires configuring a Prometheus service to scrape the deployment metrics. 
-      You can download these metrics as a ``.png`` image or ``.csv`` file.
-      See :ref:`minio-metrics-collect-using-prometheus` for complete instructions.
+      See :ref:`minio-console-metrics` for more information on the historical metric visualization.

   .. tab-item:: Logs

--- a/source/default-conf.py
+++ b/source/default-conf.py
@ -79,6 +79,7 @@ extlinks = {
    'podman-git'      : ('https://github.com/containers/podman/%s',''),
    'docker-docs'     : ('https://docs.docker.com/%s', ''),
    'openshift-docs'  : ('https://docs.openshift.com/container-platform/4.11/%s', ''),
+    'influxdb-docs'   : ('https://docs.influxdata.com/influxdb/v2.4/%s',''),

 }

--- a/source/images/minio-console/console-metrics-simple.png
+++ b/source/images/minio-console/console-metrics-simple.png
--- a/source/images/minio-console/console-metrics.png
+++ b/source/images/minio-console/console-metrics.png
--- a/source/operations/monitoring.rst
+++ b/source/operations/monitoring.rst
@ -1,5 +1,5 @@
 =====================
-Prometheus Monitoring
+Monitoring and Alerts
 =====================

 .. default-domain:: minio
@ -12,22 +12,27 @@ Metrics and Alerts
 ------------------

 MinIO provides point-in-time metrics on cluster status and operations.
-MinIO publishes collected metrics data using Prometheus-compatible data structures. 
+The :ref:`MinIO Console <minio-console-metrics>` provides a graphical display of these metrics.

-For alerts, time-series metric data, or additional metrics, MinIO can leverage `Prometheus <https://prometheus.io/>`__.
-Prometheus is an Open Source systems and service monitoring system which supports analyzing and alerting based on collected metrics.
-The Prometheus ecosystem includes multiple :prometheus-docs:`integrations <operating/integrations/>`, allowing wide latitude in processing and storing collected metrics.
+For historical metrics and analytics, MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
+You can use any scraping tool which supports that data model to pull metrics data from MinIO for further analysis and alerting.

- MinIO publishes Prometheus-compatible scraping endpoints for cluster and node-level metrics. 
-  Any Prometheus-compatible scraping software can ingest and process MinIO metrics for analysis, visualization, and alerting.
-  See :ref:`minio-metrics-and-alerts-endpoints` for more information.
+The following table lists tutorials for integrating MinIO metrics with select third-party monitoring software.

- For alerts, use Prometheus :prometheus-docs:`Alerting Rules  <prometheus/latest/configuration/alerting_rules/>` and the
-  :prometheus-docs:`Alert Manager <alerting/latest/overview/>` to trigger alerts based on collected metrics. 
-  See :ref:`minio-metrics-and-alerts-alerting` for more information.
+.. list-table::
+   :stub-columns: 1
+   :widths: 30 70
+   :width: 100%

-When configured, the :ref:`MinIO Console <minio-console-metrics>` shows some metrics in the :guilabel:`Monitoring > Metrics` page.
-You can download these metrics as either ``.png`` images or ``.csv`` files.
+   * - :ref:`minio-metrics-collect-using-prometheus`
+     - Configure Prometheus to Monitor and Alert for a MinIO deployment
+
+       Configure MinIO to query the Prometheus deployment to enable historical metrics via the MinIO Console
+
+   * - :ref:`minio-metrics-influxdb`
+     - Configure InfluxDB to Monitor and Alert for a MinIO deployment.
+
+Other metrics and analytics software suites which support the Prometheus data model may work regardless of their inclusion on the above list.

 Logging
 -------
@ -58,6 +63,6 @@ See :ref:`minio-healthcheck-api` for more information.
   :titlesonly:
   :hidden:

-   /operations/monitoring/collect-minio-metrics-using-prometheus
+   /operations/monitoring/metrics-and-alerts
   /operations/monitoring/minio-logging
   /operations/monitoring/healthcheck-probe
--- a/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
+++ b/source/operations/monitoring/collect-minio-metrics-using-prometheus.rst
@ -1,9 +1,8 @@
 .. _minio-metrics-collect-using-prometheus:
-.. _minio-metrics-and-alerts:

-======================================
-Collect MinIO Metrics Using Prometheus
-======================================
+========================================
+Monitoring and Alerting using Prometheus
+========================================

 .. default-domain:: minio

@ -11,60 +10,46 @@ Collect MinIO Metrics Using Prometheus
   :local:
   :depth: 1

-MinIO leverages `Prometheus <https://prometheus.io/>`__ for metrics and alerts.
-MinIO publishes Prometheus-compatible scraping endpoints for cluster and
-node-level metrics. See :ref:`minio-metrics-and-alerts-endpoints` for more
-information.
+MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
+The procedure on this page documents the following:

-The procedure on this page documents scraping the MinIO metrics
-endpoints using a Prometheus instance, including deploying and configuring
-a simple Prometheus server for collecting metrics. 
+- Configuring a Prometheus service to scrape and display metrics from a MinIO deployment
+- Configuring an Alert Rule on a MinIO Metric to trigger an AlertManager action

-This procedure is not a replacement for the official
-:prometheus-docs:`Prometheus Documentation <>`. Any specific guidance
-related to configuring, deploying, and using Prometheus is made on a best-effort
-basis.
+.. admonition:: Prerequisites
+   :class: note

-Requirements
------------
+   This procedure requires the following:

-Install and Configure ``mc`` with Access to the MinIO Cluster
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   - An existing Prometheus deployment with backing :prometheus-docs:`Alert Manager <alerting/latest/overview/>`

-This procedure uses :mc:`mc` for performing operations on the MinIO
-deployment. Install ``mc`` on a machine with network access to the
-deployment. See the ``mc`` :ref:`Installation Quickstart <mc-install>` for
-more complete instructions.
+   - An existing MinIO deployment with network access to the Prometheus deployment

-Prometheus Service
-~~~~~~~~~~~~~~~~~~
+   - An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment

-This procedure provides instruction for deploying Prometheus for rapid local
-evaluation and development. All other environments should have an existing
-Prometheus or Prometheus-compatible service with access to the MinIO cluster. 
+.. cond:: k8s

-Procedure
---------
+   The MinIO Operator supports deploying a :ref:`per-tenant Prometheus instance <create-tenant-configure-section>` configured to support metrics and visualizations.
+   This includes automatically configuring the Tenant to enable the :ref:`Tenant Console historical metric view <minio-console-metrics>`.

-1) Generate the Bearer Token
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+   You can still use this procedure to configure an external Prometheus service for supporting monitoring and alerting for a MinIO Tenant.
+   You must configure all necessary network control components, such as Ingress or a Load Balancer, to facilitate access between the Tenant and the Prometheus service.
+   This procedure assumes your local host machine can access the Tenant via :mc:`mc`.

-MinIO by default requires authentication for requests made to the metrics
-endpoints. While this step is not required for MinIO deployments started with 
-:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"``, you can still use the
-command output for retrieving a Prometheus ``scrape_configs`` entry.
+Configure Prometheus to Collect and Alert using MinIO Metrics
+-------------------------------------------------------------

-Use the :mc-cmd:`mc admin prometheus generate` command to generate a
-JWT bearer token for use by Prometheus in making authenticated scraping
-requests:
+1) Generate the Scrape Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use the :mc-cmd:`mc admin prometheus generate` command to generate the scrape configuration for use by Prometheus in making scraping requests:

 .. code-block:: shell
   :class: copyable

   mc admin prometheus generate ALIAS

-Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the
-:mc:`alias <mc alias>` of the MinIO deployment.
+Replace :mc-cmd:`ALIAS <mc admin prometheus generate TARGET>` with the :mc:`alias <mc alias>` of the MinIO deployment.

 The command returns output similar to the following:

@ -79,24 +64,22 @@ The command returns output similar to the following:
     static_configs:
     - targets: [minio.example.net]

-The ``targets`` array can contain the hostname for any node in the deployment.
-For clusters with a load balancer managing connections between MinIO nodes,
-specify the address of the load balancer.
+- Set the ``job_name`` to a value associated to the MinIO deployment.

-Specify the output block to the 
-:prometheus-docs:`scrape_config 
-<prometheus/latest/configuration/configuration/#scrape_config>` section of
-the Prometheus configuration. 
+  Use a unique value to ensure isolation of the deployment metrics from any others collected by that Prometheus service.

-2) Configure and Run Prometheus
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+- MinIO deployments started with :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``"public"`` can omit the ``bearer_token`` field.

-Follow the Prometheus :prometheus-docs:`Getting Started 
-<prometheus/latest/getting_started/#downloading-and-running-prometheus>` guide
-to download and run Prometheus locally.
+- Set the ``scheme`` to http for MinIO deployments not using TLS.

-Append the ``scrape_configs`` job generated in the previous step to the
-configuration file:
+- Set the ``targets`` array with a hostname that resolves to the MinIO deployment.
+
+  This can be any single node, or a load balancer/proxy which handles connections to the MinIO nodes.
+
+2) Restart Prometheus with the Updated Configuration
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Append the ``scrape_configs`` job generated in the previous step to the configuration file:

 .. code-block:: yaml
   :class: copyable
@ -122,10 +105,8 @@ Start the Prometheus cluster using the configuration file:
 3) Analyze Collected Metrics
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-Prometheus includes a 
-:prometheus-docs:`expression browser 
-<prometheus/latest/getting_started/#using-the-expression-browser>`. You can
-execute queries here to analyze the collected metrics.
+Prometheus includes a :prometheus-docs:`expression browser <prometheus/latest/getting_started/#using-the-expression-browser>`. 
+You can execute queries here to analyze the collected metrics.

 The following query examples return metrics collected by Prometheus:

@ -139,386 +120,65 @@ The following query examples return metrics collected by Prometheus:

   minio_cluster_capacity_usable_free_bytes{job="minio-job"}[5m]

-See :ref:`minio-metrics-and-alerts-available-metrics` for a complete
-list of published metrics.
+See :ref:`minio-metrics-and-alerts-available-metrics` for a complete list of published metrics.

-.. _minio-console-metrics:
+4) Configure an Alert Rule using MinIO Metrics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-4) Visualize Collected Metrics
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+You must configure :prometheus-docs:`Alert Rules <prometheus/latest/configuration/alerting_rules/>` on the Prometheus deployment to trigger alerts based on collected MinIO metrics.

-The :minio-git:`MinIO Console <console>` supports visualizing collected metrics from Prometheus. 
-Specify the URL of the Prometheus service to the :envvar:`MINIO_PROMETHEUS_URL` environment variable to each MinIO server in the deployment:
-
-.. code-block:: shell
-   :class: copyable
-
-   export MINIO_PROMETHEUS_URL="https://prometheus.example.net"
-
-If you set a custom ``job_name`` for the Prometheus scraping job, you must also set :envvar:`MINIO_PROMETHEUS_JOB_ID` to match that job name.
-
-Restart the deployment using :mc-cmd:`mc admin service restart` to apply the changes.
-
-The MinIO Console uses the metrics collected by the ``minio-job`` scraping job to populate the Dashboard metrics available from :guilabel:`Monitoring > Metrics`.
-You can download the metrics from the MinIO Console as either a ``.png`` image or a ``.csv`` file.
-
-.. image:: /images/minio-console/console-metrics.png
-   :width: 600px
-   :alt: MinIO Console Dashboard displaying Monitoring Data
-   :align: center
-
-MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics. 
-For more complete documentation on configuring a Prometheus data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
-
-Prometheus includes a :prometheus-docs:`graphing interface <prometheus/latest/getting_started/#using-the-graphing-interface>` for visualizing collected metrics. 
-
-.. _minio-metrics-and-alerts-endpoints:
-
-Metrics
-------
-
-MinIO provides a scraping endpoint for cluster-level metrics:
-
-.. code-block:: shell
-   :class: copyable
-
-   http://minio.example.net:9000/minio/v2/metrics/cluster
-
-Replace ``http://minio.example.net`` with the hostname of any node in the MinIO
-deployment. For deployments with a load balancer managing connections between
-MinIO nodes, specify the address of the load balancer.
-
-Create a new :prometheus-docs:`scraping configuration
-<prometheus/latest/configuration/configuration/#scrape_config>` to begin
-collecting metrics from the MinIO deployment. See
-:ref:`minio-metrics-collect-using-prometheus` for a complete tutorial.
-
-The following example describes a ``scrape_configs`` entry for collecting
-cluster metrics. 
+The following example alert rule files provide a baseline of alerts for a MinIO deployment.
+You can modify or otherwise use these examples as guidance in building your own alerts.

 .. code-block:: yaml
   :class: copyable

-   scrape_configs:
-   - job_name: minio-job
-     bearer_token: <secret>
-     metrics_path: /minio/v2/metrics/cluster
-     scheme: https
-     static_configs:
-     - targets: ['minio.example.net:9000']
+   groups:
+   - name: minio-alerts
+     rules:
+     - alert: NodesOffline
+       expr: avg_over_time(minio_cluster_nodes_offline_total{job="minio-job"}[5m]) > 0
+       for: 10m
+       labels:
+         severity: warn
+       annotations:
+         summary: "Node down in MinIO deployment"
+         description: "Node(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"

-.. list-table::
-   :stub-columns: 1
-   :widths: 20 80
-   :width: 100%
+     - alert: DisksOffline
+       expr: avg_over_time(minio_cluster_disk_offline_total{job="minio-job"}[5m]) > 0
+       for: 10m
+       labels:
+         severity: warn
+       annotations:
+         summary: "Disks down in MinIO deployment"
+         description: "Disks(s) in cluster {{ $labels.instance }} offline for more than 5 minutes"

-   * - ``job_name``
-     - The name of the scraping job.
+Specify the path to the alert file to the Prometheus configuration as part of the ``rule_files`` key:

-   * - ``bearer_token``
-     - The JWT token generated by :mc-cmd:`mc admin prometheus generate`.
+.. code-block:: yaml

-       Omit this field if the MinIO deployment was started with
-       :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` set to ``public``.
+   global:
+     scrape_interval: 5s

-   * - ``targets``
-     - The endpoint for the MinIO deployment. You can specify any node in the
-       deployment for collecting cluster metrics. For clusters with a load
-       balancer managing connections between MinIO nodes, specify the
-       address of the load balancer.
+   rule_files:
+   - minio-alerting.yml

-MinIO by default requires authentication for scraping the metrics endpoints.
-Use the :mc-cmd:`mc admin prometheus generate` command to generate the
-necessary bearer tokens for use with configuring the
-``scrape_configs.bearer_token`` field. You can alternatively disable
-metrics endpoint authentication by setting
-:envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
+Once triggered, Prometheus sends the alert to the configured AlertManager service.

-Visualizing Metrics
-~~~~~~~~~~~~~~~~~~~
+5) (Optional) Configure MinIO Console to Query Prometheus
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

-The MinIO Console uses the metrics collected by Prometheus to populate the
-Dashboard metrics:
+The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment. 

 .. image:: /images/minio-console/console-metrics.png
   :width: 600px
   :alt: MinIO Console displaying Prometheus-backed Monitoring Data
   :align: center

-Set the :envvar:`MINIO_PROMETHEUS_URL` environment variable to the URL of the
-Prometheus service to allow the Console to retrieve and display collected
-metrics. See :ref:`minio-metrics-collect-using-prometheus` for a complete
-example.
+To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:

-MinIO also publishes a `Grafana Dashboard
-<https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected
-metrics. For more complete documentation on configuring a Prometheus data source
-for Grafana, see :prometheus-docs:`Grafana Support for Prometheus
-<visualization/grafana/>`.
+- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
+- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics

-.. _minio-metrics-and-alerts-available-metrics:
-
-Available Metrics
-~~~~~~~~~~~~~~~~~
-
-MinIO publishes the following metrics, where each metric includes a label for
-the MinIO server which generated that metric.
-
-Object Metrics
-++++++++++++++
-
-.. metric:: minio_bucket_objects_size_distribution
-
-   Distribution of object sizes in the bucket, includes label for the bucket 
-   name.
-
-Replication Metrics
-+++++++++++++++++++
-
-These metrics are only populated for MinIO clusters with 
-:ref:`minio-bucket-replication-serverside` enabled.
-
-.. metric:: minio_bucket_replication_failed_bytes
-
-   Total number of bytes failed at least once to replicate.
-
-.. metric:: minio_bucket_replication_pending_bytes
-
-   Total bytes pending to replicate.
-
-.. metric:: minio_bucket_replication_received_bytes
-
-   Total number of bytes replicated to this bucket from another source bucket.
-
-.. metric:: minio_bucket_replication_sent_bytes
-
-   Total number of bytes replicated to the target bucket.
-
-.. metric:: minio_bucket_replication_pending_count
-
-   Total number of replication operations pending for this bucket.
-
-.. metric:: minio_bucket_replication_failed_count
-
-   Total number of replication operations failed for this bucket.
-
-Bucket Metrics
-++++++++++++++
-
-.. metric:: minio_bucket_usage_object_total
-
-   Total number of objects
-
-.. metric:: minio_bucket_usage_total_bytes
-
-   Total bucket size in bytes
-
-Cache Metrics
-+++++++++++++
-
-.. metric:: minio_cache_hits_total
-
-   Total number of disk cache hits
-
-.. metric:: minio_cache_missed_total
-
-   Total number of disk cache misses
-
-.. metric:: minio_cache_sent_bytes
-
-   Total number of bytes served from cache
-
-.. metric:: minio_cache_total_bytes
-
-   Total size of cache disk in bytes
-
-.. metric:: minio_cache_usage_info
-
-   Total percentage cache usage, value of 1 indicates high and 0 low, label
-   level is set as well
-
-.. metric:: minio_cache_used_bytes
-
-   Current cache usage in bytes
-
-Cluster Metrics
-+++++++++++++++
-
-.. metric:: minio_cluster_capacity_raw_free_bytes
-
-   Total free capacity online in the cluster.
-
-.. metric:: minio_cluster_capacity_raw_total_bytes
-
-   Total capacity online in the cluster.
-
-.. metric:: minio_cluster_capacity_usable_free_bytes
-
-   Total free usable capacity online in the cluster.
-
-.. metric:: minio_cluster_capacity_usable_total_bytes
-
-   Total usable capacity online in the cluster.
-
-Node Metrics
-++++++++++++
-
-.. metric:: minio_cluster_nodes_offline_total
-
-   Total number of MinIO nodes offline.
-
-.. metric:: minio_cluster_nodes_online_total
-
-   Total number of MinIO nodes online.
-
-.. metric:: minio_heal_objects_error_total
-
-   Objects for which healing failed in current self healing run
-
-.. metric:: minio_heal_objects_heal_total
-
-   Objects healed in current self healing run
-
-.. metric:: minio_heal_objects_total
-
-   Objects scanned in current self healing run
-
-.. metric:: minio_heal_time_last_activity_nano_seconds
-
-   Time elapsed (in nano seconds) since last self healing activity. This is set
-   to -1 until initial self heal
-
-.. metric:: minio_inter_node_traffic_received_bytes
-
-   Total number of bytes received from other peer nodes.
-
-.. metric:: minio_inter_node_traffic_sent_bytes
-
-   Total number of bytes sent to the other peer nodes.
-
-.. metric:: minio_node_disk_free_bytes
-
-   Total storage available on a disk.
-
-.. metric:: minio_node_disk_total_bytes
-
-   Total storage on a disk.
-
-.. metric:: minio_node_disk_used_bytes
-
-   Total storage used on a disk.
-
-.. metric:: minio_node_file_descriptor_limit_total
-
-   Limit on total number of open file descriptors for the MinIO Server process.
-
-.. metric:: minio_node_file_descriptor_open_total
-
-   Total number of open file descriptors by the MinIO Server process.
-
-.. metric:: minio_node_io_rchar_bytes
-
-   Total bytes read by the process from the underlying storage system including
-   cache, ``/proc/[pid]/io rchar``
-
-.. metric:: minio_node_io_read_bytes
-
-   Total bytes read by the process from the underlying storage system, 
-   ``/proc/[pid]/io read_bytes``
-
-.. metric:: minio_node_io_wchar_bytes
-
-   Total bytes written by the process to the underlying storage system including 
-   page cache, ``/proc/[pid]/io wchar``
-
-.. metric:: minio_node_io_write_bytes
-
-   Total bytes written by the process to the underlying storage system, 
-   ``/proc/[pid]/io write_bytes``
-
-.. metric:: minio_node_process_starttime_seconds
-
-   Start time for MinIO process per node, time in seconds since Unix epoch.
-
-.. metric:: minio_node_process_uptime_seconds
-
-   Uptime for MinIO process per node in seconds.
-
-.. metric:: minio_node_scanner_bucket_scans_finished
-
-   Total number of bucket scans finished since server start.
-
-.. metric:: minio_node_scanner_bucket_scans_started
-
-   Total number of bucket scans started since server start.
-
-.. metric:: minio_node_scanner_directories_scanned
-
-   Total number of directories scanned since server start.
-
-.. metric:: minio_node_scanner_objects_scanned
-
-   Total number of unique objects scanned since server start.
-
-.. metric:: minio_node_scanner_versions_scanned
-
-   Total number of object versions scanned since server start.
-
-.. metric:: minio_node_syscall_read_total
-
-   Total read SysCalls to the kernel. ``/proc/[pid]/io syscr``
-
-.. metric:: minio_node_syscall_write_total
-
-   Total write SysCalls to the kernel. ``/proc/[pid]/io syscw``
-
-S3 Metrics
-++++++++++
-
-.. metric:: minio_s3_requests_error_total
-
-   Total number S3 requests with errors
-
-.. metric:: minio_s3_requests_inflight_total
-
-   Total number of S3 requests currently in flight
-
-.. metric:: minio_s3_requests_total
-
-   Total number S3 requests
-
-.. metric:: minio_s3_time_ttbf_seconds_distribution
-
-   Distribution of the time to first byte across API calls.
-
-.. metric:: minio_s3_traffic_received_bytes
-
-   Total number of s3 bytes received.
-
-.. metric:: minio_s3_traffic_sent_bytes
-
-   Total number of s3 bytes sent
-
-Software Metrics
-++++++++++++++++
-
-.. metric:: minio_software_commit_info
-
-   Git commit hash for the MinIO release.
-
-.. metric:: minio_software_version_info
-
-   MinIO Release tag for the server
-
-.. _minio-metrics-and-alerts-alerting:
-
-Alerts
------
-
-You can configure alerts using Prometheus :prometheus-docs:`Alerting Rules
-<prometheus/latest/configuration/alerting_rules/>` based on the collected MinIO
-metrics. The Prometheus :prometheus-docs:`Alert Manager
-<alerting/latest/overview/>` supports managing alerts produced by the configured
-alerting rules. Prometheus also supports a :prometheus-docs:`Webhook Receiver
-<operating/integrations/#alertmanager-webhook-receiver>` for publishing alerts
-to mechanisms not supported by Prometheus AlertManager.
+Restart the MinIO deployment and visit the :ref:`Monitoring <minio-console-monitoring>` pane to see the historical data views.
--- a/source/operations/monitoring/metrics-and-alerts.rst
+++ b/source/operations/monitoring/metrics-and-alerts.rst
@ -0,0 +1,377 @@
+.. _minio-metrics-and-alerts-endpoints:
+.. _minio-metrics-and-alerts-alerting:
+.. _minio-metrics-and-alerts:
+
+==================
+Metrics and Alerts
+==================
+
+.. default-domain:: minio
+
+.. contents:: Table of Contents
+   :local:
+   :depth: 2
+
+MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
+You can use any scraping tool to pull metrics data from MinIO for further analysis and alerting.
+
+MinIO provides a scraping endpoint for cluster-level metrics:
+
+.. code-block:: shell
+   :class: copyable
+
+   http://minio.example.net:9000/minio/v2/metrics/cluster
+
+Replace ``http://minio.example.net`` with the hostname of any node in the MinIO deployment. 
+For deployments with a load balancer managing connections between MinIO nodes, specify the address of the load balancer.
+
+MinIO by default requires authentication for scraping the metrics endpoints.
+Use the :mc-cmd:`mc admin prometheus generate` command to generate the necessary bearer tokens. 
+You can alternatively disable metrics endpoint authentication by setting :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` to ``public``.
+
+.. _minio-console-metrics:
+
+MinIO Console Metrics Dashboard
+-------------------------------
+
+The :ref:`MinIO Console <minio-console-monitoring>` provides a point-in-time metrics dashboard by default:
+
+.. image:: /images/minio-console/console-metrics-simple.png
+   :width: 600px
+   :alt: MinIO Console with Point-In-Time Metrics
+   :align: center
+
+The Console also supports displaying time-series and historical data by querying a :prometheus-docs:`Prometheus <prometheus/latest/getting_started/>` service configured to scrape data from the MinIO deployment. 
+Specifically, the MinIO Console uses :prometheus-docs:`Prometheus query API <prometheus/latest/querying/api/>` to retrieve stored metrics data and display the following visualizations:
+
+- :guilabel:`Usage` - provides historical and on-demand visualization of overall usage and status
+- :guilabel:`Traffic` - provides historical and on-demand visualization of network traffic
+- :guilabel:`Resources` - provides historical and on-demand visualization of  resources (compute and storage)
+- :guilabel:`Info` - provides point-in-time status of the deployment
+
+.. image:: /images/minio-console/console-metrics.png
+   :width: 600px
+   :alt: MinIO Console displaying Prometheus-backed Monitoring Data
+   :align: center
+
+.. cond:: k8s
+
+   The MinIO Operator supports deploying a per-tenant Prometheus instance configured to support metrics and visualization.
+   
+   If you deploy the Tenant with this feature disabled *but* still want the historical metric views, you can instead configure an external Prometheus service to scrape the Tenant metrics.
+   Once configured, you can update the Tenant to query that Prometheus service to retrieve metric data:
+
+.. cond:: linux or container or macos or windows
+   
+   To enable historical data visualization in MinIO Console, set the following environment variables on each node in the MinIO deployment:
+
+- Set :envvar:`MINIO_PROMETHEUS_URL` to the URL of the Prometheus service
+- Set :envvar:`MINIO_PROMETHEUS_JOB_ID` to the unique job ID assigned to the collected metrics
+
+MinIO also publishes a `Grafana Dashboard <https://grafana.com/grafana/dashboards/13502>`_ for visualizing collected metrics. 
+For more complete documentation on configuring a Prometheus-compatible data source for Grafana, see :prometheus-docs:`Grafana Support for Prometheus <visualization/grafana/>`.
+
+.. _minio-metrics-and-alerts-available-metrics:
+
+Available Metrics
+-----------------
+
+MinIO publishes the following metrics, where each metric includes a label for
+the MinIO server which generated that metric.
+
+Object and Bucket Metrics
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. metric:: minio_bucket_objects_size_distribution
+
+   Distribution of object sizes in a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_usage_object_total
+
+   Total number of objects in a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_usage_total_bytes
+
+   Total bucket size in bytes in a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+Replication Metrics
+~~~~~~~~~~~~~~~~~~~
+
+These metrics are only populated for MinIO clusters with 
+:ref:`minio-bucket-replication-serverside` enabled.
+
+.. metric:: minio_bucket_replication_failed_bytes
+
+   Total number of bytes that failed at least once to replicate for a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label
+
+.. metric:: minio_bucket_replication_pending_bytes
+
+   Total number of bytes pending to replicate for a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label
+
+.. metric:: minio_bucket_replication_received_bytes
+
+   Total number of bytes replicated to this bucket from another source bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_replication_sent_bytes
+
+   Total number of bytes replicated to the target bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_replication_pending_count
+
+   Total number of replication operations pending for a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_replication_failed_count
+
+   Total number of replication operations failed for a given bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+Capacity Metrics
+~~~~~~~~~~~~~~~~
+
+.. metric:: minio_cluster_capacity_raw_free_bytes
+
+   Total free capacity online in the cluster.
+
+.. metric:: minio_cluster_capacity_raw_total_bytes
+
+   Total capacity online in the cluster.
+
+.. metric:: minio_cluster_capacity_usable_free_bytes
+
+   Total free usable capacity online in the cluster.
+
+.. metric:: minio_cluster_capacity_usable_total_bytes
+
+   Total usable capacity online in the cluster.
+
+.. metric:: minio_node_disk_free_bytes
+
+   Total storage available on a specific drive for a node in the MinIO deployment.
+   You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
+
+.. metric:: minio_node_disk_total_bytes
+
+   Total storage on a specific drive for a node in the MinIO deployment. 
+   You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
+
+.. metric:: minio_node_disk_used_bytes
+
+   Total storage used on a specific drive for a node in a MinIO deployment. 
+   You can identify the drive and node using the ``{ disk="/path/to/disk",server="STRING"}`` labels respectively.
+
+Lifecycle Management Metrics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. metric:: minio_cluster_ilm_transitioned_bytes
+
+   Total number of bytes transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
+
+
+.. metric:: minio_cluster_ilm_transitioned_objects
+
+   Total number of objects transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
+
+.. metric:: minio_cluster_ilm_transitioned_versions
+
+   Total number of non-current object versions transitioned using :ref:`tiering/transition lifecycle management rules <minio-lifecycle-management-tiering>`
+
+.. metric:: minio_node_ilm_transition_pending_tasks
+
+   Total number of pending :ref:`object transition <minio-lifecycle-management-tiering>` tasks
+
+.. metric:: minio_node_ilm_expiry_pending_tasks
+
+   Total number of pending :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
+
+.. metric:: minio_node_ilm_expiry_active_tasks
+
+   Total number of active :ref:`object expiration <minio-lifecycle-management-expiration>` tasks
+
+Node and Disk Health Metrics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. metric:: minio_cluster_disk_online_total
+
+   The total number of disks online
+
+.. metric:: minio_cluster_disk_offline_total
+
+   The total number of disks offline
+
+.. metric:: minio_cluster_disk_total
+
+   The total number of disks
+
+.. metric:: minio_cluster_nodes_offline_total
+
+   Total number of MinIO nodes offline.
+
+.. metric:: minio_cluster_nodes_online_total
+
+   Total number of MinIO nodes online.
+
+.. metric:: minio_heal_objects_error_total
+
+   Objects for which healing failed in current self healing run
+
+.. metric:: minio_heal_objects_heal_total
+
+   Objects healed in current self healing run
+
+.. metric:: minio_heal_objects_total
+
+   Objects scanned in current self healing run
+
+.. metric:: minio_heal_time_last_activity_nano_seconds
+
+   Time elapsed (in nano seconds) since last self healing activity. This is set
+   to -1 until initial self heal
+
+Scanner Metrics
+~~~~~~~~~~~~~~~
+
+.. metric:: minio_node_scanner_bucket_scans_finished
+
+   Total number of bucket scans finished since server start.
+
+.. metric:: minio_node_scanner_bucket_scans_started
+
+   Total number of bucket scans started since server start.
+
+.. metric:: minio_node_scanner_directories_scanned
+
+   Total number of directories scanned since server start.
+
+.. metric:: minio_node_scanner_objects_scanned
+
+   Total number of unique objects scanned since server start.
+
+.. metric:: minio_node_scanner_versions_scanned
+
+   Total number of object versions scanned since server start.
+
+.. metric:: minio_node_syscall_read_total
+
+   Total number of read SysCalls to the kernel. ``/proc/[pid]/io syscr``
+
+.. metric:: minio_node_syscall_write_total
+
+   Total number of write SysCalls to the kernel. ``/proc/[pid]/io syscw``
+
+S3 Metrics
+~~~~~~~~~~
+
+.. metric:: minio_bucket_traffic_sent_bytes
+
+   Total number of bytes of S3 traffic sent per bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_bucket_traffic_received_bytes
+
+   Total number of bytes of S3 traffic received per bucket.
+   You can identify the bucket using the ``{ bucket="STRING" }`` label.
+
+.. metric:: minio_s3_requests_inflight_total
+
+   Total number of S3 requests currently in flight.
+
+.. metric:: minio_s3_requests_total
+
+   Total number of S3 requests.
+
+.. metric:: minio_s3_time_ttfb_seconds_distribution
+
+   Distribution of the time to first byte across API calls.
+
+.. metric:: minio_s3_traffic_received_bytes
+
+   Total number of S3 bytes received.
+
+.. metric:: minio_s3_traffic_sent_bytes
+
+   Total number of S3 bytes sent.
+
+.. metric:: minio_s3_requests_errors_total
+
+   Total number of S3 requests with 4xx and 5xx errors.
+
+.. metric:: minio_s3_requests_4xx_errors_total
+
+   Total number of S3 requests with 4xx errors.
+
+.. metric:: minio_s3_requests_5xx_errors_total
+
+   Total number of S3 requests with 5xx errors.
+
+Internal Metrics
+~~~~~~~~~~~~~~~~
+
+.. metric:: minio_inter_node_traffic_received_bytes
+
+   Total number of bytes received from other peer nodes.
+
+.. metric:: minio_inter_node_traffic_sent_bytes
+
+   Total number of bytes sent to the other peer nodes.
+
+.. metric:: minio_node_file_descriptor_limit_total
+
+   Limit on total number of open file descriptors for the MinIO Server process.
+
+.. metric:: minio_node_file_descriptor_open_total
+
+   Total number of open file descriptors by the MinIO Server process.
+
+.. metric:: minio_node_io_rchar_bytes
+
+   Total bytes read by the process from the underlying storage system including
+   cache, ``/proc/[pid]/io rchar``
+
+.. metric:: minio_node_io_read_bytes
+
+   Total bytes read by the process from the underlying storage system, 
+   ``/proc/[pid]/io read_bytes``
+
+.. metric:: minio_node_io_wchar_bytes
+
+   Total bytes written by the process to the underlying storage system including 
+   page cache, ``/proc/[pid]/io wchar``
+
+.. metric:: minio_node_io_write_bytes
+
+   Total bytes written by the process to the underlying storage system, 
+   ``/proc/[pid]/io write_bytes``
+
+Software and Process Metrics
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. metric:: minio_software_commit_info
+
+   Git commit hash for the MinIO release.
+
+.. metric:: minio_software_version_info
+
+   MinIO Release tag for the server
+
+.. metric:: minio_node_process_starttime_seconds
+
+   Start time for MinIO process per node, time in seconds since Unix epoch.
+
+.. metric:: minio_node_process_uptime_seconds
+
+   Uptime for MinIO process per node in seconds.
+
+.. toctree::
+   :titlesonly:
+   :hidden:
+
+   /operations/monitoring/collect-minio-metrics-using-prometheus
+   /operations/monitoring/monitor-and-alert-using-influxdb
--- a/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
+++ b/source/operations/monitoring/monitor-and-alert-using-influxdb.rst
@ -0,0 +1,121 @@
+.. _minio-metrics-influxdb:
+
+======================================
+Monitoring and Alerting using InfluxDB
+======================================
+
+.. default-domain:: minio
+
+.. contents:: Table of Contents
+   :local:
+   :depth: 1
+
+MinIO publishes cluster and node metrics using the :prometheus-docs:`Prometheus Data Model <data_model/>`.
+`InfluxDB <https://www.influxdata.com/?ref=minio>`__ supports scraping MinIO metrics data for monitoring and alerting.
+
+The procedure on this page documents the following:
+
+- Configuring an InfluxDB service to scrape and display metrics from a MinIO deployment
+- Configuring an Alert on a MinIO metric 
+
+.. admonition:: Prerequisites
+   :class: note
+
+   This procedure requires the following:
+
+   - An existing InfluxDB deployment configured with one or more :influxdb-docs:`notification endpoints <notification-endpoints/>`
+   - An existing MinIO deployment with network access to the InfluxDB deployment
+   - An :mc:`mc` installation on your local host configured to :ref:`access <alias>` the MinIO deployment
+
+.. cond:: k8s
+
+   This procedure assumes all necessary network control components, such as Ingress or Load Balancers, to facilitate access between the MinIO Tenant and the InfluxDB service.
+
+Configure InfluxDB to Collect and Alert using MinIO Metrics
+-----------------------------------------------------------
+
+.. important::
+
+   This procedure specifically uses the InfluxDB UI to create a scraping endpoint. 
+   
+   The InfluxDB UI does not provide the same level of configuration as using `Telegraf <https://docs.influxdata.com/telegraf/v1.24/>`__ and the corresponding `Prometheus plugin <https://github.com/influxdata/telegraf/blob/release-1.24/plugins/inputs/prometheus/README.md>`__.
+   Specifically:
+
+   - You cannot enable authenticated access to the MinIO metrics endpoint via the InfluxDB UI
+   - You cannot set a tag for collected metrics (e.g. ``url_tag``) for uniquely identifying the metrics for a given MinIO deployment
+
+   .. cond:: k8s
+
+      The Telegraf Prometheus plugin also supports Kubernetes-specific features, such as scraping the ``minio`` service for a given MinIO Tenant.
+
+   Configuring Telegraf is out of scope for this procedure. 
+   You can use this procedure as general guidance for configuring Telegraf to scrape MinIO metrics.
+
+.. container:: procedure
+
+   1. Configure Public Access to MinIO Metrics
+
+      Set the :envvar:`MINIO_PROMETHEUS_AUTH_TYPE` environment variable to ``"public"`` for all nodes in the MinIO deployment.
+      You can then restart the deployment to allow public access to MinIO metrics.
+
+      You can validate the change by attempting to ``curl`` the metrics endpoint:
+
+      .. code-block:: shell
+         :class: copyable
+
+         curl https://HOSTNAME/minio/v2/metrics/cluster
+
+      Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
+      You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
+
+      The response body should include a list of collected MinIO metrics.
+
+   #. Log into the InfluxDB UI and Create a Bucket
+
+      Select the :influxdb-docs:`Organization <organizations/view-orgs/>` under which you want to store MinIO metrics.
+
+      Create a :influxdb-docs:`New Bucket <organizations/buckets/create-bucket/>` in which to store metrics for the MinIO deployment.
+
+   #. Create a new Scraping Source
+
+      Create a :influxdb-docs:`new InfluxDB Scraper <write-data/no-code/scrape-data/manage-scrapers/create-a-scraper/>`.
+
+      Specify the full URL to the MinIO deployment, including the metrics endpoint:
+
+      .. code-block:: shell
+         :class: copyable
+
+         https://HOSTNAME/minio/v2/metrics/cluster
+
+      Replace ``HOSTNAME`` with the URL of the load balancer or reverse proxy through which you access the MinIO deployment.
+      You can alternatively specify any single node as ``HOSTNAME:PORT``, specifying the MinIO server API port in addition to the node hostname.
+
+   #. Validate the Data
+
+      Use the :influxdb-docs:`DataExplorer <query-data/execute-queries/data-explorer/>` to visualize the collected MinIO data.
+
+      For example, you can set a filter on :metric:`minio_cluster_capacity_usable_total_bytes` and :metric:`minio_cluster_capacity_usable_free_bytes` to compare the total usable against total free space on the MinIO deployment.
+
+   #. Configure a Check
+
+      Create a :influxdb-docs:`new Check <https://docs.influxdata.com/influxdb/v2.4/monitor-alert/checks/create/>` on a MinIO metric.
+
+      The following example check rules provide a baseline of alerts for a MinIO deployment.
+      You can modify or otherwise use these examples for guidance in building your own checks.
+
+      - Create a :guilabel:`Threshold Check` named ``MINIO_NODE_DOWN``. 
+      
+        Set the filter for the :metric:`minio_cluster_nodes_offline_total` key.
+        
+        Set the :guilabel:`Thresholds` to :guilabel:`WARN` when the value is greater than :guilabel:`1`
+
+      - Create a :guilabel:`Threshold Check` named ``MINIO_QUORUM_WARNING``.
+
+        Set the filter for the :metric:`minio_cluster_disk_offline_total` key.
+
+        Set the :guilabel:`Thresholds` to :guilabel:`CRITICAL` when the value is one less than your configured :ref:`Erasure Code Parity <minio-erasure-coding>` setting.
+
+        For example, a deployment using EC:4 should set this value to ``3``.
+
+      Configure your :influxdb-docs:`Notification endpoints <monitor-alert/notification-endpoints/>` and :influxdb-docs:`Notification rules <monitor-alert/notification-rules/>` such that checks of each type trigger an appropriate response.
+