10 KiB
Metrics and Alerts
minio
Table of Contents
MinIO leverages Prometheus for
metrics and alerts. Prometheus is an Open-Source systems and service
monitoring system which supports analyzing and alerting based on
collected metrics. The Prometheus ecosystem includes multiple integrations
<operating/integrations/>
, allowing wide latitude in
processing and storing collected metrics.
- MinIO publishes Prometheus-compatible scraping endpoints for cluster
and node-level metrics. See
minio-metrics-and-alerts-endpoints
for more information. - For alerts, use Prometheus
Alerting Rules <prometheus/latest/configuration/alerting_rules/>
and theAlert Manager <alerting/latest/overview/>
to trigger alerts based on collected metrics. Seeminio-metrics-and-alerts-alerting
for more information.
MinIO publishes collected metrics data using Prometheus-compatible data structures. Any Prometheus-compatible scraping software can ingest and process MinIO metrics for analysis, visualization, and alerting.
Metrics
MinIO provides a scraping endpoint for cluster-level metrics:
http://minio.example.net:9000/minio/v2/metrics/cluster
Replace http://minio.example.net
with the hostname of
any node in the MinIO deployment. For deployments with a load balancer
managing connections between MinIO nodes, specify the address of the
load balancer.
Create a new scraping configuration
<prometheus/latest/configuration/configuration/#scrape_config>
to begin collecting metrics from the MinIO deployment. See minio-metrics-collect-using-prometheus
for a complete
tutorial.
The following example describes a scrape_configs
entry
for collecting cluster metrics.
scrape_configs:
- job_name: minio-job
bearer_token: <secret>
metrics_path: /minio/v2/metrics/cluster
scheme: https
static_configs:
- targets: ['minio.example.net:9000']
job_name |
The name of the scraping job. |
|
The JWT token generated by Omit this field if the MinIO deployment was started with |
targets |
The endpoint for the MinIO deployment. You can specify any node in the deployment for collecting cluster metrics. For clusters with a load balancer managing connections between MinIO nodes, specify the address of the load balancer. |
MinIO by default requires authentication for scraping the metrics
endpoints. Use the mc admin prometheus generate
command to generate
the necessary bearer tokens for use with configuring the
scrape_configs.bearer_token
field. You can alternatively
disable metrics endpoint authentication by setting MINIO_PROMETHEUS_AUTH_TYPE
to public
.
Visualizing Metrics
The MinIO Console uses the metrics collected by Prometheus to populate the Dashboard metrics:
Set the MINIO_PROMETHEUS_URL
environment variable to the
URL of the Prometheus service to allow the Console to retrieve and
display collected metrics. See minio-metrics-collect-using-prometheus
for a complete
example.
MinIO also publishes a Grafana
Dashboard for visualizing collected metrics. For more complete
documentation on configuring a Prometheus data source for Grafana, see
Grafana Support for Prometheus
<visualization/grafana/>
.
Available Metrics
MinIO publishes the following metrics, where each metric includes a label for the MinIO server which generated that metric.
Object Metrics
minio_bucket_objects_size_distribution
Distribution of object sizes in the bucket, includes label for the bucket name.
Replication Metrics
These metrics are only populated for MinIO clusters with minio-bucket-replication-serverside
enabled.
minio_bucket_replication_failed_bytes
Total number of bytes failed at least once to replicate.
minio_bucket_replication_pending_bytes
Total bytes pending to replicate.
minio_bucket_replication_received_bytes
Total number of bytes replicated to this bucket from another source bucket.
minio_bucket_replication_sent_bytes
Total number of bytes replicated to the target bucket.
minio_bucket_replication_pending_count
Total number of replication operations pending for this bucket.
minio_bucket_replication_failed_count
Total number of replication operations failed for this bucket.
Bucket Metrics
minio_bucket_usage_object_total
Total number of objects
minio_bucket_usage_total_bytes
Total bucket size in bytes
Cache Metrics
minio_cache_hits_total
Total number of disk cache hits
minio_cache_missed_total
Total number of disk cache misses
minio_cache_sent_bytes
Total number of bytes served from cache
minio_cache_total_bytes
Total size of cache disk in bytes
minio_cache_usage_info
Total percentage cache usage, value of 1 indicates high and 0 low, label level is set as well
minio_cache_used_bytes
Current cache usage in bytes
Cluster Metrics
minio_cluster_capacity_raw_free_bytes
Total free capacity online in the cluster.
minio_cluster_capacity_raw_total_bytes
Total capacity online in the cluster.
minio_cluster_capacity_usable_free_bytes
Total free usable capacity online in the cluster.
minio_cluster_capacity_usable_total_bytes
Total usable capacity online in the cluster.
Node Metrics
minio_cluster_nodes_offline_total
Total number of MinIO nodes offline.
minio_cluster_nodes_online_total
Total number of MinIO nodes online.
minio_heal_objects_error_total
Objects for which healing failed in current self healing run
minio_heal_objects_heal_total
Objects healed in current self healing run
minio_heal_objects_total
Objects scanned in current self healing run
minio_heal_time_last_activity_nano_seconds
Time elapsed (in nano seconds) since last self healing activity. This is set to -1 until initial self heal
minio_inter_node_traffic_received_bytes
Total number of bytes received from other peer nodes.
minio_inter_node_traffic_sent_bytes
Total number of bytes sent to the other peer nodes.
minio_node_disk_free_bytes
Total storage available on a disk.
minio_node_disk_total_bytes
Total storage on a disk.
minio_node_disk_used_bytes
Total storage used on a disk.
minio_node_file_descriptor_limit_total
Limit on total number of open file descriptors for the MinIO Server process.
minio_node_file_descriptor_open_total
Total number of open file descriptors by the MinIO Server process.
minio_node_io_rchar_bytes
Total bytes read by the process from the underlying storage system
including cache, /proc/[pid]/io rchar
minio_node_io_read_bytes
Total bytes read by the process from the underlying storage system,
/proc/[pid]/io read_bytes
minio_node_io_wchar_bytes
Total bytes written by the process to the underlying storage system
including page cache, /proc/[pid]/io wchar
minio_node_io_write_bytes
Total bytes written by the process to the underlying storage system,
/proc/[pid]/io write_bytes
minio_node_process_starttime_seconds
Start time for MinIO process per node, time in seconds since Unix epoch.
minio_node_process_uptime_seconds
Uptime for MinIO process per node in seconds.
minio_node_syscall_read_total
Total read SysCalls to the kernel.
/proc/[pid]/io syscr
minio_node_syscall_write_total
Total write SysCalls to the kernel.
/proc/[pid]/io syscw
S3 Metrics
minio_s3_requests_error_total
Total number S3 requests with errors
minio_s3_requests_inflight_total
Total number of S3 requests currently in flight
minio_s3_requests_total
Total number S3 requests
minio_s3_time_ttbf_seconds_distribution
Distribution of the time to first byte across API calls.
minio_s3_traffic_received_bytes
Total number of s3 bytes received.
minio_s3_traffic_sent_bytes
Total number of s3 bytes sent
Software Metrics
minio_software_commit_info
Git commit hash for the MinIO release.
minio_software_version_info
MinIO Release tag for the server
Alerts
You can configure alerts using Prometheus Alerting Rules
<prometheus/latest/configuration/alerting_rules/>
based on
the collected MinIO metrics. The Prometheus Alert Manager
<alerting/latest/overview/>
supports managing alerts
produced by the configured alerting rules. Prometheus also supports a
Webhook Receiver
<operating/integrations/#alertmanager-webhook-receiver>
for
publishing alerts to mechanisms not supported by Prometheus
AlertManager.
/monitoring/metrics-alerts/collect-minio-metrics-using-prometheus