Monitoring by Prometheus
This article describes how to monitor Fluentd via Prometheus.
Since both Prometheus and Fluentd are under CNCF (Cloud Native Computing Foundation), Fluentd project is recommending to use Prometheus by default to monitor Fluentd.
Installation
First of all, please install fluent-plugin-prometheus
gem.
If you are using td-agent, use td-agent-gem
for installation.
Example Fluentd Configuration
To expose the Fluentd metrics to Prometheus, we need to configure 3 parts:
Step 1: Prometheus Filter Plugin to count Incoming Records
Step 2: Prometheus Output Plugin to count Outgoing Records
Step 3: Prometheus Input Plugin to expose metrics via HTTP
Step 1: Counting Incoming Records by Prometheus Filter Plugin
First, please add the <filter>
section like below, to count the incoming records per tag. With this configuration, prometheus
filter starts adding the internal counter as the record comes in.
Step 2: Counting Outgoing Records by Prometheus Output Plugin
Second, please use copy
plugin with prometheus
output plugin, to count the outgoing records per tag. With this configuration, prometheus
output starts adding the internal counter as the record goes out.
Step 3: Expose Metrics by Prometheus Input Plugin via HTTP
Finally, please use prometheus
input plugin to expose internal counter information via HTTP.
Step 4: Check the Configuration
After you have done 3 changes, please restart fluentd.
Let's send some records.
Then, please access to http://localhost:24231/metrics
, which is the URL to receive metrics in Prometheus format.
Example Prometheus Configuration
Please prepare the file below as prometheus.yml
.
Then, launch prometheus
process.
Now please open your browser and access to http://localhost:9090/
.
How to use Prometheus to monitor Fluentd
List of Fluentd nodes
If you go to http://localhost:9090/targets
, Prometheus will show you a list of Fluentd nodes and its status.
List of Fluentd metrics
Then, visit http://localhost:9090/graph
to explore Fluentd internal metrics. There, you'll see 8 metrics in the metric list:
fluentd_input_status_num_records_total
fluentd_output_status_buffer_queue_length
fluentd_output_status_buffer_total_bytes
fluentd_output_status_emit_count
fluentd_output_status_num_errors
fluentd_output_status_num_records_total
fluentd_output_status_retry_count
fluentd_output_status_retry_wait
Please pick fluentd_input_status_num_records_total
, and you'll see the total incoming records per tag.
Example Prometheus Queries
Since fluentd_input_status_num_records_total
and fluentd_output_status_num_records_total
are monotonically increasing numbers, it requires a little bit of calculation by PromQL (Prometheus Query Language) to make them meaningful.
Here are the example PromQLs for common metrics everyone wants to see.
Metrics to Monitor
In addition to the traffic metrics introduced above, it is important to monitor the queue length and error count.
If these values are increasing, it means Fluentd cannot flush the buffer to the destination. Thus you will lose the data once the buffer becomes full.
Grafana for Advanced Visualization / Alerting
For more advanced visualization and alerting, we recommend to use Grafana as a visualization frontend for Prometheus.
Further Readings
If this article is incorrect or outdated, or omits critical information, please let us know. Fluentd is a open source project under Cloud Native Computing Foundation (CNCF). All components are available under the Apache 2 License.
Last updated