To expose the Fluentd metrics to Prometheus, we need to configure 3 parts:
Step 1: Prometheus Filter Plugin to count Incoming Records
Step 2: Prometheus Output Plugin to count Outgoing Records
Step 3: Prometheus Input Plugin to expose metrics via HTTP
Step 1: Counting Incoming Records by Prometheus Filter Plugin
First, please add the <filter> section like below, to count the incoming records per tag. With this configuration, prometheus filter starts adding the internal counter as the record comes in.
# source
<source>
@type forward
bind 0.0.0.0
port 24224
</source>
# count number of incoming records per tag
<filter company.*>
@type prometheus
<metric>
name fluentd_input_status_num_records_total
type counter
desc The total number of incoming records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</filter>
Step 2: Counting Outgoing Records by Prometheus Output Plugin
Second, please use copy plugin with prometheus output plugin, to count the outgoing records per tag. With this configuration, prometheus output starts adding the internal counter as the record goes out.
# count number of outgoing records per tag
<match company.*>
@type copy
<store>
@type forward
<server>
name myserver1
hostname 192.168.1.3
port 24224
weight 60
</server>
</store>
<store>
@type prometheus
<metric>
name fluentd_output_status_num_records_total
type counter
desc The total number of outgoing records
<labels>
tag ${tag}
hostname ${hostname}
</labels>
</metric>
</store>
</match>
Step 3: Expose Metrics by Prometheus Input Plugin via HTTP
Finally, please use prometheus input plugin to expose internal counter information via HTTP.
# expose metrics in prometheus format
<source>
@type prometheus
bind 0.0.0.0
port 24231
metrics_path /metrics
</source>
<source>
@type prometheus_output_monitor
interval 10
<labels>
hostname ${hostname}
</labels>
</source>
Step 4: Check the Configuration
After you have done 3 changes, please restart fluentd.
# For stand-alone Fluentd installations
$ fluentd -c fluentd.conf
# For td-agent users
$ sudo /etc/init.d/td-agent restart
curl http://localhost:24231/metrics
# TYPE fluentd_input_status_num_records_total counter
# HELP fluentd_input_status_num_records_total The total number of incoming records
fluentd_input_status_num_records_total{tag="company.test",host="KZK.local"} 3.0
fluentd_input_status_num_records_total{tag="company.test2",host="KZK.local"} 1.0
# TYPE fluentd_output_status_num_records_total counter
# HELP fluentd_output_status_num_records_total The total number of outgoing records
fluentd_output_status_num_records_total{tag="company.test",host="KZK.local"} 3.0
fluentd_output_status_num_records_total{tag="company.test2",host="KZK.local"} 1.0
# TYPE fluentd_output_status_buffer_queue_length gauge
# HELP fluentd_output_status_buffer_queue_length Current buffer queue length.
fluentd_output_status_buffer_queue_length{hostname="KZK.local",plugin_id="object:3fcbccc6d388",type="forward"} 1.0
....
Example Prometheus Configuration
Please prepare the file below as prometheus.yml.
global:
scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute.
# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
- job_name: 'fluentd'
static_configs:
- targets: ['localhost:24231']
Then, launch prometheus process.
$ ./prometheus --config.file="prometheus.yml"
Now please open your browser and access to http://localhost:9090/.
How to use Prometheus to monitor Fluentd
List of Fluentd nodes
If you go to http://localhost:9090/targets, Prometheus will show you a list of Fluentd nodes and its status.
List of Fluentd metrics
Then, visit http://localhost:9090/graph to explore Fluentd internal metrics. There, you'll see 8 metrics in the metric list:
fluentd_input_status_num_records_total
fluentd_output_status_buffer_queue_length
fluentd_output_status_buffer_total_bytes
fluentd_output_status_emit_count
fluentd_output_status_num_errors
fluentd_output_status_num_records_total
fluentd_output_status_retry_count
fluentd_output_status_retry_wait
Please pick fluentd_input_status_num_records_total, and you'll see the total incoming records per tag.
Example Prometheus Queries
Here are the example PromQLs for common metrics everyone wants to see.
# number of available nodes
up
# incoming records / sec / host
sum(rate(fluentd_input_status_num_records_total[1m])) by (hostname)
# incoming records / sec / tag
sum(rate(fluentd_input_status_num_records_total[1m])) by (tag)
# outgoing records / sec / host
sum(rate(fluentd_output_status_num_records_total[1m])) by (hostname)
# outgoing records / sec / tag
sum(rate(fluentd_output_status_num_records_total[1m])) by (tag)
# emit count / sec
rate(fluentd_output_status_emit_count[1m])
Metrics to Monitor
In addition to the traffic metrics introduced above, it is important to monitor the queue length and error count.
If these values are increasing, it means Fluentd cannot flush the buffer to the destination. Thus you will lose the data once the buffer becomes full.
# maximum buffer length in last 1min
max_over_time(fluentd_output_status_buffer_queue_length[1m])
# maximum buffer bytes in last 1min
max_over_time(fluentd_output_status_buffer_total_bytes[1m])
# maximum retry wait in last 1min
max_over_time(fluentd_output_status_retry_wait[1m])
# retry count / sec
rate(fluentd_output_status_retry_count[1m])
Grafana for Advanced Visualization / Alerting
Further Readings
Then, please access to http://localhost:24231/metrics, which is the URL to receive metrics in .
Since fluentd_input_status_num_records_total and fluentd_output_status_num_records_total are monotonically increasing numbers, it requires a little bit of calculation by to make them meaningful.
For more advanced visualization and alerting, we recommend to use as a visualization frontend for Prometheus.
If this article is incorrect or outdated, or omits critical information, please . is a open source project under . All components are available under the Apache 2 License.