# Monitoring by Prometheus

This article describes how to monitor Fluentd via [Prometheus](https://prometheus.io/).

Since both Prometheus and Fluentd are under [CNCF (Cloud Native Computing Foundation)](https://www.cncf.io/), Fluentd project is recommending to use Prometheus by default to monitor Fluentd.

## Installation

First of all, please install `fluent-plugin-prometheus` gem.

```
$ fluent-gem install fluent-plugin-prometheus --version=0.4.0
```

If you are using td-agent, use `td-agent-gem` for installation.

```
$ sudo td-agent-gem install fluent-plugin-prometheus --version=0.4.0
```

## Example Fluentd Configuration

To expose the Fluentd metrics to Prometheus, we need to configure 3 parts:

* Step 1: Prometheus Filter Plugin to count Incoming Records
* Step 2: Prometheus Output Plugin to count Outgoing Records
* Step 3: Prometheus Input Plugin to expose metrics via HTTP

### Step 1: Counting Incoming Records by Prometheus Filter Plugin

First, please add the `<filter>` section like below, to count the incoming records per tag. With this configuration, `prometheus` filter starts adding the internal counter as the record comes in.

```
# source
<source>
  @type forward
  bind 0.0.0.0
  port 24224
</source>

# count number of incoming records per tag
<filter company.*>
  @type prometheus
  <metric>
    name fluentd_input_status_num_records_total
    type counter
    desc The total number of incoming records
    <labels>
      tag ${tag}
      hostname ${hostname}
    </labels>
  </metric>
</filter>
```

### Step 2: Counting Outgoing Records by Prometheus Output Plugin

Second, please use `copy` plugin with `prometheus` output plugin, to count the outgoing records per tag. With this configuration, `prometheus` output starts adding the internal counter as the record goes out.

```
# count number of outgoing records per tag
<match company.*>
  @type copy
  <store>
    @type forward
    <server>
      name myserver1
      hostname 192.168.1.3
      port 24224
      weight 60
    </server>
  </store>
  <store>
    @type prometheus
    <metric>
      name fluentd_output_status_num_records_total
      type counter
      desc The total number of outgoing records
      <labels>
        tag ${tag}
        hostname ${hostname}
      </labels>
    </metric>
  </store>
</match>
```

### Step 3: Expose Metrics by Prometheus Input Plugin via HTTP

Finally, please use `prometheus` input plugin to expose internal counter information via HTTP.

```
# expose metrics in prometheus format
<source>
  @type prometheus
  bind 0.0.0.0
  port 24231
  metrics_path /metrics
</source>
<source>
  @type prometheus_output_monitor
  interval 10
  <labels>
    hostname ${hostname}
  </labels>
</source>
```

### Step 4: Check the Configuration

After you have done 3 changes, please restart fluentd.

```
# For stand-alone Fluentd installations
$ fluentd -c fluentd.conf
# For td-agent users
$ sudo /etc/init.d/td-agent restart
```

Let's send some records.

```
$ echo '{"message":"hello"}' | bundle exec fluent-cat company.test1
$ echo '{"message":"hello"}' | bundle exec fluent-cat company.test1
$ echo '{"message":"hello"}' | bundle exec fluent-cat company.test1
$ echo '{"message":"hello"}' | bundle exec fluent-cat company.test2
```

Then, please access to `http://localhost:24231/metrics`, which is the URL to receive metrics in [Prometheus format](https://prometheus.io/docs/instrumenting/exposition_formats/).

```
curl http://localhost:24231/metrics
# TYPE fluentd_input_status_num_records_total counter
# HELP fluentd_input_status_num_records_total The total number of incoming records
fluentd_input_status_num_records_total{tag="company.test",host="KZK.local"} 3.0
fluentd_input_status_num_records_total{tag="company.test2",host="KZK.local"} 1.0
# TYPE fluentd_output_status_num_records_total counter
# HELP fluentd_output_status_num_records_total The total number of outgoing records
fluentd_output_status_num_records_total{tag="company.test",host="KZK.local"} 3.0
fluentd_output_status_num_records_total{tag="company.test2",host="KZK.local"} 1.0
# TYPE fluentd_output_status_buffer_queue_length gauge
# HELP fluentd_output_status_buffer_queue_length Current buffer queue length.
fluentd_output_status_buffer_queue_length{hostname="KZK.local",plugin_id="object:3fcbccc6d388",type="forward"} 1.0
....
```

## Example Prometheus Configuration

Please prepare the file below as `prometheus.yml`.

```
global:
  scrape_interval: 10s # Set the scrape interval to every 10 seconds. Default is every 1 minute.

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  - job_name: 'fluentd'
    static_configs:
      - targets: ['localhost:24231']
```

Then, launch `prometheus` process.

```
$ ./prometheus --config.file="prometheus.yml"
```

Now please open your browser and access to `http://localhost:9090/`.

## How to use Prometheus to monitor Fluentd

### List of Fluentd nodes

If you go to `http://localhost:9090/targets`, Prometheus will show you a list of Fluentd nodes and its status.

![](https://3804023877-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LR7OsqPORtP86IQxs6E%2Fsync%2F4301d6998b9cfeccd20694398440ecad74fe2027.png?generation=1622681095673859\&alt=media)

### List of Fluentd metrics

Then, visit `http://localhost:9090/graph` to explore Fluentd internal metrics. There, you'll see 8 metrics in the metric list:

![](https://3804023877-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LR7OsqPORtP86IQxs6E%2F-LR7PDOnAgulIFNQiUNJ%2F-LR7PQZd2A4I-eB7jsJD%2Fprometheus-metrics.png?generation=1542034410893577\&alt=media)

* fluentd\_input\_status\_num\_records\_total
* fluentd\_output\_status\_buffer\_queue\_length
* fluentd\_output\_status\_buffer\_total\_bytes
* fluentd\_output\_status\_emit\_count
* fluentd\_output\_status\_num\_errors
* fluentd\_output\_status\_num\_records\_total
* fluentd\_output\_status\_retry\_count
* fluentd\_output\_status\_retry\_wait

Please pick `fluentd_input_status_num_records_total`, and you'll see the total incoming records per tag.

![](https://3804023877-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LR7OsqPORtP86IQxs6E%2F-LR7PDOnAgulIFNQiUNJ%2F-LR7PQZf0HSTGY9D0DLH%2Fprometheus-graph.png?generation=1542034410369478\&alt=media)

### Example Prometheus Queries

Since `fluentd_input_status_num_records_total` and `fluentd_output_status_num_records_total` are monotonically increasing numbers, it requires a little bit of calculation by [PromQL (Prometheus Query Language)](https://prometheus.io/docs/prometheus/latest/querying/basics/) to make them meaningful.

Here are the example PromQLs for common metrics everyone wants to see.

```
# number of available nodes
up

# incoming records / sec / host
sum(rate(fluentd_input_status_num_records_total[1m])) by (hostname)

# incoming records / sec / tag
sum(rate(fluentd_input_status_num_records_total[1m])) by (tag)

# outgoing records / sec / host
sum(rate(fluentd_output_status_num_records_total[1m])) by (hostname)

# outgoing records / sec / tag
sum(rate(fluentd_output_status_num_records_total[1m])) by (tag)

# emit count / sec
rate(fluentd_output_status_emit_count[1m])
```

### Metrics to Monitor

In addition to the traffic metrics introduced above, it is important to monitor the queue length and error count.

If these values are increasing, it means Fluentd cannot flush the buffer to the destination. Thus you will lose the data once the buffer becomes full.

```
# maximum buffer length in last 1min
max_over_time(fluentd_output_status_buffer_queue_length[1m])

# maximum buffer bytes in last 1min
max_over_time(fluentd_output_status_buffer_total_bytes[1m])

# maximum retry wait in last 1min
max_over_time(fluentd_output_status_retry_wait[1m])

# retry count / sec
rate(fluentd_output_status_retry_count[1m])
```

## Grafana for Advanced Visualization / Alerting

For more advanced visualization and alerting, we recommend to use [Grafana](https://grafana.com/) as a visualization frontend for Prometheus.

* [Grafana Support for Prometheus](https://prometheus.io/docs/visualization/grafana/)

![](https://3804023877-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-LR7OsqPORtP86IQxs6E%2Fsync%2F531d32ecc3e3b0d3e687e328f160a893247eff15.png?generation=1622681095358509\&alt=media)

## Further Readings

* [Prometheus Documentation](https://prometheus.io/docs/introduction/overview/)
* [Grafana Documentation](http://docs.grafana.org/)

If this article is incorrect or outdated, or omits critical information, please [let us know](https://github.com/fluent/fluentd-docs-gitbook/issues?state=open). [Fluentd](http://www.fluentd.org/) is a open source project under [Cloud Native Computing Foundation (CNCF)](https://cncf.io/). All components are available under the Apache 2 License.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.fluentd.org/0.12/articles/monitoring-prometheus.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
