Fluentd
Search…
Life of a Fluentd event
The following article gives a general overview of how events are processed by Fluentd with examples. It covers the complete lifecycle including Setup, Inputs, Filters, Matches and Labels.

Basic Setup

The configuration file is the fundamental piece to connect all things together, as it allows to define which Inputs or listeners Fluentd will have and set up common matching rules to route the Event data to a specific Output.
We will use the in_http and the out_stdout plugins as examples to describe the events cycle. The following is a basic definition on the configuration file to specify an http input, for short: we will be listening for HTTP Requests:
1
<source>
2
@type http
3
port 8888
4
bind 0.0.0.0
5
</source>
Copied!
The definition specifies that an HTTP server will be listening on TCP port 8888.
Now, let's define a Matching rule to print the incoming requests to the standard output:
1
<match test.cycle>
2
@type stdout
3
</match>
Copied!
The Match sets a rule where each Incoming event that arrives with a Tag equals to test.cycle, will match and use the Output plugin type called stdout. At this point we have an Input type, a Match and an Output.
Let's test this setup using curl:
1
$ curl -i -X POST -d 'json={"action":"login","user":2}' http://localhost:8888/test.cycle
2
HTTP/1.1 200 OK
3
Content-type: text/plain
4
Connection: Keep-Alive
5
Content-length: 0
Copied!
The Fluentd logs should look like this:
1
$ fluentd -c in_http.conf
2
2019-12-16 18:58:15 +0900 [info]: parsing config file is succeeded path="in_http.conf"
3
2019-12-16 18:58:15 +0900 [info]: gem 'fluentd' version '1.8.0'
4
2019-12-16 18:58:15 +0900 [info]: using configuration file: <ROOT>
5
<source>
6
@type http
7
port 8888
8
bind "0.0.0.0"
9
</source>
10
<match test.cycle>
11
@type stdout
12
</match>
13
</ROOT>
14
2019-12-16 18:58:15 +0900 [info]: starting fluentd-1.8.0 pid=44323 ruby="2.4.6"
15
2019-12-16 18:58:15 +0900 [info]: spawn command to main: cmdline=["/path/to/ruby", "-Eascii-8bit:ascii-8bit", "/path/to/fluentd", "-c", "in_http.conf", "--under-supervisor"]
16
2019-12-16 18:58:16 +0900 [info]: adding match pattern="test.cycle" type="stdout"
17
2019-12-16 18:58:16 +0900 [info]: adding source type="http"
18
2019-12-16 18:58:16 +0900 [info]: #0 starting fluentd worker pid=44336 ppid=44323 worker=0
19
2019-12-16 18:58:16 +0900 [info]: #0 fluentd worker is now running worker=0
20
2019-12-16 18:58:27.888557000 +0900 test.cycle: {"action":"login","user":2}
Copied!

Event Structure

A Fluentd event consists of three components:
    tag: Specifies the origin where an event comes from. It is used for
    message routing.
    time: Specifies the time when an event happens with nanosecond resolution.
    record: Specifies the actual log as a JSON object.
The input plugin is responsible for generating the Fluentd event from data sources. For example, in_tail generates events from text lines. If you have the following Apache log:
1
192.168.0.1 - - [28/Feb/2013:12:00:00 +0900] "GET / HTTP/1.1" 200 777
Copied!
You get the following Fluentd event:
1
tag: apache.access # set by configuration
2
time: 1362020400.000000000 # 28/Feb/2013:12:00:00 +0900
3
record: {"user":"-","method":"GET","code":200,"size":777,"host":"192.168.0.1","path":"/"}
Copied!

Processing Events

When a Setup is defined, the Router Engine contains several predefined rules to apply to different input data. Internally, an Event will to pass through a chain of procedures that may alter its lifecycle.
Now, we will expand on our previous basic example and we will add more steps in our Setup to demonstrate how the Events cycle can be altered. We will do this through the new Filters implementation.

Filters

A Filter behaves like a rule to pass or reject an event. The following configuration adds a Filter definition:
1
<source>
2
@type http
3
port 8888
4
bind 0.0.0.0
5
</source>
6
7
<filter test.cycle>
8
@type grep
9
<exclude>
10
key action
11
pattern ^logout$
12
</exclude>
13
</filter>
14
15
<match test.cycle>
16
@type stdout
17
</match>
Copied!
Fluentd configuration visualization link: https://link.calyptia.com/gjl (sign-up required)
Visualization from Calyptia
As you can see, the new Filter definition will be a mandatory step to pass before the control goes to the Match section. The Filter basically will accept or reject the Event based on its type and rule. For our example we want to discard any user logout action. We only care about the logins. The way to accomplish this, is doing a grep inside the Filter to exclude any message on which action key have the logout string.
From a terminal, run the following two curl commands containing different action values:
1
$ curl -i -X POST -d 'json={"action":"login","user":2}' http://localhost:8888/test.cycle
2
HTTP/1.1 200 OK
3
Content-type: text/plain
4
Connection: Keep-Alive
5
Content-length: 0
6
7
$ curl -i -X POST -d 'json={"action":"logout","user":2}' http://localhost:8888/test.cycle
8
HTTP/1.1 200 OK
9
Content-type: text/plain
10
Connection: Keep-Alive
11
Content-length: 0
Copied!
Fluentd logs show only one login message. The logout event has been discarded:
1
$ fluentd -c in_http.conf
2
2019-12-16 19:07:39 +0900 [info]: parsing config file is succeeded path="in_http.conf"
3
2019-12-16 19:07:39 +0900 [info]: gem 'fluentd' version '1.8.0'
4
2019-12-16 19:07:39 +0900 [info]: using configuration file: <ROOT>
5
<source>
6
@type http
7
port 8888
8
bind "0.0.0.0"
9
</source>
10
<filter test.cycle>
11
@type grep
12
<exclude>
13
key "action"
14
pattern ^logout$
15
</exclude>
16
</filter>
17
<match test.cycle>
18
@type stdout
19
</match>
20
</ROOT>
21
2019-12-16 19:07:39 +0900 [info]: starting fluentd-1.8.0 pid=44435 ruby="2.4.6"
22
2019-12-16 19:07:39 +0900 [info]: spawn command to main: cmdline=["/path/to/ruby", "-Eascii-8bit:ascii-8bit", "/path/to/fluentd", "-c", "in_http.conf", "--under-supervisor"]
23
2019-12-16 19:07:40 +0900 [info]: adding filter pattern="test.cycle" type="grep"
24
2019-12-16 19:07:40 +0900 [info]: adding match pattern="test.cycle" type="stdout"
25
2019-12-16 19:07:40 +0900 [info]: adding source type="http"
26
2019-12-16 19:07:40 +0900 [info]: #0 starting fluentd worker pid=44448 ppid=44435 worker=0
27
2019-12-16 19:07:40 +0900 [info]: #0 fluentd worker is now running worker = 0
28
2019-12-16 19:08:06.934660000 +0900 test.cycle: {"action":"login","user":2}
Copied!
As you can see, the Events follow a step-by-step cycle where they are processed in order, from top-to-bottom. The new engine allows to integrate many Filters as required. Also, considering that the configuration file may grow and start getting a bit complex for the readers, a new feature called Labels has been introduced to solve this potential problem.

Labels

This new implementation called Labels, aims to solve the configuration file complexity and allows to define new Routing sections that do not follow the top-to-bottom order, instead they act like linked references. Taking the previous example, we will modify the setup as follows:
1
<source>
2
@type http
3
bind 0.0.0.0
4
port 8888
5
@label @STAGING
6
</source>
7
8
<filter test.cycle>
9
@type grep
10
<exclude>
11
key action
12
pattern ^login$
13
</exclude>
14
</filter>
15
16
<label @STAGING>
17
<filter test.cycle>
18
@type grep
19
<exclude>
20
key action
21
pattern ^logout$
22
</exclude>
23
</filter>
24
25
<match test.cycle>
26
@type stdout
27
</match>
28
</label>
Copied!
Fluentd configuration visualization: https://link.calyptia.com/guh (sign-up required)
Visualization from Calyptia
The new configuration contains a @label parameter under source indicating that the further steps will take place on the @STAGING label section. The expectation is that every event reported on the Source, the Routing Engine will continue processing on @STAGING. Hence, it will skip the old filter definition.

Buffers

In this example, we use stdout, the non-buffered output. But in production, you use outputs in buffered mode e.g. forward, mongodb, s3 and etc. An output plugin using buffered mode first stores the received events into buffers and then writes out buffers to a destination after meeting flush conditions. So, using the buffered output, you do not see the received events immediately unlike stdout non-buffered output.
Buffer is important for reliability and throughput. See Output and Buffer articles.

Conclusion

Once the events are reported by the Fluentd engine on the Source, they are processed step-by-step or inside a referenced Label. Any Event may be filtered out at any moment. The new Routing Engine behavior provides more flexibility and makes easier the processing before reaching the Output plugin.

Learn More

If this article is incorrect or outdated, or omits critical information, please let us know. Fluentd is an open-source project under Cloud Native Computing Foundation (CNCF). All components are available under the Apache 2 License.
Last modified 14d ago