Buffer Plugins
Fluentd has nine (9) types of plugins:
This article gives an overview of Buffer Plugin.
Overview
Buffer plugins are used by output plugins. For example, out_s3
uses buf_file
by default to store incoming stream temporally before transmitting to S3.
Buffer plugins are, as you can tell by the name, pluggable. So you can choose a suitable backend based on your system requirements.
How Buffer Works
A buffer is essentially a set of "chunks". A chunk is a collection of events concatenated into a single blob. Each chunk is managed one by one in the form of files (buf_file
) or continuous memory blocks (buf_memory
).
The Lifecycle of Chunks
You can think of a chunk as a cargo box. A buffer plugin uses a chunk as a lightweight container, and fills it with events incoming from input sources. If a chunk becomes full, then it gets "shipped" to the destination.
Internally, a buffer plugin has two separated places to store its chunks: "stage" where chunks get filled with events, and "queue" where chunks wait before the transportation. Every newly-created chunk starts from stage, then proceeds to queue in time (and subsequently gets transferred to the destination).
Control Retry Behavior
A chunk can fail to be written out to the destination for a number of reasons. The network can go down, or the traffic volumes can exceed the capacity of the destination node. To handle such common failures gracefully, buffer plugins are equipped with a built-in retry mechanism.
How Exponential Backoff Works
By default, Fluentd increases the wait interval exponentially for each retry attempt. For example, assuming that the initial wait interval is set to 1 second and the exponential factor is 2, each attempt occurs at the following time points:
Note that, in practice, Fluentd tweaks this algorithm in a few aspects:
Wait intervals are randomized by default. That is, Fluentd
diversifies the wait interval by multiplying by a randomly-chosen
number between 0.875 and 1.125. You can turn off this behavior by
setting
retry_randomize
tofalse
.Wait intervals can be capped to a certain limit. For example,
if you set
retry_max_interval
to 5 seconds in the example above,the 4th retry will wait for 5 seconds, instead of 8 seconds.
If you want to disable the exponential backoff, set the retry_type
option to periodic
.
Handling Successive Failures
Fluentd will abort the attempt to transfer the failing chunks on the following conditions:
The number of retries exceeds
retry_max_times
(default:none
)The seconds elapsed since the first retry exceeds
retry_timeout
(default:
72h
)
In these events, all chunks in the queue are discarded. If you want to avoid this, you can enable retry_forever
to make Fluentd retry indefinitely.
Handling Unrecoverable Errors
Not all errors are recoverable in nature. For example, if the content of a chunk file gets corrupted, you obviously cannot fix anything just by redoing the write operation. Rather, a blind retry attempt will just make the situation worse.
Since v1.2.0, Fluentd can detect these non-recoverable failures. If these kinds of fatal errors occur, Fluentd will abort the chunk immediately and move it into secondary
or the backup directory. The exact location of the backup directory is determined by the parameter root_dir
in <system>
:
If you do not need to back up chunks, you can enable disable_chunk_backup
(available since v1.2.6) in the <buffer>
section.
The following is the current list of exceptions considered unrecoverable:
Here are the patterns when an unrecoverable error happens:
If the plugin does not have a
secondary
, the chunk is moved to the backupdirectory.
If the plugin has a
secondary
which is of different type from primary,the chunk is moved to
secondary
.If the unrecoverable error happens inside
secondary
, the chunk ismoved to the backup directory.
Detecting chunk file corruption when Fluentd starts up
When starting up, Fluentd loads all remaining chunk files.
Some chunk files are possibly corrupted after Fluentd stopped abnormally, such as due to a power failure. Since v1.16.0, those corrupted files are considered unrecoverable too and are moved to the backup directory at starting up of Fluentd. (Before v1.16.0, those files are just deleted.)
Note that depending on how corrupt the file is, it may not be detected. In such cases, some corrupted data will flow to subsequent processes and cause unexpected errors.
Since v1.16.0, in order to narrow down the range of data that possibly be corrupted, if corruption is detected in even one of the files, information on other files remaining at starting up is also output to the log.
If data corruption occurs due to an abnormal termination, please take the necessary recovery process based on these information.
Configuration Example
Following is a complete configuration that covers all the parameters controlling the retry behaviors:
Normally, you do not need to specify every option as in this example, because these options are, in fact, optional. As for the detail of each option, please read this article.
Parameters
FAQ
Buffer's chunk size and output's payload size are sometimes different, why?
Because the format of buffer chunk is different from output's payload. Let's use elasticsearch output plugin, out_elasticsearch
, for the detailed explanation.
out_elasticsearch
uses MessagePack for buffer's serialization (NOTE that this depends on the plugin). On the other hand, Elasticsearch's Bulk API requires JSON-based payload. It means that one MessagePack-ed record is converted into 2 JSON lines. So, the payload size is larger than the buffer's chunk size.
This sometimes causes a problem when the output destination has a payload size limitation. If you have a problem with the payload size issue, check chunk size configuration, and API spec.
List of Buffer Plugins
If this article is incorrect or outdated, or omits critical information, please let us know. Fluentd is an open-source project under Cloud Native Computing Foundation (CNCF). All components are available under the Apache 2 License.
Last updated