Document version: v0.12

tail Input Plugin

The in_tail Input plugin allows Fluentd to read events from the tail of text files. Its behavior is similar to the tail -F command.

Example Configuration

in_tail is included in Fluentd’s core. No additional installation process is required.

<source>
  @type tail
  path /var/log/httpd-access.log
  pos_file /var/log/td-agent/httpd-access.log.pos
  tag apache.access
  format apache2
</source>
Please see the Config File article for the basic structure and syntax of the configuration file.

How it Works

  • When Fluentd is first configured with in_tail, it will start reading from the tail of that log, not the beginning.
  • Once the log is rotated, Fluentd starts reading the new file from the beginning. It keeps track of the current inode number.
  • If td-agent restarts, it starts reading from the last position td-agent read before the restart. This position is recorded in the position file specified by the pos_file parameter.

Parameters

type (required)

The value must be tail.

tag (required)

The tag of the event.

* can be used as a placeholder that expands to the actual file path, replacing ‘/’ with ‘.’. For example, if you have the following configuration

path /path/to/file
tag foo.*

in_tail emits the parsed events with the ‘foo.path.to.file’ tag.

path (required)

The paths to read. Multiple paths can be specified, separated by ‘,’.

* and strftime format can be included to add/remove watch file dynamically. At interval of refresh_interval, Fluentd refreshes the list of watch file.

path /path/to/%Y/%m/%d/*

If the date is 20140401, Fluentd starts to watch the files in /path/to/2014/04/01 directory. See also read_from_head parameter.

You should not use '*' with log rotation because it may cause the log duplication. In such case, you should separate in_tail plugin configuration.

exclude_path

The paths to exclude the files from watcher list. For example, if you want to remove compressed files, you can use following pattern.

path /path/to/*
exclude_path ["/path/to/*.gz", "/path/to/*.zip"]

refresh_interval

The interval of refreshing the list of watch file. Default is 60 seconds.

read_from_head

Start to read the logs from the head of file, not bottom. The default is false.

If you want to tail all contents with * or strftime dynamic path, set this parameter to true. Instead, you should guarantee that log rotation will not occur in * directory.

When this is true, in_tail tries to read a file during start up phase. If target file is large, it takes long time and starting other plugins isn't executed until reading file is finished.

read_lines_limit

The number of reading lines at each IO. Default is 1000 lines.

If you see “Size of the emitted data exceeds buffer_chunk_limit.” log with in_tail, set smaller value.

multiline_flush_interval

The interval of flushing the buffer for multiline format. The default is disabled.

If you set multiline_flush_interval 5s, in_tail flushes buffered event after 5 seconds from last emit. This option is useful when you use format_firstline option. Since v0.12.20 or later.

pos_file (highly recommended)

This parameter is highly recommended. Fluentd will record the position it last read into this file.

pos_file /var/log/td-agent/tmp/access.log.pos

pos_file handles multiple positions in one file so no need multiple pos_file parameters per source.

Don't share pos_file between in_tail configurations. It causes unexpected behavior, e.g. corrupt pos_file content.
in_tail removes untracked file position during startup phase. It means the content of pos_file is growing until restart when you tails lots of files with dynamic path setting. I will fix this problem in the future. Check this issue.

format (required)

The format of the log. The following templates are supported:

  • regexp

The regexp for the format parameter can be specified. If the parameter value starts and ends with “/”, it is considered to be a regexp. The regexp must have at least one named capture (?<NAME>PATTERN). If the regexp has a capture named ‘time’, it is used as the time of the event. You can specify the time format using the time_format parameter.

fluentd-ui’s in_tail editor helps your regexp testing. Another way, Fluentular is a great website to test your regexp for Fluentd configuration.

You may hit Application Error at Fluentular due to heroku free plan limitation. Retry a few hours later or use fluentd-ui instead.
  • apache2

Reads apache’s log file for the following fields: host, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
  • apache_error

Reads apache’s error log file for the following fields: time, level, pid, client and (error) message. This template is analogous to the following configuration:

format /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>[^\]]*)\] (?<message>.*)$/
  • nginx

Reads Nginx’s log file for the following fields: remote, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

format /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
  • syslog

Reads syslog’s output file (e.g. /var/log/syslog) for the following fields: time, host, ident, and message. This template is analogous to the following configuration:

format /^(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
time_format %b %d %H:%M:%S
  • tsv or csv

If you use tsv or csv format, please also specify the keys parameter.

format tsv
keys key1,key2,key3
time_key key2

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

format csv
keys key1,key2,key3
time_key key3
keys parameter is ',' separated value and you should not add spaces between `,`. keys parameter keeps any characters. For example, "key1,key2" are parsed as ["key1", "key2"] but "key1, key2" are parsed as ["key1", " key2"]. We will support json array based parameter to avoid this problem.
  • ltsv

ltsv (Labeled Tab-Separated Value) is a tab-delimited key-value pair format. You can learn more about it on its webpage.

format ltsv
delimiter =               # Optional. ':' is used by default
time_key time_field_name

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

  • json

One JSON map, per line. This is the most straight forward format :).

format json

The time_key parameter can also be specified. The default is ‘time’ and if there is no time field in the record, json parser uses current time as an event time.

format json
time_key key3

Without time_format, json parser assumes time field is a second integer.

  • none

You can use the none format to defer parsing/structuring the data. This will parse the line as-is with the key name “message”. For example, if you had a line

hello world. I am a line of log!

It will be parsed as

{"message":"hello world. I am a line of log!"}

The key field is “message” by default, but you can specify a different value using the message_key parameter as shown below:

format none
message_key my_message
  • multiline

Read multiline log with formatN and format_firstline parameters. format_firstline is for detecting start line of multiline log. formatN, N’s range is 1..20, is the list of Regexp format for multiline log. Here is Rails log example:

format multiline
format_firstline /^Started/
format1 /Started (?<method>[^ ]+) "(?<path>[^"]+)" for (?<host>[^ ]+) at (?<time>[^ ]+ [^ ]+ [^ ]+)\n/
format2 /Processing by (?<controller>[^\u0023]+)\u0023(?<controller_method>[^ ]+) as (?<format>[^ ]+?)\n/
format3 /(  Parameters: (?<parameters>[^ ]+)\n)?/
format4 /  Rendered (?<template>[^ ]+) within (?<layout>.+) \([\d\.]+ms\)\n/
format5 /Completed (?<code>[^ ]+) [^ ]+ in (?<runtime>[\d\.]+)ms \(Views: (?<view_runtime>[\d\.]+)ms \| ActiveRecord: (?<ar_runtime>[\d\.]+)ms\)/

If you have a multiline log

Started GET "/users/123/" for 127.0.0.1 at 2013-06-14 12:00:11 +0900
Processing by UsersController#show as HTML
  Parameters: {"user_id"=>"123"}
  Rendered users/show.html.erb within layouts/application (0.3ms)
Completed 200 OK in 4ms (Views: 3.2ms | ActiveRecord: 0.0ms)

It will be parsed as

{"method":"GET","path":"/users/123/","host":"127.0.0.1","controller":"UsersController","controller_method":"show","format":"HTML","parameters":"{ \"user_id\" = >\"123\"}", ...}

One more example, you can parse Java like stacktrace logs with multiline. Here is a configuration example.

format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/

If you have a following log:

2013-3-03 14:27:33 [main] INFO  Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
    at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO  Main - End

It will be parsed as:

2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":"  Main - Start"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"ERROR","message":" Main - Exception\njavax.management.RuntimeErrorException: null\n    at Main.main(Main.java:16) ~[bin/:na]"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":"  Main - End"}
With `format_firstline`, in_tail delays record emit until next `format_firstline` matched because in_tail can't judge multiline logs are ended or not without `format_firstline` trigger. If your regexps represent log pattern correctly like above Rails example, you may remove `format_firstline` for emitting records immediately.
`multiline` works with only in_tail plugin.

keep_time_key

Parser removes time field from event record by default. If you want to keep time field in record, set true to keep_time_key. Default is false.

types (optional)

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

types <field_name_1>:<type_name_1>,<field_name_2>:<type_name_2>,...

e.g.,

types user_id:integer,paid:bool,paid_usd_amount:float

As demonstrated above, “,” is used to delimit field-type pairs while “:” is used to separate a field name with its intended type.

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

  • string
  • bool
  • integer (“int” would NOT work!)
  • float
  • time
  • array

For the time and array types, there is an optional third field after the type name. For the “time” type, you can specify a time format like you would in time_format.

For the “array” type, the third field specifies the delimiter (the default is “,”). For example, if a field called “item_ids” contains the value “3,4,5”, types item_ids:array parses it as [“3”, “4”, “5”]. Alternatively, if the value is “Adam|Alice|Bob”, types item_ids:array:| parses it as [“Adam”, “Alice”, “Bob”].

json and none parsers don’t support types parameter.

time_format

The format of the time field. This parameter is required only if the format includes a ‘time’ capture and it cannot be parsed automatically. Please see Time#strftime for additional information.

rotate_wait

in_tail actually does a bit more than tail -F itself. When rotating a file, some data may still need to be written to the old file as opposed to the new one.

in_tail takes care of this by keeping a reference to the old file (even after it has been rotated) for some time before transitioning completely to the new file. This helps prevent data designated for the old file from getting lost. By default, this time interval is 5 seconds.

The rotate_wait parameter accepts a single integer representing the number of seconds you want this time interval to be.

enable_watch_timer

Enable the additional watch timer. Setting this parameter to false will significantly reduce CPU and I/O consumption when tailing a large number of files on systems with inotify support. The default is true which results in an additional 1 second timer being used.

in_tail (via Cool.io) uses inotify on systems which support it. Earlier versions of libev on some platforms (eg Mac OS X) did not work properly; therefore, an explicit 1 second timer was used. Even on systems with inotify support, this results in additional I/O each second, for every file being tailed.

Early testing demonstrates that modern Cool.io and in_tail work properly without the additional watch timer. At some point in the future, depending on feedback and testing, the additional watch timer may be disabled by default.

log_level option

The log_level option allows the user to set different levels of logging for each plugin. The supported log levels are: fatal, error, warn, info, debug, and trace.

Please see the logging article for further details.

Last updated: 2016-05-12 20:15:28 UTC

Available versions | v0.10 | v0.12 |

If this article is incorrect or outdated, or omits critical information, please let us know.

Interested in the Fluentd Newsletters?