Versions | v0.12 (td-agent2) | v0.10 (td-agent1)

syslog Input Plugin

The in_syslog Input plugin enables Fluentd to retrieve records via the syslog protocol on UDP.

Table of Contents

Example Configuration

in_syslog is included in Fluentd’s core. No additional installation process is required.

<source>
  @type syslog
  port 5140
  bind 0.0.0.0
  tag system
</source>
Please see the Config File article for the basic structure and syntax of the configuration file.

Example Usage

The retrieved data is organized as follows. Fluentd’s tag is generated by the tag parameter (tag prefix), facility level, and priority. The record is parsed by the regexp here.

tag = "#{@tag}.#{facility}.#{priority}"

record = {
  "pri": "0",
  "time": 1353436518,
  "host": "host",
  "ident": "ident",
  "pid": "12345",
  "message": "text"
}

Parameters

type (required)

The value must be syslog.

port

The port to listen to. Default Value = 5140

bind

The bind address to listen to. Default Value = 0.0.0.0 (all addresses)

protocol_type

The transport protocol used to receive logs. “udp” and “tcp” are supported. “udp” by default.

tag (required)

The prefix of the tag. The tag itself is generated by the tag prefix, facility level, and priority.

format

The format of the log. This option is used to parse non-standard syslog formats using a regexp.

<source>
  @type syslog
  tag system
  format FORMAT_PARAMETER
</source>
Your `format` regexp should not consider the 'priority' prefix of the log. For example, if in_syslog receives the log below:
 <1>Feb 20 00:00:00 192.168.0.1 fluentd[11111]: [error] hogehoge

then the format parser receives the following log:

 Feb 20 00:00:00 192.168.0.1 fluentd[11111]: [error] hogehoge

If the format parameter is missing, then the log data is assumed to have the canonical syslog format (see with_priority).

FORMAT_PARAMETER supports the following options:

  • regexp

The regexp for the format parameter can be specified. If the parameter value starts and ends with “/”, it is considered to be a regexp. The regexp must have at least one named capture (?<NAME>PATTERN). If the regexp has a capture named ‘time’, it is used as the time of the event. You can specify the time format using the time_format parameter.

fluentd-ui’s in_tail editor helps your regexp testing. Another way, Fluentular is a great website to test your regexp for Fluentd configuration.

You may hit Application Error at Fluentular due to heroku free plan limitation. Retry a few hours later or use fluentd-ui instead.
  • apache2

Reads apache’s log file for the following fields: host, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

format /^(?<host>[^ ]*) [^ ]* (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^ ]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
  • apache_error

Reads apache’s error log file for the following fields: time, level, pid, client and (error) message. This template is analogous to the following configuration:

format /^\[[^ ]* (?<time>[^\]]*)\] \[(?<level>[^\]]*)\](?: \[pid (?<pid>[^\]]*)\])? \[client (?<client>[^\]]*)\] (?<message>.*)$/
  • nginx

Reads Nginx’s log file for the following fields: remote, user, time, method, path, code, size, referer and agent. This template is analogous to the following configuration:

format /^(?<remote>[^ ]*) (?<host>[^ ]*) (?<user>[^ ]*) \[(?<time>[^\]]*)\] "(?<method>\S+)(?: +(?<path>[^\"]*) +\S*)?" (?<code>[^ ]*) (?<size>[^ ]*)(?: "(?<referer>[^\"]*)" "(?<agent>[^\"]*)")?$/
time_format %d/%b/%Y:%H:%M:%S %z
  • syslog

Reads syslog’s output file (e.g. /var/log/syslog) for the following fields: time, host, ident, and message. This template is analogous to the following configuration:

format /^(?<time>[^ ]*\s*[^ ]* [^ ]*) (?<host>[^ ]*) (?<ident>[a-zA-Z0-9_\/\.\-]*)(?:\[(?<pid>[0-9]+)\])?(?:[^\:]*\:)? *(?<message>.*)$/
time_format %b %d %H:%M:%S
  • tsv or csv

If you use tsv or csv format, please also specify the keys parameter.

format tsv
keys key1,key2,key3
time_key key2

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

format csv
keys key1,key2,key3
time_key key3
keys parameter is ',' separated value and you should not add spaces between `,`. keys parameter keeps any characters. For example, "key1,key2" are parsed as ["key1", "key2"] but "key1, key2" are parsed as ["key1", " key2"]. We will support json array based parameter to avoid this problem.
  • ltsv

ltsv (Labeled Tab-Separated Value) is a tab-delimited key-value pair format. You can learn more about it on its webpage.

format ltsv
delimiter =               # Optional. ':' is used by default
time_key time_field_name

If you specify the time_key parameter, it will be used to identify the timestamp of the record. The timestamp when Fluentd reads the record is used by default.

  • json

One JSON map, per line. This is the most straight forward format :).

format json

The time_key parameter can also be specified. The default is ‘time’ and if there is no time field in the record, json parser uses current time as an event time.

format json
time_key key3

Without time_format, json parser assumes time field is a second integer.

  • none

You can use the none format to defer parsing/structuring the data. This will parse the line as-is with the key name “message”. For example, if you had a line

hello world. I am a line of log!

It will be parsed as

{"message":"hello world. I am a line of log!"}

The key field is “message” by default, but you can specify a different value using the message_key parameter as shown below:

format none
message_key my_message
  • multiline

Read multiline log with formatN and format_firstline parameters. format_firstline is for detecting start line of multiline log. formatN, N’s range is 1..20, is the list of Regexp format for multiline log. Here is Rails log example:

format multiline
format_firstline /^Started/
format1 /Started (?<method>[^ ]+) "(?<path>[^"]+)" for (?<host>[^ ]+) at (?<time>[^ ]+ [^ ]+ [^ ]+)\n/
format2 /Processing by (?<controller>[^\u0023]+)\u0023(?<controller_method>[^ ]+) as (?<format>[^ ]+?)\n/
format3 /(  Parameters: (?<parameters>[^ ]+)\n)?/
format4 /  Rendered (?<template>[^ ]+) within (?<layout>.+) \([\d\.]+ms\)\n/
format5 /Completed (?<code>[^ ]+) [^ ]+ in (?<runtime>[\d\.]+)ms \(Views: (?<view_runtime>[\d\.]+)ms \| ActiveRecord: (?<ar_runtime>[\d\.]+)ms\)/

If you have a multiline log

Started GET "/users/123/" for 127.0.0.1 at 2013-06-14 12:00:11 +0900
Processing by UsersController#show as HTML
  Parameters: {"user_id"=>"123"}
  Rendered users/show.html.erb within layouts/application (0.3ms)
Completed 200 OK in 4ms (Views: 3.2ms | ActiveRecord: 0.0ms)

It will be parsed as

{"method":"GET","path":"/users/123/","host":"127.0.0.1","controller":"UsersController","controller_method":"show","format":"HTML","parameters":"{ \"user_id\" = >\"123\"}", ...}

One more example, you can parse Java like stacktrace logs with multiline. Here is a configuration example.

format multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}) \[(?<thread>.*)\] (?<level>[^\s]+)(?<message>.*)/

If you have a following log:

2013-3-03 14:27:33 [main] INFO  Main - Start
2013-3-03 14:27:33 [main] ERROR Main - Exception
javax.management.RuntimeErrorException: null
    at Main.main(Main.java:16) ~[bin/:na]
2013-3-03 14:27:33 [main] INFO  Main - End

It will be parsed as:

2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":"  Main - Start"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"ERROR","message":" Main - Exception\njavax.management.RuntimeErrorException: null\n    at Main.main(Main.java:16) ~[bin/:na]"}
2013-03-03 14:27:33 +0900 zimbra.mailbox: {"thread":"main","level":"INFO","message":"  Main - End"}
With `format_firstline`, in_tail delays record emit until next `format_firstline` matched because in_tail can't judge multiline logs are ended or not without `format_firstline` trigger. If your regexps represent log pattern correctly like above Rails example, you may remove `format_firstline` for emitting records immediately.
`multiline` works with only in_tail plugin.

keep_time_key

Parser removes time field from event record by default. If you want to keep time field in record, set true to keep_time_key. Default is false.

with_priority

This option matters only when format is absent. If with_priority is true, then syslog messages are assumed to be prefixed with a priority tag like “<3>”. This option exists since some syslog daemons output logs without the priority tag preceding the message body.

If you wish to parse syslog messages of arbitrary formats, in_tcp or in_udp are recommended.

include_source_host

If true, add source host to event record. The default is false.

source_host_key

Change key of source host when include_source_host is true. The default is source_host

priority_key

The field name of the priority. If set the value, the priority will be set to its key. The default is nil (no adding priority).

facility_key

The field name of the facility. If set the value, the facility will be set to its key. The default is nil (no adding facility).

types (optional)

Although every parsed field has type string by default, you can specify other types. This is useful when filtering particular fields numerically or storing data with sensible type information.

The syntax is

types <field_name_1>:<type_name_1>,<field_name_2>:<type_name_2>,...

e.g.,

types user_id:integer,paid:bool,paid_usd_amount:float

As demonstrated above, “,” is used to delimit field-type pairs while “:” is used to separate a field name with its intended type.

Unspecified fields are parsed at the default string type.

The list of supported types are shown below:

  • string
  • bool
  • integer (“int” would NOT work!)
  • float
  • time
  • array

For the time and array types, there is an optional third field after the type name. For the “time” type, you can specify a time format like you would in time_format.

For the “array” type, the third field specifies the delimiter (the default is “,”). For example, if a field called “item_ids” contains the value “3,4,5”, types item_ids:array parses it as [“3”, “4”, “5”]. Alternatively, if the value is “Adam|Alice|Bob”, types item_ids:array:| parses it as [“Adam”, “Alice”, “Bob”].

json and none parsers don’t support types parameter.

log_level option

The log_level option allows the user to set different levels of logging for each plugin. The supported log levels are: fatal, error, warn, info, debug, and trace.

Please see the logging article for further details.

TCP protocol and message delimiter

This plugin assumes \n for delimiter character between syslog messages in one TCP connection. If you use syslog library in your application with protocol_type tcp, add \n to your syslog message.
See also rfc6587.

Last updated: 2016-12-08 03:15:51 UTC

Versions | v0.12 (td-agent2) | v0.10 (td-agent1)

If this article is incorrect or outdated, or omits critical information, please let us know. Fluentd is a open source project under Cloud Native Computing Foundation (CNCF), originally invented by Treasure Data, Inc. All components are available under the Apache 2 License.

Interested in the Fluentd Newsletters?