Available languages | en | ja |

Amazon S3 Output Plugin

The out_s3 TimeSliced Output plugin writes records into the Amazon S3 cloud object storage service. By default, it creates files on an hourly basis. This means that when you first import records using the plugin, no file is created immediately. The file will be created when the time_slice_format condition has been met. To change the output frequency, please modify the time_slice_format value.

Table of Contents

Installation

out_s3 is included in td-agent by default. Fluentd gem users will need to install the fluent-plugin-s3 gem using the following command.

$ fluent-gem install fluent-plugin-s3

Example Configuration

<match pattern>
  type s3

  aws_key_id YOUR_AWS_KEY_ID
  aws_sec_key YOUR_AWS_SECRET/KEY
  s3_bucket YOUR_S3_BUCKET_NAME
  s3_endpoint s3-us-west-1.amazonaws.com
  path logs/
  buffer_path /var/log/fluent/s3

  time_slice_format %Y%m%d%H
  time_slice_wait 10m
  utc

  buffer_chunk_limit 256m
</match>

Please see the Store Apache Logs into Amazon S3 article for real-world use cases.

Please see the Config File article for the basic structure and syntax of the configuration file.
Please make sure that you have enough space in the buffer_path directory. Running out of disk space is a problem frequently reported by users.

Parameters

type (required)

The value must be s3.

aws_key_id (required/optional)

The AWS access key id. This parameter is required when your agent is not running on an EC2 instance with an IAM Instance Profile.

aws_sec_key (required/optional)

The AWS secret key. This parameter is required when your agent is not running on an EC2 instance with an IAM Instance Profile.

s3_bucket (required)

The Amazon S3 bucket name.

buffer_path (required)

The path prefix of the log buffer files.

s3_endpoint

The Amazon S3 endpoint name. Please select the appropriate endpoint name from the list below and confirm that your bucket has been created in the correct region.

  • s3.amazonaws.com
  • s3-us-west-1.amazonaws.com
  • s3-us-west-2.amazonaws.com
  • s3.sa-east-1.amazonaws.com
  • s3-eu-west-1.amazonaws.com
  • s3-ap-southeast-1.amazonaws.com
  • s3-ap-northeast-1.amazonaws.com

The most recent versions of the endpoints can be found here.

format

The format of the S3 object. The default is out_file.

  • out_file

Dump time, tag and json record separated by a delimiter:

time[delimiter]tag[delimiter]record\n

Actual example is:

2014-06-08T23:59:40[TAB]file.server.logs[TAB]{"field1":"value1","field2":"value2"}\n

out_file format has several options to customize the content.

delimiter SPACE   # Optional. "\t"(TAB) is used by default
output_tag false  # Optional. Remove tag[delimiter] from the content. The default is false 
output_time true  # Optional. Remove time[delimiter] from the content. The default is false

And you can set time and tag to the record by setting “include_tag_key” / “tag_key” and “include_time_key” / “time_key” options. If you set following options:

include_time_key true
time_key log_time  # default is time

then a record has log_time field.

{"field1":"value1","field2":"value2","log_time":"time string",...}
  • json

Dump json record without time and tag:

{"field1":"value1","field2":"value2"}\n

json format also supports “include_xxx” options. See out_file section. If you don’t want to omit time or tag information, please set include_xxx options.

In addition, json format supports time_as_epoche option. If this option is true, time value keeps a number, not formatted string.

  • ltsv

Dump record as LTSV:

field1[label_delimiter]value1[delimiter]field2[label_delimiter]value2\n

ltsv format supports delimiter and label_delimiter options.

format ltsv
delimiter SPACE   # Optional. "\t"(TAB) is used by default
label_delimiter = # Optional. ":" is used by default

And ltsv format also supports “include_xxx” options. See out_file section.

  • single_value

For in_tail’s none format. Use one value in json, not entire record:

value1\n

single_value format accepts add_newline option.

 :::text
 add_newline false # default is true. If your value already has "\n", please set "false"

time_slice_format

The time format used as part of the file name. The following characters are replaced with actual values when the file is created:

  • %Y: year including the century (at least 4 digits)
  • %m: month of the year (01..12)
  • %d: Day of the month (01..31)
  • %H: Hour of the day, 24-hour clock (00..23)
  • %M: Minute of the hour (00..59)
  • %S: Second of the minute (00..60)

The default format is %Y%m%d%H, which creates one file per hour.

time_slice_wait

The amount of time Fluentd will wait for old logs to arrive. This is used to account for delays in logs arriving to your Fluentd node. The default wait time is 10 minutes (‘10m’), where Fluentd will wait until 10 minutes past the hour for any logs that occured within the past hour.

For example, when splitting files on an hourly basis, a log recorded at 1:59 but arriving at the Fluentd node between 2:00 and 2:10 will be uploaded together with all the other logs from 1:00 to 1:59 in one transaction, avoiding extra overhead. Larger values can be set as needed.

time_format

The format of the time written in files. The default format is ISO-8601.

path

The path prefix of the files on S3. The default is “” (no prefix).

The actual path on S3 will be: “{path}{time_slice_format}_{sequential_number}.gz”

utc

Uses UTC for path formatting. The default format is localtime.

store_as

The compression type. The default is “gzip”, but you can also choose “lzo”, “json”, or “txt”.

proxy_uri

The proxy url. The default is nil.

use_ssl

Enable/disable SSL for data transfers between Fluentd and S3. The default is “yes”.

Buffer Parameters

For advanced usage, you can tune Fluentd’s internal buffering mechanism with these parameters.

buffer_type

The buffer type is memory by default (buf_memory). The file (buf_file) buffer type can be chosen as well. The path parameter is used as buffer_path in this plugin.

buffer_queue_limit, buffer_chunk_limit

The length of the chunk queue and the size of each chunk, respectively. Please see the Buffer Plugin Overview article for the basic buffer structure. The default values are 64 and 8m, respectively. The suffixes “k” (KB), “m” (MB), and “g” (GB) can be used for buffer_chunk_limit.

flush_interval

The interval between data flushes. The default is 60s. The suffixes “s” (seconds), “m” (minutes), and “h” (hours) can be used.

retry_wait, retry_limit and max_retry_wait

The interval between write retries, and the number of retries. The default values are 1.0 and 17, respectively. retry_wait doubles every retry (e.g. the last retry waits for 131072 sec, roughly 36 hours), and max_retry_wait may be used to limit the maximum retry interval.

num_threads

The number of threads to flush the buffer. This option can be used to parallelize writes into the output(s) designated by the output plugin. The default is 1.

log_level option (Fluentd v0.10.43 and above)

The log_level option allows the user to set different levels of logging for each plugin. The supported log levels are: fatal, error, warn, info, debug, and trace.

Please see the logging article for further details.

Further Reading

This page doesn’t describe all the possible configurations. If you want to know about other configurations, please check the link below.

Last updated: 2014-02-14 11:01:47 UTC

Available languages | en | ja |

If this article is incorrect or outdated, or omits critical information, please let us know.

Interested in the Fluentd Newsletters?