Fluentd
1.0
1.0
  • Introduction
  • Overview
    • Life of a Fluentd event
    • Support
    • FAQ
    • Logo
    • fluent-package v5 vs td-agent v4
  • Installation
    • Before Installation
    • Install fluent-package
      • RPM Package (Red Hat Linux)
      • DEB Package (Debian/Ubuntu)
      • .dmg Package (macOS)
      • .msi Installer (Windows)
    • Install calyptia-fluentd
      • RPM Package (Red Hat Linux)
      • DEB Package (Debian/Ubuntu)
      • .dmg Package (macOS)
      • .msi Installer (Windows)
    • Install by Ruby Gem
    • Install from Source
    • Post Installation Guide
    • Obsolete Installation
      • Treasure Agent v4 (EOL) Installation
        • Install by RPM Package v4 (Red Hat Linux)
        • Install by DEB Package v4 (Debian/Ubuntu)
        • Install by .dmg Package v4 (macOS)
        • Install by .msi Installer v4 (Windows)
      • Treasure Agent v3 (EOL) Installation
        • Install by RPM Package v3 (Red Hat Linux)
        • Install by DEB Package v3 (Debian/Ubuntu)
        • Install by .dmg Package v3 (macOS)
        • Install by .msi Installer v3 (Windows)
  • Configuration
    • Config File Syntax
    • Config File Syntax (YAML)
    • Routing Examples
    • Config: Common Parameters
    • Config: Parse Section
    • Config: Buffer Section
    • Config: Format Section
    • Config: Extract Section
    • Config: Inject Section
    • Config: Transport Section
    • Config: Storage Section
    • Config: Service Discovery Section
  • Deployment
    • System Configuration
    • Logging
    • Signals
    • RPC
    • High Availability Config
    • Performance Tuning
    • Multi Process Workers
    • Failure Scenarios
    • Plugin Management
    • Trouble Shooting
    • Fluentd UI
    • Linux Capability
    • Command Line Option
    • Source Only Mode
    • Zero-downtime restart
  • Container Deployment
    • Docker Image
    • Docker Logging Driver
    • Docker Compose
    • Kubernetes
  • Monitoring Fluentd
    • Overview
    • Monitoring by Prometheus
    • Monitoring by REST API
  • Input Plugins
    • tail
    • forward
    • udp
    • tcp
    • unix
    • http
    • syslog
    • exec
    • sample
    • monitor_agent
    • windows_eventlog
  • Output Plugins
    • file
    • forward
    • http
    • exec
    • exec_filter
    • secondary_file
    • copy
    • relabel
    • roundrobin
    • stdout
    • null
    • s3
    • kafka
    • elasticsearch
    • opensearch
    • mongo
    • mongo_replset
    • rewrite_tag_filter
    • webhdfs
    • buffer
  • Filter Plugins
    • record_transformer
    • grep
    • parser
    • geoip
    • stdout
  • Parser Plugins
    • regexp
    • apache2
    • apache_error
    • nginx
    • syslog
    • ltsv
    • csv
    • tsv
    • json
    • msgpack
    • multiline
    • none
  • Formatter Plugins
    • out_file
    • json
    • ltsv
    • csv
    • msgpack
    • hash
    • single_value
    • stdout
    • tsv
  • Buffer Plugins
    • memory
    • file
    • file_single
  • Storage Plugins
    • local
  • Service Discovery Plugins
    • static
    • file
    • srv
  • Metrics Plugins
    • local
  • How-to Guides
    • Stream Analytics with Materialize
    • Send Apache Logs to S3
    • Send Apache Logs to Minio
    • Send Apache Logs to Mongodb
    • Send Syslog Data to Graylog
    • Send Syslog Data to InfluxDB
    • Send Syslog Data to Sematext
    • Data Analytics with Treasure Data
    • Data Collection with Hadoop (HDFS)
    • Simple Stream Processing with Fluentd
    • Stream Processing with Norikra
    • Stream Processing with Kinesis
    • Free Alternative To Splunk
    • Email Alerting like Splunk
    • How to Parse Syslog Messages
    • Cloud Data Logging with Raspberry Pi
  • Language Bindings
    • Java
    • Ruby
    • Python
    • Perl
    • PHP
    • Nodejs
    • Scala
  • Plugin Development
    • How to Write Input Plugin
    • How to Write Base Plugin
    • How to Write Buffer Plugin
    • How to Write Filter Plugin
    • How to Write Formatter Plugin
    • How to Write Output Plugin
    • How to Write Parser Plugin
    • How to Write Storage Plugin
    • How to Write Service Discovery Plugin
    • How to Write Tests for Plugin
    • Configuration Parameter Types
    • Upgrade Plugin from v0.12
  • Plugin Helper API
    • Plugin Helper: Child Process
    • Plugin Helper: Compat Parameters
    • Plugin Helper: Event Emitter
    • Plugin Helper: Event Loop
    • Plugin Helper: Extract
    • Plugin Helper: Formatter
    • Plugin Helper: Inject
    • Plugin Helper: Parser
    • Plugin Helper: Record Accessor
    • Plugin Helper: Server
    • Plugin Helper: Socket
    • Plugin Helper: Storage
    • Plugin Helper: Thread
    • Plugin Helper: Timer
    • Plugin Helper: Http Server
    • Plugin Helper: Service Discovery
  • Troubleshooting Guide
  • Appendix
    • Update from v0.12 to v1
    • td-agent v2 vs v3 vs v4
Powered by GitBook
On this page
  • Install
  • HDFS Configuration
  • Example Configuration
  • Plugin Helpers
  • Parameters
  • @type (required)
  • host (required)
  • port (required)
  • path (required)
  • Output Parameters (and overwritten values by out_webhdfs)
  • timekey
  • timekey_wait
  • queue_limit_length, chunk_limit_size
  • flush_interval
  • flush_at_shutdown
  • retry_wait, retry_max_interval
  • retry_max_times, retry_forever
  • flush_thread_count
  • Common Output / Buffer parameters
  • Further Reading

Was this helpful?

  1. Output Plugins

webhdfs

Previousrewrite_tag_filterNextbuffer

Last updated 3 years ago

Was this helpful?

The out_webhdfs Output plugin writes records into HDFS (Hadoop Distributed File System). By default, it creates files on an hourly basis. This means that when you first import records using the plugin, no file is created immediately.

The file will be created when the timekey condition has been met. To change the output frequency, please modify the timekey value.

This document does not describe all the parameters. For more details, see the Further Reading section.

Install

out_webhdfs is included in td-agent by default (v1.1.10 or later). Fluentd gem users will have to install the fluent-plugin-webhdfs gem using the following command:

$ fluent-gem install fluent-plugin-webhdfs

HDFS Configuration

Append operations are not enabled by default on CDH. Please put these configurations into your hdfs-site.xml file and restart the whole cluster:

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>

<property>
  <name>dfs.support.append</name>
  <value>true</value>
</property>

<property>
  <name>dfs.support.broken.append</name>
  <value>true</value>
</property>

Example Configuration

<match access.**>
  @type webhdfs
  host namenode.your.cluster.local
  port 50070
  path "/path/on/hdfs/access.log.%Y%m%d_%H.#{Socket.gethostname}.log"
  <buffer>
    flush_interval 10s
  </buffer>
</match>

Plugin Helpers

Parameters

@type (required)

The value must be webhfds.

host (required)

The node hostname.

port (required)

The node port number.

path (required)

The path on HDFS. Please include "#{Socket.gethostname}" in your path to avoid writing into the same HDFS file from multiple Fluentd instances. This conflict could result in data loss.

Path value can contain time placeholders. The following characters are replaced with actual values when the file is created:

  • %Y: year including the century (at least 4 digits)

  • %m: month of the year (01..12)

  • %d: Day of the month (01..31)

  • %H: Hour of the day, 24-hour clock (00..23)

  • %M: Minute of the hour (00..59)

  • %S: Second of the minute (00..60)

Although it is possible to contain time placeholder with path configuration, it is recommended to specify the format using <format> section.

Output Parameters (and overwritten values by out_webhdfs)

For advanced usage, you can tune Fluentd's internal buffering mechanism with these parameters.

timekey

This plugin will flush chunks per specified time by the timekey parameter. The default value is 86400 (when the path does not contain the time placeholders), which creates one file per day.

This parameter is specified by the path configuration. For example, when path contains %H, the value is 3600 and creates one file per hour.

timekey_wait

The amount of time Fluentd will wait for old logs to arrive. This is used to account for delays in logs arriving to your Fluentd node. The default wait time is 10 minutes (10m), where Fluentd will wait until 10 minutes past the hour for any logs that occurred within the past hour.

For example, when splitting files on an hourly basis, a log recorded at 1:59 but arriving at the Fluentd node between 2:00 and 2:10 will be uploaded together with all the other logs from 1:00 to 1:59 in one single transaction, avoiding extra overhead. Larger values can be set as needed.

queue_limit_length, chunk_limit_size

flush_interval

The interval between data flushes. The default is unspecified, and the buffer chunks will be flushed at the end of the time slices. The suffixes s (seconds), m (minutes), and h (hours) can be used.

flush_at_shutdown

The boolean value to specify whether to flush buffer chunks on shutdown, or not. The default is true. Specify true if you use memory buffer type.

retry_wait, retry_max_interval

The initial and maximum intervals between write retries. The default values are 1.0 and unset (no limit). The interval doubles (with +/-12.5% randomness) every retry until retry_max_interval is reached.

Since td-agent will retry 17 times before giving up by default (see the retry_max_times parameter for details), the sleep interval can be up to approximately 131072 seconds (roughly 36 hours) in the default configurations.

retry_max_times, retry_forever

The limit on the number of retries before buffered data is discarded, and an option to disable that limit (if true, the value of retry_max_times is ignored and there is no limit). The default values are 17 and false (not disabled). If the limit is reached, buffered data is discarded and the retry interval is reset to its initial value (retry_wait).

flush_thread_count

The number of threads to flush the buffer. This option can be used to parallelize writes into the output(s) designated by the output plugin. The default is 1.

@log_level

The @log_level option allows the user to set different levels of logging for each plugin.

Supported log levels: fatal, error, warn, info, debug, trace

Common Output / Buffer parameters

For common output / buffer parameters, please check the following articles:

Further Reading

Please see the article for real-world use cases.

Please see the article for the basic structure and syntax of the configuration file. For <buffer>, please check .

The length of the chunk queue and the size of each chunk, respectively. Please see the article for the basic buffer structure. The default values are 64 and 8m, respectively. The suffixes k (KB), m (MB), and g (GB) can be used for chunk_limit_size.

Please see the article for further details.

If this article is incorrect or outdated, or omits critical information, please . is an open-source project under . All components are available under the Apache 2 License.

Fluentd + HDFS: Instant Big Data Collection
Configuration File
Buffer Section Configuration
inject
formatter
compat_parameters
Common Parameters
Buffer Plugin Overview
logging
Output Plugin Overview
Buffer Section Configuration
fluent-plugin-webhdfs
Slides: Fluentd and WebHDFS
let us know
Fluentd
Cloud Native Computing Foundation (CNCF)