I’ve been playing around with
flume-ng and its
HDFS sink recently to try to understand how I can stream data
HDFS and work with it using Hadoop. The documentation for
flume-ng is unfortunately lacking, so I’ve typed up some quick notes
on how to configure and test the
This document assumes that you have Hadoop installed and running
1.2.0 or above.
In this example, the name of our agent is just
agent. First, let’s
define a channel for
memory-channel of type
Next, let’s configure a source for
system.log file. Let us also assign it to the
Now, configure two sinks: logger and
HDFS. Then, we specify the path
to the name node for
HDFS, pointing to the output path of where we
want the files stored.
Then, we configure the agent’s channels, sources and sinks.
Finally, let’s start the flume agent, logging all output to the console,
and starting agent
Finally, you should see output like this as data is written to the filesystem.