# Christopher Meiklejohn

home lasp research courses publications videos code curriculum vitae

## Flume-ng configuration with an HDFS sink

18 Jun 2013

I’ve been playing around with flume-ng and its HDFS sink recently to try to understand how I can stream data into HDFS and work with it using Hadoop. The documentation for flume-ng is unfortunately lacking, so I’ve typed up some quick notes on how to configure and test the HDFS sink.

This document assumes that you have Hadoop installed and running locally, with flume-ng version 1.2.0 or above.

In this example, the name of our agent is just agent. First, let’s define a channel for agent named memory-channel of type memory.

Next, let’s configure a source for agent, called tail-source, which watches the system.log file. Let us also assign it to the memory-channel.

Now, configure two sinks: logger and HDFS. Then, we specify the path to the name node for HDFS, pointing to the output path of where we want the files stored.

Then, we configure the agent’s channels, sources and sinks.

Finally, let’s start the flume agent, logging all output to the console, and starting agent agent.

Finally, you should see output like this as data is written to the filesystem.

Success!