Today, I present a modified HDFS sink for Flume, purely a
prototype, with support for one-to-one file creation for each event.
This sink assumes that events will be ingested, or later
intercepted to have a header associated with the event
with the destination filename.
First, we define a new configuration variable to determine that for a
particular HDFS sink, we want one event created per event.
Now, let’s modify the HDFS Event Sink that comes standard with Flume to
use this configuration variable as a determination to perform this
alternate type of write.
Now, let’s implement the writer to create one file per event, based on
the header that’s been appended to the event object.
The code for this prototype is available on GitHub.