Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Hadoop allows you to modify the way data is loaded on disk in two major
ways: configuring how contiguous chunks of input are generated from blocks
in HDFS (or maybe more exotic sources), and configuring how records appear
in the map phase. The two classes you’ll be playing with to do this are
RecordReader and InputFormat. These work with the Hadoop MapReduce framework in a very similar
way to how mappers and reducers are plugged in.
Hadoop also allows you to modify the way data is stored in an
analogous way: with an OutputFormat
and a RecordWriter.