Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

7. Input and Output Patterns > Customizing Input and Output in Hadoop

Customizing Input and Output in Hadoop

Hadoop allows you to modify the way data is loaded on disk in two major ways: configuring how contiguous chunks of input are generated from blocks in HDFS (or maybe more exotic sources), and configuring how records appear in the map phase. The two classes you’ll be playing with to do this are RecordReader and InputFormat. These work with the Hadoop MapReduce framework in a very similar way to how mappers and reducers are plugged in.

Hadoop also allows you to modify the way data is stored in an analogous way: with an OutputFormat and a RecordWriter.

InputFormat

Hadoop relies on the input format of the job to do three things:

  1. Validate the input configuration for the job (i.e., checking that the data is there).


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint