Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

7. Input and Output Patterns

Chapter 7. Input and Output Patterns

In this chapter, we’ll be focusing on what is probably the most often overlooked way to improve the value of MapReduce: customizing input and output. You will not always want to load or store data the way Hadoop MapReduce does out of the box. Sometimes you can skip the time-consuming step of storing data in HDFS and just accept data from some original source, or feed it directly to some process that uses it after MapReduce is finished. Sometimes the basic Hadoop paradigm of file blocks and input splits doesn’t do what you need, so this is where a custom InputFormat or OutputFormat comes into play.

Three patterns in this chapter deal with input: generating data, external source input, and partition pruning. All three input patterns share an interesting property: the map phase is completely unaware that tricky things are going on before it gets its input pairs. Customizing an input format is a great way to abstract away details of the method you use to load data.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint