Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

1. Design Patterns and MapReduce > MapReduce and Hadoop Refresher

MapReduce and Hadoop Refresher

The point of this section is to provide a quick refresher on MapReduce in the Hadoop context, since the code examples in this book are written in Hadoop. Some beginners might want to refer to a more in-depth resource such as Tom White’s excellent Hadoop: The Definitive Guide or the Apache Hadoop website. These resources will help you get started in setting up a development or fully productionalized environment that will allow you to follow along the code examples in this book.

Hadoop MapReduce jobs are divided into a set of map tasks and reduce tasks that run in a distributed fashion on a cluster of computers. Each task works on the small subset of the data it has been assigned so that the load is spread across the cluster. The map tasks generally load, parse, transform, and filter data. Each reduce task is responsible for handling a subset of the map task output. Intermediate data is then copied from mapper tasks by the reducer tasks in order to group and aggregate the data. It is incredible what a wide range of problems can be solved with such a straightforward paradigm, from simple numerical aggregations to complex join operations and Cartesian products.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint