Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

5. Join Patterns > A Refresher on Joins

A Refresher on Joins

If you come from a strong SQL background, you can probably skip this section, but for those of us that started with Hadoop, joins may be a bit of a foreign concept.

Joins are possibly one of the most complex operations one can execute in MapReduce. By design, MapReduce is very good at processing large data sets by looking at every record or group in isolation, so joining two very large data sets together does not fit into the paradigm gracefully. Before we dive into the patterns themselves, let’s go over what we mean when we say join and the different types of joins that exist.

A join is an operation that combines records from two or more data sets based on a field or set of fields, known as the foreign key. The foreign key is the field in a relational table that matches the column of another table, and is used as a means to cross-reference between tables. Examples are the simplest way to go about explaining joins, so let’s dive right in. To simplify explanations of the join types, two data sets will be used, A and B, with the foreign key defined as f. As the different types of joins are described, keep the two tables A (Table 5-1) and B (Table 5-2) in mind, as they will be used in the upcoming descriptions.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint