Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

5. Join Patterns > Cartesian Product

Cartesian Product

Pattern Description

The Cartesian product pattern is an effective way to pair every record of multiple inputs with every other record. This functionality comes at a cost though, as a job using this pattern can take an extremely long time to complete.

Intent

Pair up and compare every single record with every other record in a data set.

Motivation

A Cartesian product allows relationships between every pair of records possible between one or more data sets to be analyzed. Rather than pairing data sets together by a foreign key, a Cartesian product simply pairs every record of a data set with every record of all the other data sets.

With that in mind, a Cartesian product does not fit into the MapReduce paradigm very well because the operation is not intuitively splittable, cannot be parallelized very well, and thus requires a lot of computation time and a lot of network traffic. Any preprocessing of that data that can be done to improve execution time and reduce the byte count should be done to improve runtimes.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint