Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Now that you’re refreshed on the steps of the whole MapReduce process, let’s dive into a quick and simple example. The “Word Count” program is the canonical example in MapReduce, and for good reason. It is a straightforward application of MapReduce and MapReduce can handle it extremely efficiently. Many people complain about the “Word Count” program being overused as an example, but hopefully the rest of the book makes up for that!
In this particular example, we’re going to be doing a word count
over user-submitted comments on StackOverflow. The content of the Text field will be pulled out and preprocessed a
bit, and then we’ll count up how many times we see each word. An example
record from this data set is: