Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
As the most basic pattern, filtering serves as an abstract pattern for some of the other patterns. Filtering simply evaluates each record separately and decides, based on some condition, whether it should stay or go.
Filter out records that are not of interest and keep ones that are.
Consider an evaluation function f that takes a record and returns a Boolean value of true or false. If this function returns true, keep the record; otherwise, toss it out.
Your data set is large and you want to take a subset of this data to focus in on it and perhaps do follow-on analysis. The subset might be a significant portion of the data set or just a needle in the haystack. Either way, you need to use the parallelism of MapReduce to wade through all of your data and find the keepers.