Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
MapReduce systems such as Hadoop aren’t being used just for text analysis anymore. Increasing number of users are deploying MapReduce jobs that analyze data once thought to be too hard for the paradigm. New design patterns are surely to arise to deal with this to transform a solution from pushing the limits of the system to making it daily practice.
One of the most obvious trends in the nature of data is the rise of image, audio, and video analysis. This form of data is a good candidate for a distributed system using MapReduce because these files are typically very large. Retailers want to analyze their security video to detect what stores are busiest. Medical imaging analysis is becoming harder with the astronomical resolutions of the pictures. Unfortunately, as a text processing platform, some artifacts remain in MapReduce that make this type of analysis challenging. Since this is a MapReduce book, we’ll acknowledge the fact that analyzing this type of data is really hard, even on a single node with not much data, but we will not go into more detail.