Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
CHAPTER 4 Big Data Market Survey By Edd Dumbill The big data ecosystem can be confusing. The popularity of "big data" as industry buzzword has created a broad category. As Hadoop steamrolls through the industry, solutions from the business intelligence and data ware- housing fields are also attracting the big data label. To confuse matters, Ha- doop-based solutions such as Hive are at the same time evolving toward being a competitive data warehousing solution. Understanding the nature of your big data problem is a helpful first step in evaluating potential solutions. Let's remind ourselves of the definition of big data: "Big data is data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn't fit the strictures of your database architectures. To gain value from this data, you must choose an al- ternative way to process it." Big data problems vary in how heavily they weigh in on the axes of volume, velocity and variability. Predominantly structured yet large data, for example, may be most suited to an analytical database approach. This survey makes the assumption that a data warehousing solution alone is not the answer to your problems, and concentrates on analyzing the commer- cial Hadoop ecosystem. We'll focus on the solutions that incorporate storage and data processing, excluding those products which only sit above those lay- ers, such as the visualization or analytical workbench software. Getting started with Hadoop doesn't require a large investment as the software is open source, and is also available instantly through the Amazon Web Serv- ices cloud. But for production environments, support, professional services and training are often required. 23