Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Overview

There's been a massive amount of innovation in data tools over the last few years, thanks to a few key trends:

Learning from the web. Techniques originally developed by website developers coping with scaling issues are increasingly being applied to other domains.

CS+?=$$$. Google have proven that research techniques from computer science can be effective at solving problems and creating value in many real-world situations. That's led to increased interest in cross-pollination and investment in academic research from commercial organizations.

Cheap hardware*. Now that machines with a decent amount of processing power can be hired for just a few cents an hour, many more people can afford to do large-scale data processing. They can't afford the traditional high prices of professional data software though, so they've turned to open-source alternatives.

These trends have led to a Cambrian Explosion of new tools, which means when you're planning a new data project you have a lot to choose from. This guide aims to help you make those choices by describing each tool from the perspective of a developer looking to use them in an application. Wherever possible, this will be from my first-hand experiences, or from colleagues who have used the systems in production environments. I've made a deliberate choice to include my own opinions and impressions, so you should see this guide as a starting point for exploring the tools, not the final word. I'll do my best to explain what I like about each service but your tastes and requirements may well be quite different.

Since the goal is to help experienced engineers navigate the new data landscape, the guide only covers tools that have been created or risen to prominence in the last few years. For example, PostGres is not covered because it's been widely used for over a decade, but its Greenplum derivative is newer and less well-known, so it is included.

Subscriber Reviews

Average Rating: 3.727272727272727 out of 5 rating Based on 11 Ratings

"Good survey, quick read" - by kirkwon on 12-APR-2013
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
Covers the basics, decent for what it is and gives you pointers for more details.
Report as Inappropriate

"Good quick overview" - by farmkittie on 29-NOV-2012
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
I wanted to give a review to try to counteract the Anonymous hater who gave it one star.  This book is fine as a fast foundation.
Report as Inappropriate

"Awful book" - by Anonymous on 23-NOV-2012
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
absolutely nothing useful, you can get more information from Wikipedia! Never ever ever read this book, it's wasted money and time
Report as Inappropriate

"Nice short big data tool overview" - by Anonymous on 21-JUL-2012
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
A very quick way to learn the highlights about a number of interesting tools out there in the big data space. I learned a few new tools that seem like they are worth further investigation.
Report as Inappropriate

"Big Data Glossary" - by PWayne on 15-NOV-2011
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
I am getting involved in Hadoop, Tableau, Hive, and other big data applications. This book was just what I needed for a quick survery of Big Data alternative tools. It explains the Big Data trends, what tools are popular, and what they do for you. This is good background material.
Report as Inappropriate

Extras

The publisher has provided additional content related to this title.


Description
Content

Visit the catalog page for Big Data Glossary

  • Catalog Page

Visit the errata page for Big Data Glossary

  • Errata