Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Hadoop: The Definitive Guide helps you harness the power of your data. Ideal for processing large datasets, the Apache Hadoop framework is an open source implementation of the MapReduce algorithm on which Google built its empire. This comprehensive resource demonstrates how to use Hadoop to build reliable, scalable, distributed systems: programmers will find details for analyzing large datasets, and administrators will learn how to set up and run Hadoop clusters. Complete with case studies that illustrate how Hadoop solves specific problems, this book helps you:

  • Use the Hadoop Distributed File System (HDFS) for storing large datasets, and run distributed computations over those datasets using MapReduce

  • Become familiar with Hadoop's data and I/O building blocks for compression, data integrity, serialization, and persistence

  • Discover common pitfalls and advanced features for writing real-world MapReduce programs

  • Design, build, and administer a dedicated Hadoop cluster, or run Hadoop in the cloud

  • Use Pig, a high-level query language for large-scale data processing

  • Take advantage of HBase, Hadoop's database for structured and semi-structured data

  • Learn ZooKeeper, a toolkit of coordination primitives for building distributed systems

If you have lots of data -- whether it's gigabytes or petabytes -- Hadoop is the perfect solution. Hadoop: The Definitive Guide is the most thorough book available on the subject.

"Now you have the opportunity to learn about Hadoop from a master-not only of the technology, but also of common sense and plain talk." -- Doug Cutting, Hadoop Founder, Yahoo!

Subscriber Reviews

Average Rating: 4.75 out of 5 rating Based on 8 Ratings

"Good introduction into Hadoop" - by Alex Ott on 20-JUL-2011
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
Very good book, that allows to get high level overview of Hadoop, and related projects, together with description of other Hadoop-related projects - Pig, HBase, and other.

I'll recommend this book (it's better to take 2nd edition) for all developers, who want to learn about Hadoop, it's usage and programming for it

Report as Inappropriate

"Good for understanding how hadoop works" - by Adam on 14-JAN-2010
Reviewer Rating: 1 star rating2 star rating3 star rating4 star rating5 star rating
This book is fairly well set out and describes much of the hadoop file structure. The case studies at the end help to show real world examples but could have been written better. It would have been nicer to see some extra visual explanations as well.
Report as Inappropriate

Table of Contents



The publisher has provided additional content related to this title.


Visit the catalog page for Hadoop: The Definitive Guide

  • Catalog Page

Visit the errata page for Hadoop: The Definitive Guide

  • Errata

Download the supplemental electronic content for Hadoop: The Definitive Guide

  • Supplemental Content