Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Help

Hadoop


1. 

Using Flume

Using Flume

By: 

Publisher: O'Reilly Media, Inc.

Publication Date: 15-DEC-2014

Insert Date: 31-JUL-2014

Slots: 1.0

Table of Contents • Start Reading

Looking to use Apache Flume to stream data to Hadoop? This complete reference guide shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components to their specific use-cases....

2. 

Rough Cuts
Hadoop Application Architectures

Hadoop Application Architectures

By: ; ; ;

Publisher: O'Reilly Media, Inc.

Publication Date: 15-APR-2015

Insert Date: 10-JUL-2014

Table of Contents • Start Reading

Get expert guidance on architecting end-to-end data management solutions with Apache Hadoop. While many sources explain how to use various components in the Hadoop ecosystem, this practical book takes you through architectural considerations necessary to tie those components together into a complete tailored application, based on your particular use case. You'll also get detailed examples of architecture used in some of the most commonly found Hadoop applications....

3. 

Hadoop For Dummies

Hadoop For Dummies

By: 

Publisher: For Dummies

Publication Date: 14-APR-2014

Insert Date: 09-MAY-2014

Slots: 1.0

Table of Contents • Start Reading

Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand For Dummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic...

4. 

Pig Design Patterns

Pig Design Patterns

By: Pradeep Pasupuleti

Publisher: Packt Publishing

Publication Date: 17-APR-2014

Insert Date: 19-APR-2014

Slots: 1.0

Table of Contents • Start Reading

Simplify Hadoop programming to create complex end-to-end Enterprise Big Data solutions with Pig Quickly understand how to use Pig to design end-to-end Big Data systems Implement a hands-on programming approach using design patterns to solve commonly occurring enterprise Big Data challenges Enhances users’ capabilities to utilize Pig and create their own design patterns wherever applicable In Detail Pig Design Patterns is a comprehensive guide that will enable readers to readily use design patterns that simplify the creation of complex data pipelines in various stages of...

5. 

Rough Cuts
Virtualizing Hadoop

Virtualizing Hadoop

By: George Trujillo; Charles Kim; Steve Jones

Publisher: VMware Press

Publication Date: 25-DEC-2014

Insert Date: 02-APR-2014

Table of Contents • Start Reading

This is the Rough Cut version of the printed book. This is the only complete foundational guide to virtualizing Hadoop and deploying it in the cloud. The authors demystify all aspects of virtualizing Hadoop at scale, empowering DBAs, BI specialists, integrators, architects, and managers to deploy quickly and achieve outstanding performance. Virtualizing Hadoop combines exceptional clarity for Hadoop newcomers with realistic examples for building deep technical skill. Drawing on their immense experience, the authors identify specific obstacles and challenges in virtualizing Hadoop, helping...

6. 

Pro Microsoft HDInsight: Hadoop on Windows

Pro Microsoft HDInsight: Hadoop on Windows

By: Debarchan Sarkar

Publisher: Apress

Publication Date: 18-FEB-2014

Insert Date: 28-FEB-2014

Slots: 1.0

Table of Contents • Start Reading

Pro Microsoft HDInsight is a complete guide to deploying and using Apache Hadoop on the Microsoft Windows Azure Platforms. The information in this book enables you to process enormous volumes of structured as well as non-structured data easily using HDInsight, which is Microsoft's own distribution of Apache Hadoop. Furthermore, the blend of Infrastructure as a Service (IaaS) and Platform as a Service (PaaS) offerings available through Windows Azure lets you take advantage of Hadoop's processing power without the worry of creating, configuring, maintaining, or managing your own...

7. 

Optimizing Hadoop for MapReduce

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Publisher: Packt Publishing

Publication Date: 21-FEB-2014

Insert Date: 25-FEB-2014

Slots: 1.0

Table of Contents • Start Reading

Learn how to configure your Hadoop cluster to run optimal MapReduce jobs Optimize your MapReduce job performance Identify your Hadoop cluster’s weaknesses Tune your MapReduce configuration In Detail MapReduce is the distribution system that the Hadoop MapReduce engine uses to distribute work around a cluster by working parallel on smaller data sets. It is useful in a wide range of applications, including distributed pattern-based searching, distributed sorting, web link-graph reversal, term-vector per host, web access log stats, inverted index construction, document...

8. 

Learning Cloudera Impala

Learning Cloudera Impala

By: Avkash Chauhan

Publisher: Packt Publishing

Publication Date: 24-DEC-2013

Insert Date: 27-DEC-2013

Slots: 1.0

Table of Contents • Start Reading

Perform interactive, real-time in-memory analytics on large amounts of data using the massive parallel processing engine Cloudera Impala Step-by-step guidance to get you started with Impala on your Hadoop cluster Manipulate your data rapidly by writing proper SQL statements Explore the concepts of Impala security, administration, and troubleshooting in detail to maintain your Impala cluster In Detail If you have always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, then Cloudera Impala is the number one choice for you. Cloudera Impala...

9. 

Programming Elastic MapReduce

Programming Elastic MapReduce

By: ;

Publisher: O'Reilly Media, Inc.

Publication Date: 27-DEC-2013

Insert Date: 13-DEC-2013

Slots: 1.0

Table of Contents • Start Reading

Although you don’t need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS)....

10. 

Cloudera Impala

Cloudera Impala

By: John Russell

Publisher: O'Reilly Media, Inc.

Publication Date: 25-NOV-2013

Insert Date: 30-NOV-2013

Slots: 0.0

Table of Contents • Start Reading

Learn about Cloudera Impala--an open source project that's opening up the Apache Hadoop software stack to a wide audience of database analysts, users, and developers. The Impala massively parallel processing (MPP) engine makes SQL queries of Hadoop data simple enough to be accessible to analysts familiar with SQL and to users of business intelligence tools--and it’s fast enough to be used for interactive exploration and experimentation....

11. 

Hadoop Cluster Deployment

Hadoop Cluster Deployment

By: Danil Zburivsky

Publisher: Packt Publishing

Publication Date: 25-NOV-2013

Insert Date: 28-NOV-2013

Slots: 1.0

Table of Contents • Start Reading

Construct a modern Hadoop data platform effortlessly and gain insights into how to manage clusters efficiently Choose the hardware and Hadoop distribution that best suits your needs Get more value out of your Hadoop cluster with Hive, Impala, and Sqoop Learn useful tips for performance optimization and security In Detail Big Data is the hottest trend in the IT industry at the moment. Companies are realizing the value of collecting, retaining, and analyzing as much data as possible. They are therefore rushing to implement the next generation of data platform, and Hadoop...

12. 

Big Data Analytics with R and Hadoop

Big Data Analytics with R and Hadoop

By: Vignesh Prajapati

Publisher: Packt Publishing

Publication Date: 25-NOV-2013

Insert Date: 28-NOV-2013

Slots: 1.0

Table of Contents • Start Reading

Set up an integrated infrastructure of R and Hadoop to turn your data analytics into big data analytics Write Hadoop MapReduce within R Learn data analytics with R and the Hadoop platform Handle HDFS data within R Understand Hadoop streaming with R Encode and enrich datasets into R In Detail Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business...

13. 

Securing Hadoop

Securing Hadoop

By: Sudheesh Narayanan

Publisher: Packt Publishing

Publication Date: 22-NOV-2013

Insert Date: 23-NOV-2013

Slots: 1.0

Table of Contents • Start Reading

Implement robust end-to-end security for your Hadoop ecosystem Master the key concepts behind Hadoop security as well as how to secure a Hadoop-based Big Data ecosystem Understand and deploy authentication, authorization, and data encryption in a Hadoop-based Big Data platform Administer the auditing and security event monitoring system In Detail Security of Big Data is one of the biggest concerns for enterprises today. How do we protect the sensitive information in a Hadoop ecosystem? How can we integrate Hadoop security with existing enterprise security systems? What...

14. 

Professional Hadoop Solutions

Professional Hadoop Solutions

By: 

Publisher: Wrox

Publication Date: 23-SEP-2013

Insert Date: 14-NOV-2013

Slots: 1.0

Table of Contents • Start Reading

The go-to guidebook for deploying Big Data solutions with Hadoop Today's enterprise architects need to understand how the Hadoop frameworks and APIs fit together, and how they can be integrated to deliver real-world solutions. This book is a practical, detailed guide to building and implementing those solutions, with code-level instruction in the popular Wrox tradition. It covers storing data with HDFS and Hbase, processing data with MapReduce, and automating data processing with Oozie. Hadoop security, running Hadoop with Amazon Web Services, best practices, and automating Hadoop...

15. 

HDInsight Essentials

HDInsight Essentials

By: Rajesh Nadipalli;

Publisher: Packt Publishing

Publication Date: 23-SEP-2013

Insert Date: 01-OCT-2013

Slots: 1.0

Table of Contents • Start Reading

Tap your unstructured Big Data and empower your business using the Hadoop distribution from Windows Architect a Hadoop solution with a modular design for data collection, distributed processing, analysis, and reporting Build a multi-node Hadoop cluster on Windows servers Establish a Big Data solution using HDInsight with open source software, and provide useful Excel reports Run Pig scripts and build simple charts using Interactive JS (Azure) In Detail We live in an era in which data is generated with every action and a lot of these are unstructured; from Twitter...

16. 

Scaling Big Data with Hadoop and Solr

Scaling Big Data with Hadoop and Solr

By: Hrishikesh Karambelkar;

Publisher: Packt Publishing

Publication Date: 26-AUG-2013

Insert Date: 28-AUG-2013

Slots: 1.0

Table of Contents • Start Reading

Learn exciting new ways to build efficient, high performance enterprise search repositories for Big Data using Hadoop and Solr Understand the different approaches of making Solr work on Big Data as well as the benefits and drawbacks Learn from interesting, real-life use cases for Big Data search along with sample code Work with the Distributed Enterprise Search without prior knowledge of Hadoop and Solr In Detail As data grows exponentially day-by-day, extracting information becomes a tedious activity in itself. Technologies like Hadoop are trying to address some of the...

17. 

Hadoop Operations and Cluster Management Cookbook

Hadoop Operations and Cluster Management Cookbook

By: Shumin Guo;

Publisher: Packt Publishing

Publication Date: 24-JUL-2013

Insert Date: 30-JUL-2013

Slots: 1.0

Table of Contents • Start Reading

Over 60 recipes showing you how to design, configure, manage, monitor, and tune a Hadoop cluster Hands-on recipes to configure a Hadoop cluster from bare metal hardware nodes Practical and in depth explanation of cluster management commands Easy-to-understand recipes for securing and monitoring a Hadoop cluster, and design considerations Recipes showing you how to tune the performance of a Hadoop cluster Learn how to build a Hadoop cluster in the cloud In Detail We are facing an avalanche of data. The unstructured data we gather can contain many insights that could...

18. 

Apache Flume: Distributed Log Collection for Hadoop

Apache Flume: Distributed Log Collection for Hadoop

By: Steve Hoffman;

Publisher: Packt Publishing

Publication Date: 16-JUL-2013

Insert Date: 17-JUL-2013

Slots: 1.0

Table of Contents • Start Reading

Stream data to Hadoop using Apache Flume Integrate Flume with your data sources Transcode your data en-route in Flume Route and separate your data using regular expression matching Configure failover paths and load-balancing to remove single points of failure Utilize Gzip Compression for files written to HDFS In Detail Apache Flume is a distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. Its main goal is to deliver data from applications to Apache Hadoop's HDFS. It has a simple and flexible...

19. 

Enterprise Data Workflows with Cascading

Enterprise Data Workflows with Cascading

By: Paco Nathan

Publisher: O'Reilly Media, Inc.

Publication Date: 25-JUL-2013

Insert Date: 17-JUL-2013

Slots: 1.0

Table of Contents • Start Reading

Despite its growing use in the enterprise, building applications for Hadoop is notoriously difficult. But there is a solution. This hands-on book introduces you to Cascading, the framework that enables you to build powerful data processing applications on Hadoop without having to spend months learning the intricacies of MapReduce....

20. 

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

Apache Hadoop™ YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop™ 2

By: Arun C. Murthy; Vinod Kumar Vavilapalli; Doug Eadline; Joseph Niemiec; Jeff Markham

Publisher: Addison-Wesley Professional

Publication Date: 24-MAR-2014

Insert Date: 02-JUL-2013

Slots: 1.0

Table of Contents • Start Reading

“This book is a critically needed resource for the newly released Apache Hadoop 2.0, highlighting YARN as the significant breakthrough that broadens Hadoop beyond the MapReduce paradigm.” —From the Foreword by Raymie Stata, CEO of Altiscale The Insider’s Guide to Building Distributed, Big Data Applications with Apache Hadoop™ YARN Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of...