Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
As noted in Chapter 12, for many types of analysis, using a system like Hive to handle relational operations can dramatically ease the development of the analytic pipeline. Especially for data originally from a relational data source, using Hive makes a lot of sense. Hive and Sqoop together form a powerful toolchain for performing analysis.
Suppose we had another log of data in our system, coming from a web-based widget purchasing system. This may return log files containing a widget id, a quantity, a shipping address, and an order date.
Here is a snippet from an example log of this type:
1,15,120 Any St.,Los Angeles,CA,90210,2010-08-01 3,4,120 Any St.,Los Angeles,CA,90210,2010-08-01 2,5,400 Some Pl.,Cupertino,CA,95014,2010-07-30 2,7,88 Mile Rd.,Manhattan,NY,10005,2010-07-18