Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
them inexpensively -- possibly sorting them into categories, possibly drawing circles around faces, cars, or whatever interests you. It's an excellent way to classify a few thousand data points at a cost of a few cents each. Even a rela- tively large job only costs a few hundred dollars. While I haven't stressed traditional statistics, building statistical models plays an important role in any data analysis. According to Mike Driscoll (@data- spora), statistics is the "grammar of data science." It is crucial to "making data speak coherently." We've all heard the joke that eating pickles causes death, because everyone who dies has eaten pickles. That joke doesn't work if you understand what correlation means. More to the point, it's easy to notice that one advertisement for R in a Nutshell generated 2 percent more conversions than another. But it takes statistics to know whether this difference is signifi- cant, or just a random fluctuation. Data science isn't just about the existence of data, or making guesses about what that data might mean; it's about testing hypotheses and making sure that the conclusions you're drawing from the data are valid. Statistics plays a role in everything from traditional business intelli- gence (BI) to understanding how Google's ad auctions work. Statistics has become a basic skill. It isn't superseded by newer techniques from machine learning and other disciplines; it complements them. While there are many commercial statistical packages, the open source R lan- guage -- and its comprehensive package library, CRAN -- is an essential tool. Although R is an odd and quirky language, particularly to someone with a