Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
It’s not all about infrastructure. Big data practitioners consistently report that 80% of the effort involved in dealing with data is cleaning it up in the first place, as Pete Warden observes in his Big Data Glossary: “I probably spend more time turning messy source data into something usable than I do on the rest of the data analysis process combined.”
Because of the high cost of data acquisition and cleaning, it’s worth considering what you actually need to source yourself. Data marketplaces are a means of obtaining common data, and you are often able to contribute improvements back. Quality can of course be variable, but will increasingly be a benchmark on which data marketplaces compete.