Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

7. Input and Output Patterns > Generating Data

Generating Data

Pattern Description

The generating data pattern is interesting because instead of loading data that comes from somewhere outside, it generates that data on the fly and in parallel.

Intent

You want to generate a lot of data from scratch.

Motivation

This pattern is different from all of the others in the book in that it doesn’t load data. With this pattern, you generate the data and store it back in the distributed file system.

Generating data isn’t common. Typically you’ll generate a bunch of the data at once then use it over and over again. However, when you do need to generate data, MapReduce is an excellent system for doing it.

The most common use case for this pattern is generating random data. Building some sort of representative data set could be useful for large scale testing for when the real data set is still too small. It can also be useful for building “toy domains” for researching a proof of concept for an analytic at scale.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint