Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Cross-validation is the name given to a set of techniques that divide up data into training sets and test sets. The training set is given to the algorithm, along with the correct answers (in this case, prices), and becomes the set used to make predictions. The algorithm is then asked to make predictions for each item in the test set. The answers it gives are compared to the correct answers, and an overall score for how well the algorithm did is calculated.
Usually this procedure is performed several times, dividing the
data up differently each time. Typically, the test set will be a small
portion, perhaps 5 percent of the all the data, with the remaining 95
percent making up the training set. To start, create a function called
dividedata in numpredict.py, which divides up the dataset
into two smaller sets given a ratio that you specify: