Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
380 C H A P T E R 11. C O L L A B O R A T I V E FILTERING 11.3 Experiments Breese et al. [1998] compared the predictive accuracy of the models we've discussed using three data sets. After describing the data sets, we explain their experimental method and results. 11.3.1 The Data Sets The following three data sets were used in the study: 1. M S W e b : This data set contains users' visits to various areas of the Microsoft corporate web site. The voting here is implicit. That is, the vote is 1 if the area is visited and 0 if it is not visited. 2. N e i l s e n : This data set contains Neilsen network television viewing data for users during a two-week period in the summer of 1996. Again the vote is implicit, being 1 if the show is watched and 0 otherwise. 3. E a c h M o v i e : This data set contains explicit votes obtained from the EachMovie collaborative filtering site maintained by Digital Equipment Research Center. The votes are from the period 1995 through 1997. Votes range in value from 0 to 5. The table that follows provides details about these data sets: Data Set MSWeb Neilsen EachMovie 11.3.2 Method ~ Users 3453 1463 4119 ~ Items 294 203 1623 Mean ~ votes per user 3.95 9.55 46.4 Breese et al. [1998] divided each data set into a training set and a test set. The training set was used as the collaborative filtering data set for the memory- based algorithms and as the learning set for the probabilistic models. After the models were learned, each user in the test set was treated as an active user. The user's votes were divided into a set of votes that were treated as observed Va and a set of votes which were to be predicted Pa. The algorithms were then used to predict the votes in Pa from the votes in Va. The following protocols were used to choose the votes in Pa: 1. All b u t 1: A single vote was predicted (in Pa), and all other votes in the test set were treated as observed (in Va). 2. G i v e n 2: Two votes were treated as observed (in Va) , and all other votes were predicted (in Pa). 3. G i v e n 5: Five votes were treated as observed (in Va) , and all other votes were predicted (in Pa).