Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

5. Regression: Predicting Page Views > Predicting Web Traffic

Predicting Web Traffic

Now that we’ve prepared you to work with regressions, this chapter’s case study will focus on using regression to predict the amount of page views for the top 1,000 websites on the Internet as of 2011. The top five rows of this data set, which was provided to us by Neil Kodner, are shown in Table 5-3.

For our purposes, we’re going to work with only a subset of the columns of this data set. We’ll focus on five columns: Rank, PageViews, UniqueVisitors, HasAdvertising, and IsEnglish.

The Rank column tells us the website’s position in the top 1,000 list. As you can see, Facebook is the number one site in this data set, and YouTube is the second. Rank is an interesting sort of measurement because it’s an ordinal value in which numbers are used not for their true values, but simply for their order. One way to realize that the values don’t matter is to realize that there’s no real answer to questions like, “What’s the 1.578th website in this list?” This sort of question would have an answer if the numbers being used were cardinal values. Another way to emphasize this distinction is to note that we could replace the ranks 1, 2, 3, and 4 with A, B, C, and D and not lose any information.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial