Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

3. Discovering Groups > Word Vectors

Word Vectors

The normal way of preparing data for clustering is to determine a common set of numerical attributes that can be used to compare the items. This is very similar to what was shown in Chapter 2, when critics’ rankings were compared over a common set of movies, and when the presence or absence of a bookmark was translated to a 1 or a 0 for del.icio.us users.

Pigeonholing the Bloggers

This chapter will work through a couple of example datasets. In the first dataset, the items that will be clustered are a set of 120 of the top blogs, and the data they’ll be clustered on is the number of times a particular set of words appears in each blog’s feed. A small subset of what this looks like is shown in Table 3-1.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint