Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint

Exercises

  1. Varying news sources. The example in this chapter used mostly pure news sources. Try adding some top political blogs. (http://technorati.comis a good place to find blogs.) How does this affect the results? Are there features that apply strongly to political commentary? Are news stories with related commentary grouped easily?

  2. K-means clustering. Hierarchical clustering was used on the articles matrix, but what happens if you use K-means clustering? How many clusters do you need to get good separation of different stories? How does this compare to the number of features you need to use to extract all the themes?

  3. Optimizing for factorization. Can you use the optimization code that you built in Chapter 5 to factorize the matrix? Is this a lot faster or slower? How do the results compare?


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint