Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Using the del.icio.us API from Chapter 2, create a dataset of bookmarks suitable for clustering. Run hierarchical and K-means clustering on it.
Modify the blog parsing code to cluster individual entries instead of entire blogs. Do entries from the same blog cluster together? What about entries from the same date?
Try using actual (Euclidian) distance for blog clustering. How does this change the results?
Find out what Manhattan distance is. Create a function for it and see how it changes the results for the Zebo dataset.
Modify the K-means clustering function to return, along with the cluster results, the total distance between all the items and their respective centroids.