Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Unless you’ve been cryogenically frozen for the past couple of years, you’ve no doubt heard of Twitter—a microblogging service that can be used to broadcast short (maximum 140 characters) status updates. Whether you love it, hate it, or are indifferent, it’s undeniable that Twitter has reshaped the way people communicate on the Web. This chapter makes a modest attempt to introduce some rudimentary analytic functions that you can implement by taking advantage of the Twitter APIs to answer a number of interesting questions, such as:
How many friends/followers do I have?
Who am I following that is not following me back?
Who is following me that I am not following back?
Who are the friendliest and least friendly people in my network?
Who are my “mutual friends” (people I’m following that are also following me)?
Given all of my followers and all of their followers, what is my potential influence if I get retweeted?
Twitter’s API is constantly evolving. It is highly recommended that you follow the Twitter API account, @TwitterAPI, and check any differences between the text and actual behavior you are seeing against the official docs.
This chapter analyzes relationships among Twitterers, while the next chapter hones in on the actual content of tweets. The code we’ll develop for this chapter is relatively robust in that it takes into consideration common issues such as the infamous Twitter rate limits,[23] network I/O errors, potentially managing large volumes of data, etc. The final result is a fairly powerful command-line utility that you should be able to adapt easily for your own custom uses (http://github.com/ptwobrussell/Mining-the-Social-Web/blob/master/python_code/TwitterSocialGraphUtility.py).
Having the tools on hand to harvest and mine your own tweets is essential. However, be advised that initiatives to archive historical Twitter data in the U.S. Library of Congress may soon render the inconveniences and headaches associated with harvesting and API rate-limiting non-issues for many forms of analysis. Firms such as Infochimps are also emerging and providing a medium for acquiring various kinds of Twitter data (among other things). A query for Twitter data at Infochimps turns up everything from archives for #worldcup tweets to analyses of how smileys are used.