Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

A Lean, Mean Data-Collecting Machine > Calculating Similarity by Computing Comm...

Calculating Similarity by Computing Common Friends and Followers

Another piece of low-hanging fruit that we can go after is computing the friends and followers that two or more Twitterers have in common. Within a given universe, these folks might be interesting for a couple of reasons. One reason is that they’re the “common thread” connecting various disparate networks; you might interpret this to be a type of similarity metric. For example, if two users were both following a large number of the same people, you might conclude that those two users had very similar interests. From there, you might start to analyze the information embedded in the tweets of the common friends to gain more insight into what those people have in common, if anything, or make other conclusions. It turns out that computing common friends and followers is just a set operation away.

Example 4-7 illustrates the use of Redis’s sinterstore function, which stores the result of a set intersection, and introduces locale.format for pretty-printing so that the output is easier to read.

Example 4-7. Finding common friends/followers for multiple Twitterers, with output that’s easier on the eyes (friends_followers__friends_followers_in_common.py)

# -*- coding: utf-8 -*-

import sys
import redis

from twitter__util import getRedisIdByScreenName

# A pretty-print function for numbers
from twitter__util import pp

r = redis.Redis()

def friendsFollowersInCommon(screen_names):
    r.sinterstore('temp$friends_in_common', 
                  [getRedisIdByScreenName(screen_name, 'friend_ids') 
                      for screen_name in screen_names]
                 )

    r.sinterstore('temp$followers_in_common',
                  [getRedisIdByScreenName(screen_name, 'follower_ids')
                      for screen_name in screen_names]
                 )

    print 'Friends in common for %s: %s' % (', '.join(screen_names),
            pp(r.scard('temp$friends_in_common')))

    print 'Followers in common for %s: %s' % (', '.join(screen_names),
            pp(r.scard('temp$followers_in_common')))

    # Clean up scratch workspace

    r.delete('temp$friends_in_common')
    r.delete('temp$followers_in_common')

if __name__ == "__main__":
    if len(sys.argv) < 3:
        print >> sys.stderr, "Please supply at least two screen names."
        sys.exit(1)

    # Note:
    # The assumption is that the screen names you are 
    # supplying have already been added to Redis.
    # See friends_followers__get_friends__refactored.py

    friendsFollowersInCommon(sys.argv[1:])

Note that although the values in the working sets are ID values, you could easily use Redis’ randomkey function to sample friends and followers, and use the getUserInfo function from Example 4-5 to resolve useful information such as screen names, most recent tweets, locations, etc.