Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

A Lean, Mean Data-Collecting Machine > Elementary Set Operations

Elementary Set Operations

The most common set operations you’ll likely encounter are the union, intersection, and difference operations. Recall that the difference between a set and a list is that a set is unordered and contains only unique members, while a list is ordered and may contain duplicate members. As of Version 2.6, Python provides built-in support for sets via the set data structure. Table 4-1 illustrates some examples of common set operations for a trivially small universe of discourse involving friends and followers:

Friends = {Abe, Bob}, Followers = {Bob, Carol}

Table 4-1. Sample set operations for Friends and Followers

OperationResultComment
Friends ? FollowersAbe, Bob, CarolSomeone’s overall network
Friends n FollowersBobSomeone’s mutual friends
Friends – FollowersAbePeople a person is following, but who are not following that person back
Followers – FriendsCarolPeople who are following someone but are not being followed back

As previously mentioned, Redis provides native operations for computing common set operations. A few of the most relevant ones for the upcoming work at hand include:

smembers

Returns all of the members of a set

scard

Returns the cardinality of a set (the number of members in the set)

sinter

Computes the intersection for a list of sets

sdiff

Computes the difference for a list of sets

mget

Returns a list of string values for a list of keys

mset

Stores a list of string values against a list of keys

sadd

Adds an item to a set (and creates the set if it doesn’t already exist)

keys

Returns a list of keys matching a regex-like pattern

Skimming the pydoc for Python’s built-in set data type should convince you of the close mapping between it and the Redis APIs.

We’re Gonna Analyze Like It’s 1874

Although the concepts involved in set theory are as old as time itself, it is Georg Cantor who is generally credited with inventing set theory. His paper, “On a Characteristic Property of All Real Algebraic Numbers,” written in 1874, formalized set theory as part of his work on answering questions related to the concept of infinity. For example:

  • Are there more natural numbers (zero and the positive integers) than integers (positive and negative numbers)?

  • Are there more rational numbers (numbers that can be expressed as fractions) than integers?

  • Are there more irrational numbers (numbers that cannot be expressed as fractions, such as pi, v2, etc.) than rational numbers?

The gist of Cantor’s work around infinity as it relates to the first two questions is that the cardinalities of the sets of natural numbers, integers, and rational numbers are all equal because you can map these numbers such that they form a sequence with a definite starting point that extends forever in one direction. Even though there is never an ending point, the cardinalities of these sets are said to be countably infinite because there is a definite sequence that could be followed deterministically if you simply had enough time to count them. The cardinality of a countably infinite set became known by mathematicians as ?0, an official definition of infinity. Consider the following numeric patterns that convey the basic idea behind countably infinite sets. Each pattern shows a starting point that can extend infinitely:

  • Natural numbers: 0, 1, 2, 3, 4, …

  • Positive integers: 1, 2, 3, 4, 5, …

  • Negative integers: ?1, ?2, ?3, ?4, ?5, …

  • Integers: 0, 1, ?1, 2, ?2, 3, ?3, 4, ?4, …

  • Rational numbers: 0/0, 0/1, ?1/1, ?1/0, ?1/-1, 0/?1, 1/?1, 1/0, 1/1, …

Notice that the pattern for the rational numbers is that you can start at the origin of the Cartesian plane and build out a spiral in which each x/y coordinate pair is expressed as a fraction, which is a rational number. (The two cases where it is undefined because of division by zero are of no consequence to the cardinality of the set as a whole.)

As it runs out, however, the cardinality of the set of irrational numbers is not equal to ?0, because it is impossible to arrange them in such a way that they are countable and form a one-to-one correspondence back to a set having cardinality ?0. Cantor used what became known as the famous diagonalization argument as the proof. The gist of the diagonalization proof is that just when you think you’ve mapped out a sequence that makes the irrational numbers countable, it can be shown that a whole slew of numbers are missing—and when you put these missing numbers into the sequence, there are still a slew of numbers missing. As it turns out, you can never form the one-to-one correspondence that’s necessary. Thus, the cardinality of the set of irrational numbers is not the same as the cardinalities of the sets of natural numbers, positive integers, and irrational numbers, because a one-to-one correspondence from one of these sets cannot be derived.

So what is the cardinality of the set of irrational numbers? It can be shown that the power set of the set having cardinality ?0 is the cardinality of the set of all irrational numbers. This value is known as ?1. Further, the power set of the set having cardinality ?1 is known as ?2, etc. Computing the power set of a set of infinite numbers is admittedly a difficult concept to wrap one’s head around, but it’s one well worth pondering when you’re having trouble sleeping at night.