Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Entropy is another way to see how mixed a set is. It comes from information theory, and it measures the amount of disorder in a set. Loosely defined, entropy is how surprising a randomly selected item from the set is. If the entire set were As, you would never be surprised to see an A, so the entropy would be 0. The formula is shown in Figure B-7.
This function takes a list of items and calculates the entropy:
def entropy(l):
from math import log
log2=lambda x:log(x)/log(2)
total=len(l)
counts={}
for item in l:
counts.setdefault(item,0)
counts[item]+=1
ent=0
for i in counts:
p=float(counts[i])/total
ent-=p*log2(p)
return ent
In Chapter 7, Entropy is used in decision tree modeling to determine if dividing a set reduces the amount of disorder.