Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
This is one of the simplest classifiers to construct, but it’s a good basis for further work. It works by finding the average of all the data in each class and constructing a point that represents the center of the class. It can then classify new points by determining to which center point they are closest.
To do this, you’ll first need a function that calculates the
average point in the classes. In this case, the
classes are just 0 and 1. Add lineartrain to advancedclassify.py:
def lineartrain(rows):
averages={}
counts={}
for row in rows:
# Get the class of this point
cl=row.match
averages.setdefault(cl,[0.0]*(len(row.data)))
counts.setdefault(cl,0)
# Add this point to the averages
for i in range(len(row.data)):
averages[cl][i]+=float(row.data[i])
# Keep track of how many points in each class
counts[cl]+=1
# Divide sums by counts to get the averages
for cl,avg in averages.items( ):
for i in range(len(avg)):
avg[i]/=counts[cl]
return averages