Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
The classifiers discussed in this chapter learn how to classify a document by being trained. Many of the other algorithms in this book, such as the neural network you saw in Chapter 4, learn by reading examples of correct answers. The more examples of documents and their correct classifications it sees, the better the classifier will get at making predictions. The classifier is also specifically designed to start off very uncertain and increase in certainty as it learns which features are important for making a distinction.
The first thing you’ll need is a class to represent the
classifier. This class will encapsulate what the classifier has learned
so far. The advantage of structuring the module this way is that you can
instantiate multiple classifiers for different users, groups, or
queries, and train them differently to respond to a particular group’s
needs. Create a class called classifier in docclass.py: