Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Varying assumed probabilities. Change the
classifier class so it supports
different assumed probabilities for different features. Change the
init method so that it will take
another classifier and start with a better guess than 0.5 for the
assumed probabilities.
Calculate Pr(Document). In the naïve Bayesian classifier, the calculation of Pr(Document) was skipped since it wasn’t required to compare the probabilities. In cases where the features are independent, it can actually be used to calculate the overall probability. How would you calculate Pr(Document)?
A POP-3 email filter. Python comes with a library called poplib for downloading email messages. Write a script that downloads email messages from a server and attempts to classify them. What are the different properties of an email message, and how might you build a feature-extraction function to take advantage of these?