Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

5. Natural Language Tools > 5.4. Categorization and Extraction

Categorization and Extraction

It is no exaggeration to say that the most widely used application of NLP, and also the one with the greatest research effort these days, is the fight against spam—deciding on the basis of the content whether a particular email is wanted or unwanted. Some anti-spam software, such as SpamAssassin, takes a relatively straightforward approach to the problem: an email gets points if it contains certain words or phrases, and this is combined with the use of relay blacklists and other non-textual evidence.

However, anti-spam authors are increasingly taking an altogether different, and generally more successful, approach. They take one corpus of mail that is known to be spam and one that is known to be not spam (ham), and use statistical means to identify attributes that make a particular message more spammy or more hammy. When a new message comes in, the same statistical analysis is performed to determine its likely spamminess.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint