Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint
Share this Page URL
Help

8. Blogs et al.: Natural Language Proces... > NLP: A Pareto-Like Introduction

NLP: A Pareto-Like Introduction

The opening section of this chapter is mostly an expository discussion that attempts to illustrate the difficulty of NLP and give you a good understanding of how it differs from the techniques introduced in previous chapters. The section after it, however, gets right to business with some sample code to get you on your way.

Syntax and Semantics

You may recall from Chapter 7 that perhaps the most fundamental weaknesses of TF-IDF and cosine similarity are that these models inherently don’t require a deep semantic understanding of the data. Quite the contrary, the examples in that chapter were able to take advantage of very basic syntax that separated tokens by whitespace to break an otherwise opaque document into a bag of tokens and use frequency and simple statistical similarity metrics to determine which tokens were likely to be important in the data. Although you can do some really amazing things with these techniques, they don’t really give you any notion of what any given token means in the context in which it appears in the document. Look no further than a sentence containing a homograph[51] such as “fish” or “bear” as a case in point; either one could be a noun or a verb.


  

You are currently reading a PREVIEW of this book.

                                                                                                                    

Get instant access to over $1 million worth of books and videos.

  

Start a Free 10-Day Trial


  
  • Safari Books Online
  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • PrintPrint