Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

NLP: A Pareto-Like Introduction > Syntax and Semantics

Syntax and Semantics

You may recall from Chapter 7 that perhaps the most fundamental weaknesses of TF-IDF and cosine similarity are that these models inherently don’t require a deep semantic understanding of the data. Quite the contrary, the examples in that chapter were able to take advantage of very basic syntax that separated tokens by whitespace to break an otherwise opaque document into a bag of tokens and use frequency and simple statistical similarity metrics to determine which tokens were likely to be important in the data. Although you can do some really amazing things with these techniques, they don’t really give you any notion of what any given token means in the context in which it appears in the document. Look no further than a sentence containing a homograph[51] such as “fish” or “bear” as a case in point; either one could be a noun or a verb.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial