Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Text Mining Fundamentals > A Whiz-Bang Introduction to TF-IDF

A Whiz-Bang Introduction to TF-IDF

Information retrieval is an extensive field with many specialties. This discussion narrows in on TF-IDF, one of the most fundamental techniques for retrieving relevant documents from a corpus. TF-IDF stands for term frequency-inverse document frequency and can be used to query a corpus by calculating normalized scores that express the relative importance of terms in the documents. Mathematically, TF-IDF is expressed as the product of the term frequency and the inverse document frequency, tf_idf = tf*idf, where the term tf represents the importance of a term in a specific document, and idf represents the importance of a term relative to the entire corpus. Multiplying these terms together produces a score that accounts for both factors and has been an integral part of every major search engine at some point in its existence. To get a more intuitive idea of how TF-IDF works, let’s walk through each of the calculations involved in computing the overall score.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial