Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


  • Create BookmarkCreate Bookmark
  • Create Note or TagCreate Note or Tag
  • DownloadDownload
  • PrintPrint
Share this Page URL
Help

Bigram Analysis > How the Collocation Sausage Is Made: Contingency Tables and S...

How the Collocation Sausage Is Made: Contingency Tables and Scoring Functions

Note

This section dives into some of the more technical details of how BigramCollocationFinder—the Jaccard scoring function from Example 7-9—works. If this is your first reading of the chapter or you’re not interested in these details, feel free to skip this section and come back to it later.

A common data structure that’s used to compute metrics related to bigrams is the contingency table. The purpose of a contingency table is to compactly express the frequencies associated with the various possibilities for appearance of different terms of a bigram. Take a look at the bold entries in Table 7-5, where token1 expresses the existence of token1 in the bigram, and ~token1 expresses that token1 does not exist in the bigram.


  

You are currently reading a PREVIEW of this book.

                                                                                        

Get instant access to over
$1 million worth of books and videos.

  

Start a Free Trial