Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
The Tanimoto coefficient is a measure of the similarity of two sets. It is used in this book to calculate how similar two items are based on lists of properties. If you have two sets, A and B, where:
| A = [shirt, shoes, pants, socks] |
| B = [shirt, skirt, shoes] |
Then the intersection (overlapping) set, which I’ll call C, is [shirt, shoes]. The Tanimoto coefficient is shown in Figure B-4, where Na is the number of items in A, Nb is the number of items in B, and Nc is the number of items in C, the intersection. In this case the Tanimoto coefficient is 2/(4+3-2) = 2/5 = 0.4.
Here is a simple function that takes two lists and calculates the Tanimoto coefficient: