Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Share this Page URL
Help

4.7 Instance-based learning > Finding Nearest Neighbors Efficiently - Pg. 132

132 CHAPTER 4 Algorithms: The Basic Methods Different attributes are often measured on different scales, so if the Euclidean distance formula were used directly, the effect of some attributes might be com- pletely dwarfed by others that had larger scales of measurement. Consequently, it is usual to normalize all attribute values to lie between 0 and 1 by calculating a i = v i - min v i max v i - min v i where v i is the actual value of attribute i, and the maximum and minimum are taken over all instances in the training set. These formulae implicitly assume numeric attributes. Here the difference between two values is just the numerical difference between them, and it is this difference that is squared and added to yield the distance function. For nominal attributes that take on values that are symbolic rather than numeric, the difference between two values that are not the same is often taken to be 1, whereas if the values are the same the difference is 0. No scaling is required in this case because only the values 0 and 1 are used. A common policy for handling missing values is as follows. For nominal attri-