Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

6.4 Multivariate Outlier Detection Methods > 6.4.3 The MVE Method - Pg. 239

Chapter 6 Some Multivariate Methods 239 6.4.3 The MVE Method A natural way of detecting outliers in p-variate data, p 2, is to use Mahalanobis distance with the usual means and sample covariance matrix replaced by estimators that have a high breakdown point. One of the earliest such methods is based on the MVE estimators of location and scale (Rousseeuw & van Zomeren, 1990). Relevant theoretical results are reported by Lopuha a (1999). Let the column vector C, having length p, be the MVE estimate ¨ of location, and let the p-by- p matrix M be the corresponding measure of scatter. The distance of the point x i = (x i1 , . . . , x i p ) from C is given by D i = (x i - C) M -1 (x i - C). (6.16) 2 If D i > .975, p , the square root of the 0.975 quantile of a chi-square distribution with p degrees of freedom, then x i is declared an outlier. Rousseeuw and van Zomeren recommend this method when there are at least five observations per dimension, meaning that n/ p > 5. (Cook & Hawkins, 1990, illustrate that problems can arise when n/ p 5.) A criticism of this method is that it can declare too many points as being extreme (Fung, 1993).