Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 11. Introduction to Clustering P... > Clustering Variables - Pg. 211

Clustering Variables ! 211 DISTANCE computes various measures of distance, dissimilarity, or similarity between the observations (rows) of a SAS data set. PROC DISTANCE also provides various nonparametric and parametric methods for standardizing variables. Different variables can be standardized with different methods. performs a principal component analysis and outputs principal component scores. standardizes variables by using any of a variety of location and scale measures, including mean and standard deviation, minimum and range, median and ab- solute deviation from the median, various M-estimators and A-estimators, and some scale estimators designed specifically for cluster analysis. PRINCOMP STDIZE Massart and Kaufman (1983) is the best elementary introduction to cluster analysis. Other im- portant texts are Anderberg (1973), Sneath and Sokal (1973), Duran and Odell (1974), Hartigan (1975), Titterington, Smith, and Makov (1985), McLachlan and Basford (1988), and Kaufmann and Rousseeuw (1990). Hartigan (1975) and Spath (1980) give numerous FORTRAN programs for clustering. Any prospective user of cluster analysis should study the Monte Carlo results of Milligan (1980), Milligan and Cooper (1985), and Cooper and Milligan (1988). Important references on the statistical aspects of clustering include MacQueen (1967), Wolfe (1970), Scott and Symons (1971), Hartigan (1977, 1978, 1981, 1985), Symons (1981), Everitt (1981), Sarle (1983), Bock (1985), and Thode, Mendell, and Finch (1988). Bayesian methods have important advantages over maximum likelihood; see Binder (1978, 1981), Banfield and Raftery (1993), and Bensmail et al. (1997). For