Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 29. The CLUSTER Procedure > Getting Started: CLUSTER Procedure - Pg. 1231

Getting Started: CLUSTER Procedure ! 1231 Variables with large variances tend to have more effect on the resulting clusters than variables with small variances. If you consider all variables to be equally important, you can use the STD option in PROC CLUSTER to standardize the variables to mean 0 and standard deviation 1. However, standardization is not always appropriate. See Milligan and Cooper (1987) for a Monte Carlo study on various methods of variable standardization. You should remove outliers before using PROC CLUSTER with the STD option unless you specify the TRIM= option. The STDIZE procedure (see Chapter 81) provides additional methods for standardizing variables and imputing missing values. The ACECLUS procedure (see Chapter 22) is useful for linear transformations of the variables if any of the following conditions hold: You have no idea how the variables should be scaled. You want to detect natural clusters regardless of whether some variables have more influence than others. You want to use a clustering method designed for finding compact clusters, but you want to be able to detect elongated clusters. Agglomerative hierarchical clustering is discussed in all standard references on cluster analysis,