Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL
Help

Chapter 34. The FASTCLUS Procedure > Getting Started: FASTCLUS Procedure - Pg. 1624

1624 ! Chapter 34: The FASTCLUS Procedure seeds that are closest to each other. The seed that is replaced is the one of these two with the shortest distance to the closest of the remaining seeds when the other seed is replaced by the current observation. If the observation fails the first test for seed replacement, a second test is made. The observation replaces the nearest seed if the smallest distance from the observation to all seeds other than the nearest one is greater than the shortest distance from the nearest seed to all other seeds. If the observation fails this test, PROC FASTCLUS goes on to the next observation. You can specify the REPLACE= option to limit seed replacement. You can omit the second test for seed replacement (REPLACE=PART), causing PROC FASTCLUS to run faster, but the seeds selected might not be as widely separated as those obtained by the default method. You can also suppress seed replacement entirely by specifying REPLACE=NONE. In this case, PROC FAST- CLUS runs much faster, but you must choose a good value for the RADIUS= option in order to get good clusters. This method is similar to Hartigan's (1975, pp. 74­78) leader algorithm and the simple cluster seeking algorithm described by Tou and Gonzalez (1974, pp. 90­92). Getting Started: FASTCLUS Procedure The following example demonstrates how to use the FASTCLUS procedure to compute disjoint clusters of observations in a SAS data set. The data in this example are measurements taken on 159 freshwater fish caught from the same lake (Laengelmavesi) near Tampere in Finland. This data set is available from the Data Archive of the Journal of Statistics Education. The complete data set is displayed in Chapter 82, "The STEPDISC Procedure." The species (bream, parkki, pike, perch, roach, smelt, and whitefish), weight, three different length measurements (measured from the nose of the fish to the beginning of its tail, the notch of its tail, and the end of its tail), height, and width of each fish are tallied. The height and width are recorded as percentages of the third length variable. Suppose that you want to group empirically the fish measurements into clusters and that you want to associate the clusters with the species. You can use the FASTCLUS procedure to perform a cluster analysis. The following DATA step creates the SAS data set Fish : proc format; value specfmt 1='Bream' 2='Roach' 3='Whitefish' 4='Parkki' 5='Perch' 6='Pike' 7='Smelt'; run;