Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

4.3 Data Normalization > 4.3 Data Normalization - Pg. 108

108 CHAPTER 4 Feature Selection Solution. Generate the data set. randn('seed',0); m=1; var=0.16; stdevi=sqrt(var); norm_dat=m+stdevi*randn(1,100); Generate the outliers. outl=[6.2 -6.4 4.2 15.0 6.8]; Add outliers at the end of the data. dat=[norm_dat';outl']; Scramble the data. rand('seed',0); % randperm() below calls rand() y=randperm(length(dat));x=dat(y); Identify outliers and their corresponding indices. times=1; % controls the tolerance threshold [outliers,Index,new_dat]=simpleOutlierRemoval(x,times); [outliers Index] The new_dat file contains the data after the outliers have been rejected. The program output should look like this: outliers 4.2 6.8 15 6.2 -6.4 index 3 49 58 60 84 where index indicates the position of each outlier in x. By changing the variable times (i.e., the tolerated threshold) a different number of outliers may be detected. Try running the program with different values for the times variable. 4.3 DATA NORMALIZATION Data normalization is a useful step often adopted, prior to designing a classifier, as a precaution when the feature values vary in different dynamic ranges. In the absence of normalization, features with large values have a stronger influence on the cost function in designing the classifier. Data normalization restricts the values of all features within predetermined ranges.