Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Share this Page URL
Help

CHAPTER 17 Tutorial Exercises for the We... > 17.4 Preprocessing and parameter tun... - Pg. 574

574 CHAPTER17 Tutorial Exercises for the Weka Explorer MessingwiththeData With the Boundary Visualizer you can modify the data by adding or removing points. Exercise 17.3.12. Introduce some noise into the data and study the effect on the learning algorithms we looked at above. What kind of behavior do you observe for each algorithm as you introduce more noise? 17.4 PREPROCESSINGANDPARAMETERTUNING Now we look at some useful preprocessing techniques, which are implemented as filters, as well as a method for automatic parameter tuning. Discretization As we know, there are two types of discretization techniques: unsupervised ones, which are "class blind," and supervised ones, which take the class value of the instances into account when creating intervals. Weka's main unsupervised method for discretizing numeric attributes is weka.filters.unsupervised.attribute.Discretize. It implements these two methods: equal-width (the default) and equal-frequency discretization. Find the glass dataset glass.arff and load it into the Explorer interface. Apply the unsupervised discretization filter in the two different modes explained previously. Exercise 17.4.1. What do you observe when you compare the histograms obtained? The one for equal-frequency discretization is quite skewed for some attributes. Why? The main supervised technique for discretizing numeric attributes is weka.filters. supervised.attribute.Discretize. Locate the iris data, load it, apply the supervised discretization scheme, and look at the histograms obtained. Supervised discretization strives to create intervals within which the class distribution is consistent, although the distributions vary from one interval to the next. Exercise 17.4.2. Based on the histograms obtained, which of the discretized attributes would you consider to be most predictive? Reload the glass data and apply supervised discretization to it. Exercise 17.4.3. For some attributes there is only a single bar in the histo- gram. What does that mean? Discretized attributes are normally coded as nominal attributes, with one value per range. However, because the ranges are ordered, a discretized attribute is actually on an ordinal scale. Both filters have the ability to create binary attributes rather than multivalued ones, by setting the option makeBinary to true.