Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
16.1 An Example Classifier 549 the new dataset that corresponds to the attribute's value. It then reduces memory requirements by compacting the Instances objects. Returning to makeTree(), the resulting array of datasets is used for building subtrees. The method creates an array of Id3 objects, one for each attribute value, and calls makeTree() on each one by passing it the corresponding dataset. computeInfoGain() Returning to computeInfoGain(), the information gain associated with an attribute and a dataset is calculated using a straightforward implementation of the formula in Section 4.3 (page 104). First, the entropy of the dataset is computed. Then, split Data() is used to divide it into subsets, and computeEntropy() is called on each one. Finally, the difference between the former entropy and the weighted sum of the latter ones--the information gain--is returned. The method computeEntropy() uses the log2() method from weka.core.Utils to obtain the logarithm (to base 2) of a number. classifyInstance()