Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.

Share this Page URL

Chapter 14: Mammogram Mining Using Genet... > CLASSIFICATION OF MAMMOGRAMS - Pg. 237

Mammogram Mining Using Genetic Ant-Miner Categories of Discretization Methods Generally, the discretization methods can be categorised as: supervised or unsupervised. A distinction can be made dependent on whether the method takes class information into account to find proper intervals or not. Several discretization methods, such as equal width interval binning or equal frequency binning, do not make use of class membership information during the discretization process. These methods are referred to as unsuper- vised methods. In contrast, discretization methods that use class labels for carrying out discretization are referred to as supervised methods. Previous research indicated that supervised are better than unsupervised methods (Dougherty, Kohavi & Sahami, 1995). · · Rules are generated using classification algorithms. Ten fold cross validation is done to test the efficiency of the classifier. C4.5 Algorithm C4.5 is an algorithm used for inducing Classifica- tion Models, also called Decision Trees, from data. It is an extension to the ID3 algorithm. Decision Tree In the decision tree, each node corresponds to a non-categorical attribute and each arc to a possible value of that attribute. A leaf of the tree specifies the expected value of the categorical attribute for the records described by the path from the root to that leaf. Each node should be associated the non-categorical attribute which is most informa- tive among the attributes not yet considered in the path from the root. This establishes what a "Good" decision tree is. Entropy is used to measure how informative is a node. This defines what we mean by "Good". This notion was introduced by Claude Shannon in Information Theory (Shannon, 1948, pp. 379-423 and 623-656). Entropy Based Methods It uses entropy based measures to evaluate candi- date cut-points. This means that an entropy-based method will use the class information entropy of candidate partitions to select boundaries for discretization. Class information entropy is a measure of purity and it measures the amount of information which would be needed to specify to which class an instance belongs. ID3 Algorithm The ID3 algorithm is used to build a decision tree (Quinlan 1987), given a set of non-categorical at- tributes C1, C2, ..., Cn, the categorical attribute W, and a training set T of records. C4.5 is an extension of ID3 that accounts for unavailable values, continuous attribute value ranges, prun- ing of decision trees, rule derivation, and so on. CLASSIFICATION OF MAMMOGRAMS The major steps involved in mammogram clas- sification using data mining technique are: · Initially, region of interest of 300 mam- mograms (normal and abnormal) are taken by referring to the coordinate given in the MIAS database after applying preprocess- ing techniques. GLCM is constructed over the region. Haralick fourteen features are extracted forms a database for image mining. Input Let F be the set of features, W be the class attribute and S be training set · · 237