Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
526 CHAPTER14 The Command-Line Interface Suppose you want to check which Weka classifiers and filters are capable of operating incrementally. Searching for the word incremental in the index would soon lead you to the keyword UpdateableClassifier. In fact, this is a Java interface; inter- faces are listed after the classes in the overview tree. You are looking for all classes that implement this interface. Clicking any occurrence of it in the documentation brings up a page that describes the interface and lists the classifiers that implement it. To find the filters is a little trickier unless you know the keyword StreamableFilter, which is the name of the interface that streams data through a filter; again, its page lists the filters that implement it. You would stumble across that keyword if you knew any example of a filter that could operate incrementally. 14.3 COMMAND-LINEOPTIONS In the preceding example, the Ât option was used on the command line to commu- nicate the name of the training file to the learning algorithm. There are many other options that can be used with any learning scheme and also scheme-specific ones that apply only to particular schemes. If you invoke a scheme with the Âh or Âhelp option, or without any command-line options at all, it displays the applicable options: first the general options, then the scheme-specific ones. In the command-line interface, type java weka.classifiers.trees.J48 Âh You'll see a list of the options common to all learning schemes, shown in Table 14.1, followed by those that apply only to J48, shown in Table 14.2. A notable one is Âinfo, which outputs a very brief description of the scheme. We will explain the generic options and then briefly review the scheme-specific ones. GenericOptions The options in Table 14.1 determine which data is used for training and testing, how the classifier is evaluated, and what kind of statistics are displayed. For example, the ÂT option is used to provide the name of the test file when evaluating a learning scheme on an independent test set. By default, the class is the last attribute in an ARFF file, but you can declare another one to be the class using Âc followed by the position of the desired attribute--1 for the first, 2 for the second, and so on. When cross-validation is performed (the default if a test file is not provided), the data is randomly shuffled first. To repeat the cross-validation several times, each time reshuffling the data in a different way, set the random number seed with Âs (default value 1). With a large dataset you may want to reduce the number of folds for the cross-validation from the default value of 10 using Âx. If performance on the training data alone is required, Âno-cv can be used to suppress cross-validation; -v suppresses output of performance on the training data. As an alternative to cross- validation, a train-test split of the data specified with the Ât option can be performed