Free Trial

Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.


Share this Page URL
Help

Chapter 2: Feature Extraction Methods fo... > THEORETICAL BACKGROUND - Pg. 25

Feature Extraction Methods for Intrusion Detection Systems bedded methods from machine learning are fre- quently applied. This chapter provides an overview of various existing feature construction and feature selection methods for intrusion detection systems. A comparison between those feature selection methods is provided in the experimental part. As this chapter aims to serve a wide audience from researchers to practitioners, we first introduce the basic concepts and describe the main feature extraction methods for intrusion detection; then present the practical applications of these methods to extract features from public benchmarking data sets for intrusion detection systems. THEORETICAL BACKGROUND Feature, which is a synonym for input variable or attribute, is any representative information that is extracted from the raw data set. A special the original data. One can manually construct the features by looking at direct patterns in the data, for example, as we carry through when building signatures or rules for misuse intrusion detec- tion systems. For automatic feature construction, several approaches, such as n-grams, association rule learning and frequency episode extraction, are usually applied. These methods will be introduced in more detail in the next sections. At the feature construction stage, one should beware that any information could not be lost from the original data set. A common idea is to take into account all possible informative features. However, adding more features seems to come at a price: it increases the dimensionality of the data that is considered, thus increases the complexity of intrusion detection systems. Moreover, the irrelevant and redundant features are possibly contained in the set of features. How do we know when a feature is relevant or important? That is