Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
Emerging Missing Data Estimation Problems INTRODUCTION: DYNAMIC PROGRAMMING FOR MISSING DATA ESTIMATION The problem of missing data causes problems to a variety of fields from sensor readings in machine operation to risk analysis. A good number of models built to run off a specific number of inputs will breakdown when one or more inputs are not available. In many such applications, merely ignoring or deleting the incomplete record, a situation known as case deletion, is not an alternative, as it may carry a great deal of harm than good (Allison, 2002). In a statistical model, case deletion can lead to biased results and in practical applications of such as statistical models the consequences may be severe (Roth & Switzer, 1995). Many techniques, to impute missing data that are intended to minimize the bias or output error of a model, have been researched extensively. A good number of these methods are statistically based techniques. One of the most successfully used of these techniques is the Bayesian multiple imputation. Methods that use computational intelligence techniques such as neural networks like the one that was proposed by Abdella and Marwala (2006) have also revealed excellent results. Nevertheless, most of these techniques do not run at optimal manner. As a consequence, a lot of processing power and time is wasted in repeated calculations. In this chapter, dynamic programming can be viewed as a stage-wise search technique with the main features being a sequence of decisions that exploits the duplications as well as the arrangement of the data for missing data imputation. For the duration of the search for the optimal solution, early decisions solutions that can not possibly give optimal results are discarded. The fundamental concept behind this procedure is to keep away from performing the same calculations more than once and this is achieved by storing the results obtained in each sub-problem. Dynamic programming uses the concept of optimality that can be translated to optimization in stages. It follows that, for an optimal sequence of decisions, each sub-sequence must be optimal (Bellman, 1957). The estimation of missing data requires a system that possesses the knowledge of certain character- istics such as the correlations between variables, which are inherent in the input space. Computational intelligence techniques and maximum likelihood techniques do possess such characteristics and as a result are important for imputation of missing data (Nelwamondo & Marwala, 2007). The concept of dynamic programming can be a useful tool to the problem of missing data that optimizes all the sub- steps in the solution. By using the concept of optimality, to obtain the best estimate of missing data, all steps leading to the solution need to be optimized. This concept has several advantages that can improve the method proposed by (Abdella & Marwala, 2006), which shall be used as a baseline method in this paper. Therefore, missing data is estimated, in this section, by solving the following missing data estima- tion equation (Abdella & Marwala, 2006): { X } = k - { X u } { X } f k , { w } { X u } (13.5) In equation 13.5 vectors {X k } and {X u } are the known and unknown measurements, respectively, while the vector {w} is the mapping weight vector that maps the input to the output and in this chapter using a two-layered autoassociative multi-layered perceptron neural network, which is defined by the function