Safari Books Online is a digital library providing on-demand subscription access to thousands of learning resources.
4.3. SCORE-BASED STRUCTURE LEARNING* 127 variable in the DAG has no nondescendents. (Recall that in this book we do not consider parents nondescendents.) So each variable is trivially independent of its nondescendents given its parents, and the Markov condition is satisfied. Another way to look at this is to notice that the chain rule (see Chapter 2, Section 2.2.1) says that for all values x, y, z, and w of X, Y, Z, and W P(x, y, z, w) = P(wlz, y, x)P(zly, x)P(x). So P is equal to the product of its conditional distributions in the DAG in Figure 4.7, which means, owing to Chapter 3, Theorem 3.1, P satisfies the Markov condition with that DAG. Recall that our goal with a Bayesian network is to represent a probability distribution succinctly. A complete DAG does not accomplish this because, if there are n binomial variables, the last variable in a complete DAG would require 2n-1 conditional distributions. To represent a distribution P succinctly, we need to find a sparse DAG (one containing few edges) that satisfies the Markov condition with P. The next two sections present two methods for doing this. 4.3 Score-Based Structure Learning*