11/24/2003 CSE 571 Notes Ryan Weddle [1] Last Time... Learning Structural Causal Models... The same techniques used for learning structural causal models can also be used to learn the structure of Bayes Networks. [2] Structure In causal models, people initially believed you needed time series data to learn causality, but Pearl contends that certain information about causal structure can be learned from steady state data. As a guideline, we are interested in learning structures based upon some notion of minimality... [2.1] Causal Structure We define a causal structure as a directed acyclic graph. A causal model then, is a causal structure paired with a set of parameters. The parameters are functions defining the values of nodes in the graph in terms of the values of their parents. Causal Model = < D, ThetaD>, where D = directed acyclic graph ThetaD = parameters over graph [2.2] Inferred Causation A variable X is said to have a causal influence on a variable Y if a directed path from X to Y exists in every MINIMAL structure consistent with the data. [2.3] Observational Equivalence Two DAGs are observationally equivalent if and only if they have the same skeletons and the same sets of v-structures, that is, two converging arrows whose tails are not connected by arrows. Examples: --------- O O O O \ / ----- ^ ^ These two structures are congruent v v \ / in the sense that they have the O ----- O same skeleton. | ^ v ----- | O O --------- v-structure not a v-structure a b a---->b (tails connected by \ / \ / an arrow) v v v v c c --------- a a | --- ^ These two structures are obersvationally v | equivalent. b --- b | ^ v --- | c c Due to observational equivalence, we are able to observe some, but not all, kinds of causality from steady state data. [2.4] Latent Structures & Consistency: A latent structure is a subset of vertices which are observable. A structure, D, is consistent with a distribution if there exist parameters, ThetaD, such that the posterior probabilities calculated from the model consisting of those pairings are the same as those calculated from the distribution. Examples: See Judea Pearl Chapter/Notes, Figure 2.1 -only figures a and b are good (consistent and minimal). [2.5] Stability Another notion important in finding good causal models is that of stability. Intuitively, this means that the structure should be consistent not only with the exact distribution observed, but also with other distributions that result if we perturb the distribution. More formally, we say: A causal model M = generates a stable distribution if and only if P() contains no extraneous independencies, or I(P()) <= I(P()) for any set of parameters Theta'D This is illustrated by the coin example: Imagine a set of fair coins A and B, and an observed variable C with the following distribution. A B C 0 0 1 If the coins A and B are fair, that is, if 0 1 0 P(A)=P(B)=-.5, then we expect each case to 1 0 0 occur with probability 0.25. 1 1 1 The following three structures, then, are all consistent with this distribution: a -> c <- b b -> a <- c a -> b <- c Imagine that the coins are not fair, with slightly differing probabilities of coming up "heads", for example, consider P(A)=0.6, P(B)=0.3. The distribution is no longer uniform, but has the following probabilities A B C P 0 0 1 0.28 0 1 0 0.12 1 0 0 0.42 1 1 1 0.18 In this case, we observe that only one of the three structures is consistent with the distribution: a -> c <- b It is then an example of a stable structure over the initial distribution, while the remaining two structures are not stable, since they are inconsistent with the perturbed distribution (The unfair coins): b -> a <- c a -> b <- c [3] An Algorithm For Building Causal Models [3.1] Constructing Graphs Consider a graph with n-nodes... As n increases, the number of possible directed acyclic graphs increases by some exponential factor. It is therefore unreasonable to find structures satisfying these constraints by means of enumeration, so we will need an algorithm. The algorithm we will consider will generate a semi-connected graph where: a->b means a causal connection between a and b and c--d has a special meaning This semi-connected graph is a summary of all MINIMAL STABLE structures that are consistent with the data. If we arbitrarily add arrows to any of the undirected links (e.g c--d) without introducing any v-structures, then a valid MINIMAL STABLE structure will result. The algorithm used to do this is called the inductive causation algorithm. [3.2] The IC (Inductive Causation) Algorithm Input: P^, a stable distribution on a set V of variables. Output: a pattern H(P^) compatible with P^. 1. For each pair of variables a,b in V, search for a set Sab such that "a is independent of b | Sab" holds in P. Construct a graph G such that a, b are connected with an edge if and only if no such set Sab can be found. 2. For each pair of non-adjacent variables a and b, with a common neighbor c, check if c belongs to Sab. If it does, then continue. If it does not, then add arrows to c: a -> c <- b 3. In the partially directed graph that results, orient as many edges as possible subj to the conditions that you: (I) do not introduce any cycles (II) do not introduce any v-structures [3.3] Rules for Orienting Edges: R1: orient b-c into b->c whenever there is an arrow a->b such that a and c are non-adjacent. R2: orient a-b into a->b whenever there is a chain a->c->b. R3: orient a-b into a->b whenever there are two chains a->c->b and a-d->b, such that c and d are non-adjacent. R4: orient a-b into a->b whenever there are two chains a-c->d and c->d->b, such that c and d are non-adjacent.