11/24/2003 CSE 571 Notes
Ryan Weddle

[1] Last Time...

  Learning Structural Causal Models...

  The same techniques used for learning structural causal models can also
  be used to learn the structure of Bayes Networks.

[2] Structure

  In causal models, people initially believed you needed time series
  data to learn causality, but Pearl contends that certain information
  about causal structure can be learned from steady state data.

  As a guideline, we are interested in learning structures based upon
  some notion of minimality...

  [2.1] Causal Structure

    We define a causal structure as a directed acyclic graph.

    A causal model then, is a causal structure paired with a set of 
    parameters.  The parameters are functions defining the values of 
    nodes in the graph in terms of the values of their parents.

      Causal Model = < D, ThetaD>,  where
        D = directed acyclic graph
        ThetaD = parameters over graph

  [2.2] Inferred Causation

    A variable X is said to have a causal influence on a variable Y if a
    directed path from X to Y exists in every MINIMAL structure consistent
    with the data.

  [2.3] Observational Equivalence

    Two DAGs are observationally equivalent if and only if they have the
    same skeletons and the same sets of v-structures, that is, two
    converging arrows whose tails are not connected by arrows.

    Examples:
    ---------

       O     O               O     O
        \   /       -----     ^   ^      These two structures are congruent
         v v                   \ /       in the sense that they have the
          O         -----       O        same skeleton.
          |                     ^
          v         -----       |
          O                     O


    ---------
 
     v-structure                 not a v-structure

       a     b                         a---->b         (tails connected by
        \   /                           \   /           an arrow)
         v v                             v v
          c                               c


    ---------

       a           a 
       |    ---    ^         These two structures are obersvationally
       v           |         equivalent.
       b    ---    b
       |           ^
       v    ---    |
       c           c


    Due to observational equivalence, we are able to observe some, but
    not all, kinds of causality from steady state data.


  [2.4] Latent Structures & Consistency:

    A latent structure is a subset of vertices which are observable.

    A structure, D, is consistent with a distribution if there exist
    parameters, ThetaD, such that the posterior probabilities calculated
    from the model consisting of those pairings are the same as those
    calculated from the distribution.

    Examples:

      See Judea Pearl Chapter/Notes, Figure 2.1
        -only figures a and b are good (consistent and minimal).


  [2.5] Stability

    Another notion important in finding good causal models is that of
    stability.  Intuitively, this means that the structure should be
    consistent not only with the exact distribution observed, but also
    with other distributions that result if we perturb the distribution.

    More formally, we say:

      A causal model M = <D, ThetaD> generates a stable distribution if
      and only if P(<D, ThetaD>) contains no extraneous independencies, or

        I(P(<D,ThetaD>)) <= I(P(<D,Theta'D>))
        for any set of parameters Theta'D

    This is illustrated by the coin example:

      Imagine a set of fair coins A and B, and an observed variable C with
      the following distribution.

        A  B  C   
        0  0  1         If the coins A and B are fair, that is, if
        0  1  0         P(A)=P(B)=-.5, then we expect each case to 
        1  0  0         occur with probability 0.25.
        1  1  1

      The following three structures, then, are all consistent with
      this distribution:

             a -> c <- b
             b -> a <- c
             a -> b <- c
  
      Imagine that the coins are not fair, with slightly differing 
      probabilities of coming up "heads", for example, consider
      P(A)=0.6, P(B)=0.3.  The distribution is no longer uniform, but
      has the following probabilities

        A  B  C   P
        0  0  1   0.28
        0  1  0   0.12
        1  0  0   0.42
        1  1  1   0.18

      In this case, we observe that only one of the three structures is
      consistent with the distribution:

             a -> c <- b

      It is then an example of a stable structure over the initial
      distribution, while the remaining two structures are not stable,
      since they are inconsistent with the perturbed distribution (The
      unfair coins):

             b -> a <- c
             a -> b <- c


[3] An Algorithm For Building Causal Models

  [3.1] Constructing Graphs

    Consider a graph with n-nodes...  As n increases, the number of
    possible directed acyclic graphs increases by some exponential factor.

    It is therefore unreasonable to find structures satisfying these
    constraints by means of enumeration, so we will need an algorithm.

    The algorithm we will consider will generate a semi-connected graph
    where:

      a->b   means a causal connection between a and b
        and
      c--d   has a special meaning

    This semi-connected graph is a summary of all MINIMAL STABLE structures
    that are consistent with the data.  If we arbitrarily add arrows to
    any of the undirected links (e.g c--d) without introducing any
    v-structures, then a valid MINIMAL STABLE structure will result.

    The algorithm used to do this is called the inductive causation
    algorithm.


  [3.2] The IC (Inductive Causation) Algorithm

    Input:   P^, a stable distribution on a set V of variables.
    Output:  a pattern H(P^) compatible with P^.

    1.  For each pair of variables a,b in V, search for a set Sab such 
        that "a is independent of b | Sab" holds in P.

        Construct a graph G such that a, b are connected with an edge
        if and only if no such set Sab can be found.

    2.  For each pair of non-adjacent variables a and b, with a common
        neighbor c, check if c belongs to Sab.

        If it does, then continue.
        If it does not, then add arrows to c:

               a -> c <- b

    3.  In the partially directed graph that results, orient as many
        edges as possible subj to the conditions that you:
  
           (I)  do not introduce any cycles
           (II) do not introduce any v-structures       


  [3.3] Rules for Orienting Edges:

    R1:  orient b-c into b->c whenever there is an arrow a->b such
         that a and c are non-adjacent.

    R2:  orient a-b into a->b whenever there is a chain a->c->b.

    R3:  orient a-b into a->b whenever there are two chains
           a->c->b and a-d->b, such that c and d are non-adjacent.

    R4:  orient a-b into a->b whenever there are two chains
           a-c->d and c->d->b, such that c and d are non-adjacent.