Exploratory studies: you have empirical data and you want to know what sorts of causal models are...

20
atory studies: you have empirical data and you want to know what sorts of causal models are consistent wit Confirmatory tests: you have a causal hypothesis and you want to see if the empirical data agree with it Do the data agree with my hypothesis? I think that: A B C (hypothesis) My data has this pattern of correlation within it What causal processes could have generated this pattern?

Transcript of Exploratory studies: you have empirical data and you want to know what sorts of causal models are...

Page 1: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it.

Confirmatory tests: you have a causal hypothesis and you want to see if the empirical data agree with it.

Do the data agree with myhypothesis?

I think that: A B C(hypothesis)

My data has this patternof correlation within it

What causal processes couldhave generated this pattern?

Page 2: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

A

B C

D

E

“3-D” causal process “2-D” correlational shadow

Hypothesis generation

B & C independentgiven A

A & D independentgiven B & C

B & D independentgiven D

and so on...

Page 3: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Besides the notion of d-separation, we need one other notion:faithfulness of data to a causal graph

Is there one Bighorn Sheep in this picture,or are there two, except that the second ishidden behind the first?

Both cases are possible, but the second caserequires a very special combination of factors,i.e. that the second animal is positioned so thatit gives the illusion of being absent.

If the second case happens, then we can say that this isunfaithful to our normal experience.

Page 4: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

A B

C

+10

-2 +5

Overall effect of A on B:+10 + (-2*5)=0

Because the two pathsexactly cancel out, the overallcorrelation between A and Bis zero; i.e. uncorrelated!

The joint probability distribution overA,B & C is unfaithful to the graph becauseit gives the illusion of of independencebetween A and B contrary tod-separation.

A B

C

This will only occurwhen positive andnegative values exactlycancel out (very specialconditions) like seeing onesheep because the otherone is hiding behind the first!

Unfaithfulness

Page 5: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 1: create a saturated undirected dependencygraph.

A

B

C D

E

Page 6: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 2: let the order n (i.e the number) of conditioning variables be zero (i.e. no conditioning variables)

- For each unique pair ofvariables (X,Y) that are stilladjacent in the graph…)

- For each unique set Q of the n other variables in thegraph (in this case none…)

Test the data to see if variables X and Y are independent given theconditioning set Q.

A

B

C D

E

If X and Y are independentin the data, remove the linebetween them in the graph

Page 7: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 2: let the order n (i.e the number) of conditioning variables be zero (i.e. no conditioning variables)

A

B

C D

E

Is A & B independent given no others?

Is A & C independent given no others?

No; don’t remove the line

No; don’t remove the line

And so on...

Result: we don’t remove any linesat this stage.

Page 8: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable)

A

B

C D

E

Is A & B independent give C? No.

Is A & B independent given D? E? No.

Is A & C independent given B? Yes.

Therefore, remove the line betweenA and C and go to next pair (A,D)

Page 9: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable)

A

B

C D

E

Is A & B independent give C? No.

Is A & B independent given D? E? No.

Is A & C independent given B? Yes.

Therefore, remove the line betweenA and C and go to next pair (A,D)

Is A & D independent given B? Yes.

Therefore, remove the line betweenA and D and go to next pair (A,E)

Page 10: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 3: let the order n (i.e the number) of conditioning variables be one (i.e. one conditioning variable)

A

B

C D

E

Is A & B independent give C? No.

Is A & B independent given D? E? No.

Is A & C independent given B? Yes.

Therefore, remove the line betweenA and C and go to next pair (A,D)

Is A & D independent given B? Yes.

Therefore, remove the line betweenA and D and go to next pair (A,E)

And so on for each unique pair of variablesand each unique conditioning set.

Page 11: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 4: let the order n (i.e the number) of conditioning variables be two (i.e. two conditioning variables)

A

B

C D

E

Is A & B independent give any two others? No.

Therefore, remove the line betweenB and E and go to next pair (A,D)

Is B & C independent given any two others? No.

Is B & D independent given any two others? No.

Is B & E independent given any two others? Yes (C & D).

Page 12: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

Step 4: let the order n (i.e the number) of conditioning variables be two (i.e. two conditioning variables)

A

B

C D

E

Is A & B independent give any two others? No.

Is A & E independent given any two others? No.

Therefore, remove the line betweenB and E and go to next pair (A,D)

Is B & C independent given any two others? No.

Is B & D independent given any two others? No.

Is B & E independent given any two others? Yes (C & D).

Page 13: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Obtaining the undirected dependency graph

This algorithm is provably correct for any probability distribution, and for anyfunctional relationship between variables, and for both cyclic and acyclic causalstructures assuming:

1. Faithfulness;

3. No incorrect statistical decisions have been made when deciding upon statisticalindependence between variables in the data(i.e. lots of data and tests appropriate to the variables in question)

The fewer data you have, the greater the chance of missing small, but real,statistical dependencies (statistical power)

2. All data are generated by the same causal process;

Page 14: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Interpreting the undirected dependency graph

A

B

C D

E

If there is a line between two variables in thisundirected dependency graph then:

1. There is a direct causal relationship between the two and/or ...

2. There is a latent variable that is a common cause of the two and/or...

3. There is a more complicated type of undirected path between the two (an inducing path)

A

B

C DA

B

C D

latent

Page 15: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Orienting the undirected dependency graph

C D

E

C D

E

Shielded colliders Unshielded collider

C D

E

C D

E

C D

E

C D

E

Unshielded non-colliders

X Y Z

Unshielded pattern

A

B

C D

E

True process whichwe can’t see!

A

B

C D

E

We can’t see this!We’ve learnedthis!

Page 16: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Orienting the undirected dependency graph

In an unshielded collider, C & D will never beindependent conditional on E plus everypossible combination of remaining variables.

C D

E

In an unshielded non-collider, C & D must beindependent conditional on E plus (possibly)some other combination of remaining variables;this is why the line between C & D was removedin the undirected dependency graph!

C D

E

C D

E

C D

E

C & D dependent given all of Q={E, E+A, E+B, E+A+B}

C & D independent given one of Q={ E, E+A, E+B, E+A+B}

Page 17: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Orienting the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

A

B

C D

E

We can’t see this!We’ve learnedthis!

A B C A B D

C B D B C E

B D E C E D

C E DA

B

C D

E

Page 18: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

Orienting the undirected dependency graph

A

B

C D

E

True process whichwe can’t see!

We can’t see this!We’ve learnedthis!

A

B

C D

EA

B

C D

E

We can’t learn any more by just looking atthe data.

We can orient the rest of the edges any waywe want, so long as we don’t:

Partially-orientedacyclic graph

- create or destroy any unshielded collidersthat are found in the partially-oriented graph

- create any cycles in the graph.

All such graphs are statisticallyequivalent and we can’t test between them.

Page 19: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.
Page 20: Exploratory studies: you have empirical data and you want to know what sorts of causal models are consistent with it. Confirmatory tests: you have a causal.

There are some further algorithms that can sometimes allow us to orient morelines, but they are more complicated and require more specialized patterns.

There are also algorithms for oriented cyclic causal processes, but these areeven more complicated and require stronger assumptions (linearity of relationshipsand continuous variables).

There are also algorithms for detecting latent variables, but these assume bothlinearity and normality.

The TETRAD Project:

Causal Models and Statistical Data

http://www.phil.cmu.edu/projects/tetrad/

Causal toolbox: http://callisto.si.usherb.ca:8080/bshipley/