Uai12 Mohan Pearl

8/12/2019 Uai12 Mohan Pearl

1/117

Graphical Models for Causal

Inference

Karthika Mohan and Judea Pearl

University of California, Los Angeles

August 14, 2012

Karthika Mohan and Judea Pearl Graphical Models for Causal Inference


2/117

Introduction

Why do we need graphs?

Figure: Motivating Example



3/117

Introduction


Variables in the study:

Season

Sprinkler Rain

Wetness of pavement(Wet)

Slipperiness of

pavement(Slippery)



4/117

Introduction


# Variables Table size5 326 64

7 1288 2569 512

10 1, 02420 1, 048, 576

30 1, 073, 741, 824



5/117

Introduction

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Figure: DAG Representation

Conditional ProbabilityDistributions

P(X1) : 2 P(X3|X1) : 4

P(X2|X1) : 4

P(X4|X2, X3) : 8

P(X5|X4) : 4

Total # of Table Entries = 22



6/117

Graphs: Notations

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Figure: Bayesian Networkrepresenting dependencies

Adjacent Nodes

Root and Leaf Nodes Skeleton

Path

Kinship Terminology



7/117

Graphs: Notations

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY


Chain X1X3X4X5X1X3X4X5

Fork X3X1X2

Collider X3X4X2



8/117

Background Factors & Bi-directed Edges

UX UY

X Y

UX UY

(a) (b) (c)X Y X Y

Figure: (a) Causal Model with background factors (b) & (c) CausalModel with correlated background factors

Note: Figure (a) expresses the assumption: Ux Uy and Figure(b)& (c) express the assumption Ux Uy



9/117

Decomposing joint distribution-P(V)

How would you decompose joint distribution P(V) into smallerdistributions?



10/117



By applying Chain rule



11/117




Let X1, X2,...,Xn be any arbitrary ordering of nodes in a DAG.P(x1, x2,...,xn) =

jP(xj |x1,...,xj1)



12/117





jP(xj |x1,...,xj1)

Is it possible that conditional probability of some variable Xj isnot sensitive to all its predecessors?



13/117





jP(xj |x1,...,xj1)

Is it possible that conditional probability of some variable Xj isnot sensitive to all its predecessors?

Yes!



14/117

Markovian Parents

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY


Markovian Parents

X1:

X2:{X1}X3:{X1}X4:{X2, X3}X5:{X4}



15/117

Markovian Parents

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY


Markovian Parents

X1:

X2:{X1}X3:{X1}X4:{X2, X3}X5:{X4}

P(x1, x2, x3, x4, x5) =P(x1)P(x2|x1)P(x3|x1)P(x4|x2, x3)P(x5|x4)



16/117

Markov Compatibility

Let V ={x1, x2,...,xn}be the set of observed nodes and pai bethe Markovian parents ofxi. Then,P(v) =P(x1, x2,..,xn) =

i P(xi|pai).



17/117



i P(xi|pai).

Definition (Markov Compatibility)

If a probability distribution P admits Markovian factorizationof observed nodes relative to DAG G, we say that Gand P areMarkov compatible.



18/117



i P(xi|pai).

Definition (Markov Compatibility)

If a probability distribution P admits Markovian factorizationof observed nodes relative to DAG G, we say that Gand P areMarkov compatible.

Example

X Y P(X, Y)

1 1 0.2251 0 0.375

0 1 0.125

0 0 0.275

Markov Compatible DAGs:XYXY



19/117


20/117

Testing Markov Compatibility

Given a DAG Gand distribution P, how can you conclude that

P and Gare compatible?

Parents shielding tests non-descendants predecessors

d-separation



21/117

d-Separation

DefinitionLet X, Y and Z be disjoint sets in DAG G. X and Y ared-separated by Z (written (X Y|Z)G) if and only ifZ blocksevery path from a node in Xto a node in Y.



22/117

d-Separation


A path p is said to be d-separated(or blocked) by a set of nodesZ if and only if :(1) p contains a chain im j or a fork i m j suchthat the middle node m is in Z, or(2) ...



23/117

Example: d-Separation

X1

X23X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

X1

X4|

X2

, X3

X3 X5|X4 X1 X5|X4



24/117

d-Separation


A path p is said to be d-separated(or blocked) by a set of nodesZ if and only if :(1) p contains a chain im j or a fork i m j suchthat the middle node m is in Z, or

(2) p contains an inverted fork (or collider) im such thatthe middle node m is not in Zand such that no descendant ofm is in Z.



25/117


X1

X23X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

X3 X2|X1



26/117


Z2 Z3Z1X Y Z2 Z3Z1

(b)

X Y

(a)



27/117


Z2 Z3Z1X Y Z2 Z3Z1

(b)

X Y

(a)

Case (a): X Y|



28/117


Z2 Z3Z1X Y Z2 Z3Z1

(b)

X Y

(a)

Case (a): X Y|Case (b): XY


S


29/117

d-Separation

When is it impossible to d-separate 2 non-adjacent nodes X

and Y?Do we need to test all sets for possible separation?


I d i h


30/117

Inducing path

DefinitionPath between 2 nodes X and Y is termed inducing if everynon-terminal node on the path:(i) is a collider and(ii) an ancestor of either X or Y (or both)

X Y

Z1

Z2 Z2 Z3Z1

X Y

(a) (b)

Note: There are no separators for X and Y.


Th Fi N St f C l A l i


31/117

The Five Necessary Steps of Causal Analysis

Define Express the target quantity Q as property of themodel M.

Assume Express causal assumptions in structural or

graphical form.Identify Determine ifQ is identifiable.

Estimate Estimate Q if it is identifiable; approximate it, if itis not.

Test IfMhas testable implications


A Mi i T i T t i C l C ti


32/117

A Mini Turing Test in Causal Conversation

Figure: Turing Test


A Mi i T i T t i C l C ti


33/117


Figure: Turing Test

Input: Story

Question: What if? What is? Why?

Answers: I believe that...


A Mi i T i Te t i C l C e ti


34/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY




35/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q1: If the season is dry and the pavement is slippery, did itrain?




36/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q1: If the season is dry and the pavement is slippery, did itrain?A1: Unlikely, it is more likely that the sprinkler was ON with avery slight possibility that it is not even wet.




37/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q2: But what if we see that the sprinkler is OFF?




38/117


Figure: Turing Test

The Story

X1

X23X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q2: But what if we see that the sprinkler is OFF?A2: Then it is more likely that it rained.Without graphs,# of Table Entries = 32




39/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q3: Do you mean that if we actually turn the sprinkler ON, therain will be less likely?




40/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q3: Do you mean that if we actually turn the sprinkler ON, the

rain will be less likely?A3: No, the likelihood of rain would remain the same but thepavement would surely get wet.Without graphs,# of Table Entries = 32 * 32




41/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q4: Suppose we see that the sprinkler is ON and pavement iswet. What if the sprinkler were OFF?




42/117


Figure: Turing Test

The Story

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Q4: Suppose we see that the sprinkler is ON and pavement is

wet. What if the sprinkler were OFF?A4: The pavement would be dry because the season is likely tobe dryWithout graphs, what would be the # of table entries?


Interventions


43/117

InterventionsQuery: Would the pavement be slippery if we make sure thatthethe sprinkler is on?

Compute: P(x5|do(x3))May be equivalently represented as:(a) P(x5|x3)(b) Px3 (x5)

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY

Figure: DAG before intervention


Interventions


47/117

p

X YX Y

U

Question : Can you estimate P(y|do(x)), given P(x, y)?


Examples


48/117

X YX Y

U


NO!


Examples


49/117

X YX Y

U


NO!

P(x, y) =

u P(x,y,u) =

u P(y|x, u)P(x|u)P(u)

P(y|do(x)) =

u P(y|x, u)P(u)


Identifiability


50/117

DefinitionLet Q(M) be any computable quantity of a model M.


Identifiability


51/117

DefinitionLet Q(M) be any computable quantity of a model M. We say

that Q is identifiable in a class Mof models if, for any pairs ofmodels M1 and M2 from M, Q(M1) =Q(M2) wheneverPM1 (v) =PM2 (v).


Estimating causal effect


53/117

Theorem (Adjustment for direct causes)LetP Ai denote the set of direct causes ofXi and letY be anyset of variables disjoint of{Xi P ai}. The causal effect ofXionY is given by:

P(y|xi) =

pai P(y|xi, pai)P(pai)

whereP(y|xi, pai) andP(pai) represent pre-interventionalprobabilities.


Example: Adjustment for direct causes


54/117

Query: Would the pavement be slippery if we make sure thatthethe sprinkler is on?

P(x5|x3) =

x1P(x5|x3, x1)P(x1)

X1

X23

X

X4

X5

SPRINKLER

SEASON

RAIN

WET

SLIPPERY


X1

X23

X

X4

X5

RAIN

SLIPPERY

WET

SPRINKLER

= ON

SEASON

Figure: DAG after intervention


Estimating Causal Effect


55/117

Compute: P(Xj |do(Xi))How can we find a set Zof concomitants that are sufficient foridentifying causal effect?

X1

Xi X6

X4X3

X2

X5

Xj


Back-door Criterion for Identifiability


56/117

Definition (Pearl-1993)

A set of variables Zsatisfies the back-door criterion relative toan ordered pair of variables (Xi, Xj) in a DAG G if:

(i) no node in Zis a descendant ofXi; and(ii) Zblocks every path between Xi and Xj that contains an

arrow into Xi.

P(xj|do(xi)) =

zP(xj |xi, z)P(z)

X1

Xi X6

X4X3

X2

X5

Xj


Estimating causal effect: P(y|do(x))


57/117

Can you adjust for direct cause?

ZX Y

U (Unobserved)




58/117

Can you adjust for direct cause? NO!

ZX Y

U (Unobserved)




59/117

Can you apply backdoor criterion?

ZX Y

U (Unobserved)




60/117

Can you apply backdoor criterion? NO!

ZX Y

U (Unobserved)




61/117

Is P(y|do(x)) identifiable?

ZX Y

U (Unobserved)




62/117

Is P(y|do(x)) identifiable? YES!

ZX Y

U (Unobserved)




63/117

Given: P(y|x) =

zP(y|z)P(z|x)

ZX Y

U (Unobserved)




65/117

Front-door Adjustment


66/117

IfZsatisfies the front door criterion relative to (X, Y) and ifP(x, z)>0, then the causal effect ofX on Y is identifiable andis given by:

P(y|x) =

zP(z|x)

xP(y|x, z)P(x)


Estimating causal effect P(y|do(x))


67/117

How can you syntactically derive claims about interventions?


Estimating causal effect P(y|do(x))


68/117

How can you syntactically derive claims about interventions?

do-calculus

X Z Y X Z YX Z Y

X Z Y

U(Unobserved)

X Z Y

Z X

Z XZXZG G G

G = GG

Figure: Subgraphs ofG used in the derivation of causal effects.


do-Calculus-[Pearl-1995]


69/117

Rule-1Insertion or deletion of observations

P(y|x , z, w) =P(y|x, w) if (Y Z|X, W)GX

X Z Y X Z YX Z Y

X Z Y

U (Unobserved)

X Z Y

Z X

Z XZXZG G G

G = GG


do-Calculus-[Pearl-1995]


70/117

Rule-2Action/Observation exchange

P(y|x, z, w) =P(y|x , z, w) if (Y Z|X, W)GXZ

X Z Y X Z YX Z Y

X Z Y

U (Unobserved)

X Z Y

Z X

Z XZXZG G G

G = GG



71/117

Deriving causal effect using do-calculus


73/117

X Z Y X Z YX Z Y

X Z Y

U(Unobserved)

X Z Y

Z X

Z XZXZG G G

G = GG


Prove: P(y|x) =

zP(y|z)P(z|x)


77/117

X

Y

(a)

Y

X

(d)

X

Y

X

Y

Z

1

Z2

Y

(g)

Z3

XX

Y

Z

(e)

X

Y

Z

(b) (c)

Z1

Z2

(f)


Graphical Models in which P(y|x) is notIdentifiable


78/117

(e)

Z

X

Y

Z1Z2

X

Y

(g)(f)

Y

Z

X

Y

(h)

Z

W

X

X

Y

(a) (b)

X

Y

Z

X

Y

Z

(c) (d)Y

Z

X


C-components and C-factor

bl d b h C f h


79/117

Two variables are said to be in the same C-component if they

are connected by a path comprising of only bi-directionaledges[Tian & Pearl, 2002].

Z1

W

1

2Z

2

W

X Y

T



T i bl id b i h C if h


80/117



Z1

W

1

2Z

2

W

X Y

T

S1={X , Y , W 1, W2}S2={Z1}S3={Z2}S4={T}



T i bl id t b i th C t if th


81/117



Z1

W

1

2Z

2

W

X Y

T

S1={X , Y , W 1, W2}S2={Z1}S3={Z2}S4={T}

C-factor: Q[Si](v) =Pv\si(si)


Identifiability of C-factor


82/117

Lemma (Tian & Pearl, 2002)

Let a topological order overV beV1 < V2 < ... < Vn and letV(i) ={V1, V2,...,Vi}, i= 1,...,nandV

(0) =. For any setC,letGCdenote the subgraph ofG composed only of variables in

C. Then:(i) Each C-factorQj , j = 1,...,k is identifiable and is given by:

Qj =

{i:ViSj}P(vi|v

(i1))


Example: Identifiability of C-factor


83/117

X1

X2 X3 X

4

Y

Admissible order: X1 < X2 < X3 < X4 < YQ1=P(x4|x1, x2, x3)P(x2|x1)Q2=P(y|x1, x2, x3, x4)P(x3|x1, x2)P(x1)


Necessary and Sufficient condition foridentifiability ofPx(v)


84/117

Theorem (Tian & Pearl, 2002)

LetXbe a singleton. Px(v) is identifiable if and only if there isno bi-directed path connectingX to any of its children.


Necessary and Sufficient condition foridentifiability ofPx(v)


85/117

Theorem (Tian & Pearl, 2002)

LetXbe a singleton. Px(v) is identifiable if and only if there isno bi-directed path connectingX to any of its children. When

Px(v) is identifiable, it is given by:

Px(v) = P(v)QX

x Q

X,

whereQX is the c-factor corresponding to the c-componentSX

that containsX.


Example: Necessary and Sufficient condition foridentifiability ofPx(v)


87/117

Identification ofPx(y|z) where X Y Z= and X is notnecessarily a singleton, [Shpitser & Pearl,2006]

Hedge Criterion IDC - Sound and Complete Algorithm


Counterfactuals


88/117

Query: Would the prisoner be dead had rifleman A not shot

him, given that the prisoner is dead and rifleman A shot him?

C (Captain)

U (Court order)

B (Riflemen)A

D (Death)


Counterfactuals

Q ld h b d d h d fl A h


89/117



C (Captain)

U (Court order)

B (Riflemen)A

D (Death)

Abduction


Counterfactuals

Q W ld h i b d d h d ifl A h


90/117



C (Captain)

U (Court order)

B (Riflemen)A

D (Death)

Abduction Intervention


Counterfactuals

Q W ld th i b d d h d ifl A t h t


91/117



C (Captain)

U (Court order)

B (Riflemen)A

D (Death)

Abduction Intervention

Prediction


Counterfactuals


92/117

Query: Would the prisoner be dead had rifleman A not shothim, given that the prisoner is dead and rifleman A shot him?

C (Captain)

U(Court order)

B (Riflemen)A

D (Death)

Model M

C=UA= C

B =CD=A BFacts: D

Conclusions:

U,A,B,C,D

Model MAC=UA

B=CD=A BFacts: UConclusions:

U, A, B, C, D


Markov Equivalence

Given 2 models is there a test that would tell them apart?


93/117

Given 2 models, is there a test that would tell them apart?

DefinitionTwo graphs G1 and G2 are said to be Markov equivalent ifevery d-separation condition in one also holds in the other. .

A

B

Z

1G( )

A

B

Z

G( )2


Markov Equivalence

Given 2 models is there a test that would tell them apart?


94/117

Given 2 models, is there a test that would tell them apart?

DefinitionTwo graphs G1 and G2 are said to be Markov equivalent ifevery d-separation condition in one also holds in the other. .

Are these DAGs Markov Equivalent?

Z1

W1

2Z

2W

X Y

T

Z1 2

Z2

WW1

X Y

T

Hard to enumerate all separation conditions.


Observational Equivalence

Theorem (Verma & Pearl 1990)

Two DAGs are observationally equivalent iff they have the same


95/117

y q ff y

sets of edges and the same sets of v-structures, that is, twoconverging arrows whose tails are not connected by an arrow.

X1

X23

X

X4

X5

X1

X23

X

X4

X5

Figure: Observationally Equivalent DAGs



96/117

Markov Equivalence and ObservationalEquivalence


97/117

If two DAGs are Markov Equivalent, then they areObservationally Equivalent as well. True/False?

True if all variables are observed(i.e. no bi-directed edges) andFalse otherwise.

K thik M h d J d P l G hi l M d ls f C s l I f

Markov Equivalence and ObservationalEquivalence

If two DAGs are Markov Equivalent, then they are


98/117

q , y

Observationally Equivalent as well. True/False?True if all variables are observed(i.e. no bi-directed edges) andFalse otherwise.

X T Z YX T Z Y

Figure: DAGs that are Markov Equivalent but not ObservationallyEquivalent

How would you distinguish between the two?

Verma Constraints (Refer slide:115)

K thik M h d J d P l G hi l M d l f C l I f

Ancestral Graphs

Definition (Ancestral Graphs)


99/117

A graph which may contain directed or bi-directed edges isancestral if:(i) there are no directed cycles(ii) whenever there is an edge XY, then there is nodirected path from X to Y or from Y to X.

Z1 2

Z2

WW1

X Y

T

Figure: Ancestral graph

Z1

W1

2Z

2W

X Y

T

Figure: Notan Ancestral graph


Maximal Ancestral Graphs (MAGs)

Definition (Spirtes & Richardson, 2002)


100/117

Definition (Spirtes & Richardson, 2002)

An ancestral graph is said to be maximal if, for every pair ofnon-adjacent nodes X, Ythere exists a set Zsuch that X and

Yare d-separated conditional on Z.

Z1

W1

2Z

2W

X Y

T

Z1 2

Z

W1 2W

X Y

T

Figure: DAG and its corresponding MAG


Construction of a MAG

Given : DAG G


101/117

Given: DAG G

Step-1: Construct a graph Mcomprising of:(i) all nodes in G(ii) all uni-directional edges in G

Z1

W1

2Z

2W

X Y

T

Figure: DAG G

Z1

W1

2Z

2W

X Y

T

Figure: Graph M



102/117

Construction of a MAG

Step-3: For every pair of non-adjacent nodes A and B in G,


103/117

connected by an inducing path ,(i) add A B to M ifA is an ancestor ofB in G(ii) add AB to M ifB is an ancestor ofAin G(iii) add AB to Mif (i) and (ii) do not hold true

Z1

W1

2Z

2W

X Y

T

Figure: DAG G

Z1 2

Z

W1 2

W

X Y

T

Figure: MAGM


Markov Equivalence

Theorem


104/117

Two graphsG1 andG2 are said to be Markov equivalent if theirMAGs are Markov Equivalent

Are these MAGs Markov Equivalent?

Z1 2

Z2

WW1

X Y

T

Z1 2

Z

W1 2

W

X Y

T

Note: Markov Equivalence in MAGs are easier to check

Complete criterion for determining Markov Equivalence of 2MAGs: [Ali, Richardson and Spirtes, 2009]


Reversing an edge in a MAG

Definition (Screened Edge)


105/117

Definition (Screened Edge)

[Tian,2005] An edge XY is a screened edge in a MAG ifP a(Y) =P a(X) {X} and Sp(Y) =Sp(X)1.

X YZ

W

Figure: MAG with Screened Edge:XY

1Nodes X and Yare spouses, if they are connected by a bi-directed edge.


Reversing an edge in a MAG

Theorem (Tian,2005)


106/117

LetMbe a MAG with edgeXY andM

be a graph withedgeXY, otherwise identical to M. ThenM is a MAGthat is Markov Equivalent to M if and only ifXY is ascreened edge inM.

X YZ

W

X YZ

W

Figure: Markov Equivalent MAGs


Confounding Equivalence

Definition (Pearl and Paz,2009)


107/117

Define two sets, T and Z as c-equivalent (relative to X and Y),written T Z, if the following equality holds for every x and y:

t P(y|x, t)P(t) =

zP(y|x, z)P(z) x, y

W1 2

W

V1 2

V

X Y

Examples:

T ={W1, V2} Z={W2, V1}

T ={W1, V1} Z={W2, V2}

T ={W1, W2} Z={W1}

T ={W1, W2} Z={W2}

Note: C-equivalence is testable


Necessary and Sufficient Condition forC-Equivalence

Theorem (Pearl and Paz, 2009)


108/117

Let Z and T be two sets of variables containing no descendantof X. A necessary and sufficient condition for Z and T to bec-equivalent is that at least one of the following conditions hold:

X (Z T) | (Z T) or

Z and T are G-admissible2

W1 2

W

V1 2

V

X Y

Examples:

T ={W1, V2} Z={W2, V1}

T ={W1, V1} Z={W2, V2} T ={W1, W2} Z={W1}

T ={W1, W2} Z={W2}

2satisfies back-door criterion


Linear Models and Causal Diagrams


109/117

Assume all variables are normalized to have zero mean and unitvariance.

ZX

Y

W

a

b

c

Z=e1W =e2X=aZ+ e3

Y =bW+ cX+ e4Cov(e1, e2) = = 0Cov(e2, e3) == 0Cov(e3, e4) == 0

Which parameters can be identified?


Vanishing Regression Coefficient

Definition


110/117

DefinitionFor any linear model for a causal diagram D that may includecycles and bi-directed arcs, the partial correlation XY.Z mustvanish if and only if node X is d-separated from node Y by thevariables of Z in D [Spirtes et al., 1997b].

Z1

W1

2Z

2W

X Y

T rTX.W1Z1 = 0

Find more


Single Door Criterion for Direct Effects

TheoremLetG be any path diagram in which is the path coefficient


111/117

associated with linkXY and letG denote the diagram thatresults whenXY is deleted fromG. The coefficient isidentifiable if there exists a set of variablesZsuch that :(i) Zcontains no descendant ofY and(ii) Z d-separatesX fromY inG

Moreover, ifZ satisfies these two conditions, then is equal tothe regression coefficientrY X.Z.

G

Z YX

G

Z YX


Instrumental Variables (IV)DefinitionA variable Zis an instrument relative to a cause X and aneffect Y if:


112/117

Z is independent of all error terms that have an influenceon Y when X is held constant, and

Z is notindependent of X.

In linear systems, Causal effect ofX on Y = rZYrZX

UY

UY

UW

UYUY

Y

(b)

X

Z

Y

(c)

X

Z

Y

(d)

X

Z

W

Y

(a)

X

Z

Figure: Z is an instrument in (a), (b) and (c) but not in (d)


Conditional Instrumental Variable

Definition (Brito & Pearl, 2002)

Zis an instrumental variable ifa set Wsuch that:


113/117

Wcontains only non-descendants ofY W d-separates Z from Y in the sub-graph G obtained by

removing the edge XY W does not d-separate Z from X in G

Z X Y

W

Z X Y

W

Figure: Graph Gand corresponding subgraph G


Conditional Instrumental Variable

Z is a conditional instrumental variable. Hence,

= Causal effect ofX on Y = rZY.WrZX.W

Wdoes not satisfy single-door criterion. So, cannot be


114/117

y g ,identified using single-door.

Z X Y

W

Z X Y

W

Figure: Graph Gand corresponding subgraph G


Verma Constraints([Tian and Pearl, 2002])


115/117

A B C D

(a) (b)

A B C D

Q[{B, D}] =

u P(b|a, u)P(d|c, u)P(u)Pv\d(d) =

u P(d|c, u)P(u)

Also,Q[{B, D}] =P(d|a,b,c)P(b|a)Pv\d

(d) =

bP(d|a,b,c)P(b|a)

b P(d|a,b,c)P(b|a) isindependent of a.

Q[{B, D}] =

u P(b|a, u)P(d|a,c,u)P(u)Pv\d(d) =

u P(d|a,u,c)P(u)

Also,Q[{B, D}] =P(d|a,b,c)P(b|a)Pv\d

(d) =

bP(d|a,b,c)P(b|a)

b P(d|a,b,c)P(b|a) isnotindependent of a.


Conclusions


116/117

Graphs are indispensable for:

encoding causal knowledge

identifying parameters and causal effects

identifying testable implications

Go ahead and Exploit the Power of Graphs!



117/117

Thank You!


Uai12 Mohan Pearl

Documents

Transcript of Uai12 Mohan Pearl