1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its...

34
COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies using decomposition that it is also independent of its predecessors in a particular order d. Proof : X is d-separated of all its non- descendants, given its parents. Since D is an I-map, by the soundness theorem the claim holds.

Transcript of 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its...

Page 1: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

1

COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants,

given its parents.

Proof : Each variable X is conditionally independent of all its non-descendants, given its parents implies

using decomposition that it is also independent of its predecessors in a particular order d.

Proof : X is d-separated of all its non-descendants, given its parents. Since D is an I-map, by the soundness theorem the

claim holds.

Page 2: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

2

COROLLARY 5: If D=(U,E) is a boundary DAG of P constructed in some order d, then any topological order d’

of U will yield the same boundary DAG of P. (Hence construction order can be forgotten).

Proof : Each variable X is d-separated of all its non-descendants, given its parents in the boundary DAG of P.

In particular, due to decomposition, X is independent given its parents from all previous variables in any

topological order d’.

Page 3: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

3

Extension of the Markov Chain Property

I(Xk, Xk-1, X1 … Xk-2) I(Xk, Xk-1 Xk+1, X1 … Xk-2 Xk+2… Xn )

Holds due to the soundness theorem. Converse holds when Intersection is assumed.

Markov Blankets in DAGs

Page 4: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

4

Consequence: There is no improvement to d-separation and no statement escapes graphical representation.

Reasoning: (1) If there were an independence statement not shown by d-separation, then must be true in all distributions that satisfy the basis. But Theorem 10 states that there exists a

distribution that satisfies the basis and its consequences but violates . (2) Same argument. [Note that (2) is a stronger claim.]

Page 5: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

.

Bayesian Networks

Some slides have been edited from Nir Friedman’s lectures which is available at www.cs.huji.ac.il/~nir. Changes made by Dan Geiger.

Background Readings: An Introduction to Bayesian Networks, Finn Jensen, UCL Press, 1997.

Page 6: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

6

The “Visit-to-Asia” Example

Visit to Asia

Smoking

Lung CancerTuberculosis

Abnormalityin Chest

Bronchitis

X-Ray Dyspnea

What are the relevant variables and their dependencies ?

Page 7: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

7

Verifying the (in)Dependencies

We can now ask the expert: do the following assertion hold?

I ( S; V ) I ( T; S | V ) I ( l; {T, V} | S ) … I ( X; {V,S,T,L,B,D} | A)

V S

LT

A B

X D

Alternative verification: Is each variable becoming independent of the rest, given its Markov boundary ?

Take-Home Question: Are other variable construction orders as good ?

Page 8: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

8

Quantifying the Bayesian Network

p(t|v)

Bayesian network = Directed Acyclic Graph (DAG), annotated with conditional probability distributions.

V S

LT

A B

X D

),|()|(),|()|()|()|()()(

),,,,,,,(

badPaxPltaPsbPslPvtPsPvP

dxabltsvP

p(x|a) p(d|a,b)

p(a|t,l)p(b|s)

p(l|s)

p(s)p(v)

Page 9: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

9

Local distributions

Conditional Probability Table:p(A=y|L=n, T=n) = 0.02p(A=y|L=n, T=y) = 0.60p(A=y|L=y, T=n) = 0.99p(A=y|L=y, T=y) = 0.99

L(Yes/No)

T

(Yes/No)

A(Yes/no)

p(A|T,L)

Asymmetric independence in the CPT

Page 10: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

10

Queries

There are several types of queries.

Most queries involve evidence

An evidence e is an assignment of values to a set E of variables in the domain

Example, A Posteriori belief: P(D=yes | V = yes )Or in general: P(H=h | E = e ) where H and E are subsets of variables.

Equivalent to computing P(H=h , E = e ) and then dividing.

Page 11: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

11

A posteriori belief

This query is useful in many cases:

Prediction: what is the probability of an outcome given the starting condition

Diagnosis: what is the probability of disease/fault given symptoms

V S

LT

A B

X D

Page 12: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

12

Example: Predictive+Diagnostic

P(T = Yes | Visit_to_Asia = Yes, Dyspnea = Yes )

V S

LT

A B

X D

Probabilistic inference can combine evidence form all parts of the network, Diagnostic and Predictive, regardless of the directions of edges in the model.

Page 13: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

13

Queries: MAP Find the maximum a posteriori assignment for

some variable of interest (say H1,…,Hl )

That is, h1,…,hl maximize the conditional probability

P(h1,…,hl | e)

Equivalent to maximizing the joint P(h1,…,hl, e)

Page 14: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

14

Queries: MAPWe can use MAP for: Explanation

What is the most likely joint event, given the evidence (e.g., a set of likely diseases given the symptoms)

What is the most likely scenario, given the evidence (e.g., a series of likely malfunctions that trigger a fault).

D1 D2

S2S1

D3 D4

S4S3

Dead battery

Not charging Bad battery

Bad magneto Bad alternator

Page 15: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

15

How Expressive are Bayesian Networks

1. Check the diamond example via all boundary bases.

2. The following property holds for d-separation but does not hold for conditional independence:

ID(X,{},Y) and ID (X,,Y) ID (X,{}, ) or ID (,{},Y)

Page 16: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

16

Drawback: Interpreting the Links is not simple

Another drawback is the difficulty with extreme probabilities. There is no local test for I-mapness.

Both drawbacks disappear in the class of decomposable models, which are a special case of Bayesian networks

Page 17: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

17

Decomposable Models

Example: Markov Chains and Markov Trees

Assume the following chain is an I-map of some P(x1,x2,x3,x4) and was constructed using the methods we just described.

The “compatibility functions” on all links can be easily interpreted in the case of chains.

Same also for trees. This idea actually works for all chordal graphs.

Page 18: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

18

Chordal Graphs

Page 19: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

19

Interpretation of the links

Clique 1 Clique 2

Clique 3

A probability distribution that can be written as a productof low order marginals divided by a product of low order marginals is said to be decomposable.

Page 20: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

20

Importance of Decomposability

When assigning compatibility functions it suffices to use marginal probabilities on cliques and just make sure to be locally consistent. Marginals can be assessed from experts or estimated directly from data.

Page 21: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

21

The Diamond Example – The smallest non chordal graph

Adding one more link will turn the graph to become chordal.

Turning a general undirected graph into a chordal graph in some optimal way is the key for all exact computations done on Markov and Bayesian networks.

Page 22: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

22

Page 23: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

23

Complexity of Inference

Theorem:

Computing P(X = x) in a Bayesian network is NP-hard.

Main idea: conditional probability tables with zeros and ones are equivalent to logical gates.

Hence reducibility to 3-SAT is the easiest to pursue.

Page 24: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

24

Proof

We reduce 3-SAT to Bayesian network computation

Assume we are given a 3-SAT problem: Q1,…,Qn be propositions,

1 ,... ,k be clauses, such that i = li1 li2 li3

where each lij is a literal over Q1,…,Qn

(e.g., Q1 = true )

= 1... k

We will construct a Bayesian network s.t. P(X=t) > 0 iff is satisfiable

Page 25: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

25

P(Qi = true) = 0.5,

P(I = true | Qi , Qj , Ql ) = 1 iff Qi , Qj , Ql satisfy the clause I

A1, A2, …, are simple binary AND gates

...

...

1

Q1 Q3Q2 Q4 Qn

2 3 k

A1

... k-1

A2 XAk-2

Page 26: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

26

It is easy to check Polynomial number of variables Each Conditional Probability Table can be described by a

small table (8 parameters at most) P(X = true) > 0 if and only if there exists a satisfying

assignment to Q1,…,Qn

Conclusion: polynomial reduction of 3-SAT

...

...

1

Q1 Q3Q2 Q4 Qn

2 3 k

A1

... k-1

A2 XAk-2

Page 27: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

27

Inference is even #P-hard

P(X = t) is the fraction of satisfying assignments to

Hence 2nP(X = t) is the number of satisfying assignments to

Thus, if we know to compute P(X = t), we know to count the number of satisfying assignments to.

Consequently, computing P(X = t) is #P-hard.

Page 28: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

28

Hardness - Notes

We need not use deterministic relations in our construction. The construction shows that hardness follows even with a

small degree graphs.

Hardness does not mean we cannot do inference It implies that we cannot find a general procedure that

works efficiently for all networks For particular families of networks, we can have provably

efficient procedures (e.g., trees, HMMs). Variable elimination algorithms.

Page 29: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

29

Extra Slides with more detailsIf times allows

Page 30: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

30

Chordal Graphs

Page 31: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

31

Example of the Theorem

1. Each cycle has a chord.2. There is a way to direct edges legally, namely,

AB , AC , BC , BD , CD, CE3. Legal removal order (eg): start with E, than D,

than the rest. 4. The maximal cliques form a join (clique) tree.

Page 32: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

32

Theorem X: Every undirected graph G has a distribution P such that G is a perfect map of P.

Page 33: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

33

Proof of Theorem X

Given a graph G, it is sufficient to show that for an independence statement = I(,Z,) that does NOT hold in G, there exists a probability distribution that satisfies all independence statements that hold in the graph and does not satisfy = I(,Z,).

Well, simply pick a path in G between and that does not contain a node from Z. Define a probability distribution that is a perfect map of the chain and multiply it by any marginal probabilities on all other nodes forming P . Now “multiply” all P (Armstrong relation) to obtain P.

Interesting task (Replacing HMW #4): Given an undirected graph over binary variables construct a perfect map probability distribution. (Note: most Markov random fields are perfect maps !).

Page 34: 1 COROLLARY 4: D is an I-map of P iff each variable X is conditionally independent in P of all its non-descendants, given its parents. Proof  : Each variable.

34

Interesting conclusion of Theorem X:

All independence statements that follow for strictly-positive probability from the neighborhood basis are derivable via symmetry, decomposition, intersection, and weak union.

These axioms are (sound and) complete for neighborhood bases. These axioms are (sound and) complete also for pairwise bases.

In fact for saturated statements conditional independence and separation have the same characterization. See paper P2 .