. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u...
-
date post
21-Dec-2015 -
Category
Documents
-
view
218 -
download
1
Transcript of . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u...
![Page 1: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/1.jpg)
.
Approximate Inference
Slides by Nir Friedman
![Page 2: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/2.jpg)
When can we hope to approximate?
Two situations: Highly stochastic distributions
“Far” evidence is discarded “Peaked” distributions
improbable values are ignored
![Page 3: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/3.jpg)
Stochasticity & Approximations
Consider a chain:
P(Xi+1 = t | Xi = t) = 1- P(Xi+1 = f | Xi = f) = 1-
Computing the probability of Xn+1 given X1 , we get
X1 X2 X3Xn+1
2/)1(
0
121211
2/
0
2211
)1(12
)|(
)1(2
)|(
n
k
knkn
n
k
knkn
k
ntXfXP
k
ntXtXP
Even # of flips:
Odd # of flips:
![Page 4: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/4.jpg)
Plot of P(Xn = t | X1 = t)
0.5
0.6
0.7
0.8
0.9
1
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
n = 5
n = 10
n = 20
![Page 5: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/5.jpg)
Stochastic Processes
This behavior of a chain (a Markov Process) is called Mixing.
In general Bayes nets there is a similar behavior. If probabilities are far from 0 & 1, then effect of
“far” evidence vanishes (and so can be discarded in approximations).
![Page 6: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/6.jpg)
“Peaked” distributions If the distribution is “peaked”, then most of the
mass is on few instances If we can focus on these instances, we can
ignore the rest
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
Probability
Instances
![Page 7: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/7.jpg)
Global conditioning
A
L
C
I
D
J
B
M
E
K
Fixing value of A & B
a b c d le
ledcbaPmP...
),...,,,,,()(
Fixing values in the beginning of the summation can decrease tables formed by variable elimination. This way space is traded with time. Special case: choose to fix a set of nodes that “break all loops”. This method is called cutset-conditioning.
L
C
I
J
M
E
K
D
a b b a
![Page 8: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/8.jpg)
Bounded conditioning
A
B
Fixing value of A & B
By examining only the probable assignment of A & B, we perform several simple computations instead of a complex one.
![Page 9: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/9.jpg)
Bounded conditioning
Choose A and B so that P(Y,e |a,b) can be computed easily. E.g., a cycle cutset.
Search for highly probable assignments to A,B. Option 1--- select a,b with high P(a,b). Option 2--- select a,b with high P(a,b | e).
We need to search for such high mass values and that can be hard.
obasbleba
b)P(ab)|ayYP)yYPPr,
,,,(,( ee
![Page 10: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/10.jpg)
Bounded Conditioning
Advantages: Combines exact inference within approximation Continuous: more time can be used to examine more cases Bounds: unexamined mass
used to compute error-bars
Possible problems: P(a,b) is prior mass not the posterior. If posterior is significantly different P(a,b| e), Computation
can be wasted on irrelevant assignments
obableba
b)P(aPr,
,1
![Page 11: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/11.jpg)
Network Simplifications
In these approaches, we try to replace the original network with a simpler one
the resulting network allows fast exact methods
![Page 12: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/12.jpg)
Network Simplifications
Typical simplifications: Remove parts of the network Remove edges Reduce the number of values (value abstraction) Replace a sub-network with a simpler one
(model abstraction) These simplifications are often w.r.t. to the
particular evidence and query
![Page 13: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/13.jpg)
Stochastic Simulation
Suppose our goal is the compute the likelihood of evidence P(e) where e is an assignment to some variables in {X1,…,Xn}.
Assume that we can sample instances <x1,…,xn> according to the distribution P(x1,…,xn).
What is then the probability that a random sample <x1,…,xn> satisfies e?
Answer: simply P(e) which is what we wish to compute.
Each sample simulates the tossing of a biased coin with probability P(e) of “Heads”.
![Page 14: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/14.jpg)
Stochastic Sampling
Intuition: given a sufficient number of samples x[1],…,x[N], we can estimate
Law of large number implies that as N grows, our estimate will converge to p with high probability
N
[i])|P
NHeads
)P i
xe
e(
#(
Zeros or ones
How many samples do we need to get a reliable estimation?
We will not discuss this issue here.
![Page 15: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/15.jpg)
Sampling a Bayesian Network
If P(X1,…,Xn) is represented by a Bayesian network, can we efficiently sample from it?
Idea: sample according to structure of the network Write distribution using the chain rule, and then
sample each variable given its parents
![Page 16: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/16.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
b
Earthquake
Radio
Burglary
Alarm
Call
0.03
![Page 17: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/17.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
eb
Earthquake
Radio
Burglary
Alarm
Call
0.001
![Page 18: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/18.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
e ab
0.4
Earthquake
Radio
Burglary
Alarm
Call
![Page 19: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/19.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
e a cb
Earthquake
Radio
Burglary
Alarm
Call
0.8
![Page 20: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/20.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
e a cb
0.3
Earthquake
Radio
Burglary
Alarm
Call
![Page 21: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/21.jpg)
Samples:
B E A C R
Logic sampling
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
e e
0.3 0.001
e a cb r
Earthquake
Radio
Burglary
Alarm
Call
![Page 22: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/22.jpg)
Logic Sampling
Let X1, …, Xn be order of variables consistent with arc direction
for i = 1, …, n do sample xi from P(Xi | pai ) (Note: since Pai {X1,…,Xi-1}, we already
assigned values to them) return x1, …,xn
![Page 23: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/23.jpg)
Logic Sampling
Sampling a complete instance is linear in number of variables Regardless of structure of the network
However, if P(e) is small, we need many samples to get a decent estimate
![Page 24: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/24.jpg)
Can we sample from P(Xi|e) ?
If evidence e is in roots of the Bayes network, easily If evidence is in leaves of the network, we have a
problem: Our sampling method proceeds according to the
order of nodes in the network.
Z
R
B
A=a
X
![Page 25: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/25.jpg)
Likelihood Weighting
Can we ensure that all of our sample satisfy e? One simple (but wrong) solution:
When we need to sample a variable Y that is assigned value by e, use its specified value.
For example: we know Y = 1 Sample X from P(X) Then take Y = 1
Is this a sample from P(X,Y |Y = 1) ? NO.
X Y
![Page 26: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/26.jpg)
Likelihood Weighting
Problem: these samples of X are from P(X) Solution:
Penalize samples in which P(Y=1|X) is small
We now sample as follows: Let xi be a sample from P(x) Let wi= P(Y = 1|X = xi )
X Y
ii
iii
w
)|XPw)YxXP
xx(1|(
![Page 27: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/27.jpg)
Likelihood Weighting
Let X1, …, Xn be order of variables consistent with arc direction
w = 1 for i = 1, …, n do
if Xi = xi has been observedw w P(Xi = xi | pai )
elsesample xi from P(Xi | pai )
return x1, …,xn, and w
![Page 28: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/28.jpg)
Samples:
B E A C R
Likelihood Weighting
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a
0.8 0.05
P(r)
r r
0.3 0.001
b
Earthquake
Radio
Burglary
Alarm
Call
0.03
Weight
= r
a
= a
![Page 29: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/29.jpg)
Samples:
B E A C R
Likelihood Weighting
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
r r
0.3 0.001
eb
Earthquake
Radio
Burglary
Alarm
Call
0.001
Weight
= r = a
![Page 30: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/30.jpg)
Samples:
B E A C R
Likelihood Weighting
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
r r
0.3 0.001
eb
0.4
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
0.6a
![Page 31: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/31.jpg)
Samples:
B E A C R
Likelihood Weighting
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
r r
0.3 0.001
e cb
Earthquake
Radio
Burglary
Alarm
Call
0.05Weight
= r = a
a 0.6
![Page 32: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/32.jpg)
Samples:
B E A C R
Likelihood Weighting
P(b) 0.03P(e) 0.001
P(a)
b e b e b e b e
0.98 0.40.7 0.01
P(c)
a a
0.8 0.05
P(r)
r r
0.3 0.001
e cb r
0.3
Earthquake
Radio
Burglary
Alarm
Call
Weight
= r = a
a 0.6*0.3
![Page 33: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/33.jpg)
Likelihood Weighting
Why does this make sense? When N is large, we expect to sample NP(X = x)
samples with x[i] = x Thus,
)xXPPN
PN)x|XP
i
i 1Y|(1)Y(
1)Y, x X(
[i]) x X|1 P(Y
x[i]([i]) x X|1 P(Y
![Page 34: . Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.](https://reader030.fdocuments.net/reader030/viewer/2022032522/56649d6d5503460f94a4d734/html5/thumbnails/34.jpg)
Summary
Approximate inference is needed for large pedigrees. We have seen a few methods today. Some could fit genetic linkage analysis and some do not. There are many other approximation algorithms: Variational methods, MCMC, and others.
In next semester’s project of Bioinformatics (236524), we will offer projects that seek to implement some approximation methods and embed them in the superlink software.