Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf ·...
Transcript of Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf ·...
Computer Vision: Models, Learning and Inference–
Markov Random Fields, Part 3
Oren Freifeld and Ron Shapira-Weber
Computer Science, Ben-Gurion University
Mar 11, 2019
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 1 / 41
Image Restoration as a Generic Example
Image Restoration
Consider an n× n image
Pixels define a regular 2D lattice
x = (xs)s∈S , a latent (“clean”/“uncorrupted”) image
y = (ys)s∈S , an observed (“corrupted-by-noise”) image
Problem: Recover x from y.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 2 / 41
Image Restoration as a Generic Example
Bayesian Image Restoration
The Bayesian approach is based on
p(x|y) = p(x, y)
p(y)=p(y|x)p(x)
p(y)∝ p(y|x)p(x)
p(x|y): posterior
p(y|x): likelihood
p(y): evidence
p(x): prior
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 3 / 41
Image Restoration as a Generic Example
Consider Binary Latent Images with an MRF prior
xs ∈ {−1, 1} (binary, with -1 instead of 0); i.e., x ∈ {−1, 1}|S|
Prior model:
p(x) ∝∏c∈C
Hc(xc) (1)
Observation/likelihood model 1 (R-valued, additive):
R 3 ys = xs + ns s ∈ S R 3 nsiid∼ N (0, σ2)
Observation/likelihood model 2 (binary, multiplicative):
{−1, 1} 3 ys = xsns s ∈ S ns ∈ {−1, 1} (ns + 1)/2iid∼ Bernoulli(θ)
In both cases, the overall likelihood is:p(y|x) =
∏s∈S p(ys|x) =
∏s∈S p(ys|xs)
The first “=”: the ys’s are conditionally independent given x.The second “=” is stronger: ys ⊥⊥ (sx, sy)|xs ∀s ∈ S.iid = independent and identically distributed
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 4 / 41
Image Restoration as a Generic Example
Posterior
Case 1:
p(x|y) =
likelihood︷ ︸︸ ︷[∏s∈S
1√2πσ2
exp
(−(ys − xs)2
2σ2
)]∏c∈C Hc(xc)
∑(x′s)s∈S∈{−1,1}
|S|
[∏s∈S
1√2πσ2
exp(− (ys−x′s)2
2σ2
)]∏c∈C Hc(x′c)
∝
[∏s∈S
1√2πσ2
exp
(−(ys − xs)2
2σ2
)]∏c∈C
Hc(xc)
∝∏c∈C
Fc(xc)
(singletons – w.r.t. GA – absorbed into the Hc’s; renamed the resultingclique functions Fc’s where Fc(xc) = Fc(xc, y))
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 5 / 41
Image Restoration as a Generic Example
Posterior
Case 2:
p(x|y) ∝
likelihood︷ ︸︸ ︷(1− θ)|{s∈S:ys=−xs}|θ|{s∈S:ys=xs}|
∏c∈C
Hc(xc)
=∏c∈C
Fc(xc)
(singletons – w.r.t. GA – absorbed into the Hc’s; renamed the resultingclique functions Fc’s where Fc(xc) = Fc(xc, y))
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 6 / 41
Image Restoration as a Generic Example
Digression: What if ys Depends on more than xs?
E.g., y obtained from x by convolving it with a 3× 3 filter. We getproducts of p(ys|3× 3 n’hood of xs)⇒ In p(x|y), each 3× 3 block is a clique, with (partial) overlapsbetween nearby cliques.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 7 / 41
Image Restoration as a Generic Example
What We Want
Sample from p(x|y), compute posterior expectations (e.g., E(x|y) orE(g(x)|y) for some function g), and posterior argmaxx p(x|y).In fact, if we can sample somehow, we can use this for the other tasksas well; for example (and regardless of MRF’s):
(xis)s∈S︸ ︷︷ ︸xi
iid∼ p((xs)s∈S |y)︸ ︷︷ ︸p(x|y)
i = 1, 2, . . . , N
⇒ 1
N
N∑i=1
g((xis)s∈S)︸ ︷︷ ︸1N
∑Ni=1 g(x
i)
N→∞−−−−→by LLN
E(g((Xs)s∈S)|Y = y)︸ ︷︷ ︸E(g(X|Y=y))
(xi is a sample of an entire image)
Remark: usually E(X|Y = y) is in [−1, 1]|S| (as opposed to {−1, 1}|S|)LLN: Law of Large Numbers
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 8 / 41
Ising Model
The Ising Model Prior
p(x) =1
Zexp
(β∑s∼t
xsxt
)(s ∼ t means s is a neighbor of t)
Samples from p(x) with three different betas where, from left to right,β1 < β2 < β3
Figure from Winkler’s monograph, “Image Analysis: Random Fields and MarkovChain Monte Carlo Methods”
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 9 / 41
Ising Model
Example
If β = 11.8 and ys = xs + ns where ns
iid∼ N (0, 4), s ∈ S then
p(x|y) ∝ exp
(1
1.8
(∑s∼t
xsxt
)− 1
8
∑s
(ys − xs)2)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 10 / 41
Ising Model
Left: the true x where x ∼ p(x) (using the Ising Model)Right: y – the noisy image.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 11 / 41
Ising Model
Left: x ∼ p(x) (using the Ising Model)Right: a posterior sample, x ∼ p(x|y)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 12 / 41
Ising Model
Left: x ∼ p(x) (using the Ising Model)Right: the maximum-a-posteriori (MAP) solution, argmaxx p(x|y)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 13 / 41
Ising Model
Interpretations of the Ising Model
WLOG, let β = 1.
exp (∑
s∼t xsxt)∑x′ exp (
∑s∼t x
′sx′t)
=exp (
∑s∼t xsxt + c)∑
x′ exp (∑
s∼t x′sx′t + c)
∀c ∈ R (2)
⇒ p(x) ∝exp (
∑s∼t xsxt + c)∑
x′ exp (∑
s∼t x′sx′t + c)
(3)
Take c = |S| (# pixels) ⇒∑s∼t xsxt + c=
# like n’brs pairs - # unlike n’brs pairs + |S| = 2 × # like n’brs pairs⇒ p(x) ∝ exp(2×# like n’brs pairs)
Take c = −|S| ⇒# like n’brs pairs - # unlike n’brs pairs - |S| = - 2 × # unlike n’brs pairs= - 2 times of boundary length⇒ p(x) ∝ exp( -2 times of boundary length)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 14 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
When Sampling from the Conditionals is EasyRegardless of MRFs
Assume the following setting:
X, Y and Z: 3 random vectors (of possibly-different dimensions) with ajoint pdf/pmf p(x,y, z)
Want to sample x,y, z ∼ p(x,y, z) – but it’s hard or don’t know how.
Can relatively easily sample from the conditionals:
p(x|y, z)p(y|x, z)p(z|x,y)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 15 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling [Geman and Geman, 1984]
The algorithm:1 x[0],y[0]z[0] ∼ g(x,y, z) using some easy-to-sample-from g(x,y, z) of the
same support as p(x,y, z) (e.g., take g(x,y, z) = g(x)g(y)g(z)).2 Iterate:
x[i] ∼ p(x|y[i−1],z[i−1])y[i] ∼ p(y|x[i],z[i−1])z[i] ∼ p(z|x[i],y[i])
p(x[n],y[n], z[n])n→∞−−−→ p(x,y, z) (in some sense)
The resulting sequence is clearly a Markov Chain:
p(x[i],y[i], z[i]|x[0:(i−1)],y[0:(i−1)], z[0:(i−1)])
= p(x[i],y[i], z[i]|x[i−1],y[i−1], z[i−1])
Gibbs sampling is a particular case of MCMC
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 16 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling [Geman and Geman, 1984]
It is OK even if we know p only up to a multiplicative constant:
Know g(x,y, z), a non-normalized nonnegative function, where
p(x,y, z) =1
Zg(x,y, z)
Z =
∫g(x,y, z) dxdydz
but can’t (or don’t want to) compute the integral.
p(x|y, z) = p(x,y, z)
p(y, z)=
p(x,y, z)∫p(x′,y, z) dx′
=1Z g(x,y, z)∫
1Z g(x
′,y, z) dx′=
g(x,y, z)∫g(x′,y, z) dx′
or, in the discrete case:
p(x|y, z) = g(x,y, z)∑x′ g(x
′,y, z)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 17 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling [Geman and Geman, 1984]
More generally: want (x1, . . . ,xn) ∼ p(x1, . . . ,xn).Gibbs sampling n RV’s:
1 x[0]1 ,x
[0]2 , . . . ,x
[0]n ∼ g(x1, . . . ,xn) using some easy-to-sample-from
g(x1, . . . ,xn) of the same support as p(x1, . . . ,xn)
2 Iterate:
Update x1 by sampling x1 given the restUpdate x2 by sampling x2 given the rest...Update xn by sampling xn given the rest
One such iteration over all n variables is called a sweep.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 18 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Motivation
Assume
p is “local” (MRF with modest n’hood size)
p(y|x) is “local”
⇒ p(x|y) is MRF with “local” n’hood structure.But: in an n× n lattice structure (e.g., images), max boundary is of ordern – can’t use Dynamic Programming. So try MCMC.
Gibbs sampling [Geman and Geman, 1984] is a particular case of MCMCmethods.
In principle, Gibbs sampling is applicable in general distributions, notjust in Gibbs distributions (i.e., MRFs), but it is particularly easy to doGibbs sampling in Gibbs distributions.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 19 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling in MRFs [Geman and Geman, 1984]
Pick a (possibly-random) site-visitation scheme, and then, when visitings, sample
xs ∼ p(xs|sx)MRF= p(xs|xηs)
p(xs|xηs), which involves only the local neighborhood, is usually easy tosample from.
It is ok if know p only up to a multiplicative constant (which is often thecase with MRFs):
p(x) ∝∏c∈C
Fc(xc)
Asymptotically great, but takes a long time if the graph is toolarge/complicated
Often, particularly for images, it can be massively parallelized (but eventhen this can take a long time)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 20 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling in the Ising Model: Sampling from p(x)
p(x) ∝ exp
(β∑s∼t
xsxt
)
p(xs|sx) =p(xs, sx)
p(sx)=
p(x)
p(sx)=
1Z exp (β
∑s∼t xsxt)∑
x′s1Z exp (β
∑s∼t x
′sxt)
=exp
(β∑
t:t∈ηs xsxt
)∑
x′sexp
(β∑
t:t∈ηs x′sxt
) =exp
(β∑
t:t∈ηs xsxt
)exp
(−β∑
t:t∈ηs xt
)+ exp
(β∑
t:t∈ηs xt
)⇒
p(xs = 1|sx) =exp(β
∑t:t∈ηs xt)
exp(−β∑t:t∈ηs xt)+exp(β
∑t:t∈ηs xt)
∝ exp(β∑
t:t∈ηs xt
)p(xs = −1|sx) =
exp(−β∑t:t∈ηs xt)
exp(−β∑t:t∈ηs xt)+exp(β
∑t:t∈ηs xt)
∝ exp(−β∑
t:t∈ηs xt
)www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 21 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling in the Ising Model: Sampling from p(x|y)iid Additive Gaussian Noise
p(x|y) ∝ exp
(β(∑
s∼txsxt
)− 1
2σ2
∑s(ys − xs)2
)p(xs|sx, y) =
p(xs, sx|y)p(sx|y)
=p(x|y)p(sx|y)
=exp
(β (∑
s∼t xsxt)−1
2σ2
∑s(ys − xs)2
)∑x′s
exp(β (∑
s∼t x′sxt)− 1
2σ2
∑s(ys − x′s)2
)∝ exp
(β(∑
t:t∈ηsxsxt
)− 1
2σ2(ys − xs)2
)p(xs = 1|sx, y) ∝ exp
(β(∑
t:t∈ηsxt
)− 1
2σ2(ys − 1)2
)p(xs = −1|sx, y) ∝ exp
(β(−∑
t:t∈ηsxt
)− 1
2σ2(ys + 1)2
)Nothing more than flipping a biased coin.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 22 / 41
Introduction to MCMC and Gibbs Sampling Gibbs Sampling
Gibbs Sampling in the Ising Model: Sampling from p(x|y)Multiplicative flips (AKA binary symmetric channel)
p(x|y) ∝ exp(β∑
s∼txsxt
)θ|{s∈S:ys=xs}|(1− θ)|{s∈S:ys=−xs}|
p(xs|sx, y) =p(xs, sx|y)p(sx|y)
=p(x|y)p(sx|y)
∝ exp(β∑
t:t∈ηsxt
)θ1ys=xs (1− θ)1ys=−xs
⇒
p(xs = 1|sx, y) ∝ exp(β∑
t:t∈ηsxt
)θ1ys=1(1− θ)1ys=−1
p(xs = −1|sx, y) ∝ exp(−β∑
t:t∈ηsxt
)θ1ys=−1(1− θ)1ys=1
Again, this is just flipping a biased coin.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 23 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Markov Chain Monte Carlo (MCMC)Regardless of MRFs
Let p be the pdf/pmf of interest, known as the target distribution
Find a Markov Chain, X(0), X(1), . . . with X(t) ∈ R|S| such that:
Pr(X(t) = x)→ p(x) as t→∞ (“sampling”)orPr(X(t) ∈M)→ 1 as t→∞ where (“annealing”)
M = {x : p(x) = maxx′
p(x′)}
or
1
T
T∑t=1
H(X(t))t→∞−−−→ EpH(X)
(“ergodicity”)for some function of interest H
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 24 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Stochastic matrix
Definition
A square matrix P of nonnegative values is called a stochastic matrix iseach of its row sums to 1.
Remark: It is called doubly stochastic if both P and P T are stochasticmatrices.
Remark: a stochastic vector is a vector of nonnegative values whose sumis 1.
This terminology can be confusing – we often use “random” and“stochastic” interchangeably, but here neither a stochastic matrix nor astochastic vector are “random”.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 25 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Markov Chain Monte Carlo (MCMC) – Regardless of MRFs
For simplicity, assume finite-state space (but more generally, MCMC isalso applicable for the countable or continuous settings)The MC is said to be stationary if the transition probability
Px,y , Pr(X(t) = y|X(t− 1) = x)
is independent of t ∀x, y ∈ R|S|The transition probability matrix is
P = {Px,y} ∈ R|R|S||×|R|S|| (this might be huge)
P is a stochastic matrix.Consider pmf’s over R|S| as row vectors of length
∣∣R|S|∣∣. Such a vectorhas nonnegative entries summing to 1.Let π(t) be the pmf of X(t) ⇒ π(t+1) = π(t)P is the pmf of X(t+1):
πi(t+ 1) =
|R|S||∑j=1
πj(t)Px,y(j, i) =
|R|S||∑j=1
Pr(X(t+ 1) = i,X(t) = j)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 26 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Markov Chain Monte Carlo (MCMC)Regardless of MRFs
π(t+ 2) = π(t+ 1)P = π(t)PP = π(t)P 2
more generally π(t+ n) = π(t)Pn
If πP = π, i.e., π is a left eigenvector of P with eigenvalue 1, then π iscalled an equilibrium distribution of the MC
If π(t)t→∞−−−→ π then π is an equilibrium distribution, called the
stationary distribution of the MC.
If, in addition, ∃ limt→∞ Pt, then P t
t→∞−−−→
[ ππ...π
](i.e., identical rows)
This is consistent with the fact that regardless what π(0) is,
π(t) = π(0)P tt→∞−−−→ π.
Not every MC has an equilibrium distribution.
MCMC methods are based on finding an MC whose equilibriumdistribution is the target distribution.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 27 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Definition
Let P = {Px,y} be a transition probability matrix. Then R|S| is calledconnected under {Px,y}, if ∀x, y ∈ R|S|,
∃ (x(0), x(1), . . . , x(t))
such that1
x(0) = x , x(t) = y ;
2
Px(n+1),x(n) > 0
with n = 0, 1, . . . , t− 1
In MC terminology, this means there is exactly one communication class.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 28 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Theorem
If (X(t))t≥0 is an MC on R|S| and if
1 R|S| is connected under {Px,y}2 Px,x > 0 for some x ∈ R|S| (or more generally, in terms of MCterminology, Px,y is aperiodic)
Then:
∃ a unique p̃ on R|S| such that∑
x p̃Px,y = p̃(y) ∀y ∈ R|S|(p̃=“equilibrium distribution”)
Pr(X(t) = x|X(0) = x0)t→∞−−−→ p̃(x), indep. of x0 (the initial state)
1T
∑Ti=1H(X(t))
T→∞−−−−→∑
x p̃(x)H(x) = Ep̃H(X) with H : R|S| → R
But what we want is to sample from p, not p̃.The trick: find Px,y such that p = p̃.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 29 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Fact (Detailed Balance)
If for some π, probability on R|S|,
π(x)Px,y = π(y)Py,x ∀x, y ∈ R|S|
then∑
x π(x)Px,y = π(y), i.e., π is the equilibrium distribution.
Proof.
∑x
π(x)Px,y =∑x
π(y)Py,x = π(y)∑x
Pr(X(t+1) = x|X(t) = y) = π(y)
Remark
Note that π(x)Px,y = π(y)Py,x is just a short way for writing that
Pr(X(t) = x,X(t+ 1) = y) = Pr(X(t) = y,X(t+ 1) = x)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 30 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Putting the theorem and the fact together: If Px,y satisfies the conditionof the theorem and π = p satisfies detailed balance, then p is the uniqueequilibrium probability of the chain and (2) and (3) from the theoremhold with p̃ = p.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 31 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Detailed balance is also called reversibility since it implies
Pr(X(t) = y|X(t+ 1) = x) = Pr(X(t+ 1) = y|X(t) = x)︸ ︷︷ ︸Px,y (by definition)
;
i.e., the backward dynamics is the same as the forward dynamics.
Proof.
Pr(X(t) = y|X(t+ 1) = x) =Pr(X(t) = y,X(t+ 1) = x)
Pr(X(t+ 1) = x)︸ ︷︷ ︸π(x)
=Pr(X(t+ 1) = x|X(t) = y)
π(y)︷ ︸︸ ︷Pr(X(t) = y)
π(x)=Py,xπ(y)
π(x)
DB= Px,y
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 32 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Gibbs-Sampling Example
Suppose the visiting schedule is done by, qv, a probability on S. GivenX(t) = x ∈ R|S|:
Choose a site v using v ∼ qv.
Change xv to a sample from p(xv|xηv)Checking the conditions of the theorem:
We can get from any state x to any state y by changing one site at thetime. So the state space is connected.
Px,x > 0 for some x ∈ R|S|? Yes. In fact, it holds for every x.
Gibbs Sampling also satisfies Detailed Balance (see next slide).
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 33 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Gibbs Sampling satisfies Detailed Balance.
Px,y = Pr(X(t+ 1) = y|X(t) = x) =∑
v∈S Pr(X(t+ 1) = y|X(t) =x, V = v)qv where Pr(X(t+ 1) = y|X(t) = x, V = v) is p(yv|vx) if
vy = vx and 0 otherwise. Thus:
Px,y =∑
v∈S1vy=vxqvp(yv|vx)
⇒ p(x)Px,y =∑
v∈S1vy=vxqvp(yv|vx) p(xv|vx)p(vx)︸ ︷︷ ︸
p(x)
Similarly: p(y)Py,x =∑
v∈S1vx=vyqvp(xv|vy) p(yv|vy)p(vy)︸ ︷︷ ︸
p(y)
=∑
v∈S1vx=vyqvp(xv|vx)p(yv|vx)p(vx)
=∑
v∈S1vx=vyqv p(yv|vx)p(xv|vx)︸ ︷︷ ︸
for scalars: ab=ba
p(vx)
⇒ p(y)Py,x = p(x)Px,y, i.e., detailed balance is achieved.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 34 / 41
Introduction to MCMC and Gibbs Sampling MCMC
Remarks on Gibbs Sampling
If the visitation schedule is deterministic we still havePr(X(t) = x|X(0) = x(0))
t→∞−−−→ p(x) provided that every site is visitedinfinitely often.
Important to observe that we don’t need Z.
Recall: in MRFs, p(x) = 1Z exp
(−∑
c∈C Ec(xc))
(similarly if working with p(x|y)) Write
p(x) = pT (x)|T=1 ,1
ZTexp
(− 1
T
∑c∈CEc(xc)
)∣∣∣∣∣T=1
Fact (Simulated Annealing)
pT (x)T↓0−−→ probability concentrated on {x : x = argmaxx′ p(x
′)}. . . as long as T doesn’t decay “too fast”.
Similar results if working with p(x|y) instead of p(x)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 35 / 41
Generalizations
Modeling: Beyond the Ising Model
xs ∈ {0, 1, . . . ,m}. The Potts model is
p(x) ∝ exp
(−γ∑s∼t
1xs 6=xt
)γ > 0
A modification in case the ordering of the labels matters:
p(x) ∝ exp
(−γ∑s∼t
(xs − xt)2)
γ > 0
(can also replace the `2 loss with a robust error function)
Gaussian MRF: x ∼ N (µ,Q−1) where Q is SPD and sparse.
More complicated models and/or higher-order cliques(see examples in the presentation “MRFs, part 4”)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 36 / 41
Generalizations
Inference: Beyond Dynamic Programming and Gibbs Sam-pling
There are many other approaches for searching for the argmax. Forexample:
Graph-cut methods
Belief propagation and loopy belief propagation
Mean-field approximations
. . .
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 37 / 41
Generalizations
Applications: Beyond Image Restoration
Inpainting (y has missing pixels)
Image segmentation
Spatial coherence in various problems such as optical-flow estimation,depth-estimation, stereo, etc.
Pose estimation
Texture synthesis
Interactions between superpixels (irregular grid)
HMM-like models are useful in CV applications such as tracking
Many other applications
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 38 / 41
Generalizations
Learning MRFs
MRFs often have parameters (e.g., β in the Ising model, or Q in aGMRF) that can/should be learned; we won’t get into this too much,but we will touch upon some examples in the next presentation.
We can also learn the structure of the graph (i.e., the existence, or lackthereof, edges in the graph); this is called structural inference.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 39 / 41
If you want to learn more about MRFs and their CV Applications
Books (see course website for details)
Prince
Szeliski
Markov random fields for vision and image processing; editors: Blakeand Rother (nice mix of tutorials and applications)
Winkler’s Image Analysis, Random Fields and Markov Chain MonteCarlo (mathematically advanced, focused on MCMC)
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 40 / 41
If you want to learn more about MRFs and their CV Applications
Version Log
25/3/2019, ver 1.02. S28: Fixed RS to R|S|. Split S28 into two slides.S35: Clarified when we move back from the general case to MRFs;moved the caveat into the Fact. to MRFs
18/3/2019, ver 1.01. S35 (now S36): γ → −γ, stated γ > 0
11/03/2019, ver 1.00.
www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 41 / 41