Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf ·...

41
Computer Vision: Models, Learning and Inference Markov Random Fields, Part 3 Oren Freifeld and Ron Shapira-Weber Computer Science, Ben-Gurion University Mar 11, 2019 www.cs.bgu.ac.il/ ~ cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 1 / 41

Transcript of Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf ·...

Page 1: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Computer Vision: Models, Learning and Inference–

Markov Random Fields, Part 3

Oren Freifeld and Ron Shapira-Weber

Computer Science, Ben-Gurion University

Mar 11, 2019

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 1 / 41

Page 2: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Image Restoration

Consider an n× n image

Pixels define a regular 2D lattice

x = (xs)s∈S , a latent (“clean”/“uncorrupted”) image

y = (ys)s∈S , an observed (“corrupted-by-noise”) image

Problem: Recover x from y.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 2 / 41

Page 3: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Bayesian Image Restoration

The Bayesian approach is based on

p(x|y) = p(x, y)

p(y)=p(y|x)p(x)

p(y)∝ p(y|x)p(x)

p(x|y): posterior

p(y|x): likelihood

p(y): evidence

p(x): prior

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 3 / 41

Page 4: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Consider Binary Latent Images with an MRF prior

xs ∈ {−1, 1} (binary, with -1 instead of 0); i.e., x ∈ {−1, 1}|S|

Prior model:

p(x) ∝∏c∈C

Hc(xc) (1)

Observation/likelihood model 1 (R-valued, additive):

R 3 ys = xs + ns s ∈ S R 3 nsiid∼ N (0, σ2)

Observation/likelihood model 2 (binary, multiplicative):

{−1, 1} 3 ys = xsns s ∈ S ns ∈ {−1, 1} (ns + 1)/2iid∼ Bernoulli(θ)

In both cases, the overall likelihood is:p(y|x) =

∏s∈S p(ys|x) =

∏s∈S p(ys|xs)

The first “=”: the ys’s are conditionally independent given x.The second “=” is stronger: ys ⊥⊥ (sx, sy)|xs ∀s ∈ S.iid = independent and identically distributed

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 4 / 41

Page 5: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Posterior

Case 1:

p(x|y) =

likelihood︷ ︸︸ ︷[∏s∈S

1√2πσ2

exp

(−(ys − xs)2

2σ2

)]∏c∈C Hc(xc)

∑(x′s)s∈S∈{−1,1}

|S|

[∏s∈S

1√2πσ2

exp(− (ys−x′s)2

2σ2

)]∏c∈C Hc(x′c)

[∏s∈S

1√2πσ2

exp

(−(ys − xs)2

2σ2

)]∏c∈C

Hc(xc)

∝∏c∈C

Fc(xc)

(singletons – w.r.t. GA – absorbed into the Hc’s; renamed the resultingclique functions Fc’s where Fc(xc) = Fc(xc, y))

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 5 / 41

Page 6: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Posterior

Case 2:

p(x|y) ∝

likelihood︷ ︸︸ ︷(1− θ)|{s∈S:ys=−xs}|θ|{s∈S:ys=xs}|

∏c∈C

Hc(xc)

=∏c∈C

Fc(xc)

(singletons – w.r.t. GA – absorbed into the Hc’s; renamed the resultingclique functions Fc’s where Fc(xc) = Fc(xc, y))

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 6 / 41

Page 7: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

Digression: What if ys Depends on more than xs?

E.g., y obtained from x by convolving it with a 3× 3 filter. We getproducts of p(ys|3× 3 n’hood of xs)⇒ In p(x|y), each 3× 3 block is a clique, with (partial) overlapsbetween nearby cliques.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 7 / 41

Page 8: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Image Restoration as a Generic Example

What We Want

Sample from p(x|y), compute posterior expectations (e.g., E(x|y) orE(g(x)|y) for some function g), and posterior argmaxx p(x|y).In fact, if we can sample somehow, we can use this for the other tasksas well; for example (and regardless of MRF’s):

(xis)s∈S︸ ︷︷ ︸xi

iid∼ p((xs)s∈S |y)︸ ︷︷ ︸p(x|y)

i = 1, 2, . . . , N

⇒ 1

N

N∑i=1

g((xis)s∈S)︸ ︷︷ ︸1N

∑Ni=1 g(x

i)

N→∞−−−−→by LLN

E(g((Xs)s∈S)|Y = y)︸ ︷︷ ︸E(g(X|Y=y))

(xi is a sample of an entire image)

Remark: usually E(X|Y = y) is in [−1, 1]|S| (as opposed to {−1, 1}|S|)LLN: Law of Large Numbers

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 8 / 41

Page 9: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

The Ising Model Prior

p(x) =1

Zexp

(β∑s∼t

xsxt

)(s ∼ t means s is a neighbor of t)

Samples from p(x) with three different betas where, from left to right,β1 < β2 < β3

Figure from Winkler’s monograph, “Image Analysis: Random Fields and MarkovChain Monte Carlo Methods”

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 9 / 41

Page 10: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

Example

If β = 11.8 and ys = xs + ns where ns

iid∼ N (0, 4), s ∈ S then

p(x|y) ∝ exp

(1

1.8

(∑s∼t

xsxt

)− 1

8

∑s

(ys − xs)2)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 10 / 41

Page 11: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

Left: the true x where x ∼ p(x) (using the Ising Model)Right: y – the noisy image.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 11 / 41

Page 12: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

Left: x ∼ p(x) (using the Ising Model)Right: a posterior sample, x ∼ p(x|y)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 12 / 41

Page 13: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

Left: x ∼ p(x) (using the Ising Model)Right: the maximum-a-posteriori (MAP) solution, argmaxx p(x|y)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 13 / 41

Page 14: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Ising Model

Interpretations of the Ising Model

WLOG, let β = 1.

exp (∑

s∼t xsxt)∑x′ exp (

∑s∼t x

′sx′t)

=exp (

∑s∼t xsxt + c)∑

x′ exp (∑

s∼t x′sx′t + c)

∀c ∈ R (2)

⇒ p(x) ∝exp (

∑s∼t xsxt + c)∑

x′ exp (∑

s∼t x′sx′t + c)

(3)

Take c = |S| (# pixels) ⇒∑s∼t xsxt + c=

# like n’brs pairs - # unlike n’brs pairs + |S| = 2 × # like n’brs pairs⇒ p(x) ∝ exp(2×# like n’brs pairs)

Take c = −|S| ⇒# like n’brs pairs - # unlike n’brs pairs - |S| = - 2 × # unlike n’brs pairs= - 2 times of boundary length⇒ p(x) ∝ exp( -2 times of boundary length)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 14 / 41

Page 15: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

When Sampling from the Conditionals is EasyRegardless of MRFs

Assume the following setting:

X, Y and Z: 3 random vectors (of possibly-different dimensions) with ajoint pdf/pmf p(x,y, z)

Want to sample x,y, z ∼ p(x,y, z) – but it’s hard or don’t know how.

Can relatively easily sample from the conditionals:

p(x|y, z)p(y|x, z)p(z|x,y)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 15 / 41

Page 16: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling [Geman and Geman, 1984]

The algorithm:1 x[0],y[0]z[0] ∼ g(x,y, z) using some easy-to-sample-from g(x,y, z) of the

same support as p(x,y, z) (e.g., take g(x,y, z) = g(x)g(y)g(z)).2 Iterate:

x[i] ∼ p(x|y[i−1],z[i−1])y[i] ∼ p(y|x[i],z[i−1])z[i] ∼ p(z|x[i],y[i])

p(x[n],y[n], z[n])n→∞−−−→ p(x,y, z) (in some sense)

The resulting sequence is clearly a Markov Chain:

p(x[i],y[i], z[i]|x[0:(i−1)],y[0:(i−1)], z[0:(i−1)])

= p(x[i],y[i], z[i]|x[i−1],y[i−1], z[i−1])

Gibbs sampling is a particular case of MCMC

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 16 / 41

Page 17: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling [Geman and Geman, 1984]

It is OK even if we know p only up to a multiplicative constant:

Know g(x,y, z), a non-normalized nonnegative function, where

p(x,y, z) =1

Zg(x,y, z)

Z =

∫g(x,y, z) dxdydz

but can’t (or don’t want to) compute the integral.

p(x|y, z) = p(x,y, z)

p(y, z)=

p(x,y, z)∫p(x′,y, z) dx′

=1Z g(x,y, z)∫

1Z g(x

′,y, z) dx′=

g(x,y, z)∫g(x′,y, z) dx′

or, in the discrete case:

p(x|y, z) = g(x,y, z)∑x′ g(x

′,y, z)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 17 / 41

Page 18: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling [Geman and Geman, 1984]

More generally: want (x1, . . . ,xn) ∼ p(x1, . . . ,xn).Gibbs sampling n RV’s:

1 x[0]1 ,x

[0]2 , . . . ,x

[0]n ∼ g(x1, . . . ,xn) using some easy-to-sample-from

g(x1, . . . ,xn) of the same support as p(x1, . . . ,xn)

2 Iterate:

Update x1 by sampling x1 given the restUpdate x2 by sampling x2 given the rest...Update xn by sampling xn given the rest

One such iteration over all n variables is called a sweep.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 18 / 41

Page 19: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Motivation

Assume

p is “local” (MRF with modest n’hood size)

p(y|x) is “local”

⇒ p(x|y) is MRF with “local” n’hood structure.But: in an n× n lattice structure (e.g., images), max boundary is of ordern – can’t use Dynamic Programming. So try MCMC.

Gibbs sampling [Geman and Geman, 1984] is a particular case of MCMCmethods.

In principle, Gibbs sampling is applicable in general distributions, notjust in Gibbs distributions (i.e., MRFs), but it is particularly easy to doGibbs sampling in Gibbs distributions.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 19 / 41

Page 20: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling in MRFs [Geman and Geman, 1984]

Pick a (possibly-random) site-visitation scheme, and then, when visitings, sample

xs ∼ p(xs|sx)MRF= p(xs|xηs)

p(xs|xηs), which involves only the local neighborhood, is usually easy tosample from.

It is ok if know p only up to a multiplicative constant (which is often thecase with MRFs):

p(x) ∝∏c∈C

Fc(xc)

Asymptotically great, but takes a long time if the graph is toolarge/complicated

Often, particularly for images, it can be massively parallelized (but eventhen this can take a long time)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 20 / 41

Page 21: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling in the Ising Model: Sampling from p(x)

p(x) ∝ exp

(β∑s∼t

xsxt

)

p(xs|sx) =p(xs, sx)

p(sx)=

p(x)

p(sx)=

1Z exp (β

∑s∼t xsxt)∑

x′s1Z exp (β

∑s∼t x

′sxt)

=exp

(β∑

t:t∈ηs xsxt

)∑

x′sexp

(β∑

t:t∈ηs x′sxt

) =exp

(β∑

t:t∈ηs xsxt

)exp

(−β∑

t:t∈ηs xt

)+ exp

(β∑

t:t∈ηs xt

)⇒

p(xs = 1|sx) =exp(β

∑t:t∈ηs xt)

exp(−β∑t:t∈ηs xt)+exp(β

∑t:t∈ηs xt)

∝ exp(β∑

t:t∈ηs xt

)p(xs = −1|sx) =

exp(−β∑t:t∈ηs xt)

exp(−β∑t:t∈ηs xt)+exp(β

∑t:t∈ηs xt)

∝ exp(−β∑

t:t∈ηs xt

)www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 21 / 41

Page 22: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling in the Ising Model: Sampling from p(x|y)iid Additive Gaussian Noise

p(x|y) ∝ exp

(β(∑

s∼txsxt

)− 1

2σ2

∑s(ys − xs)2

)p(xs|sx, y) =

p(xs, sx|y)p(sx|y)

=p(x|y)p(sx|y)

=exp

(β (∑

s∼t xsxt)−1

2σ2

∑s(ys − xs)2

)∑x′s

exp(β (∑

s∼t x′sxt)− 1

2σ2

∑s(ys − x′s)2

)∝ exp

(β(∑

t:t∈ηsxsxt

)− 1

2σ2(ys − xs)2

)p(xs = 1|sx, y) ∝ exp

(β(∑

t:t∈ηsxt

)− 1

2σ2(ys − 1)2

)p(xs = −1|sx, y) ∝ exp

(β(−∑

t:t∈ηsxt

)− 1

2σ2(ys + 1)2

)Nothing more than flipping a biased coin.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 22 / 41

Page 23: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling Gibbs Sampling

Gibbs Sampling in the Ising Model: Sampling from p(x|y)Multiplicative flips (AKA binary symmetric channel)

p(x|y) ∝ exp(β∑

s∼txsxt

)θ|{s∈S:ys=xs}|(1− θ)|{s∈S:ys=−xs}|

p(xs|sx, y) =p(xs, sx|y)p(sx|y)

=p(x|y)p(sx|y)

∝ exp(β∑

t:t∈ηsxt

)θ1ys=xs (1− θ)1ys=−xs

p(xs = 1|sx, y) ∝ exp(β∑

t:t∈ηsxt

)θ1ys=1(1− θ)1ys=−1

p(xs = −1|sx, y) ∝ exp(−β∑

t:t∈ηsxt

)θ1ys=−1(1− θ)1ys=1

Again, this is just flipping a biased coin.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 23 / 41

Page 24: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Markov Chain Monte Carlo (MCMC)Regardless of MRFs

Let p be the pdf/pmf of interest, known as the target distribution

Find a Markov Chain, X(0), X(1), . . . with X(t) ∈ R|S| such that:

Pr(X(t) = x)→ p(x) as t→∞ (“sampling”)orPr(X(t) ∈M)→ 1 as t→∞ where (“annealing”)

M = {x : p(x) = maxx′

p(x′)}

or

1

T

T∑t=1

H(X(t))t→∞−−−→ EpH(X)

(“ergodicity”)for some function of interest H

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 24 / 41

Page 25: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Stochastic matrix

Definition

A square matrix P of nonnegative values is called a stochastic matrix iseach of its row sums to 1.

Remark: It is called doubly stochastic if both P and P T are stochasticmatrices.

Remark: a stochastic vector is a vector of nonnegative values whose sumis 1.

This terminology can be confusing – we often use “random” and“stochastic” interchangeably, but here neither a stochastic matrix nor astochastic vector are “random”.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 25 / 41

Page 26: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Markov Chain Monte Carlo (MCMC) – Regardless of MRFs

For simplicity, assume finite-state space (but more generally, MCMC isalso applicable for the countable or continuous settings)The MC is said to be stationary if the transition probability

Px,y , Pr(X(t) = y|X(t− 1) = x)

is independent of t ∀x, y ∈ R|S|The transition probability matrix is

P = {Px,y} ∈ R|R|S||×|R|S|| (this might be huge)

P is a stochastic matrix.Consider pmf’s over R|S| as row vectors of length

∣∣R|S|∣∣. Such a vectorhas nonnegative entries summing to 1.Let π(t) be the pmf of X(t) ⇒ π(t+1) = π(t)P is the pmf of X(t+1):

πi(t+ 1) =

|R|S||∑j=1

πj(t)Px,y(j, i) =

|R|S||∑j=1

Pr(X(t+ 1) = i,X(t) = j)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 26 / 41

Page 27: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Markov Chain Monte Carlo (MCMC)Regardless of MRFs

π(t+ 2) = π(t+ 1)P = π(t)PP = π(t)P 2

more generally π(t+ n) = π(t)Pn

If πP = π, i.e., π is a left eigenvector of P with eigenvalue 1, then π iscalled an equilibrium distribution of the MC

If π(t)t→∞−−−→ π then π is an equilibrium distribution, called the

stationary distribution of the MC.

If, in addition, ∃ limt→∞ Pt, then P t

t→∞−−−→

[ ππ...π

](i.e., identical rows)

This is consistent with the fact that regardless what π(0) is,

π(t) = π(0)P tt→∞−−−→ π.

Not every MC has an equilibrium distribution.

MCMC methods are based on finding an MC whose equilibriumdistribution is the target distribution.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 27 / 41

Page 28: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Definition

Let P = {Px,y} be a transition probability matrix. Then R|S| is calledconnected under {Px,y}, if ∀x, y ∈ R|S|,

∃ (x(0), x(1), . . . , x(t))

such that1

x(0) = x , x(t) = y ;

2

Px(n+1),x(n) > 0

with n = 0, 1, . . . , t− 1

In MC terminology, this means there is exactly one communication class.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 28 / 41

Page 29: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Theorem

If (X(t))t≥0 is an MC on R|S| and if

1 R|S| is connected under {Px,y}2 Px,x > 0 for some x ∈ R|S| (or more generally, in terms of MCterminology, Px,y is aperiodic)

Then:

∃ a unique p̃ on R|S| such that∑

x p̃Px,y = p̃(y) ∀y ∈ R|S|(p̃=“equilibrium distribution”)

Pr(X(t) = x|X(0) = x0)t→∞−−−→ p̃(x), indep. of x0 (the initial state)

1T

∑Ti=1H(X(t))

T→∞−−−−→∑

x p̃(x)H(x) = Ep̃H(X) with H : R|S| → R

But what we want is to sample from p, not p̃.The trick: find Px,y such that p = p̃.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 29 / 41

Page 30: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Fact (Detailed Balance)

If for some π, probability on R|S|,

π(x)Px,y = π(y)Py,x ∀x, y ∈ R|S|

then∑

x π(x)Px,y = π(y), i.e., π is the equilibrium distribution.

Proof.

∑x

π(x)Px,y =∑x

π(y)Py,x = π(y)∑x

Pr(X(t+1) = x|X(t) = y) = π(y)

Remark

Note that π(x)Px,y = π(y)Py,x is just a short way for writing that

Pr(X(t) = x,X(t+ 1) = y) = Pr(X(t) = y,X(t+ 1) = x)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 30 / 41

Page 31: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Putting the theorem and the fact together: If Px,y satisfies the conditionof the theorem and π = p satisfies detailed balance, then p is the uniqueequilibrium probability of the chain and (2) and (3) from the theoremhold with p̃ = p.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 31 / 41

Page 32: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Detailed balance is also called reversibility since it implies

Pr(X(t) = y|X(t+ 1) = x) = Pr(X(t+ 1) = y|X(t) = x)︸ ︷︷ ︸Px,y (by definition)

;

i.e., the backward dynamics is the same as the forward dynamics.

Proof.

Pr(X(t) = y|X(t+ 1) = x) =Pr(X(t) = y,X(t+ 1) = x)

Pr(X(t+ 1) = x)︸ ︷︷ ︸π(x)

=Pr(X(t+ 1) = x|X(t) = y)

π(y)︷ ︸︸ ︷Pr(X(t) = y)

π(x)=Py,xπ(y)

π(x)

DB= Px,y

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 32 / 41

Page 33: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Gibbs-Sampling Example

Suppose the visiting schedule is done by, qv, a probability on S. GivenX(t) = x ∈ R|S|:

Choose a site v using v ∼ qv.

Change xv to a sample from p(xv|xηv)Checking the conditions of the theorem:

We can get from any state x to any state y by changing one site at thetime. So the state space is connected.

Px,x > 0 for some x ∈ R|S|? Yes. In fact, it holds for every x.

Gibbs Sampling also satisfies Detailed Balance (see next slide).

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 33 / 41

Page 34: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Gibbs Sampling satisfies Detailed Balance.

Px,y = Pr(X(t+ 1) = y|X(t) = x) =∑

v∈S Pr(X(t+ 1) = y|X(t) =x, V = v)qv where Pr(X(t+ 1) = y|X(t) = x, V = v) is p(yv|vx) if

vy = vx and 0 otherwise. Thus:

Px,y =∑

v∈S1vy=vxqvp(yv|vx)

⇒ p(x)Px,y =∑

v∈S1vy=vxqvp(yv|vx) p(xv|vx)p(vx)︸ ︷︷ ︸

p(x)

Similarly: p(y)Py,x =∑

v∈S1vx=vyqvp(xv|vy) p(yv|vy)p(vy)︸ ︷︷ ︸

p(y)

=∑

v∈S1vx=vyqvp(xv|vx)p(yv|vx)p(vx)

=∑

v∈S1vx=vyqv p(yv|vx)p(xv|vx)︸ ︷︷ ︸

for scalars: ab=ba

p(vx)

⇒ p(y)Py,x = p(x)Px,y, i.e., detailed balance is achieved.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 34 / 41

Page 35: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Introduction to MCMC and Gibbs Sampling MCMC

Remarks on Gibbs Sampling

If the visitation schedule is deterministic we still havePr(X(t) = x|X(0) = x(0))

t→∞−−−→ p(x) provided that every site is visitedinfinitely often.

Important to observe that we don’t need Z.

Recall: in MRFs, p(x) = 1Z exp

(−∑

c∈C Ec(xc))

(similarly if working with p(x|y)) Write

p(x) = pT (x)|T=1 ,1

ZTexp

(− 1

T

∑c∈CEc(xc)

)∣∣∣∣∣T=1

Fact (Simulated Annealing)

pT (x)T↓0−−→ probability concentrated on {x : x = argmaxx′ p(x

′)}. . . as long as T doesn’t decay “too fast”.

Similar results if working with p(x|y) instead of p(x)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 35 / 41

Page 36: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Generalizations

Modeling: Beyond the Ising Model

xs ∈ {0, 1, . . . ,m}. The Potts model is

p(x) ∝ exp

(−γ∑s∼t

1xs 6=xt

)γ > 0

A modification in case the ordering of the labels matters:

p(x) ∝ exp

(−γ∑s∼t

(xs − xt)2)

γ > 0

(can also replace the `2 loss with a robust error function)

Gaussian MRF: x ∼ N (µ,Q−1) where Q is SPD and sparse.

More complicated models and/or higher-order cliques(see examples in the presentation “MRFs, part 4”)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 36 / 41

Page 37: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Generalizations

Inference: Beyond Dynamic Programming and Gibbs Sam-pling

There are many other approaches for searching for the argmax. Forexample:

Graph-cut methods

Belief propagation and loopy belief propagation

Mean-field approximations

. . .

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 37 / 41

Page 38: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Generalizations

Applications: Beyond Image Restoration

Inpainting (y has missing pixels)

Image segmentation

Spatial coherence in various problems such as optical-flow estimation,depth-estimation, stereo, etc.

Pose estimation

Texture synthesis

Interactions between superpixels (irregular grid)

HMM-like models are useful in CV applications such as tracking

Many other applications

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 38 / 41

Page 39: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

Generalizations

Learning MRFs

MRFs often have parameters (e.g., β in the Ising model, or Q in aGMRF) that can/should be learned; we won’t get into this too much,but we will touch upon some examples in the next presentation.

We can also learn the structure of the graph (i.e., the existence, or lackthereof, edges in the graph); this is called structural inference.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 39 / 41

Page 40: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

If you want to learn more about MRFs and their CV Applications

Books (see course website for details)

Prince

Szeliski

Markov random fields for vision and image processing; editors: Blakeand Rother (nice mix of tutorials and applications)

Winkler’s Image Analysis, Random Fields and Markov Chain MonteCarlo (mathematically advanced, focused on MCMC)

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 40 / 41

Page 41: Computer Vision: Models, Learning and Inference Markov ...cv192/wiki.files/CV192_lec_MRF3.pdf · Computer Vision: Models, Learning and Inference {Markov Random Fields, Part 3 Oren

If you want to learn more about MRFs and their CV Applications

Version Log

25/3/2019, ver 1.02. S28: Fixed RS to R|S|. Split S28 into two slides.S35: Clarified when we move back from the general case to MRFs;moved the caveat into the Fact. to MRFs

18/3/2019, ver 1.01. S35 (now S36): γ → −γ, stated γ > 0

11/03/2019, ver 1.00.

www.cs.bgu.ac.il/~cv192/ MRFs, Part 3 (ver. 1.02) Mar 11, 2019 41 / 41