Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf ·...
Transcript of Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf ·...
![Page 1: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/1.jpg)
Tutorial on Learning and Inferencein Discrete Graphical Models
CVPR 2014
Karteek Alahari, Dhruv Batra, Matthew Blaschko, StephenGould, Pushmeet Kohli, Nikos Komodakis, Nikos Paragios
28 June 2014
Stephen Gould | CVPR 2014 1/76
![Page 2: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/2.jpg)
Graphical Models are Pervasive in Computer Vision
Stephen Gould | CVPR 2014 2/76
![Page 3: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/3.jpg)
Graphical Models are Pervasive in Computer Vision
pixel labeling
Stephen Gould | CVPR 2014 2/76
![Page 4: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/4.jpg)
Graphical Models are Pervasive in Computer Vision
pixel labelingobject detection,pose estimation
Stephen Gould | CVPR 2014 2/76
![Page 5: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/5.jpg)
Graphical Models are Pervasive in Computer Vision
pixel labelingobject detection,pose estimation
scene understanding
Stephen Gould | CVPR 2014 2/76
![Page 6: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/6.jpg)
Graphical Models are Pervasive in Computer Vision
pixel labeling object detection,pose estimation
scene understanding
Stephen Gould | CVPR 2014 3/76
![Page 7: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/7.jpg)
Tutorial Overview
Part 1. Inference(S. Gould, 45 minutes)
Exact inference in graphical models
Graph-cut based methods
Relaxations and dual-decomposition
(P. Kohli, 45 minutes)Strategies for higher-order models
(D. Batra, 15 minutes)M-Best MAP, Diverse M-Best
Part 2. Learning(M. Blaschko, 45 minutes)
Introduction to learning of graphical models
Maximum-likelihood learning, max-margin learning
Max-margin training via subgradient method
(K. Alahari, 45 minutes)Constraint generation approaches for structured learning
Efficient training of graphical models via dual-decomposition
Stephen Gould | CVPR 2014 4/76
![Page 8: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/8.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 9: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/9.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 10: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/10.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 11: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/11.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 12: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/12.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 13: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/13.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 14: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/14.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 15: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/15.jpg)
Conditional Markov Random Fields
Also known as:
Markov Networks, Undirected Graphical Models, MRFsI make no distinction between CRFs and MRFs
X ∈ X are the observed random variables (always)
Y = (Y1, . . . ,Yn) ∈ Y are the output random variables
Yc are a subset of variables for clique c ⊆ {1, . . . , n}
Define a factored probability distribution
P(Y | X) =1
Z (X)
∏
c
Ψc(Yc ;X)
where Z (X) =∑
Y∈Y
∏
c Ψc(Yc ;X) is the partition function
Main difficulty is the exponential number of configurations
Stephen Gould | CVPR 2014 5/76
![Page 16: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/16.jpg)
MAP Inference
We will mainly be interested in maximum a posteriori (MAP)inference
y⋆ = argmaxy∈Y
P(y | x)
= argmaxy∈Y
1
Z (X)
∏
c
Ψc(Yc ;X)
= argmaxy∈Y
log
(
1
Z (X)
∏
c
Ψc(Yc ;X)
)
= argmaxy∈Y
∑
c
log Ψc(Yc ;X)− logZ (X)
= argmaxy∈Y
∑
c
log Ψc(Yc ;X)
Stephen Gould | CVPR 2014 6/76
![Page 17: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/17.jpg)
Energy Functions
Define an energy function
E (Y;X) =∑
c
ψc(Yc ;X)
where ψc (·) = − log Ψc(·)
Then
P(Y | X) =1
Z (X)exp {−E (Y;X)}
Andargmax
y∈YP(y | x) = argmin
y∈YE (y; x)
Stephen Gould | CVPR 2014 7/76
![Page 18: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/18.jpg)
Energy Functions
Define an energy function
E (Y;X) =∑
c
ψc(Yc ;X)
where ψc (·) = − log Ψc(·)
Then
P(Y | X) =1
Z (X)exp {−E (Y;X)}
Andargmax
y∈YP(y | x) = argmin
y∈YE (y; x)
Stephen Gould | CVPR 2014 7/76
![Page 19: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/19.jpg)
Energy Functions
Define an energy function
E (Y;X) =∑
c
ψc(Yc ;X)
where ψc (·) = − log Ψc(·)
Then
P(Y | X) =1
Z (X)exp {−E (Y;X)}
Andargmax
y∈YP(y | x) = argmin
y∈YE (y; x)
Stephen Gould | CVPR 2014 7/76
![Page 20: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/20.jpg)
Energy Functions
Define an energy function
E (Y;X) =∑
c
ψc(Yc ;X)
where ψc (·) = − log Ψc(·)
Then
P(Y | X) =1
Z (X)exp {−E (Y;X)}
Andargmax
y∈YP(y | x) = argmin
y∈YE (y; x)
energy minimization ‘equals’ MAP inference
Stephen Gould | CVPR 2014 7/76
![Page 21: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/21.jpg)
Clique Potentials
A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number
ψc : Yc × X → R
The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)
Often parameterized as
ψc(yc ; x) = wTc φc(yc ; x)
But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables
We will also ignore the conditioning on X (in this part)
Stephen Gould | CVPR 2014 8/76
![Page 22: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/22.jpg)
Clique Potentials
A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number
ψc : Yc × X → R
The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)
Often parameterized as
ψc(yc ; x) = wTc φc(yc ; x)
But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables
We will also ignore the conditioning on X (in this part)
Stephen Gould | CVPR 2014 8/76
![Page 23: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/23.jpg)
Clique Potentials
A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number
ψc : Yc × X → R
The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)
Often parameterized as
ψc(yc ; x) = wTc φc(yc ; x)
But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables
We will also ignore the conditioning on X (in this part)
Stephen Gould | CVPR 2014 8/76
![Page 24: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/24.jpg)
Clique Potentials
A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number
ψc : Yc × X → R
The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)
Often parameterized as
ψc(yc ; x) = wTc φc(yc ; x)
But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables
We will also ignore the conditioning on X (in this part)
Stephen Gould | CVPR 2014 8/76
![Page 25: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/25.jpg)
Clique Potentials
A clique potential ψc(yc ; x) defines a mapping from anassignment of the random variables to a real number
ψc : Yc × X → R
The clique potential encodes a preference for assignments tothe random variables (lower value is more preferred)
Often parameterized as
ψc(yc ; x) = wTc φc(yc ; x)
But in this part of the tutorial is suffices to think of the cliquepotentials as big lookup tables
We will also ignore the conditioning on X (in this part)
Stephen Gould | CVPR 2014 8/76
![Page 26: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/26.jpg)
Clique Potential Arity
E (y; x) =∑
c
ψc(yc ; x)
=∑
i∈V
ψUi (yi ; x)
︸ ︷︷ ︸
unary
+∑
ij∈E
ψPij (yi , yj ; x)
︸ ︷︷ ︸
pairwise
+∑
c∈C
ψHc (yc ; x).
︸ ︷︷ ︸
higher-order
x1 x2 x3
y1 y2 y3
x4 x5 x6
y4 y5 y6
x7 x8 x9
y7 y8 y9
Stephen Gould | CVPR 2014 9/76
![Page 27: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/27.jpg)
Graphical Representation
E (y) = ψ(y1, y2) + ψ(y2, y3) + ψ(y3, y4) + ψ(y4, y1)
Y1 Y2
Y4 Y3
� � � �
Y1 Y2 Y3 Y4
graphical model factor graph
Stephen Gould | CVPR 2014 10/76
![Page 28: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/28.jpg)
Graphical Representation
E (y) =∑
i ,j ψ(yi , yj )
Y1 Y2
Y4 Y3
� � � � � �
Y1 Y2 Y3 Y4
Stephen Gould | CVPR 2014 11/76
![Page 29: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/29.jpg)
Graphical Representation
E (y) = ψ(y1, y2, y3, y4)
Y1 Y2
Y4 Y3
�
Y1 Y2 Y3 Y4
Stephen Gould | CVPR 2014 12/76
![Page 30: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/30.jpg)
Graphical Representation
E (y) = ψ(y1, y2, y3, y4)
Y1 Y2
Y4 Y3
�
Y1 Y2 Y3 Y4
don’t worry too much about the graphical representation,look at the form of the energy function
Stephen Gould | CVPR 2014 12/76
![Page 31: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/31.jpg)
MAP Inference / Energy Minimization
Computing the energy minimizing assignment is NP-hard
argminy∈Y
E (y; x) = argmaxy∈Y
P(y | x)
Some structures admit tractable exact inference algorithms
low treewidth graphs → message passingsubmodular potentials → graph-cuts
Moreover, efficent approximate inference algorithms exist
message passing on general graphsmove making inference (submodular moves)linear programming relaxations
Stephen Gould | CVPR 2014 13/76
![Page 32: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/32.jpg)
MAP Inference / Energy Minimization
Computing the energy minimizing assignment is NP-hard
argminy∈Y
E (y; x) = argmaxy∈Y
P(y | x)
Some structures admit tractable exact inference algorithms
low treewidth graphs → message passingsubmodular potentials → graph-cuts
Moreover, efficent approximate inference algorithms exist
message passing on general graphsmove making inference (submodular moves)linear programming relaxations
Stephen Gould | CVPR 2014 13/76
![Page 33: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/33.jpg)
MAP Inference / Energy Minimization
Computing the energy minimizing assignment is NP-hard
argminy∈Y
E (y; x) = argmaxy∈Y
P(y | x)
Some structures admit tractable exact inference algorithms
low treewidth graphs → message passingsubmodular potentials → graph-cuts
Moreover, efficent approximate inference algorithms exist
message passing on general graphsmove making inference (submodular moves)linear programming relaxations
Stephen Gould | CVPR 2014 13/76
![Page 34: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/34.jpg)
exact inference
Stephen Gould | CVPR 2014 14/76
![Page 35: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/35.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1 � Y2 � Y3 � Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 36: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/36.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 37: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/37.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 38: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/38.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 39: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/39.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 40: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/40.jpg)
An Example: Chain Graph
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
miny
E (y) = miny1,y2,y3,y4
ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
= miny1,y2,y3
ψA(y1, y2) + ψB(y2, y3) + miny4ψC (y3, y4)
︸ ︷︷ ︸
mC→B(y3)
= miny1,y2
ψA(y1, y2) + miny3ψB(y2, y3) +mC→B(y3)
︸ ︷︷ ︸
mB→A(y2)
= miny1,y2
ψA(y1, y2) +mB→A(y2)
Stephen Gould | CVPR 2014 15/76
![Page 41: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/41.jpg)
Viterbi Decoding
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
The energy minimizing assignment can be decoded as
y⋆1 = argminy1
miny2ψA(y1, y2) +mB→A(y2)
y⋆2 = argminy2
ψA(y⋆1 , y2) +mB→A(y2)
y⋆3 = argminy3
ψB(y⋆2 , y3) +mC→B(y3)
y⋆4 = argminy4
ψC (y⋆3 , y4)
Stephen Gould | CVPR 2014 16/76
![Page 42: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/42.jpg)
Viterbi Decoding
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
The energy minimizing assignment can be decoded as
y⋆1 = argminy1
miny2ψA(y1, y2) +mB→A(y2)
y⋆2 = argminy2
ψA(y⋆1 , y2) +mB→A(y2)
y⋆3 = argminy3
ψB(y⋆2 , y3) +mC→B(y3)
y⋆4 = argminy4
ψC (y⋆3 , y4)
Stephen Gould | CVPR 2014 16/76
![Page 43: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/43.jpg)
Viterbi Decoding
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
The energy minimizing assignment can be decoded as
y⋆1 = argminy1
miny2ψA(y1, y2) +mB→A(y2)
y⋆2 = argminy2
ψA(y⋆1 , y2) +mB→A(y2)
y⋆3 = argminy3
ψB(y⋆2 , y3) +mC→B(y3)
y⋆4 = argminy4
ψC (y⋆3 , y4)
Stephen Gould | CVPR 2014 16/76
![Page 44: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/44.jpg)
Viterbi Decoding
E (y) = ψA(y1, y2) + ψB(y2, y3) + ψC (y3, y4)
Y1,Y2 Y2,Y3 Y3,Y4
The energy minimizing assignment can be decoded as
y⋆1 = argminy1
miny2ψA(y1, y2) +mB→A(y2)
y⋆2 = argminy2
ψA(y⋆1 , y2) +mB→A(y2)
y⋆3 = argminy3
ψB(y⋆2 , y3) +mC→B(y3)
y⋆4 = argminy4
ψC (y⋆3 , y4)
Stephen Gould | CVPR 2014 16/76
![Page 45: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/45.jpg)
What did this cost us?
Y1 � Y2 � · · · � Yn
For a chain of length n with L labels per variable:
Brute force enumeration would cost |Y| = Ln
Viterbi decoding (message passing) costs O(nL2)
The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)
Stephen Gould | CVPR 2014 17/76
![Page 46: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/46.jpg)
What did this cost us?
Y1 � Y2 � · · · � Yn
For a chain of length n with L labels per variable:
Brute force enumeration would cost |Y| = Ln
Viterbi decoding (message passing) costs O(nL2)
The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)
Stephen Gould | CVPR 2014 17/76
![Page 47: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/47.jpg)
What did this cost us?
Y1 � Y2 � · · · � Yn
For a chain of length n with L labels per variable:
Brute force enumeration would cost |Y| = Ln
Viterbi decoding (message passing) costs O(nL2)
The operation minψ(·, ·) +m(·) can be sped up for potentialswith certain structure (e.g., so called convex priors)
Stephen Gould | CVPR 2014 17/76
![Page 48: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/48.jpg)
Min-Sum Message Passing on Clique Trees
messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as
mi→j(Yj ∩Yi) = minYi\Yj
(
ψi (Yi) +∑
k∈N (i)\{j}
mk→i (Yi ∩Yk))
energy minimizing assignment decoded as
y⋆i = argminYi
( min marginal︷ ︸︸ ︷
ψi (Yi) +∑
k∈N (i)
mk→i (Yi ∩ Yk)
)
ties must be decoded consistently
Stephen Gould | CVPR 2014 18/76
![Page 49: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/49.jpg)
Min-Sum Message Passing on Clique Trees
messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as
mi→j(Yj ∩Yi) = minYi\Yj
(
ψi (Yi) +∑
k∈N (i)\{j}
mk→i (Yi ∩Yk))
energy minimizing assignment decoded as
y⋆i = argminYi
( min marginal︷ ︸︸ ︷
ψi (Yi) +∑
k∈N (i)
mk→i (Yi ∩ Yk)
)
ties must be decoded consistently
Stephen Gould | CVPR 2014 18/76
![Page 50: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/50.jpg)
Min-Sum Message Passing on Clique Trees
messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as
mi→j(Yj ∩Yi) = minYi\Yj
(
ψi (Yi) +∑
k∈N (i)\{j}
mk→i (Yi ∩Yk))
energy minimizing assignment decoded as
y⋆i = argminYi
( min marginal︷ ︸︸ ︷
ψi (Yi) +∑
k∈N (i)
mk→i (Yi ∩ Yk)
)
ties must be decoded consistently
Stephen Gould | CVPR 2014 18/76
![Page 51: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/51.jpg)
Min-Sum Message Passing on Clique Trees
messages sent in reverse then forward topological orderingmessage from clique i to clique j calculated as
mi→j(Yj ∩Yi) = minYi\Yj
(
ψi (Yi) +∑
k∈N (i)\{j}
mk→i (Yi ∩Yk))
energy minimizing assignment decoded as
y⋆i = argminYi
( min marginal︷ ︸︸ ︷
ψi (Yi) +∑
k∈N (i)
mk→i (Yi ∩ Yk)
)
ties must be decoded consistently
Stephen Gould | CVPR 2014 18/76
![Page 52: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/52.jpg)
Min-Sum Message Passing on Factor Graphs (Trees)
messages from variables to factors
mi→F (yi ) =∑
G∈N (i)\{F}
mG→i (yi )
messages from factors to variables
mF→i (yi ) = miny′F,y ′
i=yi
(
ψF (y′F ) +
∑
j∈N (F )\{i}
mj→F (y′j ))
energy minimizing assignment decoded as
y⋆i = argminyi
∑
F∈N (i)
mF→i(yi )
Stephen Gould | CVPR 2014 19/76
![Page 53: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/53.jpg)
Min-Sum Message Passing on Factor Graphs (Trees)
messages from variables to factors
mi→F (yi ) =∑
G∈N (i)\{F}
mG→i (yi )
messages from factors to variables
mF→i (yi ) = miny′F,y ′
i=yi
(
ψF (y′F ) +
∑
j∈N (F )\{i}
mj→F (y′j ))
energy minimizing assignment decoded as
y⋆i = argminyi
∑
F∈N (i)
mF→i(yi )
Stephen Gould | CVPR 2014 19/76
![Page 54: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/54.jpg)
Min-Sum Message Passing on Factor Graphs (Trees)
messages from variables to factors
mi→F (yi ) =∑
G∈N (i)\{F}
mG→i (yi )
messages from factors to variables
mF→i (yi ) = miny′F,y ′
i=yi
(
ψF (y′F ) +
∑
j∈N (F )\{i}
mj→F (y′j ))
energy minimizing assignment decoded as
y⋆i = argminyi
∑
F∈N (i)
mF→i(yi )
Stephen Gould | CVPR 2014 19/76
![Page 55: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/55.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 56: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/56.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 57: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/57.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 58: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/58.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 59: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/59.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 60: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/60.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 61: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/61.jpg)
Message Passing on General Graphs
Message passing can be generalized to graphs with loops
If the treewidth is small we can still perform exact inference
junction tree algorithm: triangulate the graph and runmessage passing on the resulting tree
Otherwise run message passing anyway
loopy belief propagtaiondifferent message schedules (synchronous/asynchronous,static/dynamic)no convergence or approximation guarantees
Stephen Gould | CVPR 2014 20/76
![Page 62: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/62.jpg)
graph-cut based methods
Stephen Gould | CVPR 2014 21/76
![Page 63: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/63.jpg)
Binary MRF Example
Consider the following energy function fortwo binary random variables, y1 and y2.
01
5
2
01
1
3
01
0 1
0 3
4 0
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸
ψ1
+ y2 + 3y2︸ ︷︷ ︸
ψ2
+ 3y1y2 + 4y1y2︸ ︷︷ ︸
ψ12
where y1 = 1− y1 and y2 = 1− y2.
Stephen Gould | CVPR 2014 22/76
![Page 64: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/64.jpg)
Binary MRF Example
Consider the following energy function fortwo binary random variables, y1 and y2.
01
5
2
01
1
3
01
0 1
0 3
4 0
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸
ψ1
+ y2 + 3y2︸ ︷︷ ︸
ψ2
+ 3y1y2 + 4y1y2︸ ︷︷ ︸
ψ12
where y1 = 1− y1 and y2 = 1− y2.
Stephen Gould | CVPR 2014 22/76
![Page 65: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/65.jpg)
Binary MRF Example
Consider the following energy function fortwo binary random variables, y1 and y2.
01
5
2
01
1
3
01
0 1
0 3
4 0
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψ12(y1, y2)= 5y1 + 2y1︸ ︷︷ ︸
ψ1
+ y2 + 3y2︸ ︷︷ ︸
ψ2
+ 3y1y2 + 4y1y2︸ ︷︷ ︸
ψ12
where y1 = 1− y1 and y2 = 1− y2.
Graphical Model
y1 y2
Probability Table
y1 y2 E P
0 0 6 0.244
0 1 11 0.002
1 0 7 0.090
1 1 5 0.664
Stephen Gould | CVPR 2014 22/76
![Page 66: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/66.jpg)
Pseudo-boolean Functions [Boros and Hammer, 2001]
Pseudo-boolean Function
A mapping f : {0, 1}n → R is called a pseudo-Boolean function.
Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.
Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.
A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.
Stephen Gould | CVPR 2014 23/76
![Page 67: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/67.jpg)
Pseudo-boolean Functions [Boros and Hammer, 2001]
Pseudo-boolean Function
A mapping f : {0, 1}n → R is called a pseudo-Boolean function.
Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.
Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.
A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.
Stephen Gould | CVPR 2014 23/76
![Page 68: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/68.jpg)
Pseudo-boolean Functions [Boros and Hammer, 2001]
Pseudo-boolean Function
A mapping f : {0, 1}n → R is called a pseudo-Boolean function.
Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.
Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.
A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.
Stephen Gould | CVPR 2014 23/76
![Page 69: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/69.jpg)
Pseudo-boolean Functions [Boros and Hammer, 2001]
Pseudo-boolean Function
A mapping f : {0, 1}n → R is called a pseudo-Boolean function.
Pseudo-boolean functions can be uniquely represented asmulti-linear polynomials, e.g., f (y1, y2) = 6+ y1+5y2− 7y1y2.
Pseudo-boolean functions can also be represented in posiform,e.g., f (y1, y2) = 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2. Thisrepresentation is not unique.
A binary pairwise Markov random field (MRF) is just aquadratic pseudo-Boolean function.
Stephen Gould | CVPR 2014 23/76
![Page 70: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/70.jpg)
Submodular Functions
Submodularity
Let V be a set. A set function f : 2V → R is called submodular iff (X ) + f (Y ) ≥ f (X ∪ Y ) + f (X ∩ Y ) for all subsets X ,Y ⊆ V.
Stephen Gould | CVPR 2014 24/76
![Page 71: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/71.jpg)
Submodular Binary Pairwise MRFs
Submodularity
A pseudo-Boolean function f : {0, 1}n → R is called submodular iff (x) + f (y) ≥ f (x ∨ y) + f (x ∧ y) for all vectors x, y ∈ {0, 1}n .
Submodularity checks for pairwise binary MRFs:
polynomial form (of pseudo-boolean function) has negativecoefficients on all bi-linear terms;
posiform has pairwise terms of the form uv ;
all pairwise potentials satisfy
ψPij (0, 1) + ψP
ij (1, 0) ≥ ψPij (1, 1) + ψP
ij (0, 0)
Stephen Gould | CVPR 2014 25/76
![Page 72: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/72.jpg)
Submodular Binary Pairwise MRFs
Submodularity
A pseudo-Boolean function f : {0, 1}n → R is called submodular iff (x) + f (y) ≥ f (x ∨ y) + f (x ∧ y) for all vectors x, y ∈ {0, 1}n .
Submodularity checks for pairwise binary MRFs:
polynomial form (of pseudo-boolean function) has negativecoefficients on all bi-linear terms;
posiform has pairwise terms of the form uv ;
all pairwise potentials satisfy
ψPij (0, 1) + ψP
ij (1, 0) ≥ ψPij (1, 1) + ψP
ij (0, 0)
Stephen Gould | CVPR 2014 25/76
![Page 73: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/73.jpg)
Submodularity of Binary Pairwise Terms
To see the equivalence of the last two conditions consider thefollowing pairwise potential
01
0 1
α β
γ δ
α +0 0
γ − α γ − α+
0 δ − γ
0 δ − γ+
0 β + γ − α− δ
0 0
E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2
[Kolmogorov and Zabih, 2004]
Stephen Gould | CVPR 2014 26/76
![Page 74: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/74.jpg)
Submodularity of Binary Pairwise Terms
To see the equivalence of the last two conditions consider thefollowing pairwise potential
01
0 1
α β
γ δ
α +0 0
γ − α γ − α+
0 δ − γ
0 δ − γ+
0 β + γ − α− δ
0 0
E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2
[Kolmogorov and Zabih, 2004]
Stephen Gould | CVPR 2014 26/76
![Page 75: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/75.jpg)
Submodularity of Binary Pairwise Terms
To see the equivalence of the last two conditions consider thefollowing pairwise potential
01
0 1
α β
γ δ
α +0 0
γ − α γ − α+
0 δ − γ
0 δ − γ+
0 β + γ − α− δ
0 0
E (y1, y2) = α+ (γ − α)y1 + (δ − γ)y2 + (β + γ − α− δ)y1y2
[Kolmogorov and Zabih, 2004]
Stephen Gould | CVPR 2014 26/76
![Page 76: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/76.jpg)
Minimum-cut Problem
Graph Cut
Let G = 〈V, E〉 be a capacitated digraph with two distinguishedvertices s and t. An st-cut is a partitioning of V into two disjointsets S and T such that s ∈ S and t ∈ T . The cost of the cut isthe sum of edge capacities for all edges going from S to T .
s
u v
t
Stephen Gould | CVPR 2014 27/76
![Page 77: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/77.jpg)
Quadratic Pseudo-boolean Optimization
Main idea:
construct a graph such that every st-cut corresponds to ajoint assignment to the variables y
the cost of the cut should be equal to the energy of theassignment, E (y; x).∗
the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).
∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76
![Page 78: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/78.jpg)
Quadratic Pseudo-boolean Optimization
Main idea:
construct a graph such that every st-cut corresponds to ajoint assignment to the variables y
the cost of the cut should be equal to the energy of theassignment, E (y; x).∗
the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).
∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76
![Page 79: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/79.jpg)
Quadratic Pseudo-boolean Optimization
Main idea:
construct a graph such that every st-cut corresponds to ajoint assignment to the variables y
the cost of the cut should be equal to the energy of theassignment, E (y; x).∗
the minimum-cut then corresponds to the the minimumenergy assignment, y⋆ = argminy E (y; x).
∗Requires non-negative edge weights.Stephen Gould | CVPR 2014 28/76
![Page 80: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/80.jpg)
Example st-Graph Construction for Binary MRF
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
1
0
Stephen Gould | CVPR 2014 29/76
![Page 81: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/81.jpg)
Example st-Graph Construction for Binary MRF
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
0
Stephen Gould | CVPR 2014 29/76
![Page 82: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/82.jpg)
Example st-Graph Construction for Binary MRF
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
3
1
0
Stephen Gould | CVPR 2014 29/76
![Page 83: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/83.jpg)
Example st-Graph Construction for Binary MRF
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
33
1
0
Stephen Gould | CVPR 2014 29/76
![Page 84: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/84.jpg)
Example st-Graph Construction for Binary MRF
E (y1, y2) = ψ1(y1) + ψ2(y2) + ψij(y1, y2)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
33
4
1
0
Stephen Gould | CVPR 2014 29/76
![Page 85: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/85.jpg)
An Example st-Cut
E (0, 1) = ψ1(0) + ψ2(1) + ψij(0, 1)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
33
4
1
0
Stephen Gould | CVPR 2014 30/76
![Page 86: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/86.jpg)
Another st-Cut
E (1, 1) = ψ1(1) + ψ2(1) + ψij(1, 1)
= 2y1 + 5y1 + 3y2 + y2 + 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
33
4
1
0
Stephen Gould | CVPR 2014 31/76
![Page 87: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/87.jpg)
Invalid st-Cut
This is not a valid cut, since it does not correspond to apartitioning of the nodes into two sets—one containing s and onecontaining t.
s
y1 y2
t
5
2
1
33
4
1
0
Stephen Gould | CVPR 2014 32/76
![Page 88: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/88.jpg)
Alternative st-Graph Construction
Sometimes you will see the roles of s and t switched.
s
y1 y2
t
5
2
1
33
4
1
0
s
y1 y2
t
2
5
3
1
4
3
0
1
These graphs represent the same energy function.
Stephen Gould | CVPR 2014 33/76
![Page 89: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/89.jpg)
Big Picture: Where are we?
We can now formulate inference in a submodular binarypairwise MRF as a minimum-cut problem.
{0, 1}n → R
How do we solve the minimum-cut problem?
Stephen Gould | CVPR 2014 34/76
![Page 90: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/90.jpg)
Max-flow/Min-cut Theorem
Max-flow/Min-cut Theorem [Fulkerson, 1956]
The maximum flow f from vertex s to vertex t is equal to theminimum cost st-cut.
s
u v
t
Stephen Gould | CVPR 2014 35/76
![Page 91: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/91.jpg)
Maximum Flow Example
s
a b
c d
t
5 3
3
5 2
1
3 5
Stephen Gould | CVPR 2014 36/76
![Page 92: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/92.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
0/5 0/30/3
0/5 0/2
0/1
0/3 0/5
flow
0
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 93: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/93.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
0/5 0/30/3
0/5 0/2
0/1
0/3 0/5
flow
0
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 94: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/94.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
3/5 0/30/3
3/5 0/2
0/1
3/3 0/5
flow
3
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 95: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/95.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
3/5 0/30/3
3/5 0/2
0/1
3/3 0/5
flow
3
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 96: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/96.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
5/5 0/32/3
3/5 2/2
0/1
3/3 2/5
flow
5
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 97: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/97.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
5/5 0/32/3
3/5 2/20/1
3/3 2/5
flow
5
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 98: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/98.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
5/5 1/31/3
4/5 2/2
1/1
3/3 3/5
flow
6
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 37/76
![Page 99: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/99.jpg)
Maximum Flow Example (Augmenting Path)
s
a b
c d
t
5/5 1/31/3
4/5 2/2
1/1
3/3 3/5
flow
6
notation
u vf /c
edge with capacity c ,and current flow f .
Stephen Gould | CVPR 2014 38/76
![Page 100: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/100.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
0/5 0/30/3
0/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 0 0b 0 0c 0 0d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 101: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/101.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
0/5 0/30/3
0/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 0 0b 0 0c 0 0d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 102: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/102.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
0/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 0 5b 0 3c 0 0d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 103: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/103.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
0/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 5b 0 3c 0 0d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 104: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/104.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
0/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 5b 0 3c 0 0d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 105: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/105.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 0 6c 0 2d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 106: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/106.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 6c 0 2d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 107: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/107.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 0/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 6c 0 2d 0 0t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 108: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/108.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 0 2d 0 2t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 109: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/109.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
0/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 2d 0 2t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 110: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/110.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/20/1
0/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 2d 0 2t 0 0
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 111: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/111.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 0 3t 0 1
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 112: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/112.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 3t 0 1
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 113: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/113.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 0/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 3t 0 1
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 114: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/114.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 1 0b 1 4c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 115: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/115.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 1 0b 2 4c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 116: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/116.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/33/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 1 0b 2 4c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 117: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/117.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 1 3b 2 1c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 118: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/118.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 2 3b 2 1c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 119: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/119.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
2/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 2 3b 2 1c 1 0d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 120: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/120.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
5/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 2 1c 1 3d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 121: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/121.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
5/5 2/2
1/1
1/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 2 1c 1 3d 1 0t 0 4
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 122: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/122.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 2 1c 1 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 123: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/123.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 7 1c 1 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 124: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/124.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 3/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 7 1c 1 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 125: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/125.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 7 0c 1 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 126: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/126.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 7 0c 3 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 127: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/127.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 0b 7 0c 3 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 128: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/128.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 2 1b 7 0c 3 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 129: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/129.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 1b 7 0c 3 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 130: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/130.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 1b 7 0c 3 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 131: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/131.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 0b 7 0c 3 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 132: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/132.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 0b 7 0c 5 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 133: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/133.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 0b 7 0c 5 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 134: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/134.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 4 1b 7 0c 5 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 135: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/135.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 1b 7 0c 5 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 136: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/136.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 1b 7 0c 5 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 137: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/137.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 0b 7 0c 5 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 138: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/138.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 0b 7 0c 7 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 139: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/139.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
5/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 0b 7 0c 7 1d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 140: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/140.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 6 1b 7 0c 7 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 141: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/141.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 7 1b 7 0c 7 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 142: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/142.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
5/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 7 1b 7 0c 7 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 143: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/143.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
4/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 7 0b 7 0c 7 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 39/76
![Page 144: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/144.jpg)
Maximum Flow Example (Push-Relabel)
s
a b
c d
t
4/5 2/30/3
4/5 2/2
1/1
3/3 3/5
state
h(·) e(·)s 6 ∞a 7 0b 7 0c 7 0d 1 0t 0 6
notation
u vf /c
edge with capacity c ,current flow f .
Stephen Gould | CVPR 2014 40/76
![Page 145: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/145.jpg)
Comparison of Maximum Flow Algorithms
Current state-of-the-art algorithm for exact minimization of generalsubmodular pseudo-Boolean functions is O(n5T + n6), where T isthe time taken to evaluate the function [Orlin, 2009].
†assumes integer capacitiesStephen Gould | CVPR 2014 41/76
![Page 146: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/146.jpg)
Comparison of Maximum Flow Algorithms
Current state-of-the-art algorithm for exact minimization of generalsubmodular pseudo-Boolean functions is O(n5T + n6), where T isthe time taken to evaluate the function [Orlin, 2009].
Algorithm Complexity
Ford-Fulkerson O(E max f )†
Edmonds-Karp (BFS) O(VE 2)
Push-relabel O(V 3)
Boykov-Kolmogorov O(V 2E max f )(∼ O(V ) in practice)
†assumes integer capacitiesStephen Gould | CVPR 2014 41/76
![Page 147: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/147.jpg)
Maximum Flow (Boykov-Kolmogorov, PAMI 2004)
Stephen Gould | CVPR 2014 42/76
![Page 148: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/148.jpg)
Maximum Flow (Boykov-Kolmogorov, PAMI 2004)
growth stage
search trees from s
and t grow untilthey touch
Stephen Gould | CVPR 2014 43/76
![Page 149: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/149.jpg)
Maximum Flow (Boykov-Kolmogorov, PAMI 2004)
growth stage
search trees from s
and t grow untilthey touch
augmentation stage
the path found isaugmented
Stephen Gould | CVPR 2014 44/76
![Page 150: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/150.jpg)
Maximum Flow (Boykov-Kolmogorov, PAMI 2004)
growth stage
search trees from s
and t grow untilthey touch
augmentation stage
the path found isaugmented; treesbreak into forests
Stephen Gould | CVPR 2014 45/76
![Page 151: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/151.jpg)
Maximum Flow (Boykov-Kolmogorov, PAMI 2004)
growth stage
search trees from s
and t grow untilthey touch
augmentation stage
the path found isaugmented; treesbreak into forests
adoption stage
trees are restored
Stephen Gould | CVPR 2014 46/76
![Page 152: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/152.jpg)
Reparameterization of Energy Functions
E (y1, y2) = 2y1+5y1+3y2+y2
+ 3y1y2 + 4y1y2
s
y1 y2
t
5
2
1
33
4
1
0
E (y1, y2) = 6y1+5y2+7y1y2
s
y1 y2
t
6
5
7
1
0
Stephen Gould | CVPR 2014 47/76
![Page 153: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/153.jpg)
Big Picture: Where are we now?
We can perform inference in submodular binary pairwiseMarkov random fields exactly.
{0, 1}n → R
What about...
non-submodular binary pairwise Markov random fields?
multi-label Markov random fields?
higher-order Markov random fields? (later)
Stephen Gould | CVPR 2014 48/76
![Page 154: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/154.jpg)
Big Picture: Where are we now?
We can perform inference in submodular binary pairwiseMarkov random fields exactly.
{0, 1}n → R
What about...
non-submodular binary pairwise Markov random fields?
multi-label Markov random fields?
higher-order Markov random fields? (later)
Stephen Gould | CVPR 2014 48/76
![Page 155: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/155.jpg)
Non-submodular Binary Pairwise MRFs
Non-submodular binary pairwise MRFs have potentials that do notsatisfy ψP
ij (0, 1) + ψPij (1, 0) ≥ ψ
Pij (1, 1) + ψP
ij (0, 0).
They are often handled in one of the following ways:
approximate the energy function by one that is submodular(i.e., project onto the space of submodular functions);
solve a relaxation of the problem using QPBO (Rother et al.,2007) or dual-decomposition (Komodakis et al., 2007).
Stephen Gould | CVPR 2014 49/76
![Page 156: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/156.jpg)
Non-submodular Binary Pairwise MRFs
Non-submodular binary pairwise MRFs have potentials that do notsatisfy ψP
ij (0, 1) + ψPij (1, 0) ≥ ψ
Pij (1, 1) + ψP
ij (0, 0).
They are often handled in one of the following ways:
approximate the energy function by one that is submodular(i.e., project onto the space of submodular functions);
solve a relaxation of the problem using QPBO (Rother et al.,2007) or dual-decomposition (Komodakis et al., 2007).
Stephen Gould | CVPR 2014 49/76
![Page 157: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/157.jpg)
Approximating Non-submodular Binary Pairwise MRFs
Consider the non-submodular potentialA B
C Dwith
A+ D > B + C .
We can project onto a submodular potential by modifying thecoefficients as follows:
∆ = A+ D − C − B
A← A−∆
3
C ← C +∆
3
B ← B +∆
3
Stephen Gould | CVPR 2014 50/76
![Page 158: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/158.jpg)
QPBO (Roof Duality) [Rother et al., 2007]
Consider the energy function
E (y) =∑
i∈V
ψUi (yi ) +
∑
ij∈E
ψPij (yi , yj )
︸ ︷︷ ︸
submodular
+∑
ij∈E
ψPij (yi , yj)
︸ ︷︷ ︸
non-submodular
We can introduce duplicate variables yi into the energy function,and write
E ′(y, y) =∑
i∈V
ψUi (yi) + ψU
i (1− yi )
2
+∑
ij∈E
ψPij (yi , yj) + ψP
ij (1− yi , 1− yj)
2
+∑
ij∈E
ψPij (yi , 1− yj) + ψP
ij (1− yi , yj)
2
Stephen Gould | CVPR 2014 51/76
![Page 159: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/159.jpg)
QPBO (Roof Duality) [Rother et al., 2007]
Consider the energy function
E (y) =∑
i∈V
ψUi (yi ) +
∑
ij∈E
ψPij (yi , yj )
︸ ︷︷ ︸
submodular
+∑
ij∈E
ψPij (yi , yj)
︸ ︷︷ ︸
non-submodular
We can introduce duplicate variables yi into the energy function,and write
E ′(y, y) =∑
i∈V
ψUi (yi) + ψU
i (1− yi )
2
+∑
ij∈E
ψPij (yi , yj) + ψP
ij (1− yi , 1− yj)
2
+∑
ij∈E
ψPij (yi , 1− yj) + ψP
ij (1− yi , yj)
2
Stephen Gould | CVPR 2014 51/76
![Page 160: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/160.jpg)
QPBO (Roof Duality)
E ′(y, y) =∑
i∈V
12ψUi (yi ) +
12ψUi (1− yi)
+∑
ij∈E
12ψPij (yi , yj) +
12ψPij (1− yi , 1− yj)
+∑
ij∈E
12ψPij (yi , 1− yj) + 1
2ψPij (1− yi , yj)
Observations
if yi = 1− yi for all i , then E (y) = E ′(y, y).E ′(y, y) is submodular.
Ignore the constraint on yi and solve anyway. Result satisfiespartial optimality: if yi = 1− yi then yi is the optimal label.
Stephen Gould | CVPR 2014 52/76
![Page 161: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/161.jpg)
QPBO (Roof Duality)
E ′(y, y) =∑
i∈V
12ψUi (yi ) +
12ψUi (1− yi)
+∑
ij∈E
12ψPij (yi , yj) +
12ψPij (1− yi , 1− yj)
+∑
ij∈E
12ψPij (yi , 1− yj) + 1
2ψPij (1− yi , yj)
Observations
if yi = 1− yi for all i , then E (y) = E ′(y, y).E ′(y, y) is submodular.
Ignore the constraint on yi and solve anyway. Result satisfiespartial optimality: if yi = 1− yi then yi is the optimal label.
Stephen Gould | CVPR 2014 52/76
![Page 162: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/162.jpg)
Multi-label Markov Random Fields
The quadratic pseudo-Boolean optimization techniques describedabove cannot be applied directly to multi-label MRFs.
However...
...for certain MRFs we can transform the multi-label probleminto a binary one exactly.
...we can project the multi-label problem onto a series ofbinary problems in a so-called move-making algorithm.
Stephen Gould | CVPR 2014 53/76
![Page 163: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/163.jpg)
Multi-label Markov Random Fields
The quadratic pseudo-Boolean optimization techniques describedabove cannot be applied directly to multi-label MRFs.
However...
...for certain MRFs we can transform the multi-label probleminto a binary one exactly.
...we can project the multi-label problem onto a series ofbinary problems in a so-called move-making algorithm.
Stephen Gould | CVPR 2014 53/76
![Page 164: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/164.jpg)
The “Battleship” Transform [Ishikawa, 2003]
If the multi-label MRFs has pairwise potentials that are convexfunctions over the label differences, i.e., ψP
ij (yi , yj ) = g(|yi − yj |)where g(·) is convex, then we can transform the energy functioninto an equivalent binary one.
y = 1⇔ z = (0, 0, 0)
y = 2⇔ z = (1, 0, 0)
y = 3⇔ z = (1, 1, 0)
y = 4⇔ z = (1, 1, 1)
s
1 1
2 2
3 3
t
∞
∞
∞
∞
Stephen Gould | CVPR 2014 54/76
![Page 165: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/165.jpg)
Move-making Inference
Idea:
initialize yprev to any valid assignment
restrict the label-space of each variable yi from L to Yi ⊆ L(with y
prev
i ∈ Yi)
transform E : Ln → R to E : Y1 × · · · × Yn → R
find the optimal assignment y for E and repeat
each move results in an assignment with lower energy
Stephen Gould | CVPR 2014 55/76
![Page 166: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/166.jpg)
Iterated Conditional Modes [Besag, 1986]
Reduce multi-variate inference to solving a series ofunivariate inference problems.
ICM move
For one of the variables yi , set Yi = L. Set Yj = {yprev
j } for allj 6= i (i.e., hold all other variables fixed).
can be used for arbitrary energy functions
Stephen Gould | CVPR 2014 56/76
![Page 167: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/167.jpg)
Iterated Conditional Modes [Besag, 1986]
Reduce multi-variate inference to solving a series ofunivariate inference problems.
ICM move
For one of the variables yi , set Yi = L. Set Yj = {yprev
j } for allj 6= i (i.e., hold all other variables fixed).
can be used for arbitrary energy functions
Stephen Gould | CVPR 2014 56/76
![Page 168: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/168.jpg)
Alpha Expansion and Alpha-Beta Swap [Boykov et al., 2001]
Reduce multi-label inference to solving a series of binary(submodular) inference problems.
α-expansion move
Choose some α ∈ L. Then for all variables, set Yi = {α, yprev
i }.
ψPij (·, ·) must be metric for the resulting move to be submodular
αβ-swap move
Choose two labels α, β ∈ L. Then for each variable yi such thatyprev
i ∈ {α, β}, set Yi = {α, β}. Otherwise set Yi = {yprev
i }.
ψPij (·, ·) must be semi-metric
Stephen Gould | CVPR 2014 57/76
![Page 169: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/169.jpg)
Alpha Expansion Potential Construction
ynexti =
{
yprevi if ti = 1
α if ti = 0
E (t) =∑
i
ψi (α)ti + ψi (yprevi )ti +
∑
ij
ψij(α,α)ti tj
+ ψij(α, yprevj )ti tj + ψij(y
previ , α)ti tj + ψij (y
previ , yprevj )ti tj
Stephen Gould | CVPR 2014 58/76
![Page 170: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/170.jpg)
Alpha Expansion Potential Construction
ynexti =
{
yprevi if ti = 1
α if ti = 0
E (t) =∑
i
ψi (α)ti + ψi (yprevi )ti +
∑
ij
ψij(α,α)ti tj
+ ψij(α, yprevj )ti tj + ψij(y
previ , α)ti tj + ψij (y
previ , yprevj )ti tj
Stephen Gould | CVPR 2014 58/76
![Page 171: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/171.jpg)
relaxations and dual decomposition
Stephen Gould | CVPR 2014 59/76
![Page 172: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/172.jpg)
Mathematical Programming Formulation
Let θc,yc , ψc (yc) and let µc,yc ,
{
1, if Yc = yc
0, otherwise
argminy∈Y
∑
c
ψc(yc)
m
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}, ∀c , yc ∈ Yc∑
ycµc,yc = 1, ∀c
∑
yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi
Stephen Gould | CVPR 2014 60/76
![Page 173: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/173.jpg)
Mathematical Programming Formulation
Let θc,yc , ψc (yc) and let µc,yc ,
{
1, if Yc = yc
0, otherwise
argminy∈Y
∑
c
ψc(yc)
m
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}, ∀c , yc ∈ Yc∑
ycµc,yc = 1, ∀c
∑
yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi
Stephen Gould | CVPR 2014 60/76
![Page 174: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/174.jpg)
Binary Integer Program: Example
Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.
Y1 Y1,Y2 Y2
θ =
ψ1(0)ψ1(1)ψ2(0)ψ2(1)
ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)
µ =
µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11
s.t.
µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1
µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1
Stephen Gould | CVPR 2014 61/76
![Page 175: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/175.jpg)
Binary Integer Program: Example
Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.
Y1 Y1,Y2 Y2
θ =
ψ1(0)ψ1(1)ψ2(0)ψ2(1)
ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)
µ =
µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11
s.t.
µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1
µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1
Stephen Gould | CVPR 2014 61/76
![Page 176: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/176.jpg)
Binary Integer Program: Example
Consider energy function E (y1, y2) = ψ1(y1) +ψ12(y1, y2) + ψ2(y2)for binary variables y1 and y2.
Y1 Y1,Y2 Y2
θ =
ψ1(0)ψ1(1)ψ2(0)ψ2(1)
ψ12(0, 0)ψ12(1, 0)ψ12(0, 1)ψ12(1, 1)
µ =
µ1,0µ1,1µ2,0µ2,1µ12,00µ12,10µ12,01µ12,11
s.t.
µ1,0 + µ1,1 = 1µ2,0 + µ2,1 = 1
µ12,00 + µ12,10+ µ12,01 + µ12,11 = 1µ12,00 + µ12,01 = µ1,0µ12,10 + µ12,11 = µ1,1µ12,00 + µ12,10 = µ2,0µ12,01 + µ12,11 = µ2,1
Stephen Gould | CVPR 2014 61/76
![Page 177: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/177.jpg)
Local Marginal Polytope
M =
{
µ ≥ 0
∣∣∣∣∣
∑
yiµi ,yi = 1, ∀i
∑
yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi
}
M is tight if factor graph is a tree
for cyclic graphsM may contain fractional vertices
for submodular energies, factional solutions are never optimal
Stephen Gould | CVPR 2014 62/76
![Page 178: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/178.jpg)
Local Marginal Polytope
M =
{
µ ≥ 0
∣∣∣∣∣
∑
yiµi ,yi = 1, ∀i
∑
yc\yiµc,yc = µi ,yi , ∀i ∈ c , yi ∈ Yi
}
M is tight if factor graph is a tree
for cyclic graphsM may contain fractional vertices
for submodular energies, factional solutions are never optimal
Stephen Gould | CVPR 2014 62/76
![Page 179: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/179.jpg)
Linear Programming (LP) Relaxation
Binary integer program
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}µ ∈ M
Linear program
minimize (over µ) θTµ
subject to µc,yc ∈ [0, 1]µ ∈ M
Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints
More easily solved via coordinate ascent of the dual
Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76
![Page 180: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/180.jpg)
Linear Programming (LP) Relaxation
Binary integer program
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}µ ∈ M
Linear program
minimize (over µ) θTµ
subject to µc,yc ∈ [0, 1]µ ∈ M
Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints
More easily solved via coordinate ascent of the dual
Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76
![Page 181: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/181.jpg)
Linear Programming (LP) Relaxation
Binary integer program
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}µ ∈ M
Linear program
minimize (over µ) θTµ
subject to µc,yc ∈ [0, 1]µ ∈ M
Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints
More easily solved via coordinate ascent of the dual
Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76
![Page 182: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/182.jpg)
Linear Programming (LP) Relaxation
Binary integer program
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}µ ∈ M
Linear program
minimize (over µ) θTµ
subject to µc,yc ∈ [0, 1]µ ∈ M
Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints
More easily solved via coordinate ascent of the dual
Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76
![Page 183: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/183.jpg)
Linear Programming (LP) Relaxation
Binary integer program
minimize (over µ) θTµ
subject to µc,yc ∈ {0, 1}µ ∈ M
Linear program
minimize (over µ) θTµ
subject to µc,yc ∈ [0, 1]µ ∈ M
Solution by standard LP solvers typically infeasible due tolarge number of variables and constraints
More easily solved via coordinate ascent of the dual
Solutions need to be rounded or decodedStephen Gould | CVPR 2014 63/76
![Page 184: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/184.jpg)
Dual Decomposition: Rewriting the Primal
minimize (over µ)∑
c θTc µc
subject to µ ∈ M
m (pad θc)
minimize (over µ)∑
c θT
c µ
subject to µ ∈M
m (introduce copies of µ)
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Stephen Gould | CVPR 2014 64/76
![Page 185: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/185.jpg)
Dual Decomposition: Rewriting the Primal
minimize (over µ)∑
c θTc µc
subject to µ ∈ M
m (pad θc)
minimize (over µ)∑
c θT
c µ
subject to µ ∈M
m (introduce copies of µ)
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Stephen Gould | CVPR 2014 64/76
![Page 186: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/186.jpg)
Dual Decomposition: Rewriting the Primal
minimize (over µ)∑
c θTc µc
subject to µ ∈ M
m (pad θc)
minimize (over µ)∑
c θT
c µ
subject to µ ∈M
m (introduce copies of µ)
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Stephen Gould | CVPR 2014 64/76
![Page 187: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/187.jpg)
Dual Decomposition: Forming the Dual
Primal problem
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Introducing dual variables λc we have Lagrangian
L(µ, {µc}, {λc}) =∑
c
θT
c µc +
∑
c
λTc (µc − µ)
=∑
c
(θc + λc)Tµc −
∑
c
λTc µ
Stephen Gould | CVPR 2014 65/76
![Page 188: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/188.jpg)
Dual Decomposition: Forming the Dual
Primal problem
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Introducing dual variables λc we have Lagrangian
L(µ, {µc}, {λc}) =∑
c
θT
c µc +
∑
c
λTc (µc − µ)
=∑
c
(θc + λc)Tµc −
∑
c
λTc µ
Stephen Gould | CVPR 2014 65/76
![Page 189: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/189.jpg)
Dual Decomposition: Forming the Dual
Primal problem
minimize (over µ, {µc})∑
c θT
c µc
subject to µc = µ
µ ∈ M
Introducing dual variables λc we have Lagrangian
L(µ, {µc}, {λc}) =∑
c
θT
c µc +
∑
c
λTc (µc − µ)
=∑
c
(θc + λc)Tµc −
∑
c
λTc µ
Stephen Gould | CVPR 2014 65/76
![Page 190: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/190.jpg)
Dual Decomposition
maximize{λc}
min{µc}
∑
c
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minµc
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minyc
ψc(yc) + λc(yc)
subject to∑
c λc = 0
Stephen Gould | CVPR 2014 66/76
![Page 191: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/191.jpg)
Dual Decomposition
maximize{λc}
min{µc}
∑
c
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minµc
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minyc
ψc(yc) + λc(yc)
subject to∑
c λc = 0
Stephen Gould | CVPR 2014 66/76
![Page 192: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/192.jpg)
Dual Decomposition
maximize{λc}
min{µc}
∑
c
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minµc
(θc + λc)Tµc
subject to∑
c λc = 0
m
maximize{λc}
∑
c
minyc
ψc(yc) + λc(yc)
subject to∑
c λc = 0
Stephen Gould | CVPR 2014 66/76
![Page 193: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/193.jpg)
Dual Lower Bound
E (y) =∑
c
ψc (yc)
=∑
c
ψc (yc) + λc(yc)
(
iff∑
c
λc(yc) = 0
)
miny
E (y) ≥∑
c
minyc
ψc(yc) + λc(yc)
miny
E (y) ≥ max{λc}:
∑c λc=0
∑
c
minyc
ψc(yc) + λc(yc)
Stephen Gould | CVPR 2014 67/76
![Page 194: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/194.jpg)
Dual Lower Bound
E (y) =∑
c
ψc (yc)
=∑
c
ψc (yc) + λc(yc)
(
iff∑
c
λc(yc) = 0
)
miny
E (y) ≥∑
c
minyc
ψc(yc) + λc(yc)
miny
E (y) ≥ max{λc}:
∑c λc=0
∑
c
minyc
ψc(yc) + λc(yc)
Stephen Gould | CVPR 2014 67/76
![Page 195: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/195.jpg)
Dual Lower Bound
E (y) =∑
c
ψc (yc)
=∑
c
ψc (yc) + λc(yc)
(
iff∑
c
λc(yc) = 0
)
miny
E (y) ≥∑
c
minyc
ψc(yc) + λc(yc)
miny
E (y) ≥ max{λc}:
∑c λc=0
∑
c
minyc
ψc(yc) + λc(yc)
Stephen Gould | CVPR 2014 67/76
![Page 196: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/196.jpg)
Subgradients
Subgradient
A subgradient of a function f at x is any vector g satisfying
f (y) ≥ f (x) + gT (y − x) for all y
Stephen Gould | CVPR 2014 68/76
![Page 197: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/197.jpg)
Subgradient Method
The basic subgradient method is a algorithm for minimizing anondifferentiable convex function f : Rn → R.
x(k+1) = x(k) − αkg(k)
x(k) is the k-th iterate
g (k) is any subgradient of f at x(k)
αk > 0 is the k-th step size
It is possible that −g (k) is not a descent direction for f at x(k), sowe keep track of the best point found so far
f(k)best = min
{
f(k−1)best , f (x(k))
}
Stephen Gould | CVPR 2014 69/76
![Page 198: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/198.jpg)
Subgradient Method
The basic subgradient method is a algorithm for minimizing anondifferentiable convex function f : Rn → R.
x(k+1) = x(k) − αkg(k)
x(k) is the k-th iterate
g (k) is any subgradient of f at x(k)
αk > 0 is the k-th step size
It is possible that −g (k) is not a descent direction for f at x(k), sowe keep track of the best point found so far
f(k)best = min
{
f(k−1)best , f (x(k))
}
Stephen Gould | CVPR 2014 69/76
![Page 199: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/199.jpg)
Step Size Rules
Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:
constant step size: αk = α
constant step length: αk = γ
‖g (k)‖2
square summable but not summable:∑∞
k=1 α2k <∞,
∑∞k=1 αk =∞
nonsummable diminishing:
limk→∞
αk = 0,∑∞
k=1 αk =∞
nonsummable diminishing step lengths: αk = γk‖g (k)‖2
limk→∞
γk = 0,∑∞
k=1 γk =∞
Stephen Gould | CVPR 2014 70/76
![Page 200: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/200.jpg)
Step Size Rules
Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:
constant step size: αk = α
constant step length: αk = γ
‖g (k)‖2
square summable but not summable:∑∞
k=1 α2k <∞,
∑∞k=1 αk =∞
nonsummable diminishing:
limk→∞
αk = 0,∑∞
k=1 αk =∞
nonsummable diminishing step lengths: αk = γk‖g (k)‖2
limk→∞
γk = 0,∑∞
k=1 γk =∞
Stephen Gould | CVPR 2014 70/76
![Page 201: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/201.jpg)
Step Size Rules
Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:
constant step size: αk = α
constant step length: αk = γ
‖g (k)‖2
square summable but not summable:∑∞
k=1 α2k <∞,
∑∞k=1 αk =∞
nonsummable diminishing:
limk→∞
αk = 0,∑∞
k=1 αk =∞
nonsummable diminishing step lengths: αk = γk‖g (k)‖2
limk→∞
γk = 0,∑∞
k=1 γk =∞
Stephen Gould | CVPR 2014 70/76
![Page 202: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/202.jpg)
Step Size Rules
Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:
constant step size: αk = α
constant step length: αk = γ
‖g (k)‖2
square summable but not summable:∑∞
k=1 α2k <∞,
∑∞k=1 αk =∞
nonsummable diminishing:
limk→∞
αk = 0,∑∞
k=1 αk =∞
nonsummable diminishing step lengths: αk = γk‖g (k)‖2
limk→∞
γk = 0,∑∞
k=1 γk =∞
Stephen Gould | CVPR 2014 70/76
![Page 203: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/203.jpg)
Step Size Rules
Step sizes are chosen ahead of time (unlike line search is ordinarygradient methods). A few common step size schedules are:
constant step size: αk = α
constant step length: αk = γ
‖g (k)‖2
square summable but not summable:∑∞
k=1 α2k <∞,
∑∞k=1 αk =∞
nonsummable diminishing:
limk→∞
αk = 0,∑∞
k=1 αk =∞
nonsummable diminishing step lengths: αk = γk‖g (k)‖2
limk→∞
γk = 0,∑∞
k=1 γk =∞
Stephen Gould | CVPR 2014 70/76
![Page 204: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/204.jpg)
Convergence Results
For constant step size and constant step length, the subgradientalgorithm will converge to within some range of the optimal value,
limk→∞
f(k)best < f ⋆ + ǫ
For the diminishing step size and step length rules the algorithmconverges to the optimal value,
limk→∞
f(k)best = f ⋆
but may take a very long time to converge.
Stephen Gould | CVPR 2014 71/76
![Page 205: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/205.jpg)
Optimal Step Size for Known f⋆
Assume we know f ⋆ (we just don’t know x⋆). Then
αk =f (x(k))− f ⋆
‖g (k)‖22
is an optimal step size in some sense. Called the Polyak step size.
A good approximation when f ⋆ is not known (but non-negative) is
αk =f (x(k))− γ · f
(k−1)best
‖g (k)‖22
where 0 < γ < 1.
Stephen Gould | CVPR 2014 72/76
![Page 206: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/206.jpg)
Optimal Step Size for Known f⋆
Assume we know f ⋆ (we just don’t know x⋆). Then
αk =f (x(k))− f ⋆
‖g (k)‖22
is an optimal step size in some sense. Called the Polyak step size.
A good approximation when f ⋆ is not known (but non-negative) is
αk =f (x(k))− γ · f
(k−1)best
‖g (k)‖22
where 0 < γ < 1.
Stephen Gould | CVPR 2014 72/76
![Page 207: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/207.jpg)
Projected Subgradient Method
One extension of the subgradient method is the projectedsubgradient method which solves problems of the form
minimize f (x)subject to x ∈ C
Here the updates are
x(k+1) = PC
(
x(k) − αkg(k))
The projected subgradient method has similar convergenceguarantees to the subgradient method.
Stephen Gould | CVPR 2014 73/76
![Page 208: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/208.jpg)
Projected Subgradient Method
One extension of the subgradient method is the projectedsubgradient method which solves problems of the form
minimize f (x)subject to x ∈ C
Here the updates are
x(k+1) = PC
(
x(k) − αkg(k))
The projected subgradient method has similar convergenceguarantees to the subgradient method.
Stephen Gould | CVPR 2014 73/76
![Page 209: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/209.jpg)
Projected Subgradient Method
One extension of the subgradient method is the projectedsubgradient method which solves problems of the form
minimize f (x)subject to x ∈ C
Here the updates are
x(k+1) = PC
(
x(k) − αkg(k))
The projected subgradient method has similar convergenceguarantees to the subgradient method.
Stephen Gould | CVPR 2014 73/76
![Page 210: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/210.jpg)
Supergradient of mini{aTi x + bi}
Consider f (x) = mini{aTi x+ bi} and let I (x) = argmini{a
Ti x + bi}.
Then for any i ∈ I (x), g = ai is a supergradient of f at x.
f (x) + gT (z− x) = f (x)− aTi (z− x) i ∈ I (x)
= f (x)− aTi x− bi + aTi z+ bi
= aTi z+ bi
≥ f (z)
Stephen Gould | CVPR 2014 74/76
![Page 211: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/211.jpg)
Supergradient of mini{aTi x + bi}
Consider f (x) = mini{aTi x+ bi} and let I (x) = argmini{a
Ti x + bi}.
Then for any i ∈ I (x), g = ai is a supergradient of f at x.
f (x) + gT (z− x) = f (x)− aTi (z− x) i ∈ I (x)
= f (x)− aTi x− bi + aTi z+ bi
= aTi z+ bi
≥ f (z)
Stephen Gould | CVPR 2014 74/76
![Page 212: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/212.jpg)
Dual Decomposition Inference
initialize λc = 0
loopslaves solve minyc ψc(yc) + λc(yc) (to get µ⋆
c )master updates λc as
λc ← λc + α
(
µ⋆
c −1
C
∑
c′
µ⋆
c′
)
until convergence
Stephen Gould | CVPR 2014 75/76
![Page 213: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/213.jpg)
Dual Decomposition Inference
initialize λc = 0
loopslaves solve minyc ψc(yc) + λc(yc) (to get µ⋆
c )master updates λc as
λc ← λc + α
(
µ⋆
c −1
C
∑
c′
µ⋆
c′
)
until convergence
Stephen Gould | CVPR 2014 75/76
![Page 214: Tutorial on Learning and Inference in Discrete Graphical ...sgould/papers/tutorial-CVPR-2014.pdf · Exact inference in graphical models Graph-cut based methods Relaxations and dual-decomposition](https://reader036.fdocuments.net/reader036/viewer/2022062602/5f0220df7e708231d402b644/html5/thumbnails/214.jpg)
Tutorial Overview
Part 1. Inference(S. Gould, 45 minutes)
Exact inference in graphical models
Graph-cut based methods
Relaxations and dual-decomposition
(P. Kohli, 45 minutes)Strategies for higher-order models
(D. Batra, 15 minutes)M-Best MAP, Diverse M-Best
Part 2. Learning(M. Blaschko, 45 minutes)
Introduction to learning of graphical models
Maximum-likelihood learning, max-margin learning
Max-margin training via subgradient method
(K. Alahari, 45 minutes)Constraint generation approaches for structured learning
Efficient training of graphical models via dual-decomposition
Stephen Gould | CVPR 2014 76/76