Lecture 12: Graphical Models - Imperial College...

61
Intelligent Data Analysis and Probabilistic Inference Lecture 12: Graphical Models Recommended reading: Bishop, Chapter 8 Duncan Gillies and Marc Deisenroth Department of Computing Imperial College London February 15, 2016

Transcript of Lecture 12: Graphical Models - Imperial College...

Page 1: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Intelligent Data Analysis and Probabilistic Inference

Lecture 12:Graphical ModelsRecommended reading:Bishop, Chapter 8

Duncan Gillies and Marc DeisenrothDepartment of ComputingImperial College London

February 15, 2016

Page 2: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Probabilistic Graphical Models

a b

c

a b

c

a b

c

Three types of probabilistic graphical models§ Bayesian networks (directed graphical models)§ Markov random fields (undirected graphical models)§ Factor graphs

§ Nodes: Random variables§ Edges: Probabilistic relations between variablesGraph captures the way in which the joint distribution over all

random variables can be decomposed into a product of factorsdepending only on a subset of these variables

Graphical Models IDAPI, Lecture 12 February 15, 2016 2

Page 3: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Probabilistic Graphical Models

a b

c

a b

c

a b

c

Three types of probabilistic graphical models§ Bayesian networks (directed graphical models)§ Markov random fields (undirected graphical models)§ Factor graphs

§ Nodes: Random variables§ Edges: Probabilistic relations between variables

Graph captures the way in which the joint distribution over allrandom variables can be decomposed into a product of factorsdepending only on a subset of these variables

Graphical Models IDAPI, Lecture 12 February 15, 2016 2

Page 4: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Probabilistic Graphical Models

a b

c

a b

c

a b

c

Three types of probabilistic graphical models§ Bayesian networks (directed graphical models)§ Markov random fields (undirected graphical models)§ Factor graphs

§ Nodes: Random variables§ Edges: Probabilistic relations between variablesGraph captures the way in which the joint distribution over all

random variables can be decomposed into a product of factorsdepending only on a subset of these variables

Graphical Models IDAPI, Lecture 12 February 15, 2016 2

Page 5: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Why are they useful?

§ Simple way to visualize the structure of a probabilistic model

§ Can be used to design/motivate new models

§ Insights into properties of the model (e.g., conditionalindependence) by inspection of the graph

§ Complex computations for inference and learning can beexpressed in terms of graphical manipulations

Graphical Models IDAPI, Lecture 12 February 15, 2016 3

Page 6: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Importance of Visualization

From Kim et al. (NIPS, 2015)

From Kim et al. (NIPS, 2015)

Graphical Models IDAPI, Lecture 12 February 15, 2016 4

Page 7: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Importance of Visualization

From Kim et al. (NIPS, 2015)

From Kim et al. (NIPS, 2015)

Graphical Models IDAPI, Lecture 12 February 15, 2016 4

Page 8: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Bayesian Networks (Directed Graphical Models)

Graphical Models IDAPI, Lecture 12 February 15, 2016 5

Page 9: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

From Joints to Graphs

Consider the joint distribution

ppa, b, cq “ ppc|a, bqppb|aqppaq

Building the corresponding graphical model:

1. Create a node for all random variables

2. For each conditional distribution, we add a directed link (arrow)to the graph from the nodes corresponding to the variables onwhich the distribution is conditioned on

a b

c

Graph layout depends on the choice of factorizationGraphical Models IDAPI, Lecture 12 February 15, 2016 6

Page 10: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

From Joints to Graphs

Consider the joint distribution

ppa, b, cq “ ppc|a, bqppb|aqppaq

Building the corresponding graphical model:

1. Create a node for all random variables

2. For each conditional distribution, we add a directed link (arrow)to the graph from the nodes corresponding to the variables onwhich the distribution is conditioned on

a b

c

Graph layout depends on the choice of factorizationGraphical Models IDAPI, Lecture 12 February 15, 2016 6

Page 11: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

From Graphs to Joints

x1 x2

x3 x4

x5

§ Joint distribution is the product of a set of conditionals, one foreach node in the graph

§ Each conditional is conditioned only on the parents of thecorresponding node in the graph

ppx1, x2, x3, x4, x5q “ ppx1qppx5qppx2|x5qppx3|x1, x2qppx4|x2q

In general: ppxq “śK

k“1 ppxk|pakq

Graphical Models IDAPI, Lecture 12 February 15, 2016 7

Page 12: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example: Bayesian Regression

x

t

0 1

−1

0

1

From PRML (Bishop, 2006)

We are given a data setpx1, y1q, . . . , pxN , yNqwhere

yi “ f pxiq ` ε, ε „ N`

0, σ2˘

with f unknown.Find a (regression) model that

explains the data

§ Consider polynomials f pxq “řM

j“0 wjxj with parametersw “ rw0, . . . , wMs

J.§ Bayesian regression: Place a conjugate Gaussian prior on the

parameters: ppwq “ N`

0, α2I˘

Graphical Models IDAPI, Lecture 12 February 15, 2016 8

Page 13: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example: Bayesian Regression

x

t

0 1

−1

0

1

From PRML (Bishop, 2006)

We are given a data setpx1, y1q, . . . , pxN , yNqwhere

yi “ f pxiq ` ε, ε „ N`

0, σ2˘

with f unknown.Find a (regression) model that

explains the data

§ Consider polynomials f pxq “řM

j“0 wjxj with parametersw “ rw0, . . . , wMs

J.§ Bayesian regression: Place a conjugate Gaussian prior on the

parameters: ppwq “ N`

0, α2I˘

Graphical Models IDAPI, Lecture 12 February 15, 2016 8

Page 14: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Graphical Models for Bayesian Regression

x

t

0 1

−1

0

1

From PRML (Bishop, 2006)

y “ f pxq ` ε

ppεq “ N`

0, σ2˘

f pxq “Mÿ

j“0

wjxj

ppwq “ N`

0, α2I˘

w

y1 yN

w

ynN

w

ynN

xn

α

σ

Graphical Models IDAPI, Lecture 12 February 15, 2016 9

Page 15: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Graphical Models for Bayesian Regression

x

t

0 1

−1

0

1

From PRML (Bishop, 2006)

y “ f pxq ` ε

ppεq “ N`

0, σ2˘

f pxq “Mÿ

j“0

wjxj

ppwq “ N`

0, α2I˘

w

y1 yN

w

ynN

w

ynN

xn

α

σ

Graphical Models IDAPI, Lecture 12 February 15, 2016 9

Page 16: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Graphical Models for Bayesian Regression

x

t

0 1

−1

0

1

From PRML (Bishop, 2006)

y “ f pxq ` ε

ppεq “ N`

0, σ2˘

f pxq “Mÿ

j“0

wjxj

ppwq “ N`

0, α2I˘

w

y1 yN

w

ynN

w

ynN

xn

α

σ

Graphical Models IDAPI, Lecture 12 February 15, 2016 9

Page 17: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Conditional Independence

a KK b|c ô ppa|b, cq “ ppa|cqô ppa, b|cq “ ppa|cqppb|cq

§ Conditional independence properties of the joint distribution canbe read directly from the graph

§ No analytical manipulations required.d-separation (Pearl, 1988)

Graphical Models IDAPI, Lecture 12 February 15, 2016 10

Page 18: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

D-Separation (Directed Graphs)

AC

B

Directed, acyclic graph in whichA, B, C are arbitrary,non-intersecting sets of nodes.Does A KK B|C hold?

Consider all possible paths from any node in A to any node in B.Any such path is blocked if it includes a node such that either

§ Arrows on the path meet either head-to-tail or tail-to-tail at thenode, and the node is in the set C or

§ Arrows meet head-to-head at the node and neither the node norany of its descendants is in the set C

If all paths are blocked, then A is d-separated from B by C, and thejoint distribution satisfies A KK B|C.

Graphical Models IDAPI, Lecture 12 February 15, 2016 11

Page 19: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

D-Separation (Directed Graphs)

AC

B

Directed, acyclic graph in whichA, B, C are arbitrary,non-intersecting sets of nodes.Does A KK B|C hold?

Consider all possible paths from any node in A to any node in B.Any such path is blocked if it includes a node such that either

§ Arrows on the path meet either head-to-tail or tail-to-tail at thenode, and the node is in the set C or

§ Arrows meet head-to-head at the node and neither the node norany of its descendants is in the set C

If all paths are blocked, then A is d-separated from B by C, and thejoint distribution satisfies A KK B|C.

Graphical Models IDAPI, Lecture 12 February 15, 2016 11

Page 20: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

D-Separation (Directed Graphs)

AC

B

Directed, acyclic graph in whichA, B, C are arbitrary,non-intersecting sets of nodes.Does A KK B|C hold?

Consider all possible paths from any node in A to any node in B.Any such path is blocked if it includes a node such that either

§ Arrows on the path meet either head-to-tail or tail-to-tail at thenode, and the node is in the set C or

§ Arrows meet head-to-head at the node and neither the node norany of its descendants is in the set C

If all paths are blocked, then A is d-separated from B by C, and thejoint distribution satisfies A KK B|C.

Graphical Models IDAPI, Lecture 12 February 15, 2016 11

Page 21: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example

a

e

d

b

c

(a) a KK b|c?

a

e

d

b

c

(b) a KK b|d?

Remember: A path is blocked if it includes a node such that either

§ The arrows on the path meet either head-to-tail or tail-to-tail atthe node, and the node is in the set C or

§ The arrows meet head-to-head at the node, and neither the nodenor any of its descendants is in the set C

Graphical Models IDAPI, Lecture 12 February 15, 2016 12

Page 22: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Markov Random Fields (Undirected Graphical Models)

Graphical Models IDAPI, Lecture 12 February 15, 2016 13

Page 23: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Markov Random Fields

a b

c

§ Nodes are sets of random variables

§ Links connect these nodes

Graphical Models IDAPI, Lecture 12 February 15, 2016 14

Page 24: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Joint Distribution

§ Express joint distribution ppxq as a product of functions definedon subsets of variables that are local to the graph

§ If xi, xj are not connected directly by a link then xi KK xj|xztxi, xju

(conditionally independent given everything else)

§ Then: In the factorization xi, xj never appear in a joint factorCliques (fully connected subgraphs)

§ Define factors in the decomposition of the joint to be functions ofthe variables in (maximum) cliques:

ppxq9ź

C

ψCpxCq

Graphical Models IDAPI, Lecture 12 February 15, 2016 15

Page 25: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Joint Distribution

§ Express joint distribution ppxq as a product of functions definedon subsets of variables that are local to the graph

§ If xi, xj are not connected directly by a link then xi KK xj|xztxi, xju

(conditionally independent given everything else)

§ Then: In the factorization xi, xj never appear in a joint factorCliques (fully connected subgraphs)

§ Define factors in the decomposition of the joint to be functions ofthe variables in (maximum) cliques:

ppxq9ź

C

ψCpxCq

Graphical Models IDAPI, Lecture 12 February 15, 2016 15

Page 26: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Joint Distribution

§ Express joint distribution ppxq as a product of functions definedon subsets of variables that are local to the graph

§ If xi, xj are not connected directly by a link then xi KK xj|xztxi, xju

(conditionally independent given everything else)

§ Then: In the factorization xi, xj never appear in a joint factorCliques (fully connected subgraphs)

§ Define factors in the decomposition of the joint to be functions ofthe variables in (maximum) cliques:

ppxq9ź

C

ψCpxCq

Graphical Models IDAPI, Lecture 12 February 15, 2016 15

Page 27: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Joint Distribution

§ Express joint distribution ppxq as a product of functions definedon subsets of variables that are local to the graph

§ If xi, xj are not connected directly by a link then xi KK xj|xztxi, xju

(conditionally independent given everything else)

§ Then: In the factorization xi, xj never appear in a joint factorCliques (fully connected subgraphs)

§ Define factors in the decomposition of the joint to be functions ofthe variables in (maximum) cliques:

ppxq9ź

C

ψCpxCq

Graphical Models IDAPI, Lecture 12 February 15, 2016 15

Page 28: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factorization Properties

ppxq “1Z

ź

C

ψCpxCq

§ C: maximal clique

§ xC: all variables in this clique

§ ψCpxCq: clique potential

§ Z “ř

C ψCpxCq: normalization constant

Graphical Models IDAPI, Lecture 12 February 15, 2016 16

Page 29: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Clique Potentials

ppxq “1Z

ź

C

ψCpxCq

Clique potentials ψCpxCq:

§ ψCpxCq ě 0

§ Unlike directed graphs, no probabilistic interpretation necessary(e.g., marginal or conditional).

§ If we convert a directed graph into an MRF, the clique potentialsmay have a probabilistic interpretation

Graphical Models IDAPI, Lecture 12 February 15, 2016 17

Page 30: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Normalization Constant

ppxq “1Z

ź

C

ψCpxCq

§ Gives us flexibility in the definition the factorization in an MRF

§ Partition function Z is required for parameter learning (notcovered in this course)

§ In a discrete model with M discrete nodes each having K states,the evaluation Z requires summing over KM states

Exponential in the size of the model

§ In a continuous model, we need to solve integralsIntractable in many cases

Major limitation of MRFsGraphical Models IDAPI, Lecture 12 February 15, 2016 18

Page 31: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Conditional Independence

AC

B

Two easy checks for conditional independence:§ A KK B|C if and only if all paths from A to B pass through C.

(Then, all paths are blocked)§ Alternative: Remove all nodes in C from the graph. If there is a

path from A to B then A KK B|C does not holdGraphical Models IDAPI, Lecture 12 February 15, 2016 19

Page 32: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Potentials as Energy Functions

§ Look only at potential functions with ψCpxCq ą 0ψCpxCq “ expp´EpxCqq for some energy function E

§ Joint distribution is the product of clique potentialsTotal energy is the sum of the energies of the clique potentials

Graphical Models IDAPI, Lecture 12 February 15, 2016 20

Page 33: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Potentials as Energy Functions

§ Look only at potential functions with ψCpxCq ą 0ψCpxCq “ expp´EpxCqq for some energy function E

§ Joint distribution is the product of clique potentialsTotal energy is the sum of the energies of the clique potentials

Graphical Models IDAPI, Lecture 12 February 15, 2016 20

Page 34: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example: Image Restauration

From PRML (Bishop, 2006)

§ Binary image, corrupted by 10% binary noise (pixel values flipwith probability 0.1).

§ Objective: Restore noise-free image

Pairwise Markov random field that has all its variables joined incliques of size 2

Graphical Models IDAPI, Lecture 12 February 15, 2016 21

Page 35: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Image Restauration (2)

xi

yi

§ MRF-based approach

§ Latent variables xi P t´1,`1u are the binary noise-free pixelvalues

§ Observed variables yi P t´1,`1u are the noise-corrupted pixelvalues

Graphical Models IDAPI, Lecture 12 February 15, 2016 22

Page 36: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Image Restauration (2)

xi

yi

§ MRF-based approach

§ Latent variables xi P t´1,`1u are the binary noise-free pixelvalues

§ Observed variables yi P t´1,`1u are the noise-corrupted pixelvalues

Graphical Models IDAPI, Lecture 12 February 15, 2016 22

Page 37: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Clique Potentials

xi

yi

Two types of clique potentials:

§ ψxypxi, yiq “ ´ηxiyi , η ą 0Strong correlation between observed and latent variables

§ ψxxpxi, xjq “ ´βxixj , β ą 0 for neighboring pixels xi, xj

Favor similar labels for neighboring pixels (smoothness prior)

Graphical Models IDAPI, Lecture 12 February 15, 2016 23

Page 38: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Clique Potentials

xi

yi

Two types of clique potentials:

§ ψxypxi, yiq “ ´ηxiyi , η ą 0Strong correlation between observed and latent variables

§ ψxxpxi, xjq “ ´βxixj , β ą 0 for neighboring pixels xi, xj

Favor similar labels for neighboring pixels (smoothness prior)

Graphical Models IDAPI, Lecture 12 February 15, 2016 23

Page 39: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Energy Function

Total energy:

Epx, yq “ ´ηÿ

i

xiyi

loooomoooon

latent-observed

´βÿ

ti,ju

xixj

looooomooooon

latent-latent

` hÿ

i

xi

loomoon

bias

§ Bias term places a prior on the latent pixel values, e.g., `1.

§ Joint distribution ppx, yq “ 1Z expp´Epx, yqq

§ Fix y-values to the observed ones Implicitly define ppx|yq

§ Example of an Ising model Statistical physics

Graphical Models IDAPI, Lecture 12 February 15, 2016 24

Page 40: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

ICM Algorithm for Image Restauration

Noise-corrupted image, ICM, Graph-cut (From PRML (Bishop, 2006))

Iterated Conditional Modes (ICM, Kittler & Foglein, 1984)

1. Initialize all xi “ yi

2. Pick any xj: Evaluate total energy Epxzj Y t`1u, yq,Epxzj Y t´1u, yq

3. Set xj to whichever state has the lower energy

4. RepeatLocal optimum

Graphical Models IDAPI, Lecture 12 February 15, 2016 25

Page 41: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Directed Graph Ñ MRF

§ Moralization:§ Add additional undirected links between all pairs of parents for

each node in the graph§ Drop arrows on original links

§ Identify (maximum) cliques

§ Initialize all clique potentials to 1

§ Take each conditional distribution factor in the directed graph,multiply it into one of the clique potentials

Graphical Models IDAPI, Lecture 12 February 15, 2016 26

Page 42: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Relation to Directed Graphs

a b

c d

c

a b

PUD

§ Directed and undirected graphs express different conditionalindependence properties

§ Left: a KK b|H, a zKK b|c has no MRF equivalent

§ Center: a zKK b|H, c KK d|aY b, a KK b|cY d has no Bayesnetequivalent

Graphical Models IDAPI, Lecture 12 February 15, 2016 27

Page 43: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factor Graphs

Graphical Models IDAPI, Lecture 12 February 15, 2016 28

Page 44: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factor Graphs

a b

c

§ (Un)directed graphical models express a global function ofseveral variables as a product of factors over subsets of thosevariables

§ Factor graphs make this decomposition explicit by introducingadditional nodes for the factors themselves.

Graphical Models IDAPI, Lecture 12 February 15, 2016 29

Page 45: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factorizing the Joint

The joint distribution is a product of factors:

ppxq “ź

sfspxsq

§ x “ px1, . . . , xnq

§ xs: Subset of variables§ fs: Factor; non-negative function of the variables xs

§ Building a factor graph as a bipartite graph:§ Nodes for all random variables (same as in (un)directed graphical

models)§ Additional nodes for factors (black squares) in the joint

distribution

§ Undirected links connecting each factor node to all of the variablenodes the factor depends on

Graphical Models IDAPI, Lecture 12 February 15, 2016 30

Page 46: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factorizing the Joint

The joint distribution is a product of factors:

ppxq “ź

sfspxsq

§ x “ px1, . . . , xnq

§ xs: Subset of variables§ fs: Factor; non-negative function of the variables xs

§ Building a factor graph as a bipartite graph:§ Nodes for all random variables (same as in (un)directed graphical

models)§ Additional nodes for factors (black squares) in the joint

distribution

§ Undirected links connecting each factor node to all of the variablenodes the factor depends on

Graphical Models IDAPI, Lecture 12 February 15, 2016 30

Page 47: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example

x1 x2 x3

fa fb fc fd

ppxq “ fapx1, x2q fbpx1, x2q fcpx2, x3q fdpx3q

Graphical Models IDAPI, Lecture 12 February 15, 2016 31

Page 48: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

MRF Ñ Factor Graph

1. Take variable nodes from MRF

2. Create additional factor nodes corresponding to the maximalcliques xs

3. The factors fspxsq equal the clique potentials

4. Add appropriate links

Not unique

Graphical Models IDAPI, Lecture 12 February 15, 2016 32

Page 49: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example: MRF Ñ Factor Graph

x1 x2

x3

x1 x2

x3

x1 x2

x3

f fafb

§ MRF with clique potential ψpx1, x2, x3q

§ Factor graph with factor f px1, x2, x3q “ ψpx1, x2, x3q

§ Factor graph with factors, such thatfapx1, x2, x3q fbpx2, x3q “ ψpx1, x2, x3q

Graphical Models IDAPI, Lecture 12 February 15, 2016 33

Page 50: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Directed Graphical Model Ñ Factor Graph

1. Take variable nodes from Bayesian network

2. Create additional factor nodes corresponding to the conditionaldistributions

3. Add appropriate links

Not unique

Graphical Models IDAPI, Lecture 12 February 15, 2016 34

Page 51: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example: Directed Graph Ñ Factor Graph

x1 x2

x3

x1 x2

x3

x1 x2

x3

f fcfbfa

§ Directed graph with factorization ppx1qppx2qppx3|x1, x2q

§ Factor graph with factor f px1, x2, x3q “ ppx1qppx2qppx3|x1, x2q

§ Factor graph with factors fa “ ppx1q, fb “ ppx2q, fc “ ppx3|x1, x2q

Graphical Models IDAPI, Lecture 12 February 15, 2016 35

Page 52: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Removing Cycles

x1 x2

x3 x1 x2

x3

f (x1, x2, x3)x1 x2

x3

§ Local cycles in an (un)directed graph (due to links connectingparents of a node) can be removed on conversion to a factorgraph

Graphical Models IDAPI, Lecture 12 February 15, 2016 36

Page 53: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Sum-Product Algorithm for Factor Graphs

§ Factor graphs give a uniform treatment to message passing§ Two different types of messages:

§ Messages µxÑ f pxq from variable nodes to factors§ Messages µ fÑxpxq from factors to variable nodes

§ Factors transform messages into evidence for the receiving node.

Graphical Models IDAPI, Lecture 12 February 15, 2016 37

Page 54: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Variable-to-Factor Message

xm

f1

fK

µxm→fs(x)

fs

µ fK→x

m(xm)

µf1→xm (x

m )

µxmÑ fspxmq “ź

lPnepxmqz fs

µ flÑxmpxmq

§ Take the product of all incoming messages along all other links§ A variable node can send a message to a factor node once it has

received messages from all other neighboring factors§ The message that a node sends to a factor is made up of the

messages that it receives from all other factors.Graphical Models IDAPI, Lecture 12 February 15, 2016 38

Page 55: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Factor-to-Variable Message

x1

xM

xµfs→x(x)

fsfsµ xM→f

s(x

M)

µx1→fs (x

1 )

µ fsÑxpxq “ÿ

x1

¨ ¨ ¨ÿ

xM

fspx, x1, . . . , xMqź

mPnep fsqzx

µxmÑ fspxmq

§ Take the product of the incoming messages along all other linkscoming into the factor node

§ Multiply by the factor associated with that node§ Marginalize over all of the variables associated with the

incoming messagesGraphical Models IDAPI, Lecture 12 February 15, 2016 39

Page 56: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Initialization

§ If the leaf node is a variable nodes, initialize the correspondingmessages to 1:

µxÑ f pxq “ 1

§ If the leaf node is a factor node, the message should be

µ fÑxpxq “ f pxq

Graphical Models IDAPI, Lecture 12 February 15, 2016 40

Page 57: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example (1)

x1 x2 x3

x4

fa fb

fc

From PRML (Bishop, 2006)

µx1Ñ fapx1q “ 1

µ faÑx2px2q “ÿ

x1

fapx1, x2q ¨ 1

µx4Ñ fcpx4q “ 1

µ fcÑx2px2q “ÿ

x4

fcpx2, x4q ¨ 1

µx2Ñ fbpx2q “ µ faÑx2px2qµ fcÑx2px2q

µ fbÑx3px3q “ÿ

x2

fbpx2, x3qµx2Ñ fbpx2q

Graphical Models IDAPI, Lecture 12 February 15, 2016 41

Page 58: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Example (2)

x1 x2 x3

x4

fa fb

fc

From PRML (Bishop, 2006)

µx3Ñ fbpx3q “ 1

µ fbÑx2px2q “ÿ

x3

fbpx2, x3q ¨ 1

µx2Ñ fapx2q “ µ fbÑx2px2qµ fcÑx2px2q

µ faÑx1px1q “ÿ

x2

fapx1, x2qµx2Ñ fapx2q

µx2Ñ fcpx2q “ µ faÑx2px2qµ fbÑx2px2q

µ fcÑx4px4q “ÿ

x2

fcpx2, x4qµx2Ñ fcpx2q

Tutorial

Graphical Models IDAPI, Lecture 12 February 15, 2016 42

Page 59: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Marginals

x

f1

fK

µx←fs(x)

fs

µ fK→x(x)

µf1→x (x)

For a single variable node the marginal is given as the product of allincoming messages:

ppxq “ź

fiPnepxq

µ fiÑxpxq

Graphical Models IDAPI, Lecture 12 February 15, 2016 43

Page 60: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

Applications: Message Passing in Graphical Models

§ Ranking: TrueSkill (Herbrich et al., 2007)§ Computer vision: de-noising, segmentation, semantic labeling, ...

(e.g., Sucar & Gillies, 1994; Shotton et al., 2006; Szeliski et al., 2008)§ Coding theory: low-density parity-check codes, turbo codes, ...

(e.g., McEliece et al., 1998)§ Linear algebra: Solve linear equation systems (Shental et al., 2008)§ Signal processing: Iterative state estimation (e.g., Bickson et al.,

2007; Deisenroth & Mohamed, 2012)Graphical Models IDAPI, Lecture 12 February 15, 2016 44

Page 61: Lecture 12: Graphical Models - Imperial College Londondfg/ProbabilisticInference/IDAPISlides12.pdf · Graphical Models IDAPI, Lecture 12 February 15, 2016 12. Markov Random Fields

References I

[1] D. Bickson, D. Dolev, O. Shental, P. H. Siegel, and J. K. Wolf. Linear Detection via Belief Propagation. In Proceedings of theAnnual Allerton Conference on Communication, Control, and Computing, 2007.

[2] C. M. Bishop. Pattern Recognition and Machine Learning. Information Science and Statistics. Springer-Verlag, 2006.

[3] M. P. Deisenroth and S. Mohamed. Expectation Propagation in Gaussian Process Dynamical Systems. In Advances inNeural Information Processing Systems, pages 2618–2626, 2012.

[4] R. Herbrich, T. Minka, and T. Graepel. TrueSkill(TM): A Bayesian Skill Rating System. In Advances in Neural InformationProcessing Systems, pages 569–576. MIT Press, 2007.

[5] B. Kim, J. A. Shah, and F. Doshi-Velez. Mind the Gap: A Generative Approach to Interpretable Feature Selection andExtraction. In C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, and R. Garnett, editors, Advances in Neural InformationProcessing Systems, pages 2251–2259. Curran Associates, Inc., 2015.

[6] J. Kittler and J. Foglein. Contextual Classification of Multispectral Pixel Data. IMage and Vision Computing, 2(1):13–29, 1984.

[7] R. J. McEliece, D. J. C. MacKay, and J.-F. Cheng. Turbo Decoding as an Instance of Pearl’s “Belief Propagation” Algorithm.IEEE Journal on Selected Areas in Communications, 16(2):140–152, 1998.

[8] J. Pearl. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann, 1988.

[9] O. Shental, D. Bickson, J. K. W. P. H. Siegel and, and D. Dolev. Gaussian Belief Propagatio Solver for Systems of LinearEquations. In IEEE International Symposium on Information Theory, 2008.

[10] J. Shotton, J. Winn, C. Rother, and A. Criminisi. TextonBoost: Joint Appearance, Shape and Context Modeling forMulit-Class Object Recognition and Segmentation. In Proceedings of the European Conference on Computer Vision, 2006.

[11] L. E. Sucar and D. F. Gillies. Probabilistic Reasoning in High-Level Vision. Image and Vision Computing, 12(1):42–60, 1994.

[12] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, A. A. Vladimir Kolmogorov, M. Tappen, and C. Rother. A ComparativeStudy of Energy Minimization Methods for Markov Random Fields with Smoothness-based Priors. IEEE Transactions onPattern Analysis and Machine Intelligence, 30(6):1068–1080, 2008.

Graphical Models IDAPI, Lecture 12 February 15, 2016 45