Probabilistic Inference Lecture 1 M. Pawan Kumar [email protected] Slides available online
-
Upload
neal-hodges -
Category
Documents
-
view
223 -
download
0
Transcript of Probabilistic Inference Lecture 1 M. Pawan Kumar [email protected] Slides available online
Probabilistic InferenceLecture 1
M. Pawan Kumar
Slides available online http://cvc.centrale-ponts.fr/personnel/pawan/
About the Course
• 7 lectures + 1 exam
• Probabilistic Models – 1 lecture
• Energy Minimization – 4 lectures
• Computing Marginals – 2 lectures
• Related Courses• Probabilistic Graphical Models (MVA)• Structured Prediction
Instructor
• Assistant Professor (2012 – Present)
• Center for Visual Computing• 12 Full-time Faculty Members• 2 Associate Faculty Members
• Research Interests• Probabilistic Models• Machine Learning• Computer Vision• Medical Image Analysis
Students
• Third year at ECP
• Specializing in Machine Learning and Vision
• Prerequisites• Probability Theory• Continuous Optimization• Discrete Optimization
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Example (on board) !!
Outline
• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs
• Conversions
• Exponential Family
• Inference
MRF
UnobservedRandomVariables
Edges define a neighborhood over random variables
Neighbors
MRF
V1 V2 V3
V4 V5 V6
V7 V8 V9
Variable Va takes a value or a label va from a set L
V = v is called a labeling Discrete, Finite
= {l1, l2,…, lh}
MRF
V1 V2 V3
V4 V5 V6
V7 V8 V9
MRF assumes the Markovian property for P(v)
MRF
V1 V2 V3
V4 V5 V6
V7 V8 V9
Va is conditionally independent of Vb given Va’s neighbors
Hammersley-Clifford Theorem
MRF
V1 V2 V3
V4 V5 V6
V7 V8 V9
Probability P(v) can be decomposed into clique potentials
Potentialψ12(v1,v2)
Potentialψ56(v5,v6)
MRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
Probability P(v) proportional to Π(a,b) ψab(va,vb)
Potentialψ1(v1,d1)
Probability P(d|v) proportional to Πa ψa (va,da)
ObservedData
MRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)
Z
Z is known as the partition function
MRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
High-orderPotential
ψ4578(v4,v5,v7,v8)
Pairwise MRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
Z is known as the partition function
UnaryPotentialψ1(v1,d1)
PairwisePotentialψ56(v5,v6)
Probability P(v,d) =Πa ψa(va,da) Π(a,b) ψab(va,vb)
Z
MRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
A is conditionally independent of B given C if
there is no path from A to B when C is removed
Conditional Random Fields (CRF)
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
CRF assumes the Markovian property for P(v|d)
Hammersley-Clifford Theorem
CRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
Probability P(v|d) proportional to Πa ψa(va;d) Π(a,b) ψab(va,vb;d)
Clique potentials that depend on the data
CRF
V1
d1
V2
d2
V3
d3
V4
d4
V5
d5
V6
d6
V7
d7
V8
d8
V9
d9
Probability P(v|d) =Πa ψa (va;d) Π(a,b) ψab(va,vb;d)
Z
Z is known as the partition function
MRF and CRF
V1 V2 V3
V4 V5 V6
V7 V8 V9
Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)
Z
Outline
• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs
• Conversions
• Exponential Family
• Inference
Bayesian Networks
V1
V2 V3
V4 V5 V6
V7 V8
Directed Acyclic Graph (DAG) – no directed loops
Ignoring directionality of edges, a DAG can have loops
Bayesian Networks
V1
V2 V3
V4 V5 V6
V7 V8
Bayesian Network concisely represents the probability P(v)
Bayesian Networks
V1
V2 V3
V4 V5 V6
V7 V8
Probability P(v) = Πa P(va|Parents(va))
P(v1)P(v2|v1)P(v3|v1)P(v4|v2)P(v5|v2,v3)P(v6|v3)P(v7|v4,v5)P(v8|v5,v6)
Bayesian Networks
Courtesy Kevin Murphy
Bayesian Networks
V1
V2 V3
V4 V5 V6
V7 V8
Va is conditionally independent of its ancestors given its parents
Bayesian Networks
Conditional independence of A and B given C
Courtesy Kevin Murphy
Outline
• Probabilistic Models• Markov Random Fields (MRF)• Bayesian Networks• Factor Graphs
• Conversions
• Exponential Family
• Inference
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
Two types of nodes: variable nodes and factor nodes
Bipartite graph between the two types of nodes
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
Factor graphs concisely represents the probability P(v)
ψa(v1,v2)
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
Factor graphs concisely represents the probability P(v)
ψa({v}a)
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
Factor graphs concisely represents the probability P(v)
ψb(v2,v3)
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
Factor graphs concisely represents the probability P(v)
ψb({v}b)
Factor Graphs
V1 V2a V3b
c
V4 V5f V6g
d e
ψb({v}b)
Probability P(v) =Πa ψa({v}a)
Z
Z is known as the partition function
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
MRF to Factor Graphs
Bayesian Networks to Factor Graphs
Factor Graphs to MRF
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Motivation
Random Variable V Label set L = {l1, l2,…, lh}
Samples V1, V2, …, Vm that are i.i.d.
Functions ϕα: L Reals
Empirical expectations: μα = (Σi ϕα(Vi))/m
Expectation wrt distribution P: EP[ϕα(V)] = Σi ϕα(li)P(li)
Given empirical expectations, find compatible distribution
Underdetermined problem
α indexes a set of functions
Maximum Entropy Principle
max Entropy of the distribution
s.t. Distribution is compatible
Maximum Entropy Principle
max -Σi P(li)log(P(li))
s.t. Distribution is compatible
Maximum Entropy Principle
max -Σi P(li)log(P(li))
s.t. Σi ϕα(li)P(li) = μα for all α
Σi P(li) = 1
P(v) proportional to exp(-Σα θαϕα(v))
Exponential Family
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2,…, lh}
Labeling V = v, va L for all a {1, 2,…, n}
Functions ϕα: Ln Reals α indexes a set of functions
P(v) = exp{-Σα θαΦα(v) - A(θ)}
SufficientStatistics
Parameters NormalizationConstant
Minimal Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)}
SufficientStatistics
Parameters NormalizationConstant
No non-zero c such that Σα cαΦα(v) = Constant
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
va θafor all Va V
vavb θab for all (Va,Vb) E
Ising Model
P(v) = exp{-Σa θava -Σa,b θabvavb- A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {-1, +1}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
va θafor all Va V
vavb θab for all (Va,Vb) E
Interactive Binary Segmentation
Interactive Binary Segmentation
Foreground histogram of RGB values FG
Background histogram of RGB values BG
‘+1’ indicates foreground and ‘-1’ indicates background
Interactive Binary Segmentation
More likely to be foreground than background
Interactive Binary Segmentation
More likely to be background than foreground
θa proportional to -log(FG(da)) + log(BG(da))
Interactive Binary Segmentation
More likely to belong to same label
Interactive Binary Segmentation
Less likely to belong to same label
θab proportional to -exp(-(da-db)2)
Rest of lecture 1 ….
Exponential Family
P(v) = exp{-Σα θαΦα(v) - A(θ)}
SufficientStatistics
Parameters Log-PartitionFunction
Random Variables V = {V1,V2,…,Vn}
Labeling V = vva L = {l1,l2,…,lh}
Random Variable Va takes a value or label va
Overcomplete Representation
P(v) = exp{-Σα θαΦα(v) - A(θ)}
SufficientStatistics
Parameters Log-PartitionFunction
There exists a non-zero c such that Σα cαΦα(v) = Constant
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2}
Ising Model
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
Ia;i(va) θa;ifor all Va V, li L
θab;ik for all (Va,Vb) E, li, lk L
Iab;ik(va,vb)
Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk
Ising Model
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {0, 1}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
Ia;i(va) θa;ifor all Va V, li L
θab;ik for all (Va,Vb) E, li, lk L
Iab;ik(va,vb)
Ia;i(va): indicator for va = li Iab;ik(va,vb): indicator for va = li, vb = lk
Interactive Binary Segmentation
Foreground histogram of RGB values FG
Background histogram of RGB values BG
‘1’ indicates foreground and ‘0’ indicates background
Interactive Binary Segmentation
More likely to be foreground than background
Interactive Binary Segmentation
More likely to be background than foreground
θa;0 proportional to -log(BG(da))
θa;1 proportional to -log(FG(da))
Interactive Binary Segmentation
More likely to belong to same label
Interactive Binary Segmentation
Less likely to belong to same label
θab;ik proportional to exp(-(da-db)2) if i ≠ k
θab;ik = 0 if i = k
Metric Labeling
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn} Label set L = {l1, l2, …, lh}
Metric Labeling
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
Ia;i(va) θa;ifor all Va V, li L
θab;ik for all (Va,Vb) E, li, lk L
Iab;ik(va,vb)
θab;ik is a metric distance function over labels
Label set L = {0, …, h-1}
Metric Labeling
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Sufficient Statistics Parameters
Ia;i(va) θa;ifor all Va V, li L
θab;ik for all (Va,Vb) E, li, lk L
Iab;ik(va,vb)
θab;ik is a metric distance function over labels
Label set L = {0, …, h-1}
Stereo Correspondence
Disparity Map
Stereo Correspondence
L = {disparities}
Pixel (xa,ya) in leftcorresponds to
pixel (xa+va,ya) in right
Stereo Correspondence
L = {disparities}
θa;i is proportional tothe difference in RGB values
Stereo Correspondence
L = {disparities}
θab;ik = wab d(i,k)
wab proportional to exp(-(da-db)2)
Pairwise MRF
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Label set L = {l1, l2, …, lh}
P(v) = exp{-Σα θαΦα(v) - A(θ)}
Sufficient Statistics Parameters
Ia;i(va) θa;ifor all Va V, li L
θab;ik for all (Va,Vb) E, li, lk L
Iab;ik(va,vb)
Pairwise MRF
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Label set L = {l1, l2, …, lh}
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
A(θ) : log Z
Probability P(v) =Πa ψa(va) Π(a,b) ψab(va,vb)
Z
ψa(li) : exp(-θa;i) ψa(li,lk) : exp(-θab;ik)
Parameters θ are sometimes also referred to as potentials
Pairwise MRF
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Label set L = {l1, l2, …, lh}
P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)}
Labeling as a function f : {1, 2, … , n} {1, 2, …, h}
Variable Va takes a label lf(a)
Pairwise MRF
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Label set L = {l1, l2, …, lh}
P(f) = exp{-Σa θa;f(a) -Σa,b θab;f(a)f(b) - A(θ)}
Labeling as a function f : {1, 2, … , n} {1, 2, …, h}
Variable Va takes a label lf(a)
Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)
Pairwise MRF
Random Variable V = {V1, V2, …,Vn}
Neighborhood over variables specified by edges E
Label set L = {l1, l2, …, lh}
P(f) = exp{-Q(f) - A(θ)}
Labeling as a function f : {1, 2, … , n} {1, 2, …, h}
Variable Va takes a label lf(a)
Energy Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b)
Outline
• Probabilistic Models
• Conversions
• Exponential Family
• Inference
Inference
maxv ( P(v) = exp{-Σa Σi θa;iIa;i(va) -Σa,b Σi,k θab;ikIab;ik(va,vb) - A(θ)} )
Maximum a Posteriori (MAP) Estimation
minf ( Q(f) = Σa θa;f(a) + Σa,b θab;f(a)f(b) )
Energy Minimization
P(va = li) = Σv P(v)δ(va = li)
Computing Marginals
P(va = li, vb = lk) = Σv P(v)δ(va = li)δ(vb = lk)
Next Lecture …
Energy minimization for tree-structured pairwise MRF