Message Passing Algorithms for Optimization
description
Transcript of Message Passing Algorithms for Optimization
![Page 1: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/1.jpg)
1
Message Passing Algorithms for OptimizationNicholas Ruozzi
Advisor: Sekhar Tatikonda
Yale University
![Page 2: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/2.jpg)
2
The Problem
Minimize a real-valued objective function that factorizes as a sum of potentials
(a multiset whose elements are subsets of the indices 1,…,n)
![Page 3: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/3.jpg)
3
Corresponding Graph
21 3
![Page 4: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/4.jpg)
4
Local Message Passing Algorithms
Pass messages on this graph to minimize f
Distributed message passing algorithm
Ideal for large scientific problems, sensor networks, etc.
21 3
![Page 5: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/5.jpg)
5
The Min-Sum Algorithm Messages at time t:
21 3
4
![Page 6: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/6.jpg)
6
Computing Beliefs The min-marginal corresponding to the ith
variable is given by
Beliefs approximate the min-marginals:
Estimate the optimal assignment as
![Page 7: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/7.jpg)
7
Min-Sum: Convergence Properties
Iterations do not necessarily converge
Always converges when the factor graph is a tree
Converged estimates need not correspond to the optimal solution
Performs well empirically
![Page 8: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/8.jpg)
8
Previous Work
Prior work focused on two aspects of message passing algorithms Convergence
Coordinate ascent schemes Not necessarily local message passing algorithms
Correctness No combinatorial characterization of failure modes Concerned only with global optimality
![Page 9: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/9.jpg)
9
Contributions
A new local message passing algorithm Parameterized family of message passing algorithms
Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a global optima
Conditions under which the estimate produced by the splitting algorithm is guaranteed to be a local optima
![Page 10: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/10.jpg)
10
Contributions
What makes a graphical model “good”?
Combinatorial understanding of the failure modes of the splitting algorithm via graph covers
Can be extended to other iterative algorithms
Techniques for handling objective functions for which the known convergent algorithms fail
Reparameterization centric approach
![Page 11: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/11.jpg)
11
Publications Convergent and correct message passing schemes for optimization problems
over graphical modelsProceedings of the 26th Conference on Uncertainty in Artificial Intelligence (UAI), July 2010
Fixing Max-Product: A Unified Look at Message Passing Algorithms (invited talk)Proceedings of the Forty-Eighth Annual Allerton Conference on Communication, Control, and Computing, September 2010
Unconstrained minimization of quadratic functions via min-sumProceedings of the Conference on Information Sciences and Systems (CISS), Princeton, NJ/USA, March 2010
Graph covers and quadratic minimizationProceedings of the Forty-Seventh Annual Allerton Conference on Communication, Control, and Computing, September 2009
s-t paths using the min-sum algorithmProceedings of the Forty-Sixth Annual Allerton Conference on Communication, Control, and Computing, September 2008
![Page 12: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/12.jpg)
12
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment Graph covers
Quadratic Minimization
![Page 13: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/13.jpg)
13
The Problem
Minimize a real-valued objective function that factorizes as a sum of potentials
(a multiset whose elements are subsets of the indices 1,…,n)
![Page 14: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/14.jpg)
14
Factorizations
Some factorizations are better than others
If xi takes one of k values this requires at most 2k2
+ k operations
![Page 15: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/15.jpg)
15
Factorizations
Some factorizations are better than others
Suppose
Only need k operations to compute the minimum value!
![Page 16: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/16.jpg)
16
Reparameterizations
We can rewrite the objective function as
This does not change the objective function as long as the messages are real-valued at each x
The objective function is reparameterized in terms of the messages
![Page 17: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/17.jpg)
17
Reparameterizations
We can rewrite the objective function as
The reparameterization has the same factor graph as the original factorization
Many message passing algorithms produce a reparameterization upon convergence
![Page 18: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/18.jpg)
18
The Splitting Reparameterization Let c be a vector of non-zero reals
If c is a vector of positive integers, then we could view this as a factorization in two ways: Over the same factor graph as the original
potentials Over a factor graph where each potential has been
“split” into several pieces
![Page 19: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/19.jpg)
19
The Splitting Reparameterization
2
1
3 2
1
3
Factor graph
Factor graph resulting from “splitting” each of the
pairwise potentials 3 times
![Page 20: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/20.jpg)
20
The Splitting Reparameterization
Beliefs:
Reparameterization:
![Page 21: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/21.jpg)
21
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment Graph covers
Quadratic Minimization
![Page 22: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/22.jpg)
22
Lower Bounds
Can lower bound the objective function with these reparameterizations:
Find the collection of messages that maximize this lower bound Lower bound is a concave function of the messages
Use coordinate ascent or subgradient methods
![Page 23: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/23.jpg)
23
Lower Bounds and the MAP LP
Equivalent to minimizing f
Dual provides a lower bound on f
Messages are a side-effect of certain dual formulations
![Page 24: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/24.jpg)
24
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment Graph covers
Quadratic Minimization
![Page 25: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/25.jpg)
25
The Splitting Algorithm A local message passing algorithm for the
splitting reparameterization
Contains the min-sum algorithm as a special case For the integer case, can be derived from the min-
sum update equations
![Page 26: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/26.jpg)
26
The Splitting Algorithm
For certain choices of c, an asynchronous version of the splitting algorithm can be shown to be a block coordinate ascent scheme for the lower bound:
For example:
![Page 27: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/27.jpg)
27
Asynchronous Splitting Algorithm
2
1
3
![Page 28: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/28.jpg)
28
Asynchronous Splitting Algorithm
2
1
3
![Page 29: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/29.jpg)
29
Asynchronous Splitting Algorithm
2
1
3
![Page 30: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/30.jpg)
30
Coordinate Ascent
Guaranteed to converge
Does not necessarily maximize the lower bound
Can get stuck in a suboptimal configuration
Can be shown to converge to the maximum in restricted cases
Pairwise-binary objective functions
![Page 31: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/31.jpg)
31
Other Ascent Schemes
Many other ascent algorithms are possible over different lower bounds: TRW-S [Kolmogorov 2007]
MPLP [Globerson and Jaakkola 2007]
Max-Sum Diffusion [Werner 2007]
Norm-product [Hazan 2010]
Not all coordinate ascent schemes are local
![Page 32: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/32.jpg)
32
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment
Graph covers
Quadratic Minimization
![Page 33: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/33.jpg)
33
Constructing the Solution
Construct an estimate, x*, of the optimal assignment from the beliefs by choosing
For certain choices of the vector c, if each argmin is unique, then x* minimizes f
A simple choice of c guarantees both convergence and correctness (if the argmins are unique)
![Page 34: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/34.jpg)
34
Correctness
If the argmins are not unique, then we may not be able to construct a solution
When does the algorithm converge to the correct minimizing assignment?
![Page 35: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/35.jpg)
35
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment Graph covers
Quadratic Minimization
![Page 36: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/36.jpg)
36
Graph Covers
A graph H covers a graph G if there is homomorphism from H to G that is a bijection on neighborhoods
Graph G 2-cover of G
2
1
3
21 3
3’2’
1’
![Page 37: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/37.jpg)
37
Graph Covers
Potential functions are “lifts” of the nodes they cover
Graph G 2-cover of G
2
1
3
21 3
3’2’
1’
![Page 38: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/38.jpg)
38
Graph Covers
The lifted potentials define a new objective function
Objective function:
2-cover objective function
![Page 39: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/39.jpg)
39
Graph Covers
Indistinguishability: for any cover and any choice of initial messages on the original graph, there exists a choice of initial messages on the cover such that the messages passed by the splitting algorithm are identical on both graphs
For choices of c that guarantee correctness, any assignment that uniquely minimizes each must also minimize the objective function corresponding to any finite cover
![Page 40: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/40.jpg)
40
Maximum Weight Independent Set
1
2 3
21 3
3’2’
1’
Graph G 2-cover of G
![Page 41: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/41.jpg)
41
Maximum Weight Independent Set
5
2 2
25 2
22
5
Graph G 2-cover of G
![Page 42: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/42.jpg)
42
Maximum Weight Independent Set
5
2 2
25 2
22
5
Graph G 2-cover of G
![Page 43: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/43.jpg)
43
Maximum Weight Independent Set
3
2 2
23 2
22
3
Graph G 2-cover of G
![Page 44: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/44.jpg)
44
Maximum Weight Independent Set
3
2 2
23 2
22
3
Graph G 2-cover of G
![Page 45: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/45.jpg)
45
More Graph Covers If covers of the factor graph have different solutions
The splitting algorithm cannot converge to the correct answer for choices of c that guarantee correctness
The min-sum algorithm may converge to an assignment that is optimal on a cover
There are applications for which the splitting algorithm always works
Minimum cuts, shortest paths, and more…
![Page 46: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/46.jpg)
46
Graph Covers
Suppose f factorizes over a set with corresponding factor graph G and the choice of c guarantees correctness
Theorem: the splitting algorithm can only converge to beliefs that have unique argmins if f is uniquely minimized at the assignment x*
The objective function corresponding to every finite cover H of G has a unique minimum that is a lift of x*
![Page 47: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/47.jpg)
47
Graph Covers
This result suggests that
There is a close link between “good” factorizations and the difficulty of a problem
Convergent and correct algorithms are not ideal for all applications
Convex functions can be covered by functions that are not convex
![Page 48: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/48.jpg)
48
Outline
Reparameterizations Lower Bounds Convergent Message Passing
Finding a Minimizing Assignment Graph covers
Quadratic Minimization
![Page 49: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/49.jpg)
49
Quadratic Minimization
symmetric positive definite implies a unique minimum
Minimized at
![Page 50: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/50.jpg)
For a positive definite matrix, min-sum convergence implies a correct solution:
Min-sum is not guaranteed to converge for all symmetric positive definite matrices
50
Quadratic Minimization
![Page 51: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/51.jpg)
51
Quadratic Minimization
A symmetric matrix is scaled diagonally dominant if there exists w > 0 such that for each row i:
Theorem: ¡ is scaled diagonally iff every finite cover of ¡ is positive definite
![Page 52: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/52.jpg)
52
Quadratic Minimization
Scaled diagonal dominance is a sufficient condition for the convergence of other iterative methods Gauss-Seidel, Jacobi, and min-sum
Suggests a generalization of scaled diagonal dominance for arbitrary convex functions Purely combinatorial!
Empirically, the splitting algorithm can always be made to converge for this problem
![Page 53: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/53.jpg)
53
Conclusion
General strategy for minimization Reparameterization Lower bounds Convergent and correct message passing
algorithms
Correctness is too strong Algorithms cannot distinguish graph covers Can fail to hold even for convex problems
![Page 54: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/54.jpg)
54
Conclusion
Open questions
Deep relationship between “hardness” of a problem and its factorizations
Convergence and correctness criteria for the min-sum algorithm
Rates of convergence
![Page 55: Message Passing Algorithms for Optimization](https://reader035.fdocuments.net/reader035/viewer/2022062222/56815d6e550346895dcb7695/html5/thumbnails/55.jpg)
55
Questions?
A draft of the thesis is available online at:http://cs-www.cs.yale.edu/homes/nruozzi/Papers/ths2.pdf