Free Energy Rates for Parameter Rich Optimization Problems...

40
Free Energy Rates for Parameter Rich Optimization Problems Joachim M. Buhmann 1 , Julien Dumazert 1 , Alexey Gronskiy 1 , Wojciech Szpankowski 2 1 ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch, [email protected], 2 Purdue University, [email protected] AofA Workshop Princeton, 20th June 2017

Transcript of Free Energy Rates for Parameter Rich Optimization Problems...

Page 1: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Free Energy Rates for ParameterRich Optimization Problems

Joachim M. Buhmann1, Julien Dumazert1,Alexey Gronskiy1, Wojciech Szpankowski2

1ETH Zurich, {jbuhmann, alexeygr}@inf.ethz.ch,[email protected],

2Purdue University, [email protected]

AofA WorkshopPrinceton, 20th June 2017

Page 2: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Roadmap

Application

Genericproblem

Case study:“sparse” minimum bisection

Corollary

Main result formulation

Proof outline

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 1/24

Page 3: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Generic Problem

Genericproblem

Case study:“sparse” minimum bisection

Main result formulation

Proof outline

Application

Corollary

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 2/24

Page 4: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Generic Problem: Gibbs Sampling

• We search for solutions:X → c

• Input X (e. g. a graph) israndom!

• Want solution c still to bestable

⇓define a Maximum EntropyGibbs measure over c:

p(c|X ) ∝ exp(−β · cost(c,X)) solutions

Gib

bs

me

asu

res

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

Page 5: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Generic Problem: Gibbs Sampling

• We search for solutions:X → c

• Input X (e. g. a graph) israndom!

• Want solution c still to bestable

⇓define a Maximum EntropyGibbs measure over c:

p(c|X ) ∝ exp(−β · cost(c,X)) solutions

Gib

bs

me

asu

res

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

Page 6: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Generic Problem: Gibbs Sampling

• We search for solutions:X → c

• Input X (e. g. a graph) israndom!

• Want solution c still to bestable

⇓define a Maximum EntropyGibbs measure over c:

p(c|X ) ∝ exp(−β · cost(c,X)) solutionsG

ibb

sm

ea

sure

s

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 3/24

Page 7: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Generic Problem: “free energy”

Nowp(c|X ) = exp

(−β · cost−F(X )

),

where the following is free energy:

F(X ) = log Z (X ) (here Z (X ) is partition function).

Goal: compute expected “free energy” — hard:

EXF(X ) = EX log Z (X ).

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 4/24

Page 8: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

Application

Genericproblem

Case study:“sparse” minimum bisection

Corollary

Main result formulation

Proof outline

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 5/24

Page 9: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

Find two subgraphs of some small size d withminimum edge cost between them.

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 10: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

Find two subgraphs of some small size d withminimum edge cost between them.

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 11: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

Find two subgraphs of some small size d withminimum edge cost between them.

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 12: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!

• solutions are dependent oneach other — not RandomEnergy Model (next slide)!

• contribution: free energyasymptotics solved!

• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 13: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!

• solutions are dependent oneach other — not RandomEnergy Model (next slide)!

• contribution: free energyasymptotics solved!

• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 14: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!

• solutions are dependent oneach other — not RandomEnergy Model (next slide)!

• contribution: free energyasymptotics solved!

• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 15: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Case Study: “Sparse” Minimum Bisection

• flashback to “Generic Problem”:in high dimensions may requiresolving via sampling!

• solutions are dependent oneach other — not RandomEnergy Model (next slide)!

• contribution: free energyasymptotics solved!

• another case study LawlerQuadratic Assignment Problemsolved as well: (see the paper).

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

2

3

4

1

5

10

11

12

9

8

6

7

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 6/24

Page 16: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Free Energy: History and Advances

• [Derrida’80] established REM — nodependencies;

• [Talagrand’02]: systematization; simple proof ofREM free energy phase transition and beyond:

limn→∞

E[log Z (β,X )]

n=

{β2

4 + log 2 β < 2√

log 2,β√

log 2 β ≥ 2√

log 2.

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

Page 17: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Free Energy: History and Advances

• [Derrida’80] established REM — nodependencies;

• [Talagrand’02]: systematization; simple proof ofREM free energy phase transition and beyond:

limn→∞

E[log Z (β,X )]

n=

{β2

4 + log 2 β < 2√

log 2,β√

log 2 β ≥ 2√

log 2.

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 7/24

Page 18: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Main Result

Genericproblem

Case study:“sparse” minimum bisection

Main result formulation

Proof outline

Application

Corollary

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 8/24

Page 19: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Main Result: Setting• m is number of solutions

• N is size of one solution(number of parameters)

• require to be “parameterrich”

log m = o(N)

23

4

1

5

10

11

129

86

7

N

log m = o(N)

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

Page 20: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Main Result: Setting• m is number of solutions

• N is size of one solution(number of parameters)

• require to be “parameterrich”

log m = o(N)

23

4

1

5

10

11

129

86

7

N

log m = o(N)

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

Page 21: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Main Result: Setting• m is number of solutions

• N is size of one solution(number of parameters)

• require to be “parameterrich”

log m = o(N)

23

4

1

5

10

11

129

86

7

N

log m = o(N)

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 9/24

Page 22: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Main Result: 2nd Order Phase Transition

Main theoremConsider sparse MBP on a complete graph; edgeweights mutually independent within any givensolution; have mean µ and variance σ2. Then, thefollowing holds:

limn→∞

E[log Z ] + βµ√

N log mlog m

=

{1 + β2σ2

2 , β <√

2σ,

βσ√

2, β ≥√

provided log n� d � n2/7√log n

.

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 10/24

Page 23: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Proof Outline

Genericproblem

Case study:“sparse” minimum bisection

Main result formulation

Proof outline

Application

Corollary

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 11/24

Page 24: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Proof Outline – I

• Introduce “solution overlap” D:average intersection of two bisections

• D is key to understand dependencies(remember we are no REM!)

Lemma 1The following holds

Erand choiceD = O(d4/n).

Computedependencies

P(``Z close to EZ’’)viaVar Z

Breaking E log Z into two cond’s

Final boundingvia adjustingparameter

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 12/24

Page 25: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Proof Outline – II• Introduce event A: happens when Z is

close to EZ , i. e. A := {Z ≥ εEZ}• Goal is to compute P(A)

Fact 1P(A) can be bounded by VarZ viaChebychev.

Lemma 2 [Buhmann et al., 2014]VarZ can be asymptotically approximatedvia Erand choiceD:

VarZ ∼ (EZ )2(σ2β2Erand choiceD).

Computedependencies

P(``Z close to EZ’’)viaVar Z

Breaking E log Z into two cond’s

Final boundingvia adjustingparameter

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 13/24

Page 26: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Proof Outline – III

• Break E log Z into

E log Z = E[log Z | A] · P(A) + E[log Z1(A)]

≥ (logEZ + log ε)P(A) + E[log Z1(A)]

Fact 3Can expand logEZ via Taylor expansion(used assumptions of Theorem) andbound P(A) from previous.

Fact 4E[log Z1(A)]: enough to bound loosely

Computedependencies

P(``Z close to EZ’’)viaVar Z

Breaking E log Z into two cond’s

Final boundingvia adjustingparameter

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 14/24

Page 27: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Proof Outline – IV

• Finally, the right choice of ε for tworegimes of β gives the phase transitionin lower bound

limn→∞

E[log Z ] + · · ·· · ·

>

{1 + β2σ2

2 , β <√

2σ,

βσ√

2, β ≥√

2σ.

• The same phase transition happensfor upper bound, — easier to prove(no computing dependencies)

Computedependencies

P(``Z close to EZ’’)viaVar Z

Breaking E log Z into two cond’s

Final boundingvia adjustingparameter

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 15/24

Page 28: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Possible Application

Genericproblem

Case study:“sparse” minimum bisection

Main result formulation

Proof outline

Application

Corollary

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 16/24

Page 29: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Application Setting:Gibbs Sampling Regularizer

• Remember our approach:

• Q: what is regularizing here?A: choice of β — essentially controls “width”

• Q: how?A: maximize expected log-convolution:

β∗ = arg maxβ

EX ′,X ′′

[log

∑c

pβ(c|X ′)pβ(c|X ′′)]

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

Page 30: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Application Setting:Gibbs Sampling Regularizer

• Remember our approach:

• Q: what is regularizing here?A: choice of β — essentially controls “width”

• Q: how?A: maximize expected log-convolution:

β∗ = arg maxβ

EX ′,X ′′

[log

∑c

pβ(c|X ′)pβ(c|X ′′)]

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

Page 31: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Application Setting:Gibbs Sampling Regularizer

• Remember our approach:

• Q: what is regularizing here?A: choice of β — essentially controls “width”

• Q: how?A: maximize expected log-convolution:

β∗ = arg maxβ

EX ′,X ′′

[log

∑c

pβ(c|X ′)pβ(c|X ′′)]

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 17/24

Page 32: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Application Setting: Intuition• Log-convolution is “almost” cross entropy;

• It stabilizes solution output:β = 0.25: underfitting

c opt(X

′ )

c opt(X

′′)

pβ(·|X ′) pβ(·|X ′′)

β = 2.8: near-optimal β = 19.5: overfitting

2 6 10 14 18

β = 0.25

β = 2.8

β = 19.5

β

solutions solutions solutions

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

Page 33: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Application Setting: Intuition• Log-convolution is “almost” cross entropy;• It stabilizes solution output:

β = 0.25: underfitting

c opt(X

′ )

c opt(X

′′)

pβ(·|X ′) pβ(·|X ′′)

β = 2.8: near-optimal β = 19.5: overfitting

2 6 10 14 18

β = 0.25

β = 2.8

β = 19.5

β

solutions solutions solutions

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 18/24

Page 34: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Corollary and Application

Genericproblem

Case study:“sparse” minimum bisection

Main result formulation

Proof outline

Application

Corollary

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 19/24

Page 35: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Corollary: Log-Convolution• The log-convolution score can be rewritten:

E log∑

c

pβ(c|X ′)pβ(c|X ′′)

= E log Z (β,X ′ + X ′′)︸ ︷︷ ︸Thm

−2E log Z (β,X )︸ ︷︷ ︸Thm

• Thus possible to analytically compute score:

noise-to-signal

ratio

less noise

greater beta

more noise

less beta

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

Page 36: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Corollary: Log-Convolution• The log-convolution score can be rewritten:

E log∑

c

pβ(c|X ′)pβ(c|X ′′)

= E log Z (β,X ′ + X ′′)︸ ︷︷ ︸Thm

−2E log Z (β,X )︸ ︷︷ ︸Thm

• Thus possible to analytically compute score:

noise-to-signal

ratio

less noise

greater beta

more noise

less beta

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 20/24

Page 37: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Conclusion

• We computed asymptotically precisely the valueof E log Z with small dependencies

• It has applications to model validation(submitted to J. of Theor. CS) in machinelearning: method involves comparing twofluctuating Gibbs distributions

• Still not found a general way (and probablythere is no such way)

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 21/24

Page 38: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

Thanks for your attention!

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 22/24

Page 39: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

References I

E. T. Jaynes,Information Theory and Statistical Mechanics.Phys. Rev., 106, 1957, p. 620.

W. Szpankowski,Combinatorial optimization problems for whichalmost every algorithm is asymptotically optimal.Optimization, 33, 1995, pp. 359–368.

M. Talagrand.Spin Glasses: A Challenge for Mathematicians.Springer, NY, 2003.

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 23/24

Page 40: Free Energy Rates for Parameter Rich Optimization Problems ...aofa2017.cs.princeton.edu/talks/contributed/aofa... · Free Energy Rates for Parameter Rich Optimization Problems Joachim

References II

J. Vannimenus and M. Mezard,On the Statistical Mechanics of OptimizationProblems of the traveling Salesman Type.J. de Physique Lettres, 45, 1984, pp. 1145–1153.

J. M. Buhmann, A. Gronskiy and W. SzpankowskiFree Energy Rates for a Class of CombinatorialOptimization ProblemsAofA’14, Paris

J. M. Buhmann, A. Gronskiy, J. Dumazert andW. SzpankowskiPhase Transitions in Parameter Rich OptimizationProblemsSODA/ANALCO’17, Barcelona

AofA, Princeton, 20 Jun 2017 Alexey Gronskiy 24/24