Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...

27
Convex Optimization CMU-10725 Quasi Newton Methods Barnabás Póczos & Ryan Tibshirani

Transcript of Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...

Page 1: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

Convex Optimization

CMU-10725Quasi Newton Methods

Barnabás Póczos & Ryan Tibshirani

Page 2: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

2

Quasi Newton Methods

Page 3: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

3

Outline

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)

Page 4: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

4

Books to Read

David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming

Nesterov: Introductory lectures on convex optimization

Bazaraa, Sherali, Shetty: Nonlinear Programming

Dimitri P. Bestsekas: Nonlinear Programming

Page 5: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

5

Motivation

Quasi Newton:

somewhere between steepest descent and Newton’s method

Evaluation and use of the Hessian matrix is impractical or costly

Idea:

use an approximation to the inverse Hessian.

Motivation:

Page 6: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

6

Modified Newton Method

Goal:

Gradient descent:

Newton method:

Modified Newton method: [Method of Deflected Gradients]

Special cases:

Page 7: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

7

Modified Newton Method

Lemma [Descent direction]

Proof:

We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction

Page 8: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

8

Quadratic problem

Modified Newton Method update rule:

Lemma [αk in quadratic problems]

Page 9: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

9

Quadratic problem

Lemma [αk in quadratic problems]

Proof [αk]

Page 10: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

10

Convergence rate (Quadratic case)

Then for the modified Newton method it holds at every step k

Theorem [Convergence rate of the modified Newton method]

Corollary

If Sk-1 is close to Q, then bk is close to Bk, and then convergence is fast

Proof: No time for it…

Page 11: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

11

Classical modified Newton’s method

The Hessian at the initial point x0 is used throughout the process.

The effectiveness depends on how fast the Hessian is changing.

Classical modified Newton:

Page 12: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

12

Construction of the inverse of the Hessian

Page 13: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

13

Construction of the inverse

Idea behind quasi-Newton methods: construct the approximation of the inverse Hessian using information gathered during the process

We show how the inverse Hessian can be built up from gradient information obtained at various points.

In the quadratic case

Notation:

Page 14: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

14

Construction of the inverse

Quadratic case:

Goal:

Page 15: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

15

Symmetric rank one correction (SR1)

We want an update on Hk such that :

Let us find the update in this form [Rank one correction]

Theorem [Rank one update of Hk]

Page 16: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

16

Therefore,

Proof:

Q.E.D.

We already know that

Symmetric rank one correction (SR1)

Page 17: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

17

We still have to proof that this update will be good for us:

Theorem [Hk update works]

Corollary

Symmetric rank one correction (SR1)

Page 18: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

18

Issues:

Algorithm: [Modified Newton method with rank 1 correction]

Symmetric rank one correction (SR1)

Page 19: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

19

Davidon–Fletcher–Powell Method[Rank two correction]

Page 20: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

20

Davidon–Fletcher–Powell Method

� For a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian.

� At each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. [rank two correction procedure]

� The method is also often referred to as the variable metric method

Page 21: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

21

Davidon–Fletcher–Powell Method

Page 22: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

22

Davidon–Fletcher–Powell Method

Theorem [Hk is positive definite]

Theorem [DFP is a conjugate direction method]

Corollary [finite step convergence for quadratic functions]

Page 23: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

23

Broyden–Fletcher–Goldfarb–Shanno

In DFP, at each step the inverse Hessian is updated by the sum of two symmetric rank one matrices.

BFGS we will estimate the Hessian Q, instead of its inverse

In the quadratic case we already proved:

To estimate H, we used the update:

Therefore, if we switch q and p, then Q can be estimated as well with Qk

Page 24: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

24

BFGS

Similarly, we already know that the the DFP update rule for H is

Switching q and p, this can also be used to estimate Q:

In the minimization algorithm, however, we will need an estimator of Q-1

To get an update for Hk+1, let us use the Sherman-Morrison formula twice

Page 25: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

25

Sherman-Morrison matrix inversion formula

Page 26: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

26

BFGS Algorithm

BFGS is almost the same as DFP, only the H update is different.

In practice BFGS seems to work better than DFP.

Page 27: Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex Optimization CMU-10725 Quasi Newton Methods ... Broyden–Fletcher–Goldfarb–Shanno In

27

Summary

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)