Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...

Convex Optimization

CMU-10725Quasi Newton Methods

Barnabás Póczos & Ryan Tibshirani

Quasi Newton Methods

Outline

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)

Books to Read

David G. Luenberger, Yinyu Ye: Linear and Nonlinear Programming

Nesterov: Introductory lectures on convex optimization

Bazaraa, Sherali, Shetty: Nonlinear Programming

Dimitri P. Bestsekas: Nonlinear Programming

Motivation

Quasi Newton:

somewhere between steepest descent and Newton’s method

Evaluation and use of the Hessian matrix is impractical or costly

use an approximation to the inverse Hessian.

Motivation:

Modified Newton Method

Gradient descent:

Newton method:

Modified Newton method: [Method of Deflected Gradients]

Special cases:

Modified Newton Method

Lemma [Descent direction]

Proof:

We know that if a vector has negative inner product with the gradient vector, then that direction is a descent direction

Quadratic problem

Modified Newton Method update rule:

Lemma [αk in quadratic problems]

Quadratic problem

Lemma [αk in quadratic problems]

Proof [αk]

Convergence rate (Quadratic case)

Then for the modified Newton method it holds at every step k

Theorem [Convergence rate of the modified Newton method]

Corollary

If Sk-1 is close to Q, then bk is close to Bk, and then convergence is fast

Proof: No time for it…

Classical modified Newton’s method

The Hessian at the initial point x0 is used throughout the process.

The effectiveness depends on how fast the Hessian is changing.

Classical modified Newton:

Construction of the inverse of the Hessian

Construction of the inverse

Idea behind quasi-Newton methods: construct the approximation of the inverse Hessian using information gathered during the process

We show how the inverse Hessian can be built up from gradient information obtained at various points.

In the quadratic case

Notation:

Construction of the inverse

Quadratic case:

Symmetric rank one correction (SR1)

We want an update on Hk such that :

Let us find the update in this form [Rank one correction]

Theorem [Rank one update of Hk]

Therefore,

Proof:

Q.E.D.

We already know that

We still have to proof that this update will be good for us:

Theorem [Hk update works]

Corollary

Issues:

Algorithm: [Modified Newton method with rank 1 correction]

Davidon–Fletcher–Powell Method[Rank two correction]

Davidon–Fletcher–Powell Method

� For a quadratic objective, it simultaneously generates the directions of the conjugate gradient method while constructing the inverse Hessian.

� At each step the inverse Hessian is updated by the sum of two symmetric rank one matrices. [rank two correction procedure]

� The method is also often referred to as the variable metric method

Davidon–Fletcher–Powell Method

Theorem [Hk is positive definite]

Theorem [DFP is a conjugate direction method]

Corollary [finite step convergence for quadratic functions]

Broyden–Fletcher–Goldfarb–Shanno

In DFP, at each step the inverse Hessian is updated by the sum of two symmetric rank one matrices.

BFGS we will estimate the Hessian Q, instead of its inverse

In the quadratic case we already proved:

To estimate H, we used the update:

Therefore, if we switch q and p, then Q can be estimated as well with Qk

Similarly, we already know that the the DFP update rule for H is

Switching q and p, this can also be used to estimate Q:

In the minimization algorithm, however, we will need an estimator of Q-1

To get an update for Hk+1, let us use the Sherman-Morrison formula twice

Sherman-Morrison matrix inversion formula

BFGS Algorithm

BFGS is almost the same as DFP, only the H update is different.

In practice BFGS seems to work better than DFP.

Summary

� Modified Newton Method

� Rank one correction of the inverse

� Rank two correction of the inverse

� Davidon–Fletcher–Powell Method (DFP)

� Broyden–Fletcher–Goldfarb–Shanno Method (BFGS)

Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...

Documents

Transcript of Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/11-QuasiNewton... · Convex...

Geometric Algorithms João Comba. Example: Convex Hull convex Non-convex.

Convex Optimization CMU-10725ryantibs/convexopt-F13/lectures/09-newton... · 3 Books to read • Boyd and Vandenberghe:Convex Optimization, Chapters 9.5 • Nesterov: Introductory

Polygons can be CONCAVE or CONVEX CONVEX CONCAVE.

Asymptotic Convex Geometry Lecture Notes - CMU · 2018. 12. 5. · Asymptotic Convex Geometry Lecture Notes Tomasz Tkocz These lecture notes were written for the course 21-801 An

CMU SCS Large Graph Mining Christos Faloutsos CMU.

Proximal Gradient Descent - CMU Statisticsstat.cmu.edu/~ryantibs/convexopt/lectures/prox-grad.pdfProximal Gradient Descent (and Acceleration) Ryan Tibshirani Convex Optimization 10-725

Lecture 4 Convex Programming and Lagrange Duality...Lecture 4 Convex Programming and Lagrange Duality (Convex Programming program, Convex Theorem on Alternative, convex duality Optimality

Convexity and Optimization - CMU Statisticslarry/=sml/convexopt.pdfSimply: because we can broadly understand and solve convex optimization problems. Non- ... There are many great books

CONVEX BODIES ASSOCIATED WITH A CONVEX BODY 781

Convex and non-convex worlds in machine learning · Intro Convex solver: bound majorization (Convex) objective: multi-classi cation Non-convexity: deep learning Summary Convex and

Lecture notes on convex polyominoes, convex permutominoes ...

Gradient Descent for Non-convex Problems in Modern Machine ...reports-archive.adm.cs.cmu.edu/anon/ml2019/CMU-ML-19-102.pdf · scent and momentum methods are the most popular ones.

A quick review of convex analysis convex analysis

Optimizing Optimization: Scalable Convex Programming with ...reports-archive.adm.cs.cmu.edu/anon/ml2016/CMU-ML-16-100.pdfseparable prox-afﬁne problem to be solved by the solver.

Ryan O'Donnell (CMU) Yi Wu (CMU, IBM) Yuan Zhou (CMU)

Variational Methods in Convex and Non-Convex Plasticity

Lecture 3: A brief overview on convex analysis and duality€¦ · Lecture 3: A brief overview on convex analysis and duality Index Terms: Convex sets, convex cones, convex functions,

geometry of convex functions - CCRMAdattorro/gcf.pdfGeometry of convex functions The link between convex sets and convex functions is via the epigraph: A function is convex if and

12-2015 - Anchor Industriesanchors.co.za/wp-content/uploads/2016/08/codipro... · WLL CMU Höchstbelastung CMU Reference Référence Referenz Referencia WLL CMU Höchstbelastung CMU

Convex Sets and Convex Functions 1 Convex Sets,