Newton Method - Carnegie Mellon School of Computer...

36
Newton Method Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi

Transcript of Newton Method - Carnegie Mellon School of Computer...

Page 1: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

Newton Method

Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi

Page 2: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

Outline  

Page 3: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it
Page 4: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it
Page 5: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

5

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx):

Goal:

Therefore,

Page 6: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

6

Illustration of Newton’s method Goal: finding a root

In the next step we will linearize here in x

Page 7: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

7

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

Page 8: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

8

Newton Method for Finding a Root This can be generalized to multivariate functions

Therefore,

[Pseudo inverse if there is no inverse]

Page 9: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

9

Page 10: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

10

Newton method for minimization

Page 11: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

11

Newton method for minimization

unconstrained

twice differentiable

Page 12: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

12

Page 13: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

13

Motivation with Quadratic Approximation

unconstrained

twice differentiable

Page 14: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

14

Motivation with Quadratic Approximation

Page 15: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

15

Comparison with Gradient Descent

Page 16: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

16

Comparison with Gradient Descent

Gradient descent uses a different quadratic approximation:

Newton method is obtained by minimizing over quadratic approximation:

Page 17: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

17

Comparison with Gradient Descent

Page 18: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

18

Page 19: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

19

Page 20: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

20

Pre-Conditioning for Gradient descent

Recall convergence rate for gradient descent:

Can we convert it into well-conditioned problem by changing coordinates?

Can get if A = [r2

f(x)]�1

Page 21: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

21

Pre-Conditioning for Gradient descent

Can we convert it into well-conditioned problem by changing coordinates?

Can get if

Running gradient descent for g(y), gives best descent direction and convergence rate.

Equivalent to Newton step on f(x).

A = [r2f(x)]�1/2

Page 22: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

22

Affine Invariance

[Proof: HW3]

Page 23: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

23

Affine Invariant stopping criterion

Stopping criterion for gradient descent:

Stopping criterion for Newton method:

where is the

Newton decrement.

Not affine-invariant

Page 24: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

24

Affine Invariant stopping criterion

Page 25: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

25

Page 26: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

26

Damped Newton’s Method

Page 27: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

27

Backtracking line search

Page 28: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

28

Convergence Rate

Page 29: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

29

Local convergence for finding root

Quadratic convergence

Page 30: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

30

Convergence analysis

Page 31: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

31

Convergence analysis

Page 32: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

32

Convergence analysis

Analysis can be improved e.g. for self-concordant functions

Page 33: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

33

Comparison to first-order methods

Page 34: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

34

Comparison to first-order methods

Page 35: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

35

Example: Logistic regression

Page 36: Newton Method - Carnegie Mellon School of Computer Sciencepradeepr/convexopt/Lecture_Slides/Newton... · 2017-10-02 · 21 Pre-Conditioning for Gradient descent Can we convert it

36

Example: Logistic regression