Newton Method - Carnegie Mellon School of Computer...

Post on 26-Mar-2020

3 views 0 download

Transcript of Newton Method - Carnegie Mellon School of Computer...

Newton Method

Materials courtesy: B. Poczos, R. Tibshirani, C. Carmanis & S. Sanghavi

Outline  

5

Newton Method for Finding a Root

Linear Approximation (1st order Taylor approx):

Goal:

Therefore,

6

Illustration of Newton’s method Goal: finding a root

In the next step we will linearize here in x

7

Example: Finding a Root

http://en.wikipedia.org/wiki/Newton%27s_method

8

Newton Method for Finding a Root This can be generalized to multivariate functions

Therefore,

[Pseudo inverse if there is no inverse]

9

10

Newton method for minimization

11

Newton method for minimization

unconstrained

twice differentiable

12

13

Motivation with Quadratic Approximation

unconstrained

twice differentiable

14

Motivation with Quadratic Approximation

15

Comparison with Gradient Descent

16

Comparison with Gradient Descent

Gradient descent uses a different quadratic approximation:

Newton method is obtained by minimizing over quadratic approximation:

17

Comparison with Gradient Descent

18

19

20

Pre-Conditioning for Gradient descent

Recall convergence rate for gradient descent:

Can we convert it into well-conditioned problem by changing coordinates?

Can get if A = [r2

f(x)]�1

21

Pre-Conditioning for Gradient descent

Can we convert it into well-conditioned problem by changing coordinates?

Can get if

Running gradient descent for g(y), gives best descent direction and convergence rate.

Equivalent to Newton step on f(x).

A = [r2f(x)]�1/2

22

Affine Invariance

[Proof: HW3]

23

Affine Invariant stopping criterion

Stopping criterion for gradient descent:

Stopping criterion for Newton method:

where is the

Newton decrement.

Not affine-invariant

24

Affine Invariant stopping criterion

25

26

Damped Newton’s Method

27

Backtracking line search

28

Convergence Rate

29

Local convergence for finding root

Quadratic convergence

30

Convergence analysis

31

Convergence analysis

32

Convergence analysis

Analysis can be improved e.g. for self-concordant functions

33

Comparison to first-order methods

34

Comparison to first-order methods

35

Example: Logistic regression

36

Example: Logistic regression