Gradient-based Methods for Optimization. Part...

40
Gradient-based Methods for Optimization. Part I. Prof. Nathan L. Gibson Department of Mathematics Applied Math and Computation Seminar October 21, 2011 Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 1 / 40

Transcript of Gradient-based Methods for Optimization. Part...

Page 1: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gradient-based Methods for Optimization. Part I.

Prof. Nathan L. Gibson

Department of Mathematics

Applied Math and Computation SeminarOctober 21, 2011

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 1 / 40

Page 2: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Outline

Unconstrained Optimization

Newton’s Method

Inexact NewtonQuasi-Newton

Nonlinear Least Squares

Gauss-Newton Method

Steepest Descent Method

Levenberg-Marquardt Method

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 2 / 40

Page 3: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization

Unconstrained Optimization

Minimize function f of N variables

I.e., find local minimizer x∗ such that

f (x∗) ≤ f (x) for all x near x∗

Different from constrained optimization

f (x∗) ≤ f (x) for all x ∈ U near x∗

Different from global minimizer

f (x∗) ≤ f (x) for all x (possibly in U)

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 3 / 40

Page 4: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization

Sample Problem

Parameter IdentificationConsider

u′′ + cu′ + ku = 0; u(0) = u0; u′(0) = 0 (1)

where u represents the motion of an unforced harmonic oscillator (e.g.,spring). We may assume u0 is known, and data {uj}M

j=1 is given for sometimes tj on the interval [0,T ].Now we can state a parameter identification problem to be: findx = [c , k]T such that the solution u(t) to (1) using parameters x is (asclose as possible to) uj when evaluated at times tj .

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 4 / 40

Page 5: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Definitions

Objective Function

Consider the following formulation of the Parameter Identification problem:Find x=[c , k]T such that the following objective function is minimized:

f (x) =1

2

M∑j=1

|u(tj ; x)− uj |2 .

This is an example of a nonlinear least squares problem.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 5 / 40

Page 6: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Definitions

Iterative Methods

An iterative method for minimizing a function f (x) usually has thefollowing parts:

Choose an initial iterate x0

For k = 0, 1, . . .

If xk optimal, stop.Determine a search direction dand a step size λSet xk+1 = xk + λd

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 6 / 40

Page 7: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Definitions

Convergence Rates

The sequence {xk}∞k=1 is said to converge to x∗ with rate p and rateconstant C if

limk→∞

‖xk+1 − x∗‖‖xk − x∗‖p

= C .

Linear: p = 1 and 0 < C < 1, such that error decreases.

Quadratic: p = 2, doubles correct digits per iteration.

Superlinear: If p = 1, C = 0. Faster than linear. Includes quadracticconvergence, but also intermediate rates.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 7 / 40

Page 8: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Necessary and Sufficient Conditions

Necessary Conditions

Theorem

Let f be twice continuously differentiable, and let x∗ be a local minimizerof f . Then

∇f (x∗) = 0 (2)

and the Hessian of f , ∇2f (x∗), is positive semidefinite.

Recall A positive semidefinite means

xTAx ≥ 0 ∀x ∈ RN .

Equation (2) is called the first-order necessary condition.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 8 / 40

Page 9: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Necessary and Sufficient Conditions

Hessian

Let f : RN → R be twice continuously differentiable (C2), then

The gradient of f is

∇f =

[∂f

∂x1, · · · ,

∂f

∂xN

]T

The Hessian of f is

∇2f =

∂2f∂x2

1· · · ∂2f

∂x1∂xN

.... . .

...∂2f

∂xN∂x1· · · ∂2f

∂x2N

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 9 / 40

Page 10: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Unconstrained Optimization Necessary and Sufficient Conditions

Sufficient Conditions

Theorem

Let f be twice continuously differentiable in a neighborhood of x∗, and let

∇f (x∗) = 0

and the Hessian of f , ∇2f (x∗), be positive semidefinite. Then x∗ is a localminimizer of f .

Note: second derivative information is required to be certain, for instance,if f (x) = x3.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 10 / 40

Page 11: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Quadratic Objective Functions

Suppose

f (x) =1

2xTHx − xTb

then we have that∇2f (x) = H

and if H is symmetric (assume it is)

∇f (x) = Hx − b.

Therefore, if H is positive definite, then the unique minimizer x∗ is thesolution to

Hx∗ = b.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 11 / 40

Page 12: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Newton’s Method

Newton’s Method solves for the minimizer of the local quadratic model off about the current iterate xk given by

mk(x) = f (xk) +∇f (xk)T (x − xk) +1

2(x − xk)T∇2f (xk)(x − xk).

If ∇2f (xk) is positive definite, then the minimizer xk+1 of mk is the uniquesolution to

0 = ∇mk(x) = ∇f (xk) +∇2f (xk)(x − xk). (3)

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 12 / 40

Page 13: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Newton Step

The solution to (3) is computed by solving

∇2f (xk)sk = −∇f (xk)

for the Newton Step sNk . Then the Newton update is defined by

xk+1 = xk + sNk .

Note: the step sNk has both direction and length. Variants of Newton’s

Method modify one or both of these.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 13 / 40

Page 14: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Standard Assumptions

Assume that f and x∗ satisfy the following

1 Let f be twice continuously differentiable and Lipschitz continuouswith constant γ

‖∇2f (x)−∇2f (y)‖ ≤ γ‖x − y‖.

2 ∇f (x∗) = 0.

3 ∇2f (x∗) is positive definite.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 14 / 40

Page 15: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Convergence Rate

Theorem

Let the Standard Assumptions hold. Then there exists a δ > 0 such that ifx0 ∈ Bδ(x

∗), the Newton iteration converges quadratically to x∗.

I.e., ‖ek+1‖ ≤ K‖ek‖2.

If x0 is not close enough, Hessian may not be positive definite.

If you start close enough, you stay close enough.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 15 / 40

Page 16: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Newton’s Method

Problems (and solutions)

Need derivatives

Use finite difference approximations

Needs solution of linear system at each iteration

Use iterative linear solver like CG(Inexact Newton)

Hessians are expensive to find (and solve/factor)

Use chord (factor once) or ShamanskiiUse Quasi-Newton (update Hk to get Hk+1)Use Gauss-Newton (first order approximate Hessian)

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 16 / 40

Page 17: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Nonlinear Least Squares

Nonlinear Least Squares

Recall,

f (x) =1

2

M∑j=1

|u(tj ; x)− uj |2 .

Then for x = [c , k]T

∇f (x) =

[∑Mj=1

∂u(tj ;x)∂c (u(tj ; x)− uj)∑M

j=1∂u(tj ;x)

∂k (u(tj ; x)− uj)

]= R ′(x)TR(x)

where R(x) = [u(t1; x)− u1, . . . , u(tM ; x)− uM ]T is called the residual

and R ′ij(x) = ∂Ri (x)∂xj

.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 17 / 40

Page 18: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Nonlinear Least Squares

Approximate Hessian

In terms of the residual R, the Hessian of f becomes

∇2f (x) = R ′(x)TR ′(x) + R ′′(x)R(x)

where R ′′(x)R(x) =∑M

j=1 rj(x)∇2rj(x) and rj(x) is the jth element of thevector R(x).The second order term requires the computation of M Hessians, each sizeN × N. However, if we happen to be solving a zero residual problem, thissecond order term goes to zero. One can argue that for small residualproblems (and good initial iterates) the second order term is neglibible.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 18 / 40

Page 19: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

Gauss-Newton Method

The equation defining the Newton step

∇2f (xk)sk = −∇f (xk)

becomes

R ′(xk)TR ′(xk)sk = −∇f (xk)

= −R ′(xk)TR(xk).

We define the Gauss-Newton step as the solution sGNk to this equation.

You can expect close to quadratic convergence for small residual problems.Otherwise, not even linear is guaranteed.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 19 / 40

Page 20: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

Numerical Example

Recallu′′ + cu′ + ku = 0; u(0) = u0; u

′(0) = 0.

Let the true parameters be x∗ = [c , k]T = [1, 1]T . Assume we haveM = 100 data uj from equally spaced time points on [0, 10].

We will use the initial iterate x0 = [1.1, 1.05]T with Newton’s Methodand Gauss-Newton.

We compute gradients with forward differences, analytical 2× 2matrix inverse, and use ode15s for time stepping the ODE.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 20 / 40

Page 21: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

0 2 4 6 8 10−2

0

2

4

6

8

10

t

u

Comparison of initial iterate

DataInitial iterate

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 21 / 40

Page 22: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

1 1.5 2 2.5 3 3.5 4 4.5 510

−8

10−6

10−4

10−2

100

102

Iterations

Gra

dien

t Nor

m

Newton’s Method

1 1.5 2 2.5 3 3.5 4 4.5 510

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Iterations

Fun

ctio

n V

alue

Newton’s Method

1 1.5 2 2.5 3 3.5 410

−6

10−4

10−2

100

102

Iterations

Gra

dien

t Nor

m

Gauss−Newton

1 1.5 2 2.5 3 3.5 410

−14

10−12

10−10

10−8

10−6

10−4

10−2

100

Iterations

Fun

ctio

n V

alue

Gauss−Newton

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 22 / 40

Page 23: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

Newton Gauss-Newton

k ||∇f (xk)|| f (xk) ||∇f (xk)|| f (xk)

0 2.330e+01 7.881e-01 2.330e+01 7.881e-011 6.852e+00 9.817e-02 1.767e+00 6.748e-032 4.577e-01 6.573e-04 1.016e-02 4.656e-073 3.242e-03 3.852e-08 1.844e-06 2.626e-134 4.213e-07 2.471e-13

Table: Parameter identification problem, locally convergent iterations. CPU timeNewton: 3.4s, Gauss-Newton: 1s.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 23 / 40

Page 24: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

0.96 0.98 1 1.02 1.04 1.06 1.08 1.1 1.120.96

0.98

1

1.02

1.04

1.06

1.08

c

k

Iteration history

Newton’s MethodGauss−Newton

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 24 / 40

Page 25: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

1 1.5 2 2.5 3 3.5

1

1.5

2

2.5

3

3.5

c

k

Search Direction

Newton’s MethodGauss−Newton

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 25 / 40

Page 26: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

1 2 3 4 5 6 7

1

1.5

2

2.5

3

c

k

Search Direction

Newton’s MethodGauss−Newton

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 26 / 40

Page 27: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Gauss-Newton Method

Global Convergence

Newton direction may not be a descent direction (if Hessian notpositive definite).

Thus Newton (or any Newton-based method) may increase f if x0 notclose enough. Not globally convergent.

Globally convergent methods ensure (sufficient) decrease in f .

The steepest descent direction is always a descent direction.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 27 / 40

Page 28: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Steepest Descent Method

Steepest Descent Method

We define the steepest descent direction to be dk = −∇f (xk). Thisdefines a direction but not a step size.

We define the Steepest Descent update step to be sSDk = λkdk for

some λk > 0.

We will talk later about ways of choosing λ.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 28 / 40

Page 29: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Steepest Descent Method

0.96 0.98 1 1.02 1.04 1.06 1.08 1.1 1.120.96

0.98

1

1.02

1.04

1.06

1.08

c

k

Iteration history

Newton’s MethodGauss−NewtonSteepest Descent

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 29 / 40

Page 30: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Steepest Descent Method

1 1.5 2 2.5 3 3.5

1

1.5

2

2.5

3

3.5

c

k

Search Direction

Newton’s MethodGauss−NewtonSteepest Descent

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 30 / 40

Page 31: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Steepest Descent Method

1 2 3 4 5 6 7

1

1.5

2

2.5

3

c

k

Search Direction

Newton’s MethodGauss−NewtonSteepest Descent

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 31 / 40

Page 32: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Steepest Descent Method

Steepest Descent Comments

Steepest Descent direction is best direction locally.

The negative gradient is perpendicular to level curves.

Solving for sSDk is equivalent to assuming ∇2f (xk) = I/λk .

In general you can only expect linear convergence.

Would be good to combine global convergence property of SteepestDescent with superlinear convergence rate of Gauss-Newton.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 32 / 40

Page 33: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

Levenberg-Marquardt Method

Recall the objective function

f (x) =1

2R(x)TR(x)

where R is the residual. We define the Levenberg-Marquardt update stepsLMk to be the solution of(

R ′(xk)TR ′(xk) + νk I)

sk = −R ′(xk)TR(xk)

where the regularization parameter νk is called the Levenberg-Marquardtparameter, and it is chosen such that the approximate HessianR ′(xk)TR ′(xk) + νk I is positive definite.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 33 / 40

Page 34: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

1 1.5 2 2.5 3 3.5

1

1.5

2

2.5

3

3.5

c

k

Search Direction

Newton’s MethodGauss−NewtonSteepest DescentLevenberg−Marquardt

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 34 / 40

Page 35: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

1 2 3 4 5 6 7

1

1.5

2

2.5

3

c

k

Search Direction

Newton’s MethodGauss−NewtonSteepest DescentLevenberg−Marquardt

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 35 / 40

Page 36: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

Levenberg-Marquardt Notes

Robust with respect to poor initial conditions and larger residualproblems.

Varying ν involves interpolation between GN direction (ν = 0) andSD direction (large ν).

See

doc lsqnonlin

for MATLAB instructions for LM and GN.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 36 / 40

Page 37: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

Levenberg-Marquardt Idea

If iterate is not close enough to minimizer so that GN does not give adescent direction, increase ν to take more of a SD direction.

As you get closer to minimizer, decrease ν to take more of a GN step.

For zero-residual problems, GN converges quadratically (if at all)SD converges linearly (guaranteed)

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 37 / 40

Page 38: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Levenberg-Marquardt Method

LM Alternative Perspective

Approximate Hessian may not be positive definite (orwell-conditioned), increase ν to add regularity.

As you get closer to minimizer, Hessian will become positive definite(by Standard Assumptions). Decrease ν, as less regularization isnecessary.

Regularized problem is “nearby problem”, want to solve actualproblem as soon as is feasible.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 38 / 40

Page 39: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

Summary

Summary

Taylor series with remainder:

f (x) = f (xk) +∇f (xk)T (x − xk) +

1

2(x − xk)

T∇2f (ξ)(x − xk)

Newton:

mNk (x) = f (xk) +∇f (xk)

T (x − xk) +1

2(x − xk)

T∇2f (xk)(x − xk)

Gauss-Newton:

mGNk (x) = f (xk) +∇f (xk)

T (x − xk) +1

2(x − xk)

TR ′(xk)TR ′(xk)(x − xk)

Steepest Descent:

mSDk (x) = f (xk) +∇f (xk)

T (x − xk) +1

2(x − xk)

T 1

λkI (x − xk)

Levenberg-Marquardt:

mLMk (x) = f (xk)+∇f (xk)

T (x−xk)+1

2(x−xk)

T(R ′(xk)

TR ′(xk) + νk I)(x−xk)

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 39 / 40

Page 40: Gradient-based Methods for Optimization. Part I.sites.science.oregonstate.edu/~gibsonn/optpart1.pdfNewton direction may not be a descent direction (if Hessian not positive definite).

References

References

1 Levenberg, K., “A Method for the Solution of Certain Problems inLeast-Squares”, Quarterly Applied Math. 2, pp. 164-168, 1944.

2 Marquardt, D., “An Algorithm for Least-Squares Estimation of NonlinearParameters”, SIAM Journal Applied Math., Vol. 11, pp. 431-441, 1963.

3 More, J. J., “The Levenberg-Marquardt Algorithm: Implementation andTheory”, Numerical Analysis, ed. G. A. Watson, Lecture Notes inMathematics 630, Springer Verlag, 1977.

4 Kelley, C. T., “Iterative Methods for Optimization”, Frontiers in AppliedMathematics 18, SIAM, 1999.http://www4.ncsu.edu/∼ctk/matlab darts.html.

5 Wadbro, E., “Additional Lecture Material”, Optimization 1 / MN1, UppsalaUniversitet, http://www.it.uu.se/edu/course/homepage/opt1/ht07/.

Prof. Gibson (OSU) Gradient-based Methods for Optimization AMC 2011 40 / 40