Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A....

175
Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12, 2018, Heidelberg

Transcript of Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A....

Page 1: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Nonlinear Programming

Ekaterina A. Kostina

IWR School 2018 “Advances in Mathematical Optimization”October 8-12, 2018, Heidelberg

Page 2: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Outline

1 Basic definitions, optimality conditions

2 Algorithms for unconstrained problems

3 Algorithms for constrained problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 3: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Nonlinear Programming Problem

General problem formulation:

min f (x) f : D ∈ Rn → Rs.t. g(x) = 0 g : D ∈ Rn → Rm

h(x) ≥ 0 h : D ∈ Rn → Rk

x variablesf objective function/ cost function/ min−f (x) ≡ −max f (x)g equality constraintsh inequality constraintsf , g, h shall be sufficiently smooth (e.g. twice differentiable) functions

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 4: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Derivatives

First and second derivatives of the objective function or theconstraints play an important role in optimizationThe first order derivatives are called the gradient (of the resp. fct)

∇f (x) =(∂f∂x1

,∂f∂x2

, . . . ,∂f∂xn

)T

and the second order derivatives are called the Hessian matrix

∇2f (x) =

∂2f∂x2

1

∂2f∂x1∂x2

. . . ∂2f∂x1∂xn

∂2f∂x2∂x1

∂2f∂x2

2. . . ∂2f

∂x2∂xn

. . . . . . . . . . . .∂2f

∂xn∂x1

∂2f∂xn∂x2

. . . ∂2f∂x2

n

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 5: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Feasible set: S = {x ∈ Rn : g(x) = 0, h(x) ≥ 0}

x∗ global minimizer of f over S ⇐⇒ x∗ ∈ S and f (x) ≥ f (x∗),∀x ∈ S

x∗ local minimizer of f over S ⇐⇒ x∗ ∈ S and there existsN (x∗, δ) such that f (x) ≥ f (x∗), ∀x ∈ S ∩N (x∗, δ) whereN (x∗, δ) := {x ∈ Rn : ||x− x∗||2 ≤ δ}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 6: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Feasible set: S = {x ∈ Rn : g(x) = 0, h(x) ≥ 0}

x∗ global minimizer of f over S ⇐⇒ x∗ ∈ S and f (x) ≥ f (x∗),∀x ∈ S

x∗ local minimizer of f over S ⇐⇒ x∗ ∈ S and there existsN (x∗, δ) such that f (x) ≥ f (x∗), ∀x ∈ S ∩N (x∗, δ) whereN (x∗, δ) := {x ∈ Rn : ||x− x∗||2 ≤ δ}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 7: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Feasible set: S = {x ∈ Rn : g(x) = 0, h(x) ≥ 0}

x∗ global minimizer of f over S ⇐⇒ x∗ ∈ S and f (x) ≥ f (x∗),∀x ∈ S

x∗ local minimizer of f over S ⇐⇒ x∗ ∈ S and there existsN (x∗, δ) such that f (x) ≥ f (x∗), ∀x ∈ S ∩N (x∗, δ) whereN (x∗, δ) := {x ∈ Rn : ||x− x∗||2 ≤ δ}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 8: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Feasible set: S = {x ∈ Rn : g(x) = 0, h(x) ≥ 0}

x∗ global minimizer of f over S ⇐⇒ x∗ ∈ S and f (x) ≥ f (x∗),∀x ∈ S

x∗ local minimizer of f over S ⇐⇒ x∗ ∈ S and there existsN (x∗, δ) such that f (x) ≥ f (x∗), ∀x ∈ S ∩N (x∗, δ) whereN (x∗, δ) := {x ∈ Rn : ||x− x∗||2 ≤ δ}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 9: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 10: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Rosenbrock’s test function

Ackeley’s test function

see Wikipedia

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 11: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Rosenbrock’s test function Ackeley’s test functionsee Wikipedia

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 12: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Local and Global Solutions

Rosenbrock’s test function Ackeley’s test functionsee Wikipedia

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 13: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Main Classes of Continuous Optimization Problems

Linear Quadratic

programming:

linear quadratic

objective,linear constraints in the variables

minx∈Rn

cTx

+12

xTHx

subject to aTi x = bi, i ∈ E, aT

i x ≥ bi, i ∈ I,

where c, ai ∈ Rn, for all i, E and I are finite index sets,

H ∈ Rn×n

symmetric.Unconstrained nonlinear programming

minx∈Rn

f (x)

Constrained nonlinear programming

minx∈Rn

f (x) subject to g(x) = 0, h(x) ≥ 0.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 14: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Main Classes of Continuous Optimization Problems

Linear

Quadratic

programming: linear

quadratic

objective,linear constraints in the variables

minx∈Rn

cTx

+12

xTHx

subject to aTi x = bi, i ∈ E, aT

i x ≥ bi, i ∈ I,

where c, ai ∈ Rn, for all i, E and I are finite index sets,

H ∈ Rn×n

symmetric.

Unconstrained nonlinear programming

minx∈Rn

f (x)

Constrained nonlinear programming

minx∈Rn

f (x) subject to g(x) = 0, h(x) ≥ 0.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 15: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Main Classes of Continuous Optimization Problems

Linear

Quadratic programming:

linear

quadratic objective,linear constraints in the variables

minx∈Rn

cTx +12

xTHx subject to aTi x = bi, i ∈ E, aT

i x ≥ bi, i ∈ I,

where c, ai ∈ Rn, for all i, E and I are finite index sets, H ∈ Rn×n

symmetric.

Unconstrained nonlinear programming

minx∈Rn

f (x)

Constrained nonlinear programming

minx∈Rn

f (x) subject to g(x) = 0, h(x) ≥ 0.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 16: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Main Classes of Continuous Optimization Problems

Linear

Quadratic programming:

linear

quadratic objective,linear constraints in the variables

minx∈Rn

cTx +12

xTHx subject to aTi x = bi, i ∈ E, aT

i x ≥ bi, i ∈ I,

where c, ai ∈ Rn, for all i, E and I are finite index sets, H ∈ Rn×n

symmetric.Unconstrained nonlinear programming

minx∈Rn

f (x)

Constrained nonlinear programming

minx∈Rn

f (x) subject to g(x) = 0, h(x) ≥ 0.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 17: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Main Classes of Continuous Optimization Problems

Linear

Quadratic programming:

linear

quadratic objective,linear constraints in the variables

minx∈Rn

cTx +12

xTHx subject to aTi x = bi, i ∈ E, aT

i x ≥ bi, i ∈ I,

where c, ai ∈ Rn, for all i, E and I are finite index sets, H ∈ Rn×n

symmetric.Unconstrained nonlinear programming

minx∈Rn

f (x)

Constrained nonlinear programming

minx∈Rn

f (x) subject to g(x) = 0, h(x) ≥ 0.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 18: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

min f (x), x ∈ Rn

Optimality conditions:give algebraic characteriszations of solutions, suitable forcomputationsprovide a way to guarantee that a candidate point is optimal(sufficient conditions)indicate when a point is not optimal (necessary conditions)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 19: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

min f (x), x ∈ Rn, f ∈ C1

Necessary conditions:x∗ is a local minimizer of f ⇒ ∇f (x∗) = 0 (stationarity)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 20: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

min f (x), x ∈ Rn, f ∈ C1

Necessary conditions:x∗ is a local minimizer of f ⇒ ∇f (x∗) = 0 (stationarity)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 21: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

stationary

but not a minimum

x∗•

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 22: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

stationary

but not a minimum

x∗•

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 23: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for UnconstrainedOptimization

stationary

but not a minimum

x∗•

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 24: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stationary Points(a)-(c) x∗ is stationary: ∇f (x∗) = 0

(a) ∇2f (x∗) posi-

tive definite: local

minimum

(c)∇2f (x∗) indefinite: saddle point

(b)∇2f (x∗) nega-

tive definite: local

maximum

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 25: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convex

x∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex

→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 26: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convex

x∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex

→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 27: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convexx∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex

→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 28: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convexx∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex

→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 29: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convexx∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex

→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 30: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Unconstrained ConvexOptimization

min f (x), x ∈ Rn

f convexx∗ is a local minimizer of f ⇒ x∗ is a global minimizer of f

x∗ is stationary⇒ x∗ is a global minimizer of f

f nonconvex→ look at higher order derivatives

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 31: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Second-Order Optimality Conditions forUnconstrained Optimization

min f (x), x ∈ Rn, f ∈ C2

Necessary second-order conditions:x∗ is a local minimizer of f ⇒ ∇2f (x∗) positive semidefinite(f locally convex)

Sufficient conditions:x∗ stationary and ∇2f (x∗) positive definite⇒ x∗ is a (strict) localminimizer of f

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 32: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Second-Order Optimality Conditions forUnconstrained Optimization

min f (x), x ∈ Rn, f ∈ C2

Necessary second-order conditions:x∗ is a local minimizer of f ⇒ ∇2f (x∗) positive semidefinite(f locally convex)

Sufficient conditions:x∗ stationary and ∇2f (x∗) positive definite⇒ x∗ is a (strict) localminimizer of f

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 33: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Second-Order Optimality Conditions forUnconstrained Optimization

min f (x), x ∈ Rn, f ∈ C2

Necessary second-order conditions:x∗ is a local minimizer of f ⇒ ∇2f (x∗) positive semidefinite(f locally convex)

Sufficient conditions:x∗ stationary and ∇2f (x∗) positive definite⇒ x∗ is a (strict) localminimizer of f

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 34: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stability

Let ε be a perturbation of the problem, then the solution x(ε) should bea small perturbation of the exact solution x∗:

||x(ε)− x∗|| ≤ c||ε||

f (x) = x4: Minimum at x∗ = 0f (x) = x4 − εx2: Maximum at x = 0,

Minima at x = ±√ε

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 35: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stability

Let ε be a perturbation of the problem, then the solution x(ε) should bea small perturbation of the exact solution x∗:

||x(ε)− x∗|| ≤ c||ε||

f (x) = x4: Minimum at x∗ = 0f (x) = x4 − εx2: Maximum at x = 0,

Minima at x = ±√ε

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 36: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stability

Let ε be a perturbation of the problem, then the solution x(ε) should bea small perturbation of the exact solution x∗:

||x(ε)− x∗|| ≤ c||ε||

f (x) = x4: Minimum at x∗ = 0

f (x) = x4 − εx2: Maximum at x = 0,Minima at x = ±

√ε

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 37: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stability

Let ε be a perturbation of the problem, then the solution x(ε) should bea small perturbation of the exact solution x∗:

||x(ε)− x∗|| ≤ c||ε||

f (x) = x4: Minimum at x∗ = 0f (x) = x4 − εx2: Maximum at x = 0,

Minima at x = ±√ε

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 38: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Stability

In the example problem the sufficient optimality conditions werenot satisfied (∇2f (x∗) is not positive definite)

One can show:

Optima that satisfy the sufficient optimality conditions are stableagainst perturbations

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 39: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Ball on a spring without constraints

minx∈R2

x21 + x2

2 + mx2

contour lines of f (x)

gradient vector∇f (x) = (2x1, 2x2 + m)

unconstrained minimum:

0 = ∇f (x∗)⇔ (x∗1 , x∗2 ) = (0,−m

2)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 40: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Ball on a spring with constraints

minx∈R2

x21 + x2

2 + mx2

h1(x) = 1 + x1 + x2 ≥ 0h2(x) = 3− x1 + x2 ≥ 0

gradient ∇h1 of active constraint

inactive constraint h2

constrained minimum:

∇f (x∗) = µ1∇h1(x∗)

µ1 is Lagrange multiplier

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 41: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Ball on a spring with active constraints

minx∈R2

x21 + x2

2 + mx2

h1(x) = 1 + x1 + x2 ≥ 0h2(x) = 3− x1 + x2 ≥ 0

“equilibrium of forces”

∇f (x∗) = µ1∇h1(x∗) + µ2∇h2(x∗), µ1 ≥ 0, µ2 ≥ 0

µ1, µ2, are Lagrange multipliers

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 42: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Multipliers as “shadow prices”

old constraint: h2(x) ≥ 0new constraint: h2(x) + ε ≥ 0

What happens if we relax aconstraint?Feasible set becomes larger,so new minimum f (x∗ε)becomes smaller.How much would we gain?

f (x∗ε) ≈ f (x∗)− εµ2

Multipliers show the hiddencost of constraints.

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 43: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

KKT Conditions for Constrained Optimization

minx∈Rn

f (x), s.t. g(x) = 0, h(x) ≥ 0.

Lagrangian function L : Rn × Rm × Rk → R

L(x, λ, µ) := f (x)−∑

i

λigi(x)−∑

i

µihi(x)

Karush-Kuhn-Tucker (KKT) point:x is a KKT point if there exist λ ∈ Rm and µ ∈ Rk such that (x, λ, µ)satisfies

g(x) = 0, h(x) ≥ 0

∇xL(x, λ, µ) = 0⇔ ∇f (x) =∑

i

λi∇gi(x) +∑

i

µi∇hi(x)

µ ≥ 0, µihi(x) = 0, i = 1, ..., k⇔ µi = 0 or hi(x) = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 44: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

KKT Conditions for Constrained Optimization

minx∈Rn

f (x), s.t. g(x) = 0, h(x) ≥ 0.

Lagrangian function L : Rn × Rm × Rk → R

L(x, λ, µ) := f (x)−∑

i

λigi(x)−∑

i

µihi(x)

Karush-Kuhn-Tucker (KKT) point:x is a KKT point if there exist λ ∈ Rm and µ ∈ Rk such that (x, λ, µ)satisfies

g(x) = 0, h(x) ≥ 0

∇xL(x, λ, µ) = 0⇔ ∇f (x) =∑

i

λi∇gi(x) +∑

i

µi∇hi(x)

µ ≥ 0, µihi(x) = 0, i = 1, ..., k⇔ µi = 0 or hi(x) = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 45: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

KKT Conditions for Constrained Optimization

minx∈Rn

f (x), s.t. g(x) = 0, h(x) ≥ 0.

Lagrangian function L : Rn × Rm × Rk → R

L(x, λ, µ) := f (x)−∑

i

λigi(x)−∑

i

µihi(x)

Karush-Kuhn-Tucker (KKT) point:x is a KKT point if there exist λ ∈ Rm and µ ∈ Rk such that (x, λ, µ)satisfies

g(x) = 0, h(x) ≥ 0

∇xL(x, λ, µ) = 0⇔ ∇f (x) =∑

i

λi∇gi(x) +∑

i

µi∇hi(x)

µ ≥ 0, µihi(x) = 0, i = 1, ..., k⇔ µi = 0 or hi(x) = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 46: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

KKT Conditions: Illustration

0 1 2 3

0

0.5

1

1.5

2

2.5

x1

c1(x )

c2(x )

f(x )

x

c2

c1

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 47: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 48: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)

CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 49: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible set

Examples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 50: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 51: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”

All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 52: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linear

Mangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 53: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

In general in order to derive optimality conditions we needconstraints/feasible set to satisfy regularity assumptions calledconstraint qualification (CQ)CQ: linearized approximation of constraint functions covers theessential geometry of the feasible setExamples

The Linear Independence Constraint Qualification (LICQ) at x⇒∇gi(x), i = 1, ...,m, ∇hi(x), i ∈ I(x), are linearly independent, whereI(x) := {i : 1 ≤ i ≤ k, hi(x) = 0} “active set”All active constraints (equalities and active inequalities) are linearMangansarian-Fromovitz CQ at x⇒∇gi(x), i = 1, ...,m linearlyindependent / or linear and ∃p ∈ Rn such that∇g(x)T p = 0, ∇hi(x)T p > 0, i ∈ I(x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 54: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

No CQ: Example from Fiacco, McCormick

h1(x) := (1− x1)3 − x2 ≥ 0 x2 ≤ (1− x1)

3

h2(x) := x1 ≥ 0h3(x) := x2 ≥ 0

∇h1 =

(−3(1− x1)−1

)∇h2 =

(10

)∇h3 =

(01

)Active inequalities in (1, 0)T :

h3(x) = 0 and h1(x) = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 55: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

First-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint.Second-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint and the Hessian ∇2L(x∗, λ∗, µ∗) of the Lagrange function ispositive semidefinite at the tangent set T(x∗):

pT∇2L(x∗, λ∗, µ∗)p ≥ 0,∀p ∈ T(x∗)

T(x∗) := {s : sT∇gi(x∗) = 0, i = 1, ...,m,sT∇hi(x∗) ≥ 0, i ∈ I(x∗)}

Sufficient optimality conditions: if KKT conditions hold and∇2L(x∗, λ∗, µ∗) is positive definite at T̃(x∗, λ∗), then x∗ is optimal,T̃(x∗, λ∗) := {s ∈ T(x∗) : sT∇hi(x∗) = 0, i ∈ I(x∗) with µi > 0}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 56: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

First-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint.

Second-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint and the Hessian ∇2L(x∗, λ∗, µ∗) of the Lagrange function ispositive semidefinite at the tangent set T(x∗):

pT∇2L(x∗, λ∗, µ∗)p ≥ 0,∀p ∈ T(x∗)

T(x∗) := {s : sT∇gi(x∗) = 0, i = 1, ...,m,sT∇hi(x∗) ≥ 0, i ∈ I(x∗)}

Sufficient optimality conditions: if KKT conditions hold and∇2L(x∗, λ∗, µ∗) is positive definite at T̃(x∗, λ∗), then x∗ is optimal,T̃(x∗, λ∗) := {s ∈ T(x∗) : sT∇hi(x∗) = 0, i ∈ I(x∗) with µi > 0}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 57: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

First-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint.Second-order necessary optimality conditions:

Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint and the Hessian ∇2L(x∗, λ∗, µ∗) of the Lagrange function ispositive semidefinite at the tangent set T(x∗):

pT∇2L(x∗, λ∗, µ∗)p ≥ 0,∀p ∈ T(x∗)

T(x∗) := {s : sT∇gi(x∗) = 0, i = 1, ...,m,sT∇hi(x∗) ≥ 0, i ∈ I(x∗)}

Sufficient optimality conditions: if KKT conditions hold and∇2L(x∗, λ∗, µ∗) is positive definite at T̃(x∗, λ∗), then x∗ is optimal,T̃(x∗, λ∗) := {s ∈ T(x∗) : sT∇hi(x∗) = 0, i ∈ I(x∗) with µi > 0}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 58: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

First-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint.Second-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint and the Hessian ∇2L(x∗, λ∗, µ∗) of the Lagrange function ispositive semidefinite at the tangent set T(x∗):

pT∇2L(x∗, λ∗, µ∗)p ≥ 0,∀p ∈ T(x∗)

T(x∗) := {s : sT∇gi(x∗) = 0, i = 1, ...,m,sT∇hi(x∗) ≥ 0, i ∈ I(x∗)}

Sufficient optimality conditions: if KKT conditions hold and∇2L(x∗, λ∗, µ∗) is positive definite at T̃(x∗, λ∗), then x∗ is optimal,T̃(x∗, λ∗) := {s ∈ T(x∗) : sT∇hi(x∗) = 0, i ∈ I(x∗) with µi > 0}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 59: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Optimality Conditions for Constrained Optimization

First-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint.Second-order necessary optimality conditions:Let x∗ be optimal and CQ are satisfied in x∗, then x∗ is a KKTpoint and the Hessian ∇2L(x∗, λ∗, µ∗) of the Lagrange function ispositive semidefinite at the tangent set T(x∗):

pT∇2L(x∗, λ∗, µ∗)p ≥ 0,∀p ∈ T(x∗)

T(x∗) := {s : sT∇gi(x∗) = 0, i = 1, ...,m,sT∇hi(x∗) ≥ 0, i ∈ I(x∗)}

Sufficient optimality conditions: if KKT conditions hold and∇2L(x∗, λ∗, µ∗) is positive definite at T̃(x∗, λ∗), then x∗ is optimal,T̃(x∗, λ∗) := {s ∈ T(x∗) : sT∇hi(x∗) = 0, i ∈ I(x∗) with µi > 0}

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 60: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithms for Unconstrained Optimization

min f (x) x ∈ Rn

Find a local minimizer x∗ of f (x), i.e. a point satisfying

∇f (x∗) = 0 (stationarity)and ∇2f (x∗) positive definite

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 61: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithms for Unconstrained Optimization

min f (x) x ∈ Rn

Find a local minimizer x∗ of f (x), i.e. a point satisfying

∇f (x∗) = 0 (stationarity)and ∇2f (x∗) positive definite

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 62: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithms for Unconstrained Optimization

min f (x) x ∈ Rn

Find a local minimizer x∗ of f (x), i.e. a point satisfying

∇f (x∗) = 0 (stationarity)and ∇2f (x∗) positive definite

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 63: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithms for Unconstrained Optimization

Basic structure of most algorithms:choose start value x0

for k = 1, ...,

determine search (descent) direction pk

determine steplenght αk

new iterate xk+1 = xk + αkpk

check for convergence

Optimization algorithms differ in the choice of pk and αk

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 64: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Properties of Optimization Algorithms

Optimization algorithms are iterative, i.e. they create an infinite (inpractice finite) sequence of points {xk} converging to theoptimum x∗

Two types of convergence:

Local convergence: convergence of the full-step (αk ≡ 1) algorithmnear the solutionGlobal convergence: convergence of an algorithm starting from anany arbitrary point x0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 65: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Properties of Optimization Algorithms

Optimization algorithms are iterative, i.e. they create an infinite (inpractice finite) sequence of points {xk} converging to theoptimum x∗

Two types of convergence:

Local convergence: convergence of the full-step (αk ≡ 1) algorithmnear the solutionGlobal convergence: convergence of an algorithm starting from anany arbitrary point x0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 66: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Properties of Optimization Algorithms

Optimization algorithms are iterative, i.e. they create an infinite (inpractice finite) sequence of points {xk} converging to theoptimum x∗

Two types of convergence:

Local convergence: convergence of the full-step (αk ≡ 1) algorithmnear the solutionGlobal convergence: convergence of an algorithm starting from anany arbitrary point x0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 67: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Properties of Optimization Algorithms

Optimization algorithms are iterative, i.e. they create an infinite (inpractice finite) sequence of points {xk} converging to theoptimum x∗

Two types of convergence:Local convergence: convergence of the full-step (αk ≡ 1) algorithmnear the solution

Global convergence: convergence of an algorithm starting from anany arbitrary point x0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 68: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Properties of Optimization Algorithms

Optimization algorithms are iterative, i.e. they create an infinite (inpractice finite) sequence of points {xk} converging to theoptimum x∗

Two types of convergence:Local convergence: convergence of the full-step (αk ≡ 1) algorithmnear the solutionGlobal convergence: convergence of an algorithm starting from anany arbitrary point x0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 69: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)r = 2 : quadratic convergencesuperlinear convergence: ||x

k+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 70: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞

{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)r = 2 : quadratic convergencesuperlinear convergence: ||x

k+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 71: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)r = 2 : quadratic convergencesuperlinear convergence: ||x

k+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 72: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)

r = 2 : quadratic convergencesuperlinear convergence: ||x

k+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 73: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)r = 2 : quadratic convergence

superlinear convergence: ||xk+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 74: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

{xk} ⊂ Rn, x∗ ∈ Rn, {xk} → x∗ as k→∞{xk} → x∗ with rate r if

||xk+1 − x∗||||xk − x∗||r

= c <∞, for sufficiently large k

r = 1 : linear convergence (c < 1)r = 2 : quadratic convergencesuperlinear convergence: ||x

k+1−x∗||||xk−x∗|| → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 75: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Rates of Convergence

0 20 40 60 80 100

0lk

sk

qk

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 76: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Generic Linesearch Algorithm

Search direction pk:f must decreasealong the direction pk

∇f (xk)pk < 0

Steplength αk

to guarantee global convergence:solve 1D minimization problem(exact or inexact):

αk = argminα

f (xk + αpk)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 77: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Computation of Steplength

Ideal: Move to (global) minimum on the selected line (univariateoptimization, exact line search)

αk = arg minα

f (xk + αpk)

In practice: approximate solution may guarantee globalconvergence, perform only inexact line search

αk ≈ argminα

f (xk + αpk)

Problem: how to guarantee sufficient decrease?Answer: Check e.g. if αk satisfies Armijo-Wolfe conditions

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 78: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Armijo-Wolfe Conditions for Inexact Line Search

Armijo condition (sufficent decrease condition):

f (xk + αpk) ≤ f (xk) + c1αk∇T f (xk)pk, c1 ∈ (0, 1)

Curvature condition:

∇T f (xk + αpk)pk ≥ c2∇T f (xk)pk, c2 ∈ (c1, 1)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 79: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

f ∈ C1 bounded from below∇f Lipschitz continuousApply Armijo-Wolfe inexact line searchThen

either there exist l ≥ 0 such that ∇f (xl) = 0

or min{ |∇f (xk)T pk|||sk|| , |∇f (xk)T pk|} → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 80: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

f ∈ C1 bounded from below∇f Lipschitz continuousApply Armijo-Wolfe inexact line search

Then

either there exist l ≥ 0 such that ∇f (xl) = 0

or min{ |∇f (xk)T pk|||sk|| , |∇f (xk)T pk|} → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 81: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

f ∈ C1 bounded from below∇f Lipschitz continuousApply Armijo-Wolfe inexact line searchThen

either there exist l ≥ 0 such that ∇f (xl) = 0

or min{ |∇f (xk)T pk|||sk|| , |∇f (xk)T pk|} → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 82: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

f ∈ C1 bounded from below∇f Lipschitz continuousApply Armijo-Wolfe inexact line searchThen

either there exist l ≥ 0 such that ∇f (xl) = 0

or min{ |∇f (xk)T pk|||sk|| , |∇f (xk)T pk|} → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 83: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

f ∈ C1 bounded from below∇f Lipschitz continuousApply Armijo-Wolfe inexact line searchThen

either there exist l ≥ 0 such that ∇f (xl) = 0

or min{ |∇f (xk)T pk|||sk|| , |∇f (xk)T pk|} → 0 as k→∞

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 84: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

Global convergence theorem:

if ∇f (xk) 6= 0 for all k then

limk→∞

||∇f (xk)|| cos θk min{1, ||pk||} = 0,

where θ is an angle between p and −∇f (x)

For global convergence (i.e.||∇f (xk)|| → 0 as k→∞) we need

not only pk to be descent directionbut also cos θk ≥ δ > 0 for all k(i.e.pk and ∇f (xk) should not become nearly orthogonal!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 85: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

Global convergence theorem:

if ∇f (xk) 6= 0 for all k then

limk→∞

||∇f (xk)|| cos θk min{1, ||pk||} = 0,

where θ is an angle between p and −∇f (x)

For global convergence (i.e.||∇f (xk)|| → 0 as k→∞) we need

not only pk to be descent directionbut also cos θk ≥ δ > 0 for all k(i.e.pk and ∇f (xk) should not become nearly orthogonal!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 86: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

Global convergence theorem:if ∇f (xk) 6= 0 for all k then

limk→∞

||∇f (xk)|| cos θk min{1, ||pk||} = 0,

where θ is an angle between p and −∇f (x)

For global convergence (i.e.||∇f (xk)|| → 0 as k→∞) we need

not only pk to be descent directionbut also cos θk ≥ δ > 0 for all k(i.e.pk and ∇f (xk) should not become nearly orthogonal!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 87: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

Global convergence theorem:if ∇f (xk) 6= 0 for all k then

limk→∞

||∇f (xk)|| cos θk min{1, ||pk||} = 0,

where θ is an angle between p and −∇f (x)

For global convergence (i.e.||∇f (xk)|| → 0 as k→∞) we neednot only pk to be descent direction

but also cos θk ≥ δ > 0 for all k(i.e.pk and ∇f (xk) should not become nearly orthogonal!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 88: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Global Convergence of Generic Line Search Method

Global convergence theorem:if ∇f (xk) 6= 0 for all k then

limk→∞

||∇f (xk)|| cos θk min{1, ||pk||} = 0,

where θ is an angle between p and −∇f (x)

For global convergence (i.e.||∇f (xk)|| → 0 as k→∞) we neednot only pk to be descent directionbut also cos θk ≥ δ > 0 for all k(i.e.pk and ∇f (xk) should not become nearly orthogonal!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 89: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Computation of the Search Direction

For the determination of pk frequently first and second orderderivatives of f (xk) are used

We discuss:

Steepest descent method

Newton’s method

Quasi-Newton methods

Left out: conjugate gradients

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 90: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 1: Steepest Descent Method

Based on first order Taylor series approximation of objectivefunction

f (xk + pk) = f (xk) +∇T f (xk)pk︸ ︷︷ ︸+...maximum descent, if

∇T f (xk)pk

||pk||→ min!

⇒ pk = −∇f (xk)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 91: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 1: Steepest Descent Method

Choose steepest descent direction, perform (exact) line search:

pk = −∇f (xk) xk+1 = xk − αk∇f (xk)

search direction is perpendicular to level sets of f (x)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 92: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent Method

Excellent global convergence propertiesunder weak assumptionsAsymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 93: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent MethodExcellent global convergence propertiesunder weak assumptions

Asymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 94: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent MethodExcellent global convergence propertiesunder weak assumptionsAsymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1

Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 95: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent MethodExcellent global convergence propertiesunder weak assumptionsAsymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 96: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent MethodExcellent global convergence propertiesunder weak assumptionsAsymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 97: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Steepest Descent MethodExcellent global convergence propertiesunder weak assumptionsAsymptotically, convergence rate is linear

i.e. |f (xk+1)− f (x∗)| ≤ C|f (xk)− f (x∗)|

with C < 1Convergence can be very slowif C close to 1

If f (x) = xTAx, A positive definite (quadraticconvex) λi are eigenvalues of A, one canshow that

C ≈ λmax − λmin

λmax + λmin

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 98: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Example - Steepest Descent Method

f (x, y) =((x− y2)2 +

1100

) 14

+1

100y2

banana valley function

global minimum at x = y = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 99: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Example - Steepest Descent Method

Convergence of steepest descent method:needs almost 35.000 iterations to come closer than 0.1 to thesolutionmean value of convergence rate C ≈ 0.99995it holds at (x = 4, y = 2)

λmin = 0.1, λmax = 268,C ≈ 268− 0.1268− 0.1

≈ 0.9993

||xk − x∗||

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 100: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Steepest Descent Methods

first-order method (inexpensive)

global convergence under weak assumptions, but nosecond-order optimality guarantees for the generated solution

scale-dependent: when the objective poorly scaled, very slowconvergence, cumulation of round-off errors and break-down

useful for some special applications (e.g. in data analysis)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 101: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Steepest Descent Methods

first-order method (inexpensive)

global convergence under weak assumptions, but nosecond-order optimality guarantees for the generated solution

scale-dependent: when the objective poorly scaled, very slowconvergence, cumulation of round-off errors and break-down

useful for some special applications (e.g. in data analysis)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 102: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Steepest Descent Methods

first-order method (inexpensive)

global convergence under weak assumptions, but nosecond-order optimality guarantees for the generated solution

scale-dependent: when the objective poorly scaled, very slowconvergence, cumulation of round-off errors and break-down

useful for some special applications (e.g. in data analysis)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 103: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Steepest Descent Methods

first-order method (inexpensive)

global convergence under weak assumptions, but nosecond-order optimality guarantees for the generated solution

scale-dependent: when the objective poorly scaled, very slowconvergence, cumulation of round-off errors and break-down

useful for some special applications (e.g. in data analysis)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 104: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Steepest Descent Methods

first-order method (inexpensive)

global convergence under weak assumptions, but nosecond-order optimality guarantees for the generated solution

scale-dependent: when the objective poorly scaled, very slowconvergence, cumulation of round-off errors and break-down

useful for some special applications (e.g. in data analysis)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 105: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 2: Newton’s Method

Based on second order Taylor series approximation of objective function

f (xk + pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk︸ ︷︷ ︸+...

maximum descent, if

∇T f (xk)pk +12(pk)T∇2f (xk)pk → min!

→ pk = −(∇2f (xk))−1∇f (xk)

pk is “Newton Direction”

pk is decrease direction if the hessian ∇2f (xk) is positive definite!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 106: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 2: Newton’s Method

Based on second order Taylor series approximation of objective function

f (xk + pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk︸ ︷︷ ︸+...

maximum descent, if

∇T f (xk)pk +12(pk)T∇2f (xk)pk → min!

→ pk = −(∇2f (xk))−1∇f (xk)

pk is “Newton Direction”

pk is decrease direction if the hessian ∇2f (xk) is positive definite!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 107: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 2: Newton’s Method

Based on second order Taylor series approximation of objective function

f (xk + pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk︸ ︷︷ ︸+...

maximum descent, if

∇T f (xk)pk +12(pk)T∇2f (xk)pk → min!

→ pk = −(∇2f (xk))−1∇f (xk)

pk is “Newton Direction”

pk is decrease direction if the hessian ∇2f (xk) is positive definite!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 108: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 2: Newton’s Method

Based on second order Taylor series approximation of objective function

f (xk + pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk︸ ︷︷ ︸+...

maximum descent, if

∇T f (xk)pk +12(pk)T∇2f (xk)pk → min!

→ pk = −(∇2f (xk))−1∇f (xk)

pk is “Newton Direction”

pk is decrease direction if the hessian ∇2f (xk) is positive definite!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 109: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 2: Newton’s Method

Based on second order Taylor series approximation of objective function

f (xk + pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk︸ ︷︷ ︸+...

maximum descent, if

∇T f (xk)pk +12(pk)T∇2f (xk)pk → min!

→ pk = −(∇2f (xk))−1∇f (xk)

pk is “Newton Direction”

pk is decrease direction if the hessian ∇2f (xk) is positive definite!

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 110: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Visualization of Newton’s method

pk minimizes quadratic approximation of the objective

Q(pk) = f (xk) +∇T f (xk)pk +12(pk)T∇2f (xk)pk

gradient direction

Newton direction

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 111: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Why is it called Newton’s method?

Newton’s method finds zeros of nonlinear equations. Here: findsolution of the equation

∇f (x) = 0

Newton’s idea: use Taylor series of ∇f at xk:

∇f (xk + pk) ≈ ∇f (xk) +∇2f (xk)pk = 0!

and to make this zero, set:

pk = −(∇2f (xk))−1∇f (xk)︸ ︷︷ ︸Newton direction

(Full step) Newton’s method: iterate

xk+1 = xk + pk

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 112: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!Problem:

If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.

Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 113: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!Problem:

If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.

Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 114: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!

Problem:If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.

Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 115: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!Problem:

If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.

Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 116: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!Problem:

If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.

Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 117: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence of Newton’s method

Newton’s method has quadratic rate of local convergence

i.e. ||xk+1 − x∗|| ≤ C||xk − x∗||2,C <∞

This is very fast if we are close to a solution:Doubles the correct digits in each iteration!Problem:

If the start value x0 of the iteration is near to a saddle point or amaximum, the full step method converges to this saddle point ormaximum.Line search helps, but is only possible if p is descent direction, i.e. if∇2f positive definite.Ensure this by: Levenberg-Marquardt, or trust-region methods

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 118: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Example - Newton’s method

f (x, y) =((x− y2)2 +

1100

) 14

+1

100y2

banana valley function

global minimum at x = y = 0

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 119: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Example - Newton’s method

Convergence of Newton’s method:less than 25 iterations for an accuracy of better than 10−7!convergence roughly linear for first 15-20 iterations since steplength αk 6= 1convergence roughly quadratic for last iterations with step lengthαk = 1

||xk − x∗||

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 120: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Comparison of the Steepest Descent and Newton’sMethods

For banana valley example:Newton’s method much faster than steepest descent method (factor1000)Newton’s method superior due to higher order of convergencesteepest descent method converges too slowly for practicalapplications

||xk − x∗||

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 121: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Comparison of the Steepest Descent and Newton’sMethods

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

0 0.2 0.4 0.6 0.8 1

0

0.5

1

1.5

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 122: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 123: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 124: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 125: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 126: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 127: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergent

Line search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 128: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Newton’s Methods

Fast (i.e. quadratic) local rate of convergence

Scale-invariant w.r.t. linear transformations of the variables

pk is not well-defined if ∇2f (xk) singular, pk is not a descent if∇2f (xk) is not positive definite

xk can be attracted to local maxima or saddle points of f

Very small neighbourhood of local convergence, Newton’smethod is not globally convergentLine search, trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 129: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 3: Quasi-Newton Methods

In practice, evaluation of second derivatives for the hessian isexpensive!Idea: approximate Hessian matrix ∇2f (xk)

also ensure that the approximation Bk is positive definite

xk+1 = xk − (Bk)−1∇f (xk)

Bk ≈ ∇2f (xk)

methods are known as Quasi-Newton methodsspecial case: steepest descent method: B = I

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 130: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 3: Quasi-Newton Methods

In practice, evaluation of second derivatives for the hessian isexpensive!Idea: approximate Hessian matrix ∇2f (xk)

also ensure that the approximation Bk is positive definite

xk+1 = xk − (Bk)−1∇f (xk)

Bk ≈ ∇2f (xk)

methods are known as Quasi-Newton methodsspecial case: steepest descent method: B = I

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 131: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 3: Quasi-Newton Methods

In practice, evaluation of second derivatives for the hessian isexpensive!Idea: approximate Hessian matrix ∇2f (xk)

also ensure that the approximation Bk is positive definite

xk+1 = xk − (Bk)−1∇f (xk)

Bk ≈ ∇2f (xk)

methods are known as Quasi-Newton methods

special case: steepest descent method: B = I

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 132: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Algorithm 3: Quasi-Newton Methods

In practice, evaluation of second derivatives for the hessian isexpensive!Idea: approximate Hessian matrix ∇2f (xk)

also ensure that the approximation Bk is positive definite

xk+1 = xk − (Bk)−1∇f (xk)

Bk ≈ ∇2f (xk)

methods are known as Quasi-Newton methodsspecial case: steepest descent method: B = I

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 133: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Methods

different Quasi-Newton methods:

simplified Newton method: keep Hessian approximation B constant,e.g.

Bk ≡ ∇2f (x0)

or: use same matrix B for several iterations:

if ||∆xk+1||∆xk|| ≥ δmax then update Bk ≡ ∇2f (xk)

or, even cheaper: use update-formulas for Hessian...

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 134: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Methods

different Quasi-Newton methods:

simplified Newton method: keep Hessian approximation B constant,e.g.

Bk ≡ ∇2f (x0)

or: use same matrix B for several iterations:

if ||∆xk+1||∆xk|| ≥ δmax then update Bk ≡ ∇2f (xk)

or, even cheaper: use update-formulas for Hessian...

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 135: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Methods

different Quasi-Newton methods:simplified Newton method: keep Hessian approximation B constant,e.g.

Bk ≡ ∇2f (x0)

or: use same matrix B for several iterations:

if ||∆xk+1||∆xk|| ≥ δmax then update Bk ≡ ∇2f (xk)

or, even cheaper: use update-formulas for Hessian...

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 136: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Methods

different Quasi-Newton methods:simplified Newton method: keep Hessian approximation B constant,e.g.

Bk ≡ ∇2f (x0)

or: use same matrix B for several iterations:

if ||∆xk+1||∆xk|| ≥ δmax then update Bk ≡ ∇2f (xk)

or, even cheaper: use update-formulas for Hessian...

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 137: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Methods

different Quasi-Newton methods:simplified Newton method: keep Hessian approximation B constant,e.g.

Bk ≡ ∇2f (x0)

or: use same matrix B for several iterations:

if ||∆xk+1||∆xk|| ≥ δmax then update Bk ≡ ∇2f (xk)

or, even cheaper: use update-formulas for Hessian...

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 138: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Update Formulas

Idea: Given an Hessian approximation Bk

find a new approximation Bk+1 that is “close” to Bk and satisfies

∇f (xk) + Bk+1(xk+1 − xk) = ∇f (xk+1)

Advantages:needs only evaluation of gradient ∇f (xk) (same cost as steepestdescent), but incorporates second order informationadditional advantage: can update the inverse (Bk)−1 directly

Examples:Symmetric Broyden-updateDFP-update (Davidon, Fletcher, Powell)BFGS-update (Broyden, Fletcher, Goldfarb, Shanno) (most widely used)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 139: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Update Formulas

Idea: Given an Hessian approximation Bk

find a new approximation Bk+1 that is “close” to Bk and satisfies

∇f (xk) + Bk+1(xk+1 − xk) = ∇f (xk+1)

Advantages:needs only evaluation of gradient ∇f (xk) (same cost as steepestdescent), but incorporates second order informationadditional advantage: can update the inverse (Bk)−1 directly

Examples:Symmetric Broyden-updateDFP-update (Davidon, Fletcher, Powell)BFGS-update (Broyden, Fletcher, Goldfarb, Shanno) (most widely used)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 140: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Update Formulas

Idea: Given an Hessian approximation Bk

find a new approximation Bk+1 that is “close” to Bk and satisfies

∇f (xk) + Bk+1(xk+1 − xk) = ∇f (xk+1)

Advantages:needs only evaluation of gradient ∇f (xk) (same cost as steepestdescent), but incorporates second order informationadditional advantage: can update the inverse (Bk)−1 directly

Examples:Symmetric Broyden-updateDFP-update (Davidon, Fletcher, Powell)BFGS-update (Broyden, Fletcher, Goldfarb, Shanno) (most widely used)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 141: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Quasi-Newton Update Formulas

Idea: Given an Hessian approximation Bk

find a new approximation Bk+1 that is “close” to Bk and satisfies

∇f (xk) + Bk+1(xk+1 − xk) = ∇f (xk+1)

Advantages:needs only evaluation of gradient ∇f (xk) (same cost as steepestdescent), but incorporates second order informationadditional advantage: can update the inverse (Bk)−1 directly

Examples:Symmetric Broyden-updateDFP-update (Davidon, Fletcher, Powell)BFGS-update (Broyden, Fletcher, Goldfarb, Shanno) (most widely used)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 142: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence Properties

Quasi-Newton update methods converge locally superlinearly

i.e. ||xk+1 − x∗|| ≤ Ck||xk − x∗||,Ck → 0

Quasi-Newton methods converge globally (i.e. from arbitraryinitial point), if Bk remain positive definite and line search is usedQuasi-Newton methods most popular method for medium scaleproblems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 143: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence Properties

Quasi-Newton update methods converge locally superlinearly

i.e. ||xk+1 − x∗|| ≤ Ck||xk − x∗||,Ck → 0

Quasi-Newton methods converge globally (i.e. from arbitraryinitial point), if Bk remain positive definite and line search is usedQuasi-Newton methods most popular method for medium scaleproblems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 144: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence Properties

Quasi-Newton update methods converge locally superlinearly

i.e. ||xk+1 − x∗|| ≤ Ck||xk − x∗||,Ck → 0

Quasi-Newton methods converge globally (i.e. from arbitraryinitial point), if Bk remain positive definite and line search is used

Quasi-Newton methods most popular method for medium scaleproblems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 145: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Convergence Properties

Quasi-Newton update methods converge locally superlinearly

i.e. ||xk+1 − x∗|| ≤ Ck||xk − x∗||,Ck → 0

Quasi-Newton methods converge globally (i.e. from arbitraryinitial point), if Bk remain positive definite and line search is usedQuasi-Newton methods most popular method for medium scaleproblems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 146: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-method

Constrained problem:

min f (x) f : D ∈ Rn → Rs.t. g(x) = 0 g : D ∈ Rn → Rl

h(x) ≥ 0 h : D ∈ Rn → Rk

Idea: Consider successively quadratic approximations of the problem:

minp

f (xk) +∇T f (xk)p +12

pT Hkp

s.t. g(xk) +∇g(xk)p = 0

h(xk) +∇h(xk)p ≥ 0

Hk ≈ ∇2L(x, λ, µ)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 147: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-method

Constrained problem:

min f (x) f : D ∈ Rn → Rs.t. g(x) = 0 g : D ∈ Rn → Rl

h(x) ≥ 0 h : D ∈ Rn → Rk

Idea: Consider successively quadratic approximations of the problem:

minp

f (xk) +∇T f (xk)p +12

pT Hkp

s.t. g(xk) +∇g(xk)p = 0

h(xk) +∇h(xk)p ≥ 0

Hk ≈ ∇2L(x, λ, µ)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 148: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-method

Constrained problem:

min f (x) f : D ∈ Rn → Rs.t. g(x) = 0 g : D ∈ Rn → Rl

h(x) ≥ 0 h : D ∈ Rn → Rk

Idea: Consider successively quadratic approximations of the problem:

minp

f (xk) +∇T f (xk)p +12

pT Hkp

s.t. g(xk) +∇g(xk)p = 0

h(xk) +∇h(xk)p ≥ 0

Hk ≈ ∇2L(x, λ, µ)

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 149: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-method

if we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 150: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-methodif we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 151: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-methodif we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 152: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-methodif we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 153: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-methodif we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 154: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-methodif we use the exact Hessian of the Lagrangian

H = ∇2L(x, λ, µ)

this leads to a Newton-method for the optimality conditions and feasibility(KKT-conditions)

with update-formulas for Hk, we obtain quasi-Newton SQP-methods

if we use appropriate update-formulas, we can have superlinearconvergence

global convergence can be achieved by using a stepsize strategy basedon (inexact) 1D minimization of an appropriate merit function, e.g. exactmerit function

T(x) = f (x) +∑

eq

γi|gi(x)|+∑ineq

βi|min{0, hi(x)}|

with sufficiently large γi, βi

alternatively, global convergence by trust region

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 155: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Constrained Optimization: SQP-method

1. Start with k = 0, start value x0 and H0 = I

2. Compute f (xk), g(xk), h(xk),∇f (xk),∇g(xk),∇h(xk)

3. If xk feasible and ||∇L(x, λ, µ)|| < ε then stop→ convergenceachieved

4. Solve quadratic problem (QP) and get pk

5. Perform line search and get stepsize αk

6. Iterate xk+1 = xk + αkpk

7. Update Hessian of the Lagrange function8. k = k + 1, goto step 2

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 156: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Solution of the Quadratic Program

Unconstrained case:

minp

gTp +12

pTHp

H must be positive definite, otherwise the optimization problemhas no solutionnecessary optimality condition:

Hp + g = 0

=> use cholesky-method or cg-method to solve

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 157: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Solution of the Quadratic Program

equality constrained case:

minp

gTp +12

pTHp

Ap + a = 0

necessary optimality condition (KKT-system): ∃λ such that(H AT

A 0

)= −

(ga

)

matrix is indefinite, use range- or nullspace-method to solve

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 158: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Solution of the Quadratic Program

equality and inequality constrained case:

minp

gT p +12

pT Hp

Ap + a = 0

Bp + b ≥ 0

use active-set-strategy

aim: find out which inequalities are active at the solution and which not

idea: solve a sequence of equality constrained QPs

minp

gT p +12

pT Hp

Ap + a = 0

Bip + bi = 0, i ∈ Wk

where Wk is a “guess” for an optimal active set

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 159: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 160: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 161: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 162: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 163: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 164: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Active-Set Strategy: Example

min(p1 − 1)2 + (p2 − 2.5)2

p1 + 2p2 + 2 ≥ 0

−p1 − 2p2 + 6 ≥ 0

−p1 − 2p2 + 2 ≥ 0

p1 ≥ 0

p2 ≥ 0

p0 = (2, 0)T , W0 = {3, 5}, negativemultiplier with respect to constraint 3,remove constraint 3

p1 = (2, 0), W1 = {5}, no negativemultipliers, solve QP, step length θ = 1

p2 = (1, 0), W2 = {5}, negativemultiplier respect to constraint 5,remove constraint 5

p3 = (1, 0), W3 = {}, no negativemultipliers, solve QP, step length θ < 1,constraint 1 gets active

p4 = (1, 1.5), W4 = {1}, no negativemultipliers, solve QP, step length θ = 1

p5 = (1.4, 1.7), W5 = {3, 5} allmultipliers positive→ solution

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 165: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Alternative Methods for Constrained Optimization

Penalty and barrier methodsAugmented lagrangian methodsInterior point methods for inequality constrained problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 166: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Alternative Methods for Constrained Optimization

Penalty and barrier methods

Augmented lagrangian methodsInterior point methods for inequality constrained problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 167: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Alternative Methods for Constrained Optimization

Penalty and barrier methodsAugmented lagrangian methods

Interior point methods for inequality constrained problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 168: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Alternative Methods for Constrained Optimization

Penalty and barrier methodsAugmented lagrangian methodsInterior point methods for inequality constrained problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 169: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Optimization Methods Overview

Optimization problems can be (un)constrained, (non)convex, (non)linear,(non)smooth, continuous/integer,(in)finite dimensional, ...

Here: try to find local minima of smooth nonlinear problems: ∇f (x) = 0(resp. ∇L(x, λ, µ) = 0, g(x) = 0, hactive = 0)

Starting at an initial guess x0 , most methods iterate xk+1 = xk + αkpk withsearch direction pk and step length αk

Search direction can be chosen differently

steepest descent (simple, but slow and rarely used in practice)Newton’s method (very fast if Hessian cheaply available)Quasi-Newton methods (cheap, fast, and popular, e.g. BFGS)SQP methods for constrained optimizationCG method (good for very large scale problems)

Other methods: direct search, simulated annealing, genetic algorithms,... useful for special optimization problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 170: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Optimization Methods Overview

Optimization problems can be (un)constrained, (non)convex, (non)linear,(non)smooth, continuous/integer,(in)finite dimensional, ...

Here: try to find local minima of smooth nonlinear problems: ∇f (x) = 0(resp. ∇L(x, λ, µ) = 0, g(x) = 0, hactive = 0)

Starting at an initial guess x0 , most methods iterate xk+1 = xk + αkpk withsearch direction pk and step length αk

Search direction can be chosen differently

steepest descent (simple, but slow and rarely used in practice)Newton’s method (very fast if Hessian cheaply available)Quasi-Newton methods (cheap, fast, and popular, e.g. BFGS)SQP methods for constrained optimizationCG method (good for very large scale problems)

Other methods: direct search, simulated annealing, genetic algorithms,... useful for special optimization problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 171: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Optimization Methods Overview

Optimization problems can be (un)constrained, (non)convex, (non)linear,(non)smooth, continuous/integer,(in)finite dimensional, ...

Here: try to find local minima of smooth nonlinear problems: ∇f (x) = 0(resp. ∇L(x, λ, µ) = 0, g(x) = 0, hactive = 0)

Starting at an initial guess x0 , most methods iterate xk+1 = xk + αkpk withsearch direction pk and step length αk

Search direction can be chosen differently

steepest descent (simple, but slow and rarely used in practice)Newton’s method (very fast if Hessian cheaply available)Quasi-Newton methods (cheap, fast, and popular, e.g. BFGS)SQP methods for constrained optimizationCG method (good for very large scale problems)

Other methods: direct search, simulated annealing, genetic algorithms,... useful for special optimization problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 172: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Optimization Methods Overview

Optimization problems can be (un)constrained, (non)convex, (non)linear,(non)smooth, continuous/integer,(in)finite dimensional, ...

Here: try to find local minima of smooth nonlinear problems: ∇f (x) = 0(resp. ∇L(x, λ, µ) = 0, g(x) = 0, hactive = 0)

Starting at an initial guess x0 , most methods iterate xk+1 = xk + αkpk withsearch direction pk and step length αk

Search direction can be chosen differently

steepest descent (simple, but slow and rarely used in practice)Newton’s method (very fast if Hessian cheaply available)Quasi-Newton methods (cheap, fast, and popular, e.g. BFGS)SQP methods for constrained optimizationCG method (good for very large scale problems)

Other methods: direct search, simulated annealing, genetic algorithms,... useful for special optimization problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 173: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Summary: Optimization Methods Overview

Optimization problems can be (un)constrained, (non)convex, (non)linear,(non)smooth, continuous/integer,(in)finite dimensional, ...

Here: try to find local minima of smooth nonlinear problems: ∇f (x) = 0(resp. ∇L(x, λ, µ) = 0, g(x) = 0, hactive = 0)

Starting at an initial guess x0 , most methods iterate xk+1 = xk + αkpk withsearch direction pk and step length αk

Search direction can be chosen differently

steepest descent (simple, but slow and rarely used in practice)Newton’s method (very fast if Hessian cheaply available)Quasi-Newton methods (cheap, fast, and popular, e.g. BFGS)SQP methods for constrained optimizationCG method (good for very large scale problems)

Other methods: direct search, simulated annealing, genetic algorithms,... useful for special optimization problems

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 174: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

References

J. Nocedal, S. Wright: Numerical Optimization, Springer, 1999P. E. Gill, W. Murray, M. H. Wright: Practical Optimization,Academic Press, 1981R. Fletcher, Practical Methods of Optimization, Wiley, 1987D. E. Luenberger: Linear and Nonlinear Programming, AddisonWesley, 1984

Ekaterina Kostina Nonlinear Programming IWR School 2018

Page 175: Nonlinear Programming - Heidelberg University · 2019-09-16 · Nonlinear Programming Ekaterina A. Kostina IWR School 2018 “Advances in Mathematical Optimization” October 8-12,

Thank you for your attention!

Ekaterina Kostina Nonlinear Programming IWR School 2018