Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear...

96
Nonlinear Programming Models Fabio Schoen 2008 http://gol.dsi.unifi.it/users/schoen Nonlinear Programming Models – p. 1 Introduction Nonlinear Programming Models – p. 2 NLP problems min f (x) x S R n Standard form: min f (x) h i (x)=0 i =1,m g j (x) 0 j =1,k Here S = {x R n : h i (x)=0i, g j (x) 0j } Nonlinear Programming Models – p. 3 Local and global optima A global minimum or global optimum is any x S such that x S f (x) f (x ) A point ¯ x is a local optimum if ε> 0 such that x S Bx, ε)f (x) f x) where Bx, ε)= {x R n : x ¯ x‖≤ ε} is a ball in R n . Any global optimum is also a local optimum, but the opposite is generally false. Nonlinear Programming Models – p. 4

Transcript of Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear...

Page 1: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Nonlinear Programming ModelsFabio Schoen

2008

http://gol.dsi.unifi.it/users/schoen

Nonlinear Programming Models – p. 1

Introduction

Nonlinear Programming Models – p. 2

NLP problems

min f(x)

x ∈ S ⊆ Rn

Standard form:

min f(x)

hi(x) = 0 i = 1,m

gj(x) ≤ 0 j = 1, k

Here S = x ∈ Rn : hi(x) = 0∀ i, gj(x) ≤ 0∀ j

Nonlinear Programming Models – p. 3

Local and global optima

A global minimum or global optimum is any x⋆ ∈ S such that

x ∈ S⇒f(x) ≥ f(x⋆)

A point x is a local optimum if ∃ ε > 0 such that

x ∈ S ∩ B(x, ε)⇒f(x) ≥ f(x)

where B(x, ε) = x ∈ Rn : ‖x − x‖ ≤ ε is a ball in R

n.Any global optimum is also a local optimum, but the opposite isgenerally false.

Nonlinear Programming Models – p. 4

Page 2: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convex Functions

A set S ⊆ Rn is convex if

x, y ∈ S⇒λx + (1 − λ)y ∈ S

for all choices of λ ∈ [0, 1]. Let Ω ⊆ Rn: non empty convex set. A

function f : Ω → R is convex iff

f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y)

for all x, y ∈ Ω, λ ∈ [0, 1]

Nonlinear Programming Models – p. 5

Convex Functions

x y

Nonlinear Programming Models – p. 6

Properties of convex functions

Every convex function is continuous in the interior of Ω. It mightbe discontinuous, but only on the frontier.If f is continuously differentiable then it is convex iff

f(y) ≥ f(x) + (y − x)T∇f(x)

for all y ∈ Ω

Nonlinear Programming Models – p. 7

Convex functions

yx

Nonlinear Programming Models – p. 8

Page 3: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

If f is twice continuously differentiable ⇒f it is convex iff itsHessian matrix is positive semi-definite:

∇2f(x) :=

[

∂2f

∂xi∂xj

]

then ∇2f(x) < 0 iff

vT∇2f(x)v ≥ 0 ∀ v ∈ Rn

or, equivalently, all eigenvalues of ∇2f(x) are non negative.

Nonlinear Programming Models – p. 9

Example: an affine function is convex (and concave)For a quadratic function (Q: symmetric matrix):

f(x) =1

2xT Qx + bT x + c

we have

∇f(x) = Qx + b ∇2f(x) = Q

⇒f is convex iff Q < 0

Nonlinear Programming Models – p. 10

Convex Optimization Problems

min f(x)

x ∈ S

is a convex optimization problem iff S is a convex set and f isconvex on S. For a problem in standard form

min f(x)

hi(x) = 0 i = 1,m

gj(x) ≤ 0 j = 1, k

if f is convex, hi(x) are affine functions, gj(x) are convexfunctions, then the problem is convex.

Nonlinear Programming Models – p. 11

Maximization

Slight abuse in notation: a problem

max f(x)

x ∈ S

is called convex iff S is a convex set and f is a concave function(not to be confused with minimization of a concave function, (ormaximization of a convex function) which are NOT a convexoptimization problem)

Nonlinear Programming Models – p. 12

Page 4: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convex and non convex optimization

Convex optimization “is easy”, non convex optimization isusually very hard.Fundamental property of convex optimization problems: everylocal optimum is also a global optimum (will give a proof later)Minimizing a positive semidefinite quadratic function on apolyhedron is easy (polynomially solvable); if even a singleeigenvalue of the hessian is negative ⇒the problem becomesNP–hard

Nonlinear Programming Models – p. 13

Convex functions: examples

Many (of course not all . . . ) functions are convex!

affine functions aT x + b

quadratic functions 12xT Qx + bT x + c with Q = QT , Q 0

any norm is a convex function

x log x (however log x is concave)

f is convex if and only if ∀x0, d ∈ Rn, its restriction to any

line: φ(α) = f(x0 + αd), is a convex function

a linear non negative combination of convex functions isconvex

g(x, y) convex in x for all y ⇒∫

g(x, y) dy convex

Nonlinear Programming Models – p. 14

more examples . . .

maxiaTi x + b is convex

f, g: convex ⇒maxf(x), g(x) is convex

fa convex functions for any a ∈ A (a possibly uncountableset) ⇒supa∈A fa(x) is convex

f convex ⇒f(Ax + b)

let S ⊆ Rn be any set ⇒f(x) = sups∈S ‖x − s‖ is convex

Trace(AT X) =∑

i,j AijXij is convex (it is linear!)

log det X−1 is convex over the set of matricesX ∈ R

n×n : X ≻ 0

λmax(X) (the largest eigenvalue of a matrix X)

Nonlinear Programming Models – p. 15

Data Approximation

Nonlinear Programming Models – p. 16

Page 5: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Table of contents

norm approximation

maximum likelihood

robust estimation

Nonlinear Programming Models – p. 17

Norm approximation

Problem:min

x‖Ax − b‖

where A, b: parameters. Usually the system is over-determined,i.e. b 6∈ Range(A).For example, this happens when A ∈ R

m×n with m > n and Ahas full rank.r := Ax − b: “residual”.

Nonlinear Programming Models – p. 18

Examples

‖r‖ =√

rT r: least squares (or “regression”)

‖r‖ =√

rT Pr with P ≻ 0: weighted least squares

‖r‖ = maxi |ri|: minimax, or ℓ∞ or di Tchebichevapproximation

‖r‖ =∑

i |ri|: absolute or ℓ1 approximation

Possible (convex) additional constraints:

maximum deviation from an initial estimate: ‖x − xest‖ ≤ ǫ

simple bounds ℓi ≤ xi ≤ ui

ordering: x1 ≤ x2 ≤ · · · ≤ xn

Nonlinear Programming Models – p. 19

Example: ℓ1 norm

Matrix A ∈ R100×30

0

10

20

30

40

50

60

70

80

-5 -4 -3 -2 -1 0 1 2 3 4 5

norm 1 residuals

Nonlinear Programming Models – p. 20

Page 6: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

ℓ∞ norm

0

2

4

6

8

10

12

14

16

18

20

-5 -4 -3 -2 -1 0 1 2 3 4 5

∞ norm residuals

Nonlinear Programming Models – p. 21

ℓ2 norm

0

2

4

6

8

10

12

14

16

18

-5 -4 -3 -2 -1 0 1 2 3 4 5

norm 2 residuals

Nonlinear Programming Models – p. 22

Variants

min∑

i h(yi − aTi x) where h: convex function:

h linear–quadratic h(z) =

z2 |z| ≤ 1

2|z| − 1 |z| > 1

“dead zone”: h(z) =

0 |z| ≤ 1

|z| − 1 |z| > 1

logarithmic barrier: h(z) =

− log(1 − z2) |z| < 1

∞ |z| ≥ 1

Nonlinear Programming Models – p. 23

comparison

-0.5

0

0.5

1

1.5

2

2.5

3

3.5

4

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

norm 1(x)norm 2(x)linquad(x)

deadzone(x)logbarrier(x)

Nonlinear Programming Models – p. 24

Page 7: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Maximum likelihood

Given a sample X1, X2, . . . , Xk and a parametric family ofprobability density functions L(·; θ), the maximum likelihoodestimate of θ given the sample is

θ = arg maxθ

L(X1, . . . , Xk; θ)

Example: linear measures with and additive i.i.d. (independentidentically dsitributed) noise:

Xi = aTi θ + εi (1)

where εi iid random variables with density p(·):

L(X1 . . . , Xk; θ) =k∏

i=1

p(Xi − aTi θ)

Nonlinear Programming Models – p. 25

Max likelihood estimate - MLE

(taking the logarithm, which does not change optimum points):

θ = arg maxθ

i

log(p(Xi − aTi θ))

If p is log–concave ⇒this problem is convex. Examples:

ε ∼ N (0, σ), i.e. p(z) = (2πσ)−1/2 exp(−z2/2σ2) ⇒MLE is theℓ2 estimate: θ = arg min ‖Aθ − X‖2;

p(z) = (1/(2a)) exp(−|z|/a) ⇒ℓ1 estimate:θ = arg minθ ‖Aθ − X‖1

Nonlinear Programming Models – p. 26

p(z) = (1/a) exp(−z/a)1z≥0 (negative exponential)⇒theestimate can be found solving the LP problem:

min 1T (X − Aθ)

Aθ ≤ X

p uniform on [−a, a] ⇒the MLE is any θ such that‖Aθ − X‖∞ ≤ a

Nonlinear Programming Models – p. 27

Ellipsoids

An ellipsoid is a subset of Rn of the form

E = x ∈ Rn : (x − x0)

T P−1(x − x0) ≤ 1

where x0 ∈ Rn is the center of the ellipsoid and P is a

symmetric positive-definite matrix.Alternative representations:

E = x ∈ Rn : ‖Ax − b‖2 ≤ 1

where A ≻ 0, or

E = x ∈ Rn : x = x0 + Au | ‖u‖2 ≤ 1

where A is square and non singular (affine transformation of theunit ball)

Nonlinear Programming Models – p. 28

Page 8: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Robust Least Squares

Least Squares: x = arg min√

i(aTi x − bi)2 Hp: ai not known,

but it is known that

ai ∈ Ei = ai + Piu : ‖u‖ ≤ 1

where Pi = P Ti < 0. Definition: worst case residuals:

maxai∈Ei

i

(aTi x − bi)2

A robust estimate of x is the solution of

xr = arg minx

maxai∈Ei

i

(aTi x − bi)2

Nonlinear Programming Models – p. 29

RLS

It holds:|α + βT y| ≤ |α| + ‖β‖‖y‖

then, choosing y⋆ = β/‖β‖ if α ≥ 0 and y⋆ = −β/‖β‖, otherwiseif α < 0, then ‖y‖ = 1 and

|α + βT y⋆| = |α + βT β/‖β‖sign(α)|= |α| + ‖β‖

then:

maxai∈Ei

|(aTi x − bi)| = max

‖u‖≤1|aT

i x − bi + uT Pix|

= |aTi x − bi| + ‖Pix‖

Nonlinear Programming Models – p. 30

. . .

Thus the Robust Least Squares problem reduces to

min

(

i

(|aTi x − bi| + ‖Pix‖)2

)1/2

(a convex optimization problem).Transformation:

minx,t

‖t‖2

|aTi x − bi| + ‖Pix‖ ≤ ti ∀ i i.e.

Nonlinear Programming Models – p. 31

. . .

minx,t

‖t‖2

aTi x − bi + ‖Pix‖ ≤ ti

−aTi x + bi + ‖Pix‖ ≤ ti

(Second Order Cone Problem). A norm cone is a convex set

C = (x, t) ∈ Rn+1 : ‖x‖ ≤ t

Nonlinear Programming Models – p. 32

Page 9: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Geometrical Problems

Nonlinear Programming Models – p. 33

Geometrical Problems

projections and distances

polyhedral intersection

extremal volume ellipsoids

classification problems

Nonlinear Programming Models – p. 34

Projection on a set

Given a set C the projection of x on C is defined as:

PC(x) = arg minz∈C

‖z − x‖

bc

bc

bc

Nonlinear Programming Models – p. 35

Projection on a convex set

IfC = x : Ax = b, fi(x) ≤ 0, i = 1,m

where fi: convex ⇒C is a convex set and the problem

PC(x) = arg min ‖x − z‖Az = b

fi(z) ≤ 0 i = 1,m

is convex.

Nonlinear Programming Models – p. 36

Page 10: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Distance between convex sets

dist(C(1), C(2)) = minx∈C(1),y∈C(2)

‖x − y‖

Nonlinear Programming Models – p. 37

Distance between convex sets

If C(j) = x : A(j)x = b(j), f(j)i ≤ 0 then the minimum distance

can be found through a convex model:

min ‖x(1) − x(2)‖A(1)x(1) = b(1)

A(2)x(2) = b(2)

f(1)i x(1) ≤ 0

f(2)i x(2) ≤ 0

Nonlinear Programming Models – p. 38

Polyhedral intersection

1: polyhedra described by means of linear inequalities:

P1 = x : Ax ≤ b,P2 = x : Cx ≤ d

Nonlinear Programming Models – p. 39

Polyhedral intersection

P1

⋂P2 = ∅? It is a linear feasibility problem: Ax ≤ b, Cx ≤ d

P1 ⊆ P2? Just check

supcTk x : Ax ≤ b ≤ dk ∀ k

(solution of a finite number of LP’s)

Nonlinear Programming Models – p. 40

Page 11: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Polyhedral intersection (2)

2: polyhedra (polytopes) described through vertices:

P1 = convv1, . . . , vk,P2 = convw1, . . . , wh

P1

⋂P2 = ∅? Need to find λ1, λk, µ1, µh ≥ 0:∑

i

λi = 1∑

j

µj = 1

i

λivi =∑

j

µjwj

P1 ⊆ P2? ∀ i = 1, . . . , k check whether ∃µj ≥ 0:

j

µj = 1

j

µjwj = viNonlinear Programming Models – p. 41

Minimal ellipsoid containing k points

Given v1, . . . , vk ∈ Rn find an ellipsoid

E = x : ‖Ax − b‖ ≤ 1

with minimal volume containing the k given points.

*

*

*

*

*

*

**

* *

*

*

*

*

*

*

*

**

Nonlinear Programming Models – p. 42

A = AT ≻ 0. Volume of E is proportional to det A−1 ⇒convexoptimization problem (in the unknowns: A, b):

min log det A−1

A = AT

A ≻ 0

‖Avi − b‖ ≤ 1 i = 1, k

Nonlinear Programming Models – p. 43

Max. ellipsoid contained in a polyhedron

Given P = x : Ax ≤ b find an ellipsoid:

E = By + d : ‖y‖ ≤ 1

contained in P with maximum volume.

Nonlinear Programming Models – p. 44

Page 12: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Max. ellipsoid contained in a polyhedron

E ⊆ P ⇔ aTi (By + d) ≤ bi ∀ y : ‖y‖ ≤ 1

⇔ sup‖y‖≤1

aTi By + aT

i d ≤ bi ∀ i

⇔ ‖Bai‖ + aTi d ≤ bi

maxB,d

log det B

B = BT ≻ 0

‖Bai‖ + aTi d ≤ bi i = 1, . . .

Nonlinear Programming Models – p. 45

Difficult variants

These problems are hard:

find a maximal volume ellipsoid contained in a polyhedrongiven by its vertices

*

*

**

*

*

*

*

*

*

*

*

*

*

**

*

*

Nonlinear Programming Models – p. 46

find a minimal volume ellipsoid containing a polyhedrondescribed as a system of linear inequalities.

Nonlinear Programming Models – p. 47

It is already a difficult problem to show whether a given ellipsoidE contains a polyhedron P = Ax ≤ b.This problem is still difficult even when the ellipsoid is a sphere:this problem is equivalent to norm maximization in a polyhedron– it is an NP–hard concave optimization problem.

Nonlinear Programming Models – p. 48

Page 13: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Linear classification (separation)

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

Nonlinear Programming Models – p. 49

Given two point sets X1, . . . , Xk, Y1, . . . , Yh find an hyperplaneaT x = t such that:

aT Xi ≥ 1 i = 1, k

aT Yj ≤ 1 j = 1, h

(LP feasibility problem).

Nonlinear Programming Models – p. 50

Robust separation

bc

bc

bc

bc

bcbc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

bc

Nonlinear Programming Models – p. 51

Robust separation

Find a “maximal” separation:

maxa:‖a‖≤1

(

mini

aT Xi − maxj

aT Yj

)

equivalent to the convex problem:

max t1 − t2

aT Xi ≥ t1 ∀ i

aT Yj ≤ t2 ∀ j

‖a‖ ≤ 1

Nonlinear Programming Models – p. 52

Page 14: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Optimality ConditionsFabio Schoen

2008

http://gol.dsi.unifi.it/users/schoen

Optimality Conditions – p. 1

Optimality Conditions: descent directions

Let S ⊆ Rn be a convex set and consider the problem

minx∈S

f(x)

where f : S → R. Let x1, x2 ∈ S and d = x2 − x1. d is a feasibledirection.If there exists ǫ > 0 such that f(x1 + ǫd) < f(x1) ∀ ǫ ∈ (0, ǫ), d iscalled a descent direction at x1.Elementary necessary optimality condition: if x⋆ is a localoptimum, no descent direction may exist at x⋆

Optimality Conditions – p. 2

Optimality Conditions for Convex Sets

If x⋆ ∈ S is a local optimum for f() and there exists aneighborhood U(x⋆) such that f ∈ C1(U(x⋆)), then

dT∇f(x⋆) ≥ 0 ∀ d : feasible direction

Optimality Conditions – p. 3 Optimality Conditions – p. 4

Page 15: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

proof

Taylor expansion:

f(x⋆ + ǫd) = f(x⋆) + ǫdT∇f(x⋆) + o(ǫ)

d cannot be a descent direction, so, if ǫ is sufficiently small, thenf(x⋆ + ǫd) ≥ f(x⋆). Thus

ǫdT∇f(x⋆) + o(ǫ) ≥ 0

and dividing by ǫ,

dT∇f(x⋆) +o(ǫ)

ǫ≥ 0

Letting ǫ ↓ 0 the proof is complete.

Optimality Conditions – p. 5

Optimality Conditions: tangent cone

General case:

min f(x)

gi(x) ≤ 0 i = 1, . . . ,m

x ∈ X (X : open set)

Let S = x ∈ X : gi(x) ≤ 0, i = 1, . . . ,m.Tangent cone to S in x: T (x) = d ∈ R

n:

d

‖d‖ = limxk→x

xk − x

‖xk − x‖

where xk ∈ S.

Optimality Conditions – p. 6

b

bc

bc

bc

bc

bc

bc

Optimality Conditions – p. 7

Some examples

S = Rn ⇒T (x) = R

n ∀x

S = Ax = b ⇒

T (x) = d : Ad = 0

S = Ax ≤ b; let I be the set of active constraints in x:

aTi x = bi i ∈ I

aTi x < bi i 6∈ I.

Optimality Conditions – p. 8

Page 16: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Optimality Conditions – p. 9

Let d = limk(xk − x)/‖(xk − x)‖ ⇒

aTi d = aT

i limk

(xk − x)/‖(xk − x)‖ i ∈ I

= limk

aTi (xk − x)/‖(xk − x)‖

= limk

(aTi xk − b)/‖(xk − x)‖

≤ 0

Thus if d ∈ T (x) ⇒aTi d ≤ 0 for i ∈ I.

Optimality Conditions – p. 10

Viceversa, let xk = x + αkd. If aTi d ≤ 0 for i ∈ I ⇒

aTi xk = aT

i (x + αkd) i ∈ I= bi + αka

Ti d

≤ bi

aTi xk = aT

i (x + αkd) i 6∈ I< bi + αka

Ti d

≤ bi if αk small enough

Thus

T (x) = d : aTi d ≤ 0∀ i ∈ I

Optimality Conditions – p. 11

Example

Let S = (x, y) ∈ R2 : x2 − y = 0 (parabola).

Tangent cone at (0, 0)? Let (xk, yk) → (0, 0), i.e.xk → 0, yk = x2

k:

‖(xk, yk) − (0, 0)‖ =√

x2k + (xk)4

= |xk|√

1 + x2k

and

limxk→0+

xk

|xk|√

1 + x2k

= 1 limxk→0+

yk

|xk|√

1 + x2k

= 0

limxk→0−

xk

|xk|√

1 + x2k

= −1 limxk→0−

yk

|xk|√

1 + x2k

= 0

thus T (0, 0) = (−1, 0), (1, 0)Optimality Conditions – p. 12

Page 17: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Descent direction

d ∈ Rn is a feasible direction in x ∈ S if ∃ α > 0 :

x + αd ∈ S ∀α ∈ [0, α).

d feasible ⇒d ∈ T (x), but in general the converse is false.If

f(x + αd) ≤ f(x) ∀α ∈ (0, α)

d is a descent direction

Optimality Conditions – p. 13

I order necessary opt condition

Let x ∈ S ⊆ Rn be a local optimum for minx∈S f(x); let

f ∈ C1(U(x)). Then

dT∇f(x) ≥ 0 ∀ d ∈ T (x)

Proofd = limk(xk − x)/‖(xk − x)‖. Taylor expansion:

f(xk) = f(x) + ∇T f(x)(xk − x) + o(‖xk − x‖)= f(x) + ∇T f(x)(xk − x) + ‖xk − x‖o(1).

x local optimum ⇒∃U(x) : f(x) ≥ f(x) ∀x ∈ U ∩ S.

Optimality Conditions – p. 14

. . .

If k is large enough, xk ∈ U(x):

f(xk) − f(x) ≥ 0

thus∇T f(x)(xk − x) + ‖xk − x‖o(1) ≥ 0

Dividing by ‖(xk − x)‖ :

∇T f(x)(xk − x)/‖(xk − x)‖ + o(1) ≥ 0

and in the limit∇T f(x)d ≥ 0.

Optimality Conditions – p. 15

Examples

Unconstrained problemsEvery d ∈ R

n belongs to the tangent cone ⇒at a local optimum

∇T f(x)d ≥ 0 ∀ d ∈ Rn

Choosing d = ei e d = −ei we get

∇f(x) = 0

NB: the same is true if x is a local minimum in the relativeinterior of the feasible region.

Optimality Conditions – p. 16

Page 18: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Linear equality constraints

min f(x)

Ax = b

Tangent cone: d : Ad = 0. Necessary conditions:

∇T f(x)d ≥ 0 ∀ d : Ad = 0

equivalent statement:

mind

∇T f(x)d = 0

Ad = 0

(a linear program).

Optimality Conditions – p. 17

Linear equality constraints

From LP duality ⇒

max 0T λ = 0

AT λ = ∇f(x)

Thus at a local minimum point there exist Lagrange multipliers:

∃λ : AT λ = ∇f(x)

Optimality Conditions – p. 18

Linear inequalities

min f(x)

Ax ≤ b

Tangent cone at a local minimum x:d ∈ R

n : aTi d ≤ 0 ∀ i ∈ I(x). Let AI be the rows of A

associated to active constraints at x. Then

mind

∇T f(x)d = 0

AId ≤ 0

λ ≤ 0

Optimality Conditions – p. 19

Linear inequalities

From LP duality:

max 0T λ = 0

ATIλ = ∇f(x)

λ ≤ 0

Thus, at a local optimum, the gradient is a non positive linearcombination of the coefficients of active constraints.

Optimality Conditions – p. 20

Page 19: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Farkas’ Lemma

Let A: matrix in Rm×n and b ∈ R

n. One and only one of thefollowing sets:

AT y ≤ 0

bT y > 0

and

Ax = b

x ≥ 0

is non empty

Optimality Conditions – p. 21

Geometrical interpretation

AT y ≤ 0 Ax = b

bT y > 0 x ≥ 0

a1

a2

b

z : ∃x : z = Ax, x ≥ 0

y : AT y ≤ 0

Optimality Conditions – p. 22

Proof

1) if ∃x ≥ 0 : Ax = b ⇒bT y = xT AT y. Thus if AT y ≤ 0 ⇒bT y ≤ 0.2) Premise: Separating hyperplane theorem: let C and D betwo convex nonempty sets: C ∪ D = ∅. Then there exists a 6= 0and b:

aT x ≤ b x ∈ C

aT x ≥ b x ∈ D

If C is a point and D is a closed convex set, separation is strict,i.e.

aT C < b

aT x > b x ∈ D

Optimality Conditions – p. 23

Farkas’ Lemma (proof)

2) let x : Ax = b, x ≥ 0 = ∅. Let

S = y ∈ Rm : ∃x ≥ 0, Ax = y

S is closed, convex and b 6∈ S. From the separating hyperplanetheorem: ∃α ∈ R

m 6= 0, β ∈ R:

αT y ≤ β ∀x ∈ S

αT b > β

0 ∈ S ⇒β ≥ 0 ⇒αT b > 0; αT Ax ≤ β for all x ≥ 0. This ispossible iff αT A ≤ 0.Letting y = α we obtain a solution of

AY y ≤ 0 bT y > 0

Optimality Conditions – p. 24

Page 20: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

First order feasible variations cone

G(x) = d ∈ Rn : ∇T gi(x)d ≤ 0 i ∈ I

b

b

Optimality Conditions – p. 25

First order variations

G(x) ⊇ T (x).In fact if xk is feasible and

d = limk

xk − x

‖xk − x‖

then gi(x) ≤ 0 and

g(x + limk

(xk − x)) ≤ 0

Optimality Conditions – p. 26

. . .

g(x + limk

‖xk − x‖ xk − x

‖xk − x‖) ≤ 0

g(x + limk

‖xk − x‖ limxk − x

‖xk − x‖) ≤ 0

g(x + limk

‖xk − x‖d) ≤ 0

Let αk = ‖xk − x‖, if αk ≈ 0:

g(x + αkd) ≤ 0

Optimality Conditions – p. 27

gi(x + αkd) = gi(x) + αk∇T gi(x)d + o(αk)

where αk > 0 and d belong to the tangent cone T (x). If the i–thconstraint is active, then

gi(x + αkd) = αk∇T gi(x)d + o(αk) ≤ 0

gi(x + αkd)/αk = ∇T gi(x)d + o(αk))/αk ≤ 0

Letting αk → 0 the result is obtained.

Optimality Conditions – p. 28

Page 21: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

example

G(x) 6= T (x);

−x3 + y ≤ 0

−y ≤ 0

Optimality Conditions – p. 29

KKT necessary conditions

(Karush–Kuhn–Tucker)Let x ∈ X ⊆ R

n, X 6= ∅ be a local optimum for

min f(x)

gi(x) ≤ 0 i = 1, . . . ,m

x ∈ X

I: indices of active constraints at x. If:

1. f(x), gi(x) ∈ C1(x) for i ∈ I2. “constraint qualifications” conditions: T (x) = G(x) hold in x ;

then there exist Lagrange multipliers λi ≥ 0, i ∈ I:

∇f(x) +∑

i∈I

λi∇gi(x) = 0.

Optimality Conditions – p. 30

Proof

x local optimum ⇒if d ∈ T (x) ⇒dT∇f(x) ≥ 0. But d ∈ T (x) ⇒

dT∇gi(x) ≤ 0 i ∈ I.

Thus it is impossible that

−∇T f(x)d > 0

∇T gi(x)d ≤ 0 i ∈ I

From Farkas’ Lemma ⇒there exists a solution of:∑

i∈I

λi∇T gi(x) = −∇T f(x) i ∈ I

λi ≥ 0 i ∈ I

Optimality Conditions – p. 31

Constraint qualifications: examples

polyhedra: X = Rn and gi(x) are affine functions: Ax ≤ b

linear independence: X open set, gi(x), i 6∈ I continuous in x and∇gi(x), i ∈ I are linearly independent.

Slater condition: X open set, gi(x), i ∈ I convex differentiablefunctions in x, gi(x), i 6∈ I continuous in x, and ∃ x ∈ Xstrictly feasible:

gi(x) < 0 i ∈ I.

Optimality Conditions – p. 32

Page 22: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convex problems

An optimization problem

minx∈S

f(x)

is a convex problem if

S is a convex set, i.e.

x, y ∈ S⇒λx + (1 − λ)y ∈ S

∀λ ∈ [0, 1]

f is a convex function on S, i.e.

f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y)

∀λ ∈ [0, 1] and x, y ∈ S

Optimality Conditions – p. 33

Standard convex problem

min f(x)

gi(x) ≤ 0 i = 1,m

hj(x) = 0 j = 1, k

if

f is convex

gi are convex

hj are affine (i.e. of the form αT x + β)

then the problem is convex.

Optimality Conditions – p. 34

Convex problems

Every local optimum is a global one.Proof: x: local optimum for minS f(x)x⋆: global optimum.S convex ⇒λx⋆ + (1 − λ)x ∈ S. Thus if λ ≈ 0 ⇒

f(x) ≤ f(λx⋆ + (1 − λ)x

≤ λf(x⋆) + (1 − λ)f(x)

⇒f(x) ≤ f(x⋆)

and x is also a global optimum.

Optimality Conditions – p. 35

Sufficiency of 1st order conditions

(for a convex differentiable problem: if dT∇f(x) ∀ d ∈ T (x), thenx is a (global) optimumProof:

f(y) ≥ f(x) + (y − x)T∇f(x) ∀ y ∈ S

But y − x ∈ T (x) ⇒

f(y) ≥ f(x) + dT∇f(x) ∀ y ∈ S

≥ f(x)

thus x is a global minimum.

Optimality Conditions – p. 36

Page 23: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convexity of the set of global optima

(for convex problems)The set of global minima of a convex problem is a convex set. Infact, let x and y be global minima for the convex problem

minx∈S

f(x)

Then, choosing λ ∈ [0, 1] we have λx + (1 − λ)y ∈ S, as S isconvex. Moreover

f(λx + (1 − λ)y) ≤ λf(x) + (1 − λ)f(y)

λf ⋆ + (1 − λ)f ⋆ = f ⋆

where f ⋆ is the global minimum value. Thus the equality holdsand the proof is complete.

Optimality Conditions – p. 37

KKT for equality constraints

x: local optimum for

min f(x)

gi(x) ≤ 0 i = 1, . . . ,m

hj(x) = 0 j = 1, . . . , k

x ∈ X ⊆ Rn

Let I: set of active inequalities in x. If f(x),gi(x), i ∈ I, hj(x) ∈ C1 and “constraint qualifications” hold in x,⇒∃λi ≥ 0∀ i ∈ I e µj ∈ R,∀ j = 1, . . . , h:

∇f(x) +∑

i∈I

λi∇gi(x) +h∑

j=1

µj∇hj(x) = 0

Optimality Conditions – p. 38

Complementarity

KKT equivalent formulation:

∇f(x) +m∑

i=1

λi∇gi(x) +h∑

j=1

µj∇hj(x) = 0

λigi(x) = 0 i = 1, . . . ,m

Condition λigi(x) = 0 is called complementarity condition

Optimality Conditions – p. 39

II order necessary conditions

If f, g1, hj ∈ C2 in x and the gradients of active constraints in xare linearly independent, then there exist mutlipliersλi ≥ 0, i ∈ I and µj, j = 1, . . . , k such that

∇f(x) +∑

i∈I

λi∇gi(x) +k∑

j=1

µj∇hj(x) = 0

and

dT∇2L(x)d ≥ 0

for every direction d: dT∇gi(x) ≤ 0, dT∇hj(x) = 0 where

∇2L(x) := ∇2f(x) +∑

i∈I

λi∇2gi(x) +k∑

j=1

µj∇2hj(x)

Optimality Conditions – p. 40

Page 24: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Sufficient conditions

Let f, gi, hj twice continuously differentiable. Let x⋆, λ⋆, µ⋆:

∇f(x⋆) +∑

i∈I

λ⋆i∇gi(x

⋆) +k∑

j=1

µ⋆j∇hj(x

⋆) = 0

λ⋆i gi(x

⋆) = 0

λ⋆i ≥ 0

dT∇2L(x⋆)d > 0 ∀ d :dT∇hj(x⋆) = 0

dT∇gi(x⋆) = 0, i ∈ I

then x⋆ is a local minimum.

Optimality Conditions – p. 41

Lagrange Duality

Problem:

f ⋆ = min f(x)

gi(x) ≤ 0

x ∈ X

definition: Lagrange Function:

L(x; λ) = f(x) +∑

i

λigi(x) λ ≥ 0, x ∈ X

Optimality Conditions – p. 42

Relaxation

Given an optimization problem

minx∈S

f(x)

a relaxation is a problem

minx∈Q

g(x)

where

S ⊆ Q

g(x) ≤ f(x) ∀x ∈ S.

Weak Duality : The optimal value of a relaxation is a lowerbound on the optimum value of the problem.

Optimality Conditions – p. 43

Lagrange minimization is a relaxation

Proof:

Feasible set of the Lagrange problem: X (contains theoriginal one)

If g(x) ≤ 0 and λ ≥ 0 ⇒

L(x, λ) = f(x) + λT g(x)

≤ f(x)

Optimality Conditions – p. 44

Page 25: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Dual Lagrange function

with respect to constraints g(x) ≤ 0:

θ(λ) = infx∈X

L(x, λ)

= infx∈X

(f(x) + λT g(x))

For every choice of λ ≥ 0, θ(λ) is a lower bound for everyfeasible solution and in particular, is a lower bound for theglobal minimum value of the problem.

Optimality Conditions – p. 45

Example (circle packing)

min−r

4r2 − (xi − xj)2 − (yi − yj)

2 ≤ 0 1 ≤ i < j ≤ N

xi, yi ≤ 1 i = 1, . . . , N

−xi,−yi ≤ 0 i = 1, . . . , N

Optimality Conditions – p. 46

When N = 2, relaxing the first constraint:

θ(λ) = minx,y,r

−r + λ(4r2 − (x1 − x2)2 − (y1 − y2)

2)

x1, x2, y1, y2 ≥ 0

x1, x2, y1, y2 ≤ 1

Optimality Conditions – p. 47

solution

Minimizing with respect to x, y ⇒|x1 − x2| = |y1 − y2| = 1 fromwhich

θ(λ) = minr

−r + 4λr2 − 2λ

r =1

θ(λ) = −2λ − 1

16λ

This is a lower bound on the optimum value. Best possiblelower bound:

θ⋆ = maxλ

θ(λ)

λ⋆ =1

4√

2θ⋆ = −

√2

2Optimality Conditions – p. 48

Page 26: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Choosing (x1, y1) = (0, 0) and (x2, y2) = (1, 1) a feasible solutionwith r =

√2/2 is obtained.

The Lagrange dual gives a lower bound equal to −√

2/2: sameas the objective function at a feasible solution ⇒optimalsolution!(an exception, not the rule!)

Optimality Conditions – p. 49

Lagrange Dual

θ⋆ = max θ(λ)

λ ≥ 0

This problem might:

1. be unbounded

2. have a finite sup but non max

3. have a unique maximum attained in correspondence with asingle solution x

4. have many different maxima, each connected with adifferent solution x

Optimality Conditions – p. 50

Equality constraints

f ⋆ = min f(x)

gi(x) ≤ 0 i = 1, . . . ,m

hj(x) = 0 j = 1, . . . , k

x ∈ X

Lagrange function:

L(x; λ, µ) = f(x) + λT g(x) + µT h(x)

where λ ≥ 0, but µ is free.

Optimality Conditions – p. 51

Linear Programming

min cT x

Ax ≤ b

Dual Lagrange function:

θ(λ) = minx

cT x + λT (Ax − b)

= −λT b + minx

(cT + λT A)x.

but:

minx

(cT + λT A)x =

0 if cT + λT A = 0

−∞ otherwise.

Optimality Conditions – p. 52

Page 27: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

Lagrange dual function:

θ(λ) =

−λT b if cT + λT A = 0

−∞ otherwise.

Lagrange dual:

max−λT b

λT A + cT = 0

λ ≥ 0

which is equivalent to:

max λT b

λT A = cT

λ ≤ 0Optimality Conditions – p. 53

Quadratic Programming (QP)

min1

2xT Qx + cT x

Ax = b

(Q: symmetric).Lagrange dual function:

θ(λ) = minx

1

2xT Qx + cT x + λT (Ax − b)

= −λT b + minx

1

2xT Qx + (cT + λT A)x

Optimality Conditions – p. 54

QP – Case 1

Q has at least one negative eigenvalue ⇒

minx

1

2xT Qx + (cT + λT A)x = −∞

In fact ∃ d : dT Qd < 0.Choosing x = αd with α > 0 ⇒

1

2xT Qx + (cT + λT A)x =

1

2α2dT Qd + α(cT + λT A)d

and for large values of α this can be made as small as desired.

Optimality Conditions – p. 55

QP – Case 2

Q positive definite ⇒minimum point of the dual Lagrangefunction:

Qx + (c + AT λ) = 0

i.e.

x = −Q−1(c + AT λ)

Optimality Conditions – p. 56

Page 28: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

Lagrange function value:

θ(λ) = −λT b +1

2xT Qx + (cT + λT A)x

= −λT b +1

2(c + AT λ)T Q−1QQ−1(c + AT λ)

− (cT + λT A)Q−1(c + AT λ)

= −λT b +1

2(c + AT λ)T Q−1(c + AT λ)

− (cT + λT A)Q−1(c + AT λ)

= −λT b − 1

2(c + AT λ)T Q−1(c + AT λ)

Optimality Conditions – p. 57

. . .

Lagrange dual (seen as a min problem):

minλ

λT b +1

2(c + AT λ)T Q−1(c + AT λ)

Optimality conditions:

b + AQ−1(c + AT λ) = 0

But recalling that x = −Q−1(c + AT λ) ⇒

b − Ax = 0 feasibility of x

⇒if we find optimal multipliers λ (a linear system) ⇒we get the optimalsolution x (thanks to feasibility and weak duality)!

Optimality Conditions – p. 58

Properties of the Lagrange dual

For any problem

f ⋆ = min f(x)

gi(x) ≤ 0 i = 1, . . . ,m

x ∈ X

where X is non empty and compact, if f and gi are continuousthen the Lagrange dual function is concave

Optimality Conditions – p. 59

Dim.

From Weierstrass theorem

θ(λ) = minx∈X

f(x) + λT g(x)

exists and is finite

θ(ηa + (1 − η)b) = minx∈X

(f(x) + (ηa + (1 − η)b)T g(x))

= minx∈X

(η(f(x) + aT g(x)) + (1 − η)(f(x) + bT g(x)))

≥ η minx∈X

(f(x) + aT g(x)) + (1 − η) minx∈X

(f(x) + bT g(x))

= ηθ(a) + (1 − η)θ(b).

Optimality Conditions – p. 60

Page 29: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Solution of the Lagrange dual

maxλ

θ(λ) = maxλ

minx∈X

(f(x) + λT g(x))

is equivalent to

max z

z ≤ f(x) + λT g(x) ∀x ∈ X

λ ≥ 0

After having computed f and g in x1, x2, . . . , xk a restricted dualcan be defined:

max z

z ≤ f(xj) + λT g(xj) ∀ j = 1, . . . , k

λ ≥ 0 Optimality Conditions – p. 61

. . .

Let λ be the optimal solution of the restricted dual. Is it anoptimal dual solution? Is it true that z ≤ f(x) + λT g(x)? Check:we look for x, optimal solution of

minx∈X

f(x) + λT g(x)

if f(x) + λT g(x) ≥ z then we have found the optimal solutionof the dual;

otherwise the pair x, f(x) is added to the restricted dual anda new solution is computed.

Optimality Conditions – p. 62

Geometric programming

Unconstrained Geometric program:

minx>0

m∑

k=1

ck

n∏

j=1

xαkj

j αkj ∈ R, ck > 0

(non convex). Variable substitution:

xj = exp(yj) yj ∈ R

Optimality Conditions – p. 63

Transformed problem:

miny

m∑

k=1

(

ck

n∏

j=1

eαkjyj

)

=

miny

m∑

k=1

eαTk

y+βk βk = log ck

still non convex, but its logarithm is convex.

Optimality Conditions – p. 64

Page 30: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Duality example

Dual of

min f(x) min logm∑

k=1

exp(αTk x + βk)

No constraints ⇒dual lagrange function is identical to f(x)!Strong duality holds, but is useless.Simple transformation:

min logm∑

k=1

exp yk

yk = αTk x + βk

Optimality Conditions – p. 65

solving the dual

Dual function

L(λ) = minx,y

logm∑

k=1

exp yk + λT (Ax + β − y)

Minimization in x is unconstrained: min λT Ax ⇒if λT A 6= 0 L(λ) is unbounded

if λT A = 0 then

L(λ) = miny

logm∑

k=1

exp yk + λT (β − y)

Optimality Conditions – p. 66

First order (unconstrained) optimality conditions w.r.t. yi:

exp yi∑

k exp yk

− λi = 0

⇒Lagrange multipliers exist provided that∑

i

λi = 1 λi > 0∀ i

Optimality Conditions – p. 67

Substituting λj = exp yj/∑

k exp yk,

L(λ) = log∑

j

exp yj −∑

j

λjyj

= log∑

j

exp yj −∑

j

yj exp yj/∑

k

exp yk

=1

k exp yk

(∑

k

exp yk(log∑

j

exp yj − yk))

=∑

k

(

exp yk∑

j exp yj

(log∑

j

exp yj − yk)

)

= −∑

k

λk log λk

Optimality Conditions – p. 68

Page 31: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Lagrange Dual

The Lagrange Dual becomes:

maxλ

βT λ −∑

k

λk log λk

k

λk = 1

AT λ = 0

λ ≥ 0

Optimality Conditions – p. 69

Special cases: linear constraints

min f(x)

Ax ≥ b

Lagrange function:

L(x, λ) = f(x) + λT (b − Ax)

Constraint qualifications always hold (polyhedron). If x⋆ is alocal optimum there exists λ⋆ ≥ 0:

Ax⋆ ≥ b

∇f(x⋆) = AT λ⋆

λ⋆T (b − Ax⋆) = 0

Optimality Conditions – p. 70

Non negativity constraints

min f(x)

x ≥ 0

Lagrange function: L(x, λ) = f(x) − λT x. KKT conditions:

∇f(x⋆) = λ⋆

x⋆ ≥ 0

λ⋆ ≥ 0

(λ⋆)T x⋆ = 0

Optimality Conditions – p. 71

λ⋆j =

∂f(x⋆)

∂xj

j = 1, n

from which

∂f(x⋆)

∂xj

= 0 ∀ j : x⋆j > 0

∂f(x⋆)

∂xj

≥ 0 otherwise

Optimality Conditions – p. 72

Page 32: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Box constraints

min f(x)

ℓ ≤ x ≤ u ℓi < ui∀ i

Lagrange function: L(x, λ, µ) = f(x) + λT (ℓ − x) + µT (x − u).KKT conditions:

∇f(x⋆) = λ⋆ − µ⋆

(ℓ − x⋆)T λ⋆ = 0

(x⋆ − u)T µ = 0

(λ⋆, µ⋆) ≥ 0

Given x⋆ letJℓ = j : x⋆

j = ℓj, Ju = j : x⋆j = uj, J0 = j : ℓj < x⋆

j < ujOptimality Conditions – p. 73

Box constr. (cont)

Then, from complementarity,

∂f(x⋆)

∂xj

= λ⋆j j ∈ Jℓ

∂f(x⋆)

∂xj

= −µ⋆j j ∈ Ju

∂f(x⋆)

∂xj

= 0 j ∈ J0

Optimality Conditions – p. 74

Thus

∂f(x⋆)

∂xj

≥ 0 j ∈ Jℓ

∂f(x⋆)

∂xj

≤ 0 j ∈ Ju

∂f(x⋆)

∂xj

= 0 j ∈ J0

with feasibility ℓ ≤ x⋆ ≤ u

Optimality Conditions – p. 75

Optimization over the simplex

min f(x)

1T x = 1

x ≥ 0

Lagrange function: L(x, λ, µ) = f(x) − λT x + µT (1T x − 1). KKT:

∇f(x⋆) = λ⋆ − µ⋆1

1T x⋆ = 1

(x⋆, λ⋆) ≥ 0

(λ⋆)T x⋆ = 0

Optimality Conditions – p. 76

Page 33: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

simplex. . .

∂f(x⋆)

∂xj

− λ⋆j = −µ⋆

(all equal). Thus, from complementarity, if x⋆j > 0 then λ⋆

j = 0

and ∂f(x⋆)∂xj

= −µ⋆; otherwise ∂f(x⋆)∂xj

≥ −µ⋆. Thus, if j : x⋆j > 0,

∂f(x⋆)

∂xj

≤ ∂f(x⋆)

∂xk

∀ k

Optimality Conditions – p. 77

Application: Min var portfolio

Given n assets with random returns R1, . . . , Rn, how to invest 1e in such a way that the resulting portfolio has minimumvariance? If xj denotes the percentage of the investment onasset j, how to compute the variance of this portfolio P (x)?

Var = E(P (x) − (E(P (x))))2

= E

(

n∑

j=1

(Rj − E(Rj))xj

)2

=∑

i,j

(Ri − E(Ri))(Rj − E(Rj))xixj

= xT Qx

where Q is the variance-covariance matrix of the n assets.

Optimality Conditions – p. 78

Min var portfolio

Problem (objective multiplied by 1/2 for simpler computations):

min(1/2)xT Qx

1T x = 1

x ≥ 0

Optimality Conditions – p. 79

Optimal portfolio

KKT: for all j : x⋆j > 0:

j

Qijxj ≤∑

j

Qkjxj ∀ k

Vector Qx might be thaught as the vector of marginalcontributions to the total risk (which is a weighted sum ofelements of Qx). Thus in the optimal portfolio, all assets withpositive level give equal (and minimal) contribution to the totalrisk.

Optimality Conditions – p. 80

Page 34: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Algorithms for unconstrained localoptimization

Fabio Schoen

2008

http://gol.dsi.unifi.it/users/schoen

Algorithms for unconstrained local optimization – p. 1

Optimization Algorithms

Most common form for optimization algorithms:Line search-based methods:Given a starting point x0 a sequence is generated:

xk+1 = xk + αkdk

where dk ∈ Rn: search direction, αk > 0: step

Usually first dk is chosen and than the step is obtained, oftenfrom a 1–dimensional optimization

Algorithms for unconstrained local optimization – p. 2

Trust-region algorithms

A model m(x) and a confidence region U(xk) containing xk aredefined. The new iterate is chosen as the solution of theconstrained optimization problem

minx∈U(xk)

m(x)

The model and the confidence region are possibly updated ateach iteration.

Algorithms for unconstrained local optimization – p. 3

Speed measures

Let x⋆: local optimum. The error in xk might be measured e.g.as

e(xk) = ‖xk − x⋆‖ or

e(xk) = |f(xk) − f(x⋆)|.

Given xk → x⋆ if ∃ q > 0, β ∈ (0, 1) : (for k large enough):

e(xk) ≤ qβk

⇒xk is linearly convergent, or converges with order 1;β : convergence rateA sufficient condition for linear convergence:

lim supe(xk+1)

e(xk)≤ β

Algorithms for unconstrained local optimization – p. 4

Page 35: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

super–linear convergence

If for every β ∈ (0, 1) exists q:

e(xk) ≤ qβk

then convergence is super–linear.Sufficient condition:

lim supe(xk+1)

e(xk)= 0

Algorithms for unconstrained local optimization – p. 5

Higher order convergence

If, given p > 1, ∃ q > 0, β ∈ (0, 1) :

e(xk) ≤ qβ(pk)

then xk is said to converge with order at least pIf p = 2 ⇒quadratic convergence Sufficient condition:

lim supe(xk+1)

e(xk)p< ∞

Algorithms for unconstrained local optimization – p. 6

Examples

1k

converges to 0 with order one 1 (linear convergence)

Algorithms for unconstrained local optimization – p. 7

Examples

1k

converges to 0 with order one 1 (linear convergence)1k2 converges to 0 with order 1

Algorithms for unconstrained local optimization – p. 7

Page 36: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Examples

1k

converges to 0 with order one 1 (linear convergence)1k2 converges to 0 with order 1

2−k converges to 0 with order 1

Algorithms for unconstrained local optimization – p. 7

Examples

1k

converges to 0 with order one 1 (linear convergence)1k2 converges to 0 with order 1

2−k converges to 0 with order 1

k−k converges to 0 with order 1; convergence issuper–linear

Algorithms for unconstrained local optimization – p. 7

Examples

1k

converges to 0 with order one 1 (linear convergence)1k2 converges to 0 with order 1

2−k converges to 0 with order 1

k−k converges to 0 with order 1; convergence issuper–linear1

22k converges a 0 with order 2 quadratic convergence

Algorithms for unconstrained local optimization – p. 7

Descent directions and the gradient

Let f ∈ C1(Rn), xk ∈ Rn : ∇f(xk) 6= 0

Let d ∈ Rn. If

dT∇f(xk) < 0

then d is a descent directionTaylor expansion:

f(xk + αd) − f(xk) = αdT∇f(xk) + o(α)

f(xk + αd) − f(xk)

α= dT∇f(xk) + o(1)

Thus if α is small enough f(xk + αd) − f(xk) < 0

NB: d might be a descent direction even if dT∇f(xk) = 0

Algorithms for unconstrained local optimization – p. 8

Page 37: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convergence of line search methods

If a sequence xk+1 = xk + αkdk is generated in such a way that:

L0 = x : f(x) ≤ f(x0) is compact

dk 6= 0 whenever ∇f(xk) 6= 0

f(xk+1) ≤ f(xk)

if ∇f(xk) 6= 0 ∀ k then

limk→∞

dTk

‖dk‖∇f(xk) = 0

Algorithms for unconstrained local optimization – p. 9

if dk 6= 0 then

|dTk ∇f(xk)|‖dk‖

≥ σ(‖∇f(xk)‖)

where σ is such that limk→∞ σ(tk) = 0⇒ limk→∞ tk = 0

(σ is called a forcing function)

Algorithms for unconstrained local optimization – p. 10

Then either there exists a finite index k such that ∇f(xk) = 0 orotherwise

xk ∈ L0 and all of its limit points are in L0

f(xk) admits a limit

limk→∞∇f(xk) = 0

for every limit point x of xk we have ∇f(x) = 0

Algorithms for unconstrained local optimization – p. 11

Comments on the assumptions

f(xk+1) ≤ f(xk): most optimization methods choose dk as adescent direction. If dk is a descent direction, choosing αk

“sufficiently small” ensures the validity of the assumption

limk→∞dT

k

‖dk‖∇f(xk) = 0: given a normalized direction dk, the

scalar product dkT∇f(xk) is the directional derivative of falong dk: it is required that this goes to zero. This can beachieved through precise line searches (choosing the stepso that f is minimized along dk)|dT

k∇f(xk)|

‖dk‖≥ σ(‖∇f(xk)‖): letting, e.g., σ(t) = ct, c > 0, if

dk : dTk ∇f(xk) < 0 then the condition becomes

dTk ∇f(xk)

‖dk‖ ‖∇f(xk‖≤ −c

Algorithms for unconstrained local optimization – p. 12

Page 38: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Recalling that

cos θk =dT

k ∇f(xk)

‖dk‖ ‖∇f(xk‖

then the condition becomes

cos θk ≤ −c

that is, the angle between dk and ∇f(xk) is bounded away fromorthogonality.

θkdT

k ∇f(xk)

Algorithms for unconstrained local optimization – p. 13

Gradient Algorithms

General scheme:

xk+1 = xk − αkDk∇f(xk)

with Dk ≻ 0 e αk > 0If ∇f(xk) 6= 0 then

dk = Dk∇f(xk)

is a descent direction. In fact

dTk ∇f(xk) = −∇T f(xk)Dk∇f(xk)

< 0

Algorithms for unconstrained local optimization – p. 14

Steepest Descent

or “gradient” method:

Dk := I

i.e. xk+1 = xk − αk∇f(xk).If ∇f(xk) 6= 0 then dk = −∇f(xk) is a descent direction.Moreover, it is the steepest (w.r.t. the euclidean norm):

mind∈Rn

∇T f(xk)d

‖d‖ ≤ 1

Algorithms for unconstrained local optimization – p. 15

∇f(xk)

Algorithms for unconstrained local optimization – p. 16

Page 39: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

mind∈Rn

∇T f(xk)d

√dT d ≤ 1

KKT conditions: In the interior ⇒∇T f(xk) = 0; if the constraint isactive ⇒

∇f(xk) + λd

‖d‖ = 0

√dT d = 1

λ ≥ 0

⇒d = − ∇f(xk)‖∇f(xk)‖

.

Algorithms for unconstrained local optimization – p. 17

Newton’s method

Dk := −(

∇2f(xk))−1

Motivation: Taylor expansion of f :

f(x) ≈ f(xk) + ∇T f(xk)(x − xk) +1

2(x − xk)

T∇2f(xk)(x − xk)

Minimizing the approximation:

∇f(xk) + ∇2f(xk)(x − xk) = 0

If the hessian is non singular ⇒

x = xk −(

∇2f(xk))−1 ∇f(xk)

Algorithms for unconstrained local optimization – p. 18

Step choice

Given dk, how to choose αk so that xk+1 = xk + αkdk?

“optimal” choice (one-dimensional optimization):

αk = arg minα≥0

f(xk + αdk).

Analytical expression of the optimal step is available only in fewcases. E.g. if f(x) = 1

2xT Qx + cT x with Q ≻ 0. Then

f(xk + αdk) =1

2(xk + αdk)

T Q(xk + αdk) + cT (xk + αdk)

=1

2α2dT

k Qdk + α(Qxk + c)T dk + β

where β does not depend on α.

Algorithms for unconstrained local optimization – p. 19

Minimizing w.r.t. α:

αdTk Qdk + (Qxk + c)T dk = 0 ⇒

α = −(Qxk + c)T dk

dTk Qdk

= − dTk ∇f(xk)

dTk ∇2f(xk)dk

E.g., in steepest descent:

αk =‖∇f(xk)‖2

∇T f(xk)∇2f(xk)∇f(xk)

Algorithms for unconstrained local optimization – p. 20

Page 40: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Approximate step size

Rules for choosing a step-size (from the sufficient condition forconvergence):

f(xk+1) < f(xk)

limk→∞dT

k

‖dk‖∇f(xk) = 0

Often it is also required that

‖xk+1 − xk‖ → 0

dTK∇f(xk + αkdk) → 0

In general it is important to insure a sufficient reduction of f anda sufficiently large step xk+1 − xk

Algorithms for unconstrained local optimization – p. 21

Avoid too large steps

u

u

uu

Algorithms for unconstrained local optimization – p. 22

Avoid too small stepsu

u

u

u

u

Algorithms for unconstrained local optimization – p. 23

Armijo’s rule

Input: δ ∈ (0, 1), γ ∈ (0, 1/2), ∆k > 0

α := ∆k;while (f(xk + αdk) > f(xk) + γαdT

k∇f(xk)) do

α := δα ;end

return α

Typical values : δ ∈ [0.1, 0.5], γ ∈ [10−4, 10−3].On exit the returned step is such that

f(xk + αdk) ≤ f(xk) + γαdT

k ∇f(xk)

Algorithms for unconstrained local optimization – p. 24

Page 41: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

α

acceptable steps

αdTk ∇f(xk)

γαdTk ∇f(xk)

Algorithms for unconstrained local optimization – p. 25

Line search in practice

How to choose the initial step size ∆k?Let φ(α) = f(xk + αdk). A possibility is to choose ∆k = α⋆, theminimizer of a quadratic approximation to φ(·). Example:

q(α) = c0 + c1α +1

2c2α

2

q(0) = c0 := f(xk)

q′(0) = c1 := dTk ∇f(xk)

Then α⋆ = −c1/c2.

Algorithms for unconstrained local optimization – p. 26

Third condition? If an estimate f of the minimum of f(xk + αdk)

is available ⇒choose c2 : min q(α) = f .

min q(α) = q(−c1/c2)

= c0 − c21/c2 := f

c2 = c21/2(f − c0)

α⋆ = −c1/c2

= 2f − c0

c1

Algorithms for unconstrained local optimization – p. 27

Thus it is reasonable to start with

∆k = 2f − f(xk)

dTk ∇f(xk)

A reasonable estimate might be to choose ∆k = 2 (f(xk−1)−f(xk))

dT

k∇f(xk)

Algorithms for unconstrained local optimization – p. 28

Page 42: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convergence of steepest descent

xk+1 = xk − αk∇f(xk)

If a sufficiently accurate step size is used ⇒the condition of thetheorem on global convergence are satisfied ⇒the steepestdescent algorithm globally converges to a stationary point.“Sufficiently accurate” means exact line search or, e.g., Armijo’srule.

Algorithms for unconstrained local optimization – p. 29

Local analysis of steepest descent

Behaviour of the algorithm when minimizing

f(x) =1

2xT Qx

where Q ≻ 0. (local and global) optimum: x⋆ = 0. Steepestdescent method:

xk+1 = xk − αk∇f(xk)

= xk − αkQxk

= (I − αkQ)xk

Error (in x) at step k + 1:

‖xk+1 − 0‖ = ‖(I − αkQ)xk‖

=√

xTk (I − αkQ)2xk Algorithms for unconstrained local optimization – p. 30

Analysis

Let A: symmetric with eigenvalues: λ1 < · · · < λn. Then

λ1‖v‖2 ≤ vT Av ≤ λm‖v‖2 ∀ v ∈ Rn

xTk (I − αkQ)2xk ≤ λ⋆xT

k xk

where λ⋆ largest eigenvalue of (I − αkQ)2.

Algorithms for unconstrained local optimization – p. 31

. . .

λ is an eigenvalue of A iff αλ is an eigenvalue of αA

λ is an eigenvalue of A iff 1 + λ is an eigenvalue of I + A

Thus the eigenvalues of (I − αkQ) are

1 − αλi

where λi are the eigenvalues of Q. The maximum eigenvaluewill be:

max(1 − αkλ1)2, (1 − αkλn)2

thus

‖xk+1‖ ≤√

max(1 − αkλ1)2, (1 − αkλn)2‖xk‖= max|1 − αkλ1|, |1 − αkλn|‖xk‖

Algorithms for unconstrained local optimization – p. 32

Page 43: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

Eliminating the dependency on αk:

max|1 − αλ1|, |1 − αλn| =

max1 − αλ1,−1 + αλ1, 1 − αλn,−1 + αλn

0

1

2

3

4

5

0 0.2 0.4 0.6 0.8 1

|1 − αλ1|

|1 − αλn|

Algorithms for unconstrained local optimization – p. 33

. . .

α ≥ 0 and λ1 ≤ λn, ⇒

1 − αλ1 ≥ 1 − αλn

−1 + αλ1 ≤ −1 + αλn

and thus

max|1 − αkλ1|, |1 − αkλn|‖xk‖ = max1 − αλ1,−1 + αλn

Minimum point:

1 − αλ1 = −1 + αλn

i.e.

α⋆ =2

λ1 + λnAlgorithms for unconstrained local optimization – p. 34

Analysis

In the best possible case

‖xk+1‖‖xk‖

≤ |1 − α⋆λ1|

= |1 − 2

λ1 + λn

λ1|

=λn − λ1

λn + λ1

=ρ − 1

ρ + 1

where ρ = λn/λ1: condition number of Qρ ≫ 1 (ill–conditioned problem) ⇒very slow convergenceρ ≈ 1 ⇒very speed convergence

Algorithms for unconstrained local optimization – p. 35

Zig–zagging

min1

2(x2 + My2)

where M > 0. Optimum: x⋆0y⋆ = 0. Starting point: (M, 1).Iterates:

[

xk+1

yk+1

]

=

[

xk

yk

]

+ α

[

xk

Myk

]

With optimal step size ⇒[

xk+1

yk+1

]

=

[

M(

M−1M+1

)k

(

−M−1M+1

)k

]

Algorithms for unconstrained local optimization – p. 36

Page 44: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Converegence is

rapid if M ≈ 1

very slow and “zig–zagging” if M ≫ 1 or M ≪ 1

Slow convergence and zig–zagging are general phenomena(especially when the starting point is near the longest axes ofthe ellipsoidal level sets)

Algorithms for unconstrained local optimization – p. 37

Zig–zagging

-10

-5

0

5

10

0 20 40 60 80 100

Algorithms for unconstrained local optimization – p. 38

Analysis of Newton’s method

Newton-Raphson method: xk+1 = xk − (∇2f(xk))−1 ∇f(xk). Let

x⋆: local optimum. Taylor expansion of ∇f :

∇f(x⋆) = 0

= ∇f(xk) + ∇2f(xk)(x⋆ − xk) + o(‖x⋆ − xk‖)

If ∇2f(xk) is non singular and ‖(∇2f(xk))−1‖ is limited ⇒

0 =(

∇2f(xk))−1 ∇f(xk) + (x⋆ − xk) +

(

∇2f(xk))−1

o(‖x⋆ − xk‖)= x⋆ − xk+1 + o(‖x⋆ − xk‖)

Algorithms for unconstrained local optimization – p. 39

Thus

‖x⋆ − xk+1‖ = o(‖x⋆ − xk‖)

i.e. ‖x⋆−xk+1‖

‖x⋆−xk‖= o(‖x⋆−xk‖)

‖x⋆−xk‖⇒convergence is at least super–linear

Algorithms for unconstrained local optimization – p. 40

Page 45: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Local Convergence of Newton’s Method

Let f ∈ C2(U(x⋆, δ1)), where U : ball with radius δ1 and center x⋆;let ∇2f(x⋆) be non–singular. Then:

1. ∃ δ > 0 : if x0 ∈ U(x⋆, δ) ⇒xk is well defined andconverges to x⋆ at least superlinearly.

2. If ∃ δ > 0, L > 0,M > 0 :

‖∇2f(x) −∇2f(y)‖ ≤ L‖x − y‖

and

‖(∇2f(x))−1‖ ≤ M

then, if x0 ∈ U(x⋆, δ) Newton’s method converges with orderat least 2 and

‖xk+1 − x⋆‖ ≤ LM

2‖xk − x⋆‖2

Algorithms for unconstrained local optimization – p. 41

Difficulties

Many things might go wrong:

at some iteration, ∇2f(xk) might be singular. For example:if xk belongs to a flat region f(x) = constant.

even if non singular, inversion ∇2f(xk) or, in any case,solving a linear system with coefficient matrix ∇2f(xk) isnumerically unstable and computationally demanding

there is no guarantee that ∇2f(xk) ≻ 0 ⇒Newton directionmight not be a descent direction

Algorithms for unconstrained local optimization – p. 42

Difficulties

Newton’s method just tries to solve the system

∇f(xk) = 0

and thus might very well be attracted towards a maximum

the method lacks global convergence: it converges only ifstarted “near” a local optimum

Algorithms for unconstrained local optimization – p. 43

Newton–type methods

line search variant: xk+1 = xk − αk (∇2f(xk))−1 ∇f(xk)

Modified Newton method: replace ∇2f(xk) by(∇2f(xk) + Dk) where Dk is chosen so that ∇2f(xk) + Dk ispositive definite

Algorithms for unconstrained local optimization – p. 44

Page 46: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Quasi-Newton methods

Consider solving the nonlinear system ∇f(x) = 0. Taylorexpansion of the gradient:

∇f(xk) ≈ ∇f(xk+1) + ∇2f(xk+1)(xk − xk+1)

Let Bk+1 be an approximation of the hessian in xk+1.Quasi–Newton equation:

Bk+1(xk+1 − xk) = ∇f(xk+1) −∇f(xk)

Algorithms for unconstrained local optimization – p. 45

Quasi–Newton equation

Let:

sk := xk+1 − xk yk := ∇f(xk+1) −∇f(xk)

Quasi–Newton equation: Bk+1sk = yk. If Bk was the previousapproximate hessian, we ask that

1. the variation between Bk and Bk+1 is “small”

2. nothing changes along directions which are normal to thestep sk:

Bkz = Bk+1z ∀ z : zT sk = 0

Choosing n−1 vectors z which are orthogonal to sk ⇒n2 linearlyindependent equations in n2 unknowns ⇒∃ a unique solution.

Algorithms for unconstrained local optimization – p. 46

Broyden updating

It can be shown that the unique solution is given by:

Bk+1 = Bk +(yk − Bksk)s

Tk

sTk sk

Theorem: let Bk ∈ Rn×n and sk 6= 0. The unique solution to:

minB

‖Bk − B‖F

Bsk = yk

is Broyden’s update Bk+1 here ‖X‖F =√

TrXT X denotesFrobenius norm.

Algorithms for unconstrained local optimization – p. 47

proof

‖Bk+1 − Bk‖ =

(yk − Bksk)sTk

sTk sk

=

(Bsk − Bksk)sTk

sTk sk

=

(B − Bk)sksTk

sTk sk

≤∥

∥(B − Bk)

‖sksTk ‖

sTk sk

=∥

∥(B − Bk)

TrsksTk sksT

k

sTk sk

=∥

∥(B − Bk)

sTk sk

sTk sk

= ‖(B − Bk)‖

Unicity is a consequence of the strict convexity of the norm andthe convexity of the feasible region.

Algorithms for unconstrained local optimization – p. 48

Page 47: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Quasi-Newton and optimization

Special situation:

1. the hessian matrix in optimization problems is symmetric;

2. in gradient methods, when we letxk+1 = xk − (Bk+1)

−1 ∇f(xk), it is desirable that Bk+1 bepositive definite.

Broyden’s update:

Bk+1 = Bk +(yk − Bksk)s

Tk

sTk sk

is generally not symmetric even if Bk is.

Algorithms for unconstrained local optimization – p. 49

Simmetry

Remedy: let C1 = Bk +(yk−Bksk)sT

k

sT

ksk

symmetrization:

C2 =1

2(C1 + CT

1 )

However, it does not satisfy Quasi–Newton equation. Broydenupdate of C2:

C3 = C2 +(yk − C2sk)s

Tk

sTk sk

which is not symmetric, . . .

Algorithms for unconstrained local optimization – p. 50

PBS update

In the limit

Bk+1 = Bk +(yk − Bksk)s

Tk + sk(yk − Bksk)

T

sTk sk

+(sT

k (yk − Bksk))sksTk

(sTk sk)2

(PBS – Powell-Broyden-Symmetric update).Imposing also hereditary positive definiteness, DFP(Davidon-Fletcher-Powell) is obtained:

Bk+1 = Bk +(yk − Bksk)y

Tk + yk(yk − Bksk)

T

yTk sk

+(sT

k (yk − Bksk))ykyTk

(yTk sk)2

=

(

I − yksTk

yTk sk

)

Bk

(

I − skyTk

yTk sk

)

+yky

Tk

yTk sk

Algorithms for unconstrained local optimization – p. 51

BFGS

Same ideas, but applied to the approximate inverse Hessian:Inverse Quasi–Newton equation:

sk = Hk+1yk

lead to the most common Quasi–Newton update: BFGS(Broyden-Fletcher-Goldfarb-Shanno):

Hk+1 =

(

I − skyTk

yTk sk

)

Hk

(

I − yksTk

yTk sk

)

+sks

Tk

yTk sk

Algorithms for unconstrained local optimization – p. 52

Page 48: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

BFGS method

xk+1 = xk − αkHk∇f(xk)

Hk+1 =

(

I − skyTk

yTk sk

)

Hk

(

I − yksTk

yTk sk

)

+sks

Tk

yTk sk

yk = ∇f(xk+1) −∇f(xk)

sk = xk+1 − xk

Algorithms for unconstrained local optimization – p. 53

Trust Region methods

Possible defect of standard Newton method: the approximationbecomes less and less precise if we move away from thecurrent point. Long step ⇒bad approximation.Idea: constrained minimization of quadratic approximation:

xk+1 = arg min‖xk+1−xk‖≤∆k

mk(x) where

mk(x) = f(xk) + ∇T f(xk)(xk+1 − xk)

+1

2(xk+1 − xk)

T∇2f(xk)(xk+1 − xk)

∆k > 0: parameter.First advantage (over pure Newton): the step is always definite(thanks to Weierstrass’s theorem)

Algorithms for unconstrained local optimization – p. 54

Outline of Trust Region

Let mk(·) a local model function. E.g. in Newton Trust Regionmethods,

mk(s) = f(xk) + sT∇f(xk) +1

2sT∇2f(xk)s

or in a Quasi-Newton Trust Region method

mk(s) = f(xk) + sT∇f(xk) +1

2sT Bks

Algorithms for unconstrained local optimization – p. 55

How to choose and update the trust region radius ∆k? Given astep sk, let

ρk =f(xk) − f(xk + sk)

mk(0) − mk(sk)

the ratio between the actual reduction and the predictedreduction

Algorithms for unconstrained local optimization – p. 56

Page 49: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Model updating

ρk =f(xk) − f(xk + sk)

mk(0) − mk(sk)

The predicted reduction is always non negative;

if ρk is small (surely if it is negative) the model and thefunction strongly disagree ⇒the step must be rejected andthe trust region reduced

if ρk ≥ 1 it is safe to expand the trust region

intermediate ρk values lead us to keep the regionunchanged

Algorithms for unconstrained local optimization – p. 57

Algorithm

Data: ∆ > 0, ∆0 ∈ (0, ∆), η ∈ [0, 1/4]

for k = 0, 1, . . . doFind the step sk and ρk minimizing the model in the trust region ;if ρk < 1/4 then

∆k+1 = ∆k/4 ;else

if ρk > 3/4 and ‖sk‖ = ∆k then

∆k+1 = min2∆k, ∆ ;else

∆k+1 = ∆k;end

end

if ρk > η thenxk+1 = xk + sk;

elsexk+1 = xk;

end

end

Algorithms for unconstrained local optimization – p. 58

Solving the model

How to find

mins

∇f(xk)T s +

1

2sT Bks

‖s‖ ≤ ∆

If Bk ≻ 0, KKT conditions are necessary and sufficient; rewritingthe constraint as sT s ≤ ∆2 ⇒:

∇f(xk) + Bks + 2λs = 0

λ(∆ − ‖s‖) = 0

Algorithms for unconstrained local optimization – p. 59

Thus either s is in the interior of the ball with radius ∆, in whichcase λ = 0 and we have the (quasi)-Newton step:

p = −B−1k ∇f(xk)

or ‖s‖ = ∆ and if λ > 0 then 2λs = −∇f(xk) − Bs = −∇mk(s)⇒s is parallel to the negtaive gradient of the model and normalto its contour lines.

Algorithms for unconstrained local optimization – p. 60

Page 50: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

The Cauchy Point

Strategy to approximately solve the trust region sub–problem.Find the “Cauchy point”: the minimizer of mk along the direction−∇f(xk) within the trust region. First find the direction:

psk = arg min

pfk + ∇f(xk)

T p

‖p‖ ≤ ∆k

Then along this direction find a minimizer

τk = arg minτ≥0

mk(τpsk)

‖τpsk‖ ≤ ∆k

The Cauchy point is xk + τkpsk.

Algorithms for unconstrained local optimization – p. 61

Finding the Cauchy point

Finding psk is easy: analytic solution:

psk = −∇f(xk)

‖gk‖∆k

For the step size τk:

If ∇f(xk)T Bk∇f(xk) ≤ 0 ⇒negative curvature direction

⇒largest possible step ⇒τk = 1

Otherwise the model along the line is strictly convex, so

τk = min1, ‖∇f(xk)‖3

∆k∇f(xk)T Bk∇f(xk)

Choosing the Cauchy point ⇒global but extremely slowconvergence (similar to steepest descent). Usually an improvedpoint is searched starting from the Cauchy one.

Algorithms for unconstrained local optimization – p. 62

Derivative Free Optimization

Algorithms for unconstrained local optimization – p. 63

Pattern Search

For smooth optimization, but without knowledge of derivatives.Elementary idea: if x ∈ R

2 is not a local minimum for f , then atleast one of the directions e1, e2,−e1,−e2 (moving towards E, N,W, S) forms an acute angle with −∇f(x) ⇒is a descentdirection.Direct search: explores all the direction in search of one whichgives a descent.

Algorithms for unconstrained local optimization – p. 64

Page 51: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Coordinate search

Let D⊕ = ±ei be the set of coordinate directions and theiropposites

Data: k = 0, ∆0 an initial step length, x0 a starting pointwhile ∆ is large enough do

if f(xk + ∆kd) < f(xk) for some d ∈ D⊕ thenxk+1 = xk + ∆kd (step accepted) ;

else∆k+1 = 0.5∆k ;

end

k = k + 1 ;end

Algorithms for unconstrained local optimization – p. 65

Pattern search

It is not necessary to explore 2n directions. It is sufficient thatthe set of directions forms a positive span, i.e. every v ∈ R

n

should be expressible as a non negative linear combination ofthe vectors in the set.Formally, G is a generating set iff

∀ v 6= 0 ∈ Rn∃ g ∈ G : vT g > 0

A “good” generating set should be characterized by asufficiently high cosine measure:

κ(G) := minv 6=0

maxd∈G

vT d

‖v‖‖d‖

Algorithms for unconstrained local optimization – p. 66

Examples

u

u

u

u

u

u

u

u

uu

In the first case κ ≈ 0.19612, in the second κ = 0.5, in the thirdκ =

√0.5 ≈ 0.7017

Algorithms for unconstrained local optimization – p. 67

Step Choice

xk+1 =

xk + ∆kdk if f(xk + ∆kdk) < f(xk) − ρ(∆k)(success)

xk otherwise (failure)

where ρ(t) = o(t). We let

∆k+1 = φk∆k

where φk ≥ 1 for successful iterations, φk < 1 otherwise.Direct methods possess good convergence properties.

Algorithms for unconstrained local optimization – p. 68

Page 52: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

b

Algorithms for unconstrained local optimization – p. 69

b

Algorithms for unconstrained local optimization – p. 70

b

Algorithms for unconstrained local optimization – p. 71

b

Algorithms for unconstrained local optimization – p. 72

Page 53: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Nelder-Mead Simplex

Given a simplex S = v1, . . . , vn+1 in Rn let vr the worst point:

r = arg maxif(vi). Let C be the centroid of S \ vr:

C =

i6=r vi

n

The algorithm performs a sort of line search along the directionC − vr.Let

R = C + (C − vr)

a reflection of the worst point along the direction. Let f be thebest function value in the current simplex.Three cases might occur:

Algorithms for unconstrained local optimization – p. 73

1: Reflection

Check f(R): if it is intermediate, i.e. better than the worst andworse than the best, then accept the reflection, i.e. discard theworst point in the simplex and replace it with R.

Algorithms for unconstrained local optimization – p. 74

Reflection step

b

b

b⊗

worst

reflection

Algorithms for unconstrained local optimization – p. 75

2: improvement

if the trial step is an improvement:

f(R) < f

then attempt an expansion: try to move R to R = R + (R − C)

If successful (f(R) < f(R)) then accept the expansion anddiscard the worst point.If unsuccessful, then accept R as a new point and discard theworst one.

Algorithms for unconstrained local optimization – p. 76

Page 54: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Expansion

b

b

bb

worst

reflection

expansion

Algorithms for unconstrained local optimization – p. 77

3: contraction

If however the reflected point R is worse than all points in thesimplex (possibly except the worst vr), than a contraction step isperformed:

if f(R) > f(vr) (R is worse than all points in the simplex),add

0.5(vr + C)

to the simplex and discard vr

otherwise if R is better than vr than add

0.5(R + C)

to the simplex and discard vr

Algorithms for unconstrained local optimization – p. 78

Contraction

b

b

b

b

b

⊗worst

reflectioncontraction

b

Algorithms for unconstrained local optimization – p. 79

Nelder-Mead is not a direct search method (only a singledirection at a time is explored)It is widely used by practitioners. However it may fail toconverge to a local minimum.There are examples of strictly convex functions in R

2 on whichthe method converges to a non-stationary point. The badconvergence properties are connected to the event that then–dimensional simplex degenerates into a lower dimensionalspace.Moreover the method has a strong tendency to generatedirections which are almost normal to that of the gradient!Convergent variants of Nelder-Mead method do exists.

Algorithms for unconstrained local optimization – p. 80

Page 55: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Implicit filtering

Let

f(x) = h(x) + w(x)

where h(x) is a smooth function, while w(x) can be consideredas an additive, typically random, noise.The method performs a rough estimate of the gradient (finitedifference with a “large step”) and proceeds with an Armijo linesearch. If unsuccessful, the step for finite differences isreduced.

Algorithms for unconstrained local optimization – p. 81

Implicit filtering

Data: εk ↓ 0, params δ, γ, ∆ of Armijo’s rulerepeat

OuterIteration = false;repeat

compute f(xk) and a finite difference estimate of ∇f(xk):

∇εkf(xk) = [(f(xk + εkei) − f(xkεkei))/2εk]

if ‖∇εkf(xk)‖ ≤ εk then

OuterIteration = trueelse

Armijo: if successful accept the Armijo step;otherwise let OuterIteration = true

end

until OuterIteration ;k = k + 1;

until convergence criterion ;Algorithms for unconstrained local optimization – p. 82

Convergence properties

If

∇2h(x) is Lipschitz continuous

the sequence xk generated by the method is infinite

limk→∞

ε2k +

η(xk; εk)

εk

= 0

where

η(x; ε) = supz:‖z−x‖∞≤ε

|w(x)|

unsuccessful Armijo steps occur at most a finite number oftimes

then all limit points of xk are stationaryAlgorithms for unconstrained local optimization – p. 83

Page 56: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Algorithms for constrained localoptimization

Fabio Schoen

2008

http://gol.dsi.unifi.it/users/schoen

Algorithms for constrained local optimization – p. 1

Feasible direction methods

Algorithms for constrained local optimization – p. 2

Frank–Wolfe method

Let X: convex set. Consider the problem:

minx∈X

f(x)

Let xk ∈ X ⇒choosing a feasible direction dk corresponds tochoosing a point x ∈ X : dk = x − xk.“Steepest descent” choice:

minx∈X

∇T f(xk)(x − xk)

(a linear objective with convex constraints, usually easy tosolve). Let xk be an optimal solution of this problem.

Algorithms for constrained local optimization – p. 3

Frank–Wolfe

If ∇T f(xk)(xk − xk) = 0 then

∇T f(xk)d ≥ 0

for every feasible direction d ⇒first order necessary conditionshold.Otherwise, letting dk = xk − x, this is a descent direction alongwhich a step αk ∈ (0, 1] might be chosen according to Armijo’srule.

Algorithms for constrained local optimization – p. 4

Page 57: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convergence of Frank-Wolfe method

Under mild conditions the method converges to a pointsatisfying first order necessary conditions.However it is usually extremely slow (convergence may besub–linear)It might find applications in very large scale problems in whichsolving the sub-problem for direction determination is very easy(e.g. when X is a polytope).

Algorithms for constrained local optimization – p. 5

Gradient Projection methods

Generic iteration:

xk+1 = xk + αk(xk − xk)

where the direction dk = xk − xk is obtained finding

xk = [xk − sk∇f(xk)]+

where: sk ∈ R+ and [·]+ represents projection over the feasible

set.

Algorithms for constrained local optimization – p. 6

The method is slightly faster than Frank-Wolfe, with a linearconvergence rate similar to that of (unconstrained) steepestdescent.It might be applied if projection is relatively cheap, e.g. whenthe feasible set is a box.A point xk satisfies first order necessary conditionsdT∇f(xk) ≥ 0 iff

xk = [xk − sk∇f(xk)]+

Algorithms for constrained local optimization – p. 7

Lagrange Multiplier Algorithms

Algorithms for constrained local optimization – p. 8

Page 58: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Barrier Methods

min f(x)

gj(x) ≤ 0 j = 1, . . . , r

A Barrier is a continuous function which tends to +∞ wheneverx approaches the boundary of the feasible region. Examples ofbarrier functions:

B(x) = −∑

j

log(−gj(x)) logaritmic barrier

B(x) = −∑

j

1

gj(x)invers barrier

Algorithms for constrained local optimization – p. 9

Barrier Method

Let εk ↓ 0 and x0 strictly feasible, i.e. gj(x0) < 0∀ j. Then let

xk = arg minx∈Rn

(f(x) + εkB(x))

Proposition: every limit point of xk is a global minimum of theconstrained optimization problem

Algorithms for constrained local optimization – p. 10

Analysis of Barrier methods

Special case: a single constraint (might be generalized)Let x be a limit point of xk (a global minimum). If KKTconditions hold, then there exists a unique λ ≥ 0:

∇f(x) + λ∇g(x) = 0

(with λg(x) = 0. xk, solution of the barrier problem

min f(x) + εkB(x)

g(x) < 0

satisfies

∇f(xk) + εk∇B(xk) = 0

Algorithms for constrained local optimization – p. 11

. . .

If B(x) = φ(g(x)), ⇒

∇f(xk) + εkφ′(g(xk))∇g(xk) = 0

In the limit, for k → ∞:

lim εkφ′(g(xk))∇g(xk) = λ∇g(x)

if limk g(xk) < 0 ⇒φ′(g(xk))∇g(xk) → K (finite) and Kεk → 0

if limk g(xk) = 0 ⇒(thanks to the unicity of Lagrangemultipliers),

λ = limk

εkφ′(g(xk))

Algorithms for constrained local optimization – p. 12

Page 59: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Difficulties in Barrier Methods

strong numeric instability: the condition number of thehessian matrix grows as εk → 0

need for an initial strictly feasible point x0

(partial) remedy: εk is very slowly decreased and the solution ofthe k + 1–th problem is obtained starting an unconstrainedoptimization from xk

Algorithms for constrained local optimization – p. 13

Example

min(x − 1)2 + (y − 1)2

x + y ≤ 1

Logarithmic Barrier problem:

min(x − 1)2 + (y − 1)2 − εk log(1 − x − y)

x + y − 1 < 0

Gradient:

2(x − 1) + εk

1−x−y

2(y − 1) + εk

1−x−y

Stationary points x = y = 3

1+εk

4(only the “-” solution is acceptable)

Algorithms for constrained local optimization – p. 14

Barrier methods and L.P.

min cT x

Ax = b

x ≥ 0

Logarithmic Barrier on x ≥ 0:

min cT x − ε∑

j

log xj

Ax = b

x > 0

Algorithms for constrained local optimization – p. 15

The central path

The starting point is usually associated with ε = ∞ and is theunique solution of

min−∑

j

log xj

Ax = b

x > 0

The trajectory x(ε) of solutions to the barrier problem is calledthe central path and leads to an optimal solution of the LP.

Algorithms for constrained local optimization – p. 16

Page 60: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Penalty Methods

Penalized problem:

min f(x) + ρP (x)

where ρ > 0 and P (x) ≥ 0 with P (x) = 0 if x is feasible.Example:

min f(x)

hi(x) = 0 i = 1, . . . ,m

A penalized problem might be:

min f(x) + ρ∑

i

hi(x)2

Algorithms for constrained local optimization – p. 17

Convergence of the quadratic penalty method

(for equality constrained problems): let

P (x; ρ) = f(x) + ρ∑

i

hi(x)2

Given ρ0 > 0, x0 ∈ Rn, k = 0, let

xk+1 = arg min P (x; ρk)

(found with an iterative method initialized at xk); let ρk+1 > ρk,k := k + 1.If xk+1 is a global minimizer of P and ρk → ∞ then every limitpoint of xk is a global optimum of the constrained problem.

Algorithms for constrained local optimization – p. 18

Exact penalties

Exact penalties: there exists a penalty parameter value s.t. theoptimal solution to the penalized problem is the optimal solutionof the original one.ℓ1 penalty function:

P1(x; ρ) = f(x) + ρ∑

i

|hi(x)|

Algorithms for constrained local optimization – p. 19

Exact penalties

for inequality constrained problems:

min f(x)

hi(x) = 0

gj(x) ≤ 0

the penalized problem is

P1(x; ρ) = f(x)ρ∑

i

|hi(x)| + ρ∑

j

max(0,−gj(x))

Algorithms for constrained local optimization – p. 20

Page 61: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Augmented Lagrangian method

Given an equality constrained problem, reformulate it as:

min f(x) +1

2ρ‖h(x)‖2

h(x) = 0

The Lagrange function of this problem is called AugmentedLagrangian:

L(x; λ) = f(x) +1

2ρ‖h(x)‖2 + λT h(x)

Algorithms for constrained local optimization – p. 21

Motivation

minx

f(x) +1

2ρ‖h(x)‖2 + λT h(x)

∇xLρ(x, λ) = ∇f(x) +∑

i

λi∇h(x) + ρh(x)∇h(x)

= ∇xL(x, λ) + ρh(x)∇h(x)

∇2xxLρ(x, λ) = ∇2f(x) +

i

λi∇2h(x) + ρh(x)∇2h(x) + ρ∇h(x)∇T h(x)

= ∇2xxL(x, λ) + ρh(x)∇2h(x) + ρ∇h(x)∇T h(x)

Algorithms for constrained local optimization – p. 22

motivation . . .

Let (x⋆, λ⋆) an optimal (primal and dual) solution. Necessarily:∇xL(x⋆, λ⋆) = 0; moreover h(x⋆) = 0 thus

∇xLρ(x⋆, λ⋆) = ∇xL(x⋆, λ⋆) + ρh(x⋆)∇h(x⋆)

= 0

⇒(x⋆, λ⋆) is a stationary point for the augmented lagrangian.

Algorithms for constrained local optimization – p. 23

motivation . . .

Observe that:

∇2

xxLρ(x, λ) = ∇2

xxL(x, λ) + ρh(x)∇2h(x) + ρ∇h(x)∇T h(x)

= ∇2

xxL(x, λ) + ρ∇h(x)∇T h(x)

Assume that sufficient optimality conditions hold:

vT∇2

xxL(x⋆, λ⋆)v > 0 ∀ v : vT∇h(x⋆) = 0,

Algorithms for constrained local optimization – p. 24

Page 62: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

Let v 6= 0 : vT∇h(x⋆)= 0. Then

vT∇2

xxLρ(x⋆, λ⋆)vT = vT∇2

xxL(x⋆, λ⋆)vT + ρvT∇h(x⋆)∇T h(x⋆)v

= vT∇2

xxL(x⋆, λ⋆)vT > 0

Algorithms for constrained local optimization – p. 25

. . .

Let v 6= 0 : vT∇h(x⋆)6= 0. Then

vT∇2

xxLρ(x⋆, λ⋆)vT = vT∇2

xxL(x⋆, λ⋆)vT + ρvT∇h(x⋆)∇T h(x⋆)v

= vT∇2

xxL(x⋆, λ⋆)vT + ρ(vT∇h(x⋆))2

which might be negative. However ∃ρ > 0: if ρ ≥ ρ

⇒vT∇2xxLρ(x

⋆, λ⋆)vT > 0.Thus, if ρ is large enough, the Hessian of the augmentedlagrangian is positive definite and x⋆ is a (strict) local minimumof Lρ(·, λ

⋆)

Algorithms for constrained local optimization – p. 26

Inequality constraints

min f(x)

g(x) ≤ 0

Nonlinear transformation of inequalities into equalities:

minx,s

f(x)

gj(x) + s2

j = 0 j = 1, p

Algorithms for constrained local optimization – p. 27

Given the problem

min f(x)

hi(x) = 0 i = 1,m

gj(x) ≤ 0 j = 1, p

an Augmented Lagrangian problem might be defined as

minLρ(x, z; λ, µ) = minx,z

f(x) + λT h(x) +1

2ρ‖h(x)‖2

+∑

j

µj(gj(x) + z2

j ) +1

j

(gj(x) + z2

j )2

Algorithms for constrained local optimization – p. 28

Page 63: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

. . .

Consider minimization with respect to z variables:

minz

j

µj(gj(x) + z2

j ) +1

j

(gj(x) + z2

j )2

= minu≥0

j

µj(gj(x) + uj) +1

2ρ(gj(x) + uj)

2

(quadratic minimization over the nonnegative orthant). Solution:

u⋆j = max0, uj

where u is the unconstrained optimum:

u : µj + ρ(gj(x) + uj) = 0

Algorithms for constrained local optimization – p. 29

. . .

Thus:

u⋆j = max0,−

µj

ρ− gj(x).

Substituting:

Lρ(x; λ, µ) = f(x) + λT h(x) +1

2ρ‖h(x)‖2

+1

j

(

max0, µj + ρgj(x) − µ2

j

)

This is an Augmented Lagragian for inequality constrainedproblems.

Algorithms for constrained local optimization – p. 30

Sequential Quadratic Programming

min f(x)

hi(x) = 0

Idea: apply Newton’s method to solve the KKT equations:Lagrangian function:

L(x; λ) = f(x) +∑

i

λihi(x)

let H(x) = [hi(x)] ,∇H(x) = [∇hi(x)]. KKT conditions:

F [x; λ] =

[

∇f(x) + ∇HT (x)λH(x)

]

= 0

Algorithms for constrained local optimization – p. 31

Newton step for SQP

Jacobian of KKT system:

F ′(x, λ) =

[

∇2xxL(x; λ) ∇T H(x)∇H(x) 0

]

Newton step:[

xk+1

λk+1

]

=

[

xk

λk

]

+

[

dk

∆k

]

where[

∇2xxL(xk; λk) ∇T H(xk)∇H(xk) 0

] [

dk

∆k

]

=

[

−∇f(xk) −∇HT (xk)λk

−H(xk)

]

Algorithms for constrained local optimization – p. 32

Page 64: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

existence

The Newton step exists if

the Jacobian of the constraint set ∇H(xk) has full row rank

the Hessian ∇2xxL(xk; λk) is positive definite

In this case the Newton step is the unique solution of

∇2

xxL(xk; λk)dk + ∇T H(xk)∆k + ∇f(xk) + ∇HT (xk)λk = 0

∇H(xk)dk + H(xk) = 0

Algorithms for constrained local optimization – p. 33

Alternative view: SQP

mind

f(xk) + ∇f(xk)T d +

1

2dT∇2

xxL(xk; λk)d

∇H(xk)d + H(xk) = 0

KKT conditions:

∇2

xxL(xk; λk)d + ∇f(xk) + ∇H(xk)Λk = 0

Under the same conditions as before this QP has a uniquesolution dk with Lagrange multipliers Λk = λk+1

Algorithms for constrained local optimization – p. 34

Alternative view: SQP

mind

L(xk, λk) + ∇TxL(xk, λk)d +

1

2dT∇2

xxL(xk; λk)d

∇H(xk)d + H(xk) = 0

KKT conditions:

∇2

xxL(xk; λk)d + ∇f(xk) + ∇H(xk)λk + ∇H(xk)Λk = 0

Under the same conditions as before this QP has a uniquesolution dk with Lagrange multipliers Λk = ∆k+1

Algorithms for constrained local optimization – p. 35

Thus SQP can be seen as a method which

minimizes a quadratic approximation to the Lagrangian

subject to a first order approximation of the constraints.

Algorithms for constrained local optimization – p. 36

Page 65: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Inequalities

If the original problem is

min f(x)

hi(x) = 0

gj(x) ≤ 0

then the SQP iteration solves

mind

fk + ∇f(xk)T d +

1

2dT∇2

xxL(xk, λk)d

∇Ti hi(xk)p + hi(xk) = 0

∇Tj gj(xk)p + gj(xk) ≤ 0

Algorithms for constrained local optimization – p. 37

Filter Methods

Basic idea:

min f(x)

g(x) ≤ 0

can be considered as a problem with two objectives:

minimize f(x)

minimize g(x)

(the second objective has priority over the first)

Algorithms for constrained local optimization – p. 38

Filter

Given the problem

min f(x)

gj(x) ≤ 0 j = 1, . . . , k

let us consider the bi-criteria optimization problem

min f(x)

min h(x)

where

h(x) =∑

j

maxgj(x), 0

Algorithms for constrained local optimization – p. 39

Let fk, hk, k = 1, 2, . . . the observed values of f and h at pointsx1, x2, . . ..A pair (fk, hk) dominates a pair (fℓ, hℓ) iff

fk ≤ fℓ and

hk ≤ hℓ

A filter is a list of pairs which are non-dominated by the others

Algorithms for constrained local optimization – p. 40

Page 66: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

bc

bc

bc

bc

bc

h(x)

f(x)

Algorithms for constrained local optimization – p. 41

Trust region SQP

Consider a Trust-region SQP method:

mind

fk + ∇L(xk; λk)T d +

1

2dT∇2

xxL(xk; λk)d

∇Tj gj(xk)p + gj(xk) ≤ 0

‖d‖∞ ≤ ρ

(the ∞ norm is used here in order to keep the problem a QP)Traditional (unconstrained) trust region methods: if the currentstep is a failure ⇒reduce the trust region ⇒eventually the stepwill become a pure gradient step ⇒convergence!

Algorithms for constrained local optimization – p. 42

Trust region SQP

Here diminishing the trust region radius might lead to infeasibleQP’s:

gj(x) ≤ 0

∇Tj gj(xk)p + gj(xk) ≤ 0

bc xk

Algorithms for constrained local optimization – p. 43

Filter methods

Data: x0: starting point, ρ, k = 0

while Convergence criterion not satisfied do

if QP is infeasible thenFind xk+1 minimizing constraint violation;

elseSolve QP and get a step dk; try setting xk+1 = xk + dk;if (fk+1, hk+1) is acceptable to the filter then

Accept xk+1 and add (fk+1, hk+1) to the filter;Remove dominated points from the filter;Possibly increase ρ;

elseReject the step;Reduce ρ;

end

end

set k = k + 1;end Algorithms for constrained local optimization – p. 44

Page 67: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Comparison with other methods

bc

bc

bc

bc

bc

h(x)

f(x)

acceptable steps "classical" method

Rejected filter steps

Algorithms for constrained local optimization – p. 45

Page 68: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Introduction to Global OptimizationFabio Schoen

2008

http://gol.dsi.unifi.it/users/schoen

Introduction to Global Optimization – p. 1

Global Optimization Problems

minx∈S⊆Rn

f(x)

What is it meant by global optimization? Of course we sould liketo find

f ∗ = minx∈S⊆Rn

f(x)

andx∗ = arg min f(x) : f(x∗) ≤ f(x) ∀ x ∈ S

Introduction to Global Optimization – p. 2

This definition in unsatisfactory:

the problem is “ill posed” in x (two objective functions whichdiffer only slightly might have global optima which arearbitrarily far)

it is however well posed in the optimal values: ||f − g|| ≤ δ⇒|f ∗ − g∗| ≤ ε

Introduction to Global Optimization – p. 3

Quite often we are satisfied in looking for f ∗ and search one ormore feasible solutions suche that

f(x) ≤ f(x∗) + ε

Frequently, however, this is too ambitious a task!

Introduction to Global Optimization – p. 4

Page 69: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Research in Global Optimization

the problem is highly relevant, especially in applications

the problem is very hard (perhaps too much) to solve

there are plenty of publications on global optimizationalgorithms for specific problem classes

there are only relatively few papers with relevant theoreticalcontents

often from elegant theories, weak algorithms have beenproduced and viceversa, the best computational methodsoften lack a sound theoretical support

Introduction to Global Optimization – p. 5

many global optimization papers get published on appliedresearch journals

Bazaraa, Sherali, Shetty “Nonlinear Programming: theoryand algorithms”, 1993:the word “global optimum” appears for the first time on page99, the second time at page 132, then at page 247:“A desirable property of an algorithm for solving [anoptimization] problem is that it generates a sequence ofpoints converging to a global optimal solution. In manycases however we may have to be satisfied with lessfavorable outcomes.”after this (in 638 pages) it never appears anymore. “Globaloptimization” is never cited.

Introduction to Global Optimization – p. 6

Similar situation in Bertsekas, Nonlinear Programming (1999):777 pages, but only the definition of global minima and maximais given!Nocedal & Wrigth, “Numerical Optimization”, 2nd edition, 2006:Global solutions are needed in some applications, but for manyproblems they are difficult to recognize and even more difficultto locate. . .many successful global optimization algorithms require thesolution of many local optimization problems, to which thealgorithms described in this book can be applied

Introduction to Global Optimization – p. 7

Complexity

Global optimization is “hopeless”: without “global” informationno algorithm will find a certifiable global optimum unless itgenerates a dense sample.There exists a rigorous definition of “global” information – someexamples:

number of local optima

global optimum value

for global optimization problems over a box, (an upperbound on) the Lipschitz constant

|f(y) − f(x)| ≤ L‖x− y‖ ∀x, y

Concavity of the objective function + convexity of thefeasible region

an explicit representation of the objective function as thedifference between two convex functions (+ convexity of the

Introduction to Global Optimization – p. 8

Page 70: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Complexity

Global optimization is computationally intractable alsoaccording to classical complexity theory. Special cases:Quadratic programming:

minl≤Ax≤u

1

2xTQx+ cTx

is NP–hard [Sahni, 1974] and, when considered as a decisionproblem, NP -complete [Vavasis, 1990].

Introduction to Global Optimization – p. 9

Many special cases are still NP–hard:

norm maximization on a parallelotope:

max ‖x‖b ≤ Ax ≤ c

Quadratic optimization on a hyper-rectangle (A = I) wheneven only one eigenvalue of Q is negative

quadratic minimization over a simplex

minx≥0

1

2xTQx+ cTx

j

xj = 1

Even checking that a point is a local optimum is NP -hardIntroduction to Global Optimization – p. 10

Applications of global optimization

concave minimization – quantity discounts, scaleeconomies

fixed charge

combinatorial optimization - binary linear programming:

min cTx+KxT (1 − x)

Ax = b

x ∈ [0, 1]

or:

min cTx

Ax = b

x ∈ [0, 1]

xT (1 − x) = 0Introduction to Global Optimization – p. 11

Minimization of cost functions which are neither convex norconcave. E.g.: finding the minimum conformation ofcomplex molecules – Lennard-Jones micro-cluster, proteinfolding, protein-ligand docking,Example: Lennard-Jones: pair potential due to two atoms atX1, X2 ∈ R

3:

v(r) =1

r12− 2

r6

where r = ‖X1 −X2‖. The total energy of a cluster of Natoms located at X1, . . . , XN ∈ R

3 is defined as:∑

i=1,...,N

j<i

v(||Xi −Xj||)

This function has a number of local (non global) minimawhich grows like exp(N)

Introduction to Global Optimization – p. 12

Page 71: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Lennard-Jones potential

-3

-2

-1

0

1

2

3

0.5 1 1.5 2 2.5 3 3.5 4 4.5 5

attractive(x)repulsive(x)

lennard-jones(x)

Introduction to Global Optimization – p. 13

Protein folding and docking

Potential energy model:E = El + Ea + Ed + Ev + Ee where:

El =∑

i∈L

1

2Kb

i (ri − r0i )

2

(contribution of pairs of bonded atoms):

Ea =∑

i∈A

1

2Kθ

i (θi − θ0i )

2

(angle between 3 bonded atoms)

Ed =∑

i∈T

1

2Kφ

i [1 + cos(nφi − γ)]

(dihedrals)Introduction to Global Optimization – p. 14

Ev =∑

(i,j)

∈C

(

Aij

R12ij

− Bij

R6ij

)

(van der Waals)

Ee =1

2

(i,j)

∈C

qiqjεRij

(Coulomb interaction)

Introduction to Global Optimization – p. 15

Docking

Given two macro-molecules M1,M2, find their minimal energycouplingIf no bonds are changed ⇒to find the optimal docking it issufficient to minimized:

Ev + Ee =∑

i∈M1,j∈M2

(

Aij

R12ij

− Bij

R6ij

)

+1

2

i∈M1,j∈M2

qiqjεRij

Introduction to Global Optimization – p. 16

Page 72: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Main algorithmic strategies

Two main families:

1. with global information (“structured problems”)

2. without global information (“unstructured problems”)

Structured problems ⇒stochastic and deterministic methodsUnstructured problems ⇒typically stochastic algorithmsEvery global optimization method should try to find a balancebetween

exploration of the feasible region

approximations of the optimum

Introduction to Global Optimization – p. 17

Example: Lennard Jones

LJN = minLJ(X) = minN−1∑

i=1

N∑

j=i+1

1

‖Xi −Xj‖12− 2

‖Xi −Xj‖6

This is a highly structured problem. But is it easy/convenient touse its structure?And how?

Introduction to Global Optimization – p. 18

LJ

The map

F1 : R3N 7→ R

N(N−1)/2+

F1(X1, . . . , XN) 7→

‖X1 −X2‖2, . . . , ‖XN−1 −XN‖2

is convex and the function

F2 : RN(N−1)/2+ 7→ R

F2(r12, . . . , rN−1,N) 7→∑ 1

r6ij

− 2∑ 1

r3ij

is the difference between two convex functions. Thus LJ(X)can be seen as the difference between two convex function (ad.c. programming problem)

Introduction to Global Optimization – p. 19

NB: every C2 function is d.c., but often its d.c. decomposition isnot known.D.C. optimization is very elegant, there exists a nice dualitytheory, but algorithms are typically very inefficient.

Introduction to Global Optimization – p. 20

Page 73: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

A primal method for d.c. optimization

“cutting plane” method (just an example, not particularlyefficient, useless for high dimensional problems).Any unconstrained d.c. problem can be represented as anequivalent problem with linear objective, a convex constraintand a reverse convex constraint. If g, h ar convex, thenmin g(x) − h(x) is equivalent to:

min z

g(x) − h(x) ≤ z

which is equivalent to

min z

g(x) ≤ w

h(x) + z ≥ w

Introduction to Global Optimization – p. 21

D.C. canonical form

min cTx

g(x) ≤ 0

h(x) ≥ 0

where h, g: convex. Let

Ω = x : g(x) ≤ 0C = x : h(x)≤0

Hp:0 ∈ intΩ ∩ intC, cTx > 0∀x ∈ Ω \ intC

Fundamental property: if a D.C. problem admits an optimum, atleast one optimum belongs to

∂Ω ∩ ∂C Introduction to Global Optimization – p. 22

Discussion of the assumptions

g(0) < 0, h(0) < 0, cTx > 0∀ feasible x. Let x be a solution to theconvex problem

min cTx g(x) ≤ 0

If h(x) ≥ 0 then x solves the d.c. problem. Otherwise cTx > cT xfor all feasible x. Coordinate transformation: y = x− x:

min cTy

g(y) ≤ 0

h(y) ≥ 0

where g(y) = g(y + x). Then cTy > 0 for all feasible solutionsand h(0) > 0; by continuity it is possible to choose x so thatg(0) < 0.

Introduction to Global Optimization – p. 23-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

4

Ω

C

0

cTx = 0

Introduction to Global Optimization – p. 24

Page 74: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Let x best known solution.Let

D(x) = x ∈ Ω : cTx ≤ cT xIf D(x) ⊆ C then x is optimal;Check: a polytope P (with known vertices) is built whichcontains D(x)If all vertices of P are in C ⇒optimal solution. Otherwise let v:best feasible vertex;the intersection of the segment [0, v] with ∂C (if feasible) is animproving point x. Otherwise a cut is introduced in P which istangent to Ω in x.

Introduction to Global Optimization – p. 25-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

x

D(x) = x ∈ Ω : cTx ≤ cT x

Introduction to Global Optimization – p. 26

Initialization

Given a feasible solution x, take a polytope P such that

P ⊇ D(x)

i.e.

y : cTy ≤ cT x

y feasible

⇒y ∈ P

If P ⊂ C, i.e. if y ∈ P ⇒h(y) ≤ 0 then x is optimal.Checking is easy if we know the vertices of P .

Introduction to Global Optimization – p. 27-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

x

P : D(x) ⊆ P with vertices V1, . . . , Vk. V ⋆ := arg maxh(Vj)

V ⋆

Introduction to Global Optimization – p. 28

Page 75: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Step 1

Let V ⋆ the vertex with largest h() value. Surely h(V ⋆) > 0(otherwise we stop with an optimal solution)Moreover: h(0) < 0 (0 is in the interior of C). Thus the line fromV ⋆ to 0 must intersect the boundary of CLet xk be the intersection point. It might be feasible(⇒improving) or not.

Introduction to Global Optimization – p. 29-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

x

xk = ∂C ∩ [V ⋆, 0]

V ⋆

xk

Introduction to Global Optimization – p. 30

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

If xk ∈ Ω, set x := xk

x

Introduction to Global Optimization – p. 31-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3

-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

Otherwise if xk 6∈ Ω, the polytope is divided

Introduction to Global Optimization – p. 32

Page 76: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

-9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3-4

-3

-2

-1

0

1

2

3

4

Ω

C

cTx = 0

Otherwise if xk 6∈ Ω, the polytope is divided

Introduction to Global Optimization – p. 32

Duality for d.c. problems

minx∈S

g(x) − h(x)

where f, g: convex. Let

h⋆(u) := supuTx− h(x) : x ∈ Rn

g⋆(u) := supuTx− g(x) : x ∈ Rn

the conjugate functions of h e g. The problem

infh⋆(u) − g⋆(u) : u : h⋆(u) < +∞

is the Fenchel-Rockafellar dual. If min g(x) − h(x) admits anoptimum, then Fenchel dual is a strong dual.

Introduction to Global Optimization – p. 33

If x⋆ ∈ arg min g(x) − h(x) then

u⋆ ∈ ∂h(x⋆)

(∂ denotes subdifferential) is dual optimal and ifu⋆ ∈ arg minh⋆(u) − g⋆(u) then

x⋆ ∈ ∂g⋆(u⋆)

is an optimal primal solution.

Introduction to Global Optimization – p. 34

A primal/dual algorithm

Pk : min g(x) − (h(xk) + (x− xk)Tyk)

andDk : minh⋆(y) − (g⋆(yk−1) + xT

k (y − yk−1)

Introduction to Global Optimization – p. 35

Page 77: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Exact Global Optimization

Introduction to Global Optimization – p. 36

GlobOpt - relaxations

Consider the global optimization problem (P):

min f(x)

x ∈ X

and assume the min exists and is finite and that we can use arelaxation (R):

min g(y)

y ∈ Y

Usually both X and Y are subsets of the same space Rn.

Recall: (R) is a relaxation of (P) iff:

X ⊆ Y

g(x) ≤ f(x) for all x ∈ XIntroduction to Global Optimization – p. 37

Branch and Bound

1. Solve the relaxation (R) and let L be the (global) optimumvalue (assume it is feasible for (R))

2. (Heuristically) solve the original problem (P) (or, moregenerally, find a “good” feasible solution to (P) in X). Let Ube the best feasible function value known

3. if U − L ≤ ε then stop: U is a certified ε–optimum for (P)

4. otherwise split X and Y into two parts and apply to each ofthem the same method

Introduction to Global Optimization – p. 38

Tools

“good relaxations”: easy yet accurate

good upper bounding, i.e., good heuristics for (P)

Good relaxations can be obtained, e.g., through:

convex relaxations

domain reduction

Introduction to Global Optimization – p. 39

Page 78: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convex relaxations

Assume X is convex and Y = X. If g is the convex envelop of fon X, then solving the convex relaxation (R), in one step givesthe certified global optimum for (P).g(x) is a convex under-estimator of f on X if:

g(x)is convex

g(x) ≤ f(x) ∀x ∈ X

g is the convex envelop of f on X if:

gis a convex under-estimator off

g(x) ≥ h(x) ∀x ∈ X

∀h : convex under-estimator of f

Introduction to Global Optimization – p. 40

A 1-D example

Introduction to Global Optimization – p. 41

Convex under-estimator

Introduction to Global Optimization – p. 42

Branching

Introduction to Global Optimization – p. 43

Page 79: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Bounding

fathomed

Upper bound

lower boundsIntroduction to Global Optimization – p. 44

Relaxation of the feasible domain

Let

minx∈S

f(x)

be a GlobOpt problem where f is convex, while S is non convex.A relaxation (outer approximation) is obtained replacing S with alarger set Q. If Q is convex ⇒convex optimization problem.If the optimal solution to

minx∈Q

f(x)

belongs to S ⇒optimal solution to the original problem.

Introduction to Global Optimization – p. 45

Example

minx∈[0,5],y∈[0,3]

−x− 2y

xy ≤ 3

0 1 2 3 4 5 60

1

2

3

4

Introduction to Global Optimization – p. 46

Relaxation

minx∈[0,5],y∈[0,3]

−x− 2y

xy ≤ 3

We know that:

(x+ y)2 = x2 + y2 + 2xy

thus

xy = ((x+ y)2 − x2 − y2)/2

and, as x and y are non-negative, x2 ≤ 5x, y2 ≤ 3y, thus a(convex) relaxation of xy ≤ 3 is

(x+ y)2 − 5x− 3y ≤ 6

(a convex constraint)

Introduction to Global Optimization – p. 47

Page 80: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Relaxation

0 1 2 3 4 5 60

1

2

3

4

Optimal solution of the relaxed convex problem: (2, 3) (value:−8)

Introduction to Global Optimization – p. 48

Stronger Relaxation

minx∈[0,5],y∈[0,3]

−x− 2y

xy ≤ 3

Thus:

(5 − x)(3 − y) ≥ 0 ⇒15 − 3x− 5y + xy ≥ 0 ⇒

xy ≥ 3x+ 5y − 15

Thus a (convex) relaxation of xy ≤ 3 is

3x+ 5y − 15 ≤ 3

i.e.: 3x+ 5y ≤ 18Introduction to Global Optimization – p. 49

Relaxation

0 1 2 3 4 5 60

1

2

3

4

The optimal solution of the convex (linear) relaxation is (1, 3)which is feasible ⇒optimal for the original problem

Introduction to Global Optimization – p. 50

Convex (concave) envelopes

How to build convex envelopes of a function or how to relax anon convex constraint?Convex envelopes ⇒lower boundsConvex envelopes of −f(x) ⇒upper boundsConstraint: g(x) ≤ 0 ⇒if h(x) is a convex underestimator of gthen h(x) ≤ 0 is a convex relaxations.Constraint: g(x) ≥ 0 ⇒if h(x) is concave and h(x) ≥ g(x), thenh(x) ≥ 0 is a “convex” constraint

Introduction to Global Optimization – p. 51

Page 81: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Convex envelopes

Definition: a function is polyhedral if it is the pointwise maximumof a finite number of linear functions.(NB: in general, the convex envelope is the pointwisesupremum of affine minorants)The generating set X of a function f over a convex set P is theset

X = x ∈ Rn : (x, f(x))is a vertex of epi(convP (f))

I.e., given f we first build its convex envelop in P and thendefine its epigraph (x, y) : x ∈ P, y ≥ f(x). This is a convexset whose extreme points can be denoted by V . X are the xcoordinates of V

Introduction to Global Optimization – p. 52

Generating sets

* *

*

*

Introduction to Global Optimization – p. 53

bbb

Introduction to Global Optimization – p. 54

Characterization

Let f(x) be continuously differentiable in a polytope P . Theconvex envelope of f on P is polyhedral if and only if

X(f) = Vert(P )

(the generating set is the vertex set of P )Corollary: let f1, . . . , fm ∈ C1(P ) and

i fi(x) possesspolyhedral convex envelopes on P . Then

Conv(∑

i

fi(x)) =∑

i

Convfi(x)

iff the generating set of∑

i Conv(fi(x)) is Vert(P )

Introduction to Global Optimization – p. 55

Page 82: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Characterization

If a f(x) is such that Convf(x) is polyhedral, than an affinefunction h(x) such that

1. h(x) ≤ f(x) for all x ∈ Vert(P )

2. there exist n+ 1 affinely independent vertices of P ,V1, . . . , Vn+1 such that

f(Vi) = h(Vi) i = 1, . . . , n+ 1

belongs to the polyhedral description of Convf(x) and

h(x) = convf(x)

for any x ∈ Conv(V1, . . . , Vn+1).

Introduction to Global Optimization – p. 56

Characterization

The condition may be reversed: given m affine functionsh1, . . . , hm such that, for each of them

1. hj(x) ≤ f(x) for all x ∈ Vert(P )

2. there exist n+ 1 affinely independent vertices of P ,V1, . . . , Vn+1 such that

f(Vi) = hj(Vi) i = 1, . . . , n+ 1

Then the function ψ(x) = maxj φj(x) is the convex envelope of apolyhedral function f iff

the generating set of ψ is Vert(P)

for every vertex Vi we have ψ(Vi) = f(Vi)

Introduction to Global Optimization – p. 57

Sufficient condition

If f(x) is lower semi-continuous in P and for all x 6∈ Vert(P ) thereexists a line ℓx: x ∈ interior of P ∩ ℓx and f(x) is concave in aneighborhood of x on ℓx,then Convf(x) is polyhedralApplication: let

f(x) =∑

i,j

αijxixj

The sufficient condition holds for f in [0, 1]n ⇒bilinear forms arepolyhedral in an hypercube

Introduction to Global Optimization – p. 58

Application: a bilinear term

(Al-Khayyal, Falk (1983)): let x ∈ [ℓx, ux], y ∈ [ℓy, uy]. Then theconvex envelope of xy in [ℓx, ux] × [ℓy, uy is

φ(x, y) = maxℓyx+ ℓxy − ℓxℓy;uyx+ uxy − uxuy

In fact: φ(x, y) is a under-estimate of xy:

(x− ℓx)(y − ℓy) ≥ 0

xy ≥ ℓyx+ ℓxy − ℓxℓy

and analogously for xy ≥ uyx+ uxy − uxuy

Introduction to Global Optimization – p. 59

Page 83: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Bilinear terms

xy ≥ φ(x, y) = maxℓyx+ ℓxy − ℓxℓy;uyx+ uxy − uxuyNo other (polyhedral) function underestimating xy is tighter.In fact ℓyx+ ℓxy − ℓxℓy belongs to the convex envelope: itunderestimates xy and coincides with xy at 3 vertices((ℓx, ℓy), (ℓx, uy), (ux, ℓy)).Analogously for the other affine function.All vertices are interpolated by these 2 underestimatinghyperplanes ⇒they form the convex envelop of xy

Introduction to Global Optimization – p. 60

All easy then?

Of course no!Many things can go wrong . . .

It is true that, on the hypercube, a bilinear form:∑

i<j

αijxixj

is polyhedral (easy to see) but we cannot guarantee ingeneral that the generating set of the envelope are thevertices of the hypercube! (in particular, if α’s have oppositesigns)

if the set is not an hypercube, even a bilinear term might benon polyhedral: e.g. xy on the triangle 0 ≤ x ≤ y ≤ 1

Finding the (polyhedral) convex envelope of a bilinear form on ageneric polytope P is NP–hard!

Introduction to Global Optimization – p. 61

Fractional terms

A convex underestimate of a fractional term x/y over a box canbe obtained through

w ≥ ℓx/y + x/uy − ℓx/uy if ℓx ≥ 0

w ≥ x/uy − ℓxy/ℓyuy + ℓx/ℓy if ℓx < 0

w ≥ ux/y + x/ℓy − ux/ℓy if ℓx ≥ 0

w ≥ x/ℓy − uxy/ℓyuy + ux/uy if ℓx < 0

(a better underestimate exists)

Introduction to Global Optimization – p. 62

Univariate concave terms

If f(x), x ∈ [ℓx, ux], is concave, then the convex envelope issimply its linear interpolation at the extremes of the interval:

f(ℓx) +f(ux) − f(ℓx)

ux − ℓx(x− ℓx)

Introduction to Global Optimization – p. 63

Page 84: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Underestimating a general nonconvex function

Let f(x) ∈ C2 be general non convex. Than a convexunderestimate on a box can be defined as

φ(x) = f(x) −n∑

i=1

αi(xi − ℓi)(ui − xi)

where αi > 0 are parameters. The Hessian of φ is

∇2φ(x) = ∇2f(x) + 2diag(α)

φ is convex iff ∇2φ(x) is positive semi-definite.

Introduction to Global Optimization – p. 64

How to choose αi’s? One possibility: uniform choice: αi = α. Inthis case convexity of φ is obtained iff

α ≥ max

0,−1

2min

x∈[ℓ,u]λmin(x)

where λmin(x) is the minimum eigenvalue of ∇2f(x)

Introduction to Global Optimization – p. 65

Key properties

φ(x) ≤ f(x)

φ interpolates f at all vertices of [ℓ, u]

φ is convex

Maximum separation:

max(f(x) − φ(x)) =1

4α∑

i

(ui − ℓi)2

Thus the error in underestimation decreases when the boxis split.

Introduction to Global Optimization – p. 66

Estimation of α

Compute an interval Hessian [H] : [H(x)]ij = [hLij(x), h

Uij(x)] in

[ℓ, u]Find α such that [H] + 2diag(α) < 0.Gerschgorin theorem for real matrices:

λmin ≥ mini

hii −∑

j 6=i

|hij|

Extension to interval matrices:

λmin ≥ mini

hLii −

j 6=i

max|hLij|, |hU

ij|uj − ℓjui − ℓi

Introduction to Global Optimization – p. 67

Page 85: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Improvements

new relaxation functions (other than quadratic). Example

Φ(x; γ) = −n∑

i=1

(1 − eγi(xi−ℓi))(1 − eγi(ui−xi))

gives a tighter underestimate than the quadratic function

partitioning: partition the domain into a small number ofregions (hyper-rectangules); evaluate a convexunderestimator in each region; join the underestimators toform a single convex function in the whole domain

Introduction to Global Optimization – p. 68

Domain (range) reduction

Techniques for cutting the feasible region without cutting theglobal optimum solution.Simplest approaches: feasibility-based and optimality-basedrange reduction (RR).Let the problem be:

minx∈S

f(x)

Feasibility based RR asks for solving

ℓi = minxi ui = maxxi

x ∈ S x ∈ S

for all i ∈ 1, . . . , n and then adding the constraints x ∈ [ℓ, u] tothe problem (or to the sub-problems generated during Branch &Bound)

Introduction to Global Optimization – p. 69

Feasibility Based RR

If S is a polyhedron, RR requires the solution of LP’s:

[ℓ, u] = min /maxx

Ax ≤ b

x ∈ [L,U ]

“Poor man’s” L.P. based RR: from every constraint∑

j aijxj ≤ biin which ai > 0 then

x ≤1

ai

(

bi −∑

j 6=

aijxj

)

x ≤1

ai

(

bi −∑

j 6=

minaijLj, aijUj)

Introduction to Global Optimization – p. 70

Optimality Based RR

Given an incumbent solution x ∈ S, ranges are updated bysolving the sequence:

ℓi = minxi ui = maxxi

f(x) ≤ f(x) f(x) ≤ f(x)

x ∈ S x ∈ S

where f(x) is a convex underestimate of f in the currentdomain.RR can be applied iteratively (i.e., at the end of a complete RRsequence, we might start a new one using the new bounds)

Introduction to Global Optimization – p. 71

Page 86: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

generalization

minx∈X

f(x) (P )

g(x) ≤ 0

a (non convex) problem; let

minx∈X

f(x) (R)

g(x) ≤ 0

be a convex relaxation of (P ):

x ∈ X : g(x) ≤ 0 ⊆ x ∈ X : g(x) ≤ 0 and

x ∈ X : g(x) ≤ 0⇒f(x) ≤ f(x)

Introduction to Global Optimization – p. 72

R.H.S. perturbation

Let

φ(y) = minx∈X

f(x) (Ry)

g(x) ≤ y

be a perturbation of (R). (R) convex ⇒(Ry) convex for any y.Let x: an optimal solution of (R) and assume that the i–thconstraint is active:

g(x) = 0

Then, if xy is an optimal solution of (Ry) ⇒gi(x) ≤ yi is active at

xy if yi ≤ 0

Introduction to Global Optimization – p. 73

Duality

Assume (R) has a finite optimum at x with value φ(0) andLagrange multipliers µ. Then the hyperplane

H(y) = φ(0) − µTy

is a supporting hyperplane of the graph of φ(y) at y = 0, i.e.

φ(y) ≥ φ(0) − µTy ∀ y ∈ Rm

Introduction to Global Optimization – p. 74

Main result

If (R) is convex with optimum value φ(0), constraint i is active atthe optimum and the Lagrange multiplier is µi > 0 then, if U isan upper bound for the original problem (P ) the constraint:

gi(x) ≥ −(U − L)/µi

(where L = φ(0)) is valid for the original problem (P ), i.e. it doesnot exclude any feasible solution with value better than U .

Introduction to Global Optimization – p. 75

Page 87: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

proof

Problem (Ry) can be seen as a convex relaxation of theperturbed non convex problem

Φ(y) = minx∈X

f(x)

g(x) ≤ y

and thus φ(y) ≤ Φ(y). Thus underestimating (Ry) produces anunderestimate of Φ(y). Let y := eiyi; From duality:L− µT eiyi ≤ φ(eiyi) ≤ Φ(eiyi)If yi < 0 then U is an upper bound also for Φ(eiyi), thusL− µiyi ≤ U . But if yi < 0 then constraint i is active. For anyfeasible x there exists a yi < 0 such that g(x) ≤ yi is active ⇒wemay substitute yi with g

i(x) and deduce L− µigi

(x) ≤ U

Introduction to Global Optimization – p. 76

Applications

Range reduction: let x ∈ [ℓ, u] in the convex relaxed problem. Ifvariable xi is at its upper bound in the optimal solution, them wecan deduce

xi ≥ maxℓi, ui − (U − L)/λi

where λi is the optimal multiplier associated to the i–th upperbound. Analogously for active lower bounds:

xi ≤ minui, ℓi + (U − L)/λi

Introduction to Global Optimization – p. 77

Let the constraint

aTi x ≤ bi

be active in an optimal solution of the convex relaxation (R).Then we can deduce the valid inequality

aiTx ≥ bi − (U − L)/µi

Introduction to Global Optimization – p. 78

Methods based on “merit functions”

Bayesian algorithm: the objective function is considered as arealization of a stochastic process

f(x) = F (x;ω)

A loss function is defined, e.g.:

L(x1, ..., xn;ω) = mini=1,n

F (xi;ω) − minxF (x;ω)

and the next point to sample is placed in order to minimize theexpected loss (or risk)

xn+1 = arg minE (L(x1, ..., xn, xn+1) | x1, ..., xn)

= arg minE (min(F (xn+1;ω) − F (x;ω)) | x1, ..., xn)

Introduction to Global Optimization – p. 79

Page 88: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Radial basis method

Given k observations (x1, f1), . . . , (xk, fk), an interpolant is built:

s(x) =n∑

i=1

λiΦ(‖x− xi‖) + p(x)

p: polynomial of a (prefixed) small degree m. Φ: radial functionlike, e.g.:

Φ(r) = r linear

Φ(r) = r3 cubic

Φ(r) = r2 log r thin plate spline

Φ(r) = e−γr2

gaussian

Polynomial p is necessary to guarantee existence of a uniqueinterpolant (i.e. when the matrix Φij = Φ(‖xi −xj‖) is singular)

Introduction to Global Optimization – p. 80

“Bumpiness”

Let f ⋆k an estimate of the value of the global optimum after k

observations. Let syk the (unique) interpolant of the data points

(xi, fi)i = 1, . . . , k

(y, f ⋆k )

Idea: the most likely location of y is such that the resultinginterpolant has minimum “bumpiness”Bumpiness measure:

σ(sk) = (−1)m+1∑

λisyk(xi)

Introduction to Global Optimization – p. 81

TO BE DONE

Introduction to Global Optimization – p. 82

Stochastic methods

Pure Random Search - random uniform sampling over thefeasible region

Best start: like Pure Random Search, but a local search isstarted from the best observation

Multistart: Local searches started from randomly generatedstarting points

Introduction to Global Optimization – p. 83

Page 89: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

-3

-2

-1

0

1

2

3

0 1 2 3 4 5

rsrsrs rs rsrs rs rsrsrs

+

++

+

+

+ + +++

Introduction to Global Optimization – p. 84

-3

-2

-1

0

1

2

3

0 1 2 3 4 5

rsrsrs rs rsrs rs rsrsrs

+

++

+

+

+ + +++

Introduction to Global Optimization – p. 85

Clustering methods

Given a uniform sample, evaluate the objective function

Sample Transformation (or concentration): either a fractionof “worst” points are discarded, or a few steps of a gradientmethod are performed

Remaining points are clustered

from the best point in each cluster a single local search isstarted

Introduction to Global Optimization – p. 86

Uniform sample

−1

−3

0

−5

rs

rs rs

rs

rs

rsrs

rs

rs

rs

rsrs

rs

rs

rs

rs

rs

rs

rs

rsrsrs

rs

rs

rs

rs

rs

rsrs

rs

rs

0

1

2

3

4

5

0 1 2 3 4 5

Introduction to Global Optimization – p. 87

Page 90: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Sample concentration

−1

−3

0

−5

rs

rsrs

rs

rs

rs

rs

rs

rs

rs

rs

rs

rsrs

rs

+ + +

+

+

+

+

++

+++

+

+ ++

0

1

2

3

4

5

0 1 2 3 4 5

Introduction to Global Optimization – p. 88

Clustering

−1

−3

0

−5

r

rr

rr

r

r

r

r

r

u

r

u

r

r

0

1

2

3

4

5

0 1 2 3 4 5

Introduction to Global Optimization – p. 89

Local optimization

−1

−3

0

−5

r

rr

rr

r

r

r

r

r

u

r

u

r

r

0

1

2

3

4

5

0 1 2 3 4 5

Introduction to Global Optimization – p. 90

Clustering: MLSL

Sampling proceed in batches of N points. Given sample pointsX1, . . . , Xk ∈ [0, 1]n, label Xj as “clustered” iff ∃Y ∈ X1, . . . , Xk:

||Xj − Y || ≤ ∆k :=1√2π

(

log k

kσΓ(

1 +n

2

)

)1

n

andf(Y ) ≤ f(Xj)

Introduction to Global Optimization – p. 91

Page 91: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

Simple Linkage

A sequential sample is generated (batches consist of a singleobservation). A local search is started only from the lastsampled point (i.e. there is no “recall”) unless there exists asufficiently near sampled point with better function valure

Introduction to Global Optimization – p. 92

Smoothing methods

Given f : Rn → R, the Gaussian transform is defined as:

〈f〉λ(x) =1

πn/2λn

Rn

f(y) exp(

−‖y − x‖2/λ2)

When λ is sufficiently large ⇒〈f〉λ is convex. Idea: starting witha large enough λ, minimize the smoothed function and slowlydecrease λ towards 0.

Introduction to Global Optimization – p. 93

Smoothing methods

-10-5

05

10 -10

-5

0

5

10

0

0.5

1

1.5

2

2.5

3

Introduction to Global Optimization – p. 94

-10-5

05

10 -10

-5

0

5

10

0

0.5

1

1.5

2

2.5

3

Introduction to Global Optimization – p. 95

Page 92: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

-10-5

05

10 -10

-5

0

5

10

0.60.8

11.21.41.61.8

22.22.4

Introduction to Global Optimization – p. 96

-10-5

05

10 -10

-5

0

5

10

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Introduction to Global Optimization – p. 97

-10-5

05

10 -10

-5

0

5

10

0.8

1

1.2

1.4

1.6

1.8

2

2.2

Introduction to Global Optimization – p. 98

Transformed function landscape

Elementary idea: local optimization smooths out many “highfrequency” oscillations

Introduction to Global Optimization – p. 99

Page 93: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 100

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 101

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 102

Monotonic Basin-Hopping

k := 0; f⋆ := +∞;while k < MaxIter do

Xk: random initial solutionX⋆

k= arg min f(x; Xk);

(local minimization started at Xk)fk = f(X⋆

k);

if fk < f⋆ =⇒ f⋆ := fk

NoImprove := 0;while NoImprove < MaxImprove do

X = random perturbation of Xk

Y = arg minf(x; X) ;if f(Y ) < f⋆ =⇒ Xk := Y ; NoImprove := 0; f⋆ := f(Y )

otherwise NoImprove + +

end while

end while

Introduction to Global Optimization – p. 103

Page 94: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 104

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 105

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 106

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 107

Page 95: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

0

1

2

3

4

5

6

7

8

9

10

Introduction to Global Optimization – p. 108

Page 96: Nonlinear Programming Models Fabio Schoen Introductionfor all x,y ∈ Ω,λ ∈ [0,1] Nonlinear Programming Models – p. 5 Convex Functions x y Nonlinear Programming Models – p.

References

In this year’s course the global optimization part has been expanded, so itis possible that some part in nonlinear optimization will be skipped. Here isan essential reference list for the material covered during the course:

Mokhtar S. Bazaraa, John J. Jarvis and Hanif D. Sherali, Linear Program-ming and Network Flows, John Wiley & Sons, 1990.

Dimitri P. Bertsekas, Nonlinear Programming, Athena Scientific, 1999.

Jorge Nocedal and Stephen J. Wright, Numerical Optimization, Springer,2006.

Mohit Tawarmalani and Nikolaos V. Sahinidis, A Polyhedral Branch–and–Cut Approach to Global Optimization, in: Mathematical Programming, vol-ume 103, pages 225-249, 2005.

Androulakis I.P., C.D. Maranas, and C.A. Floudas (PostScript (184K), PDF(154K)), ”αBB : A Global Optimization Method for General ConstrainedNonconvex Problems”, Journal of Global Optimization, 7, 4, pp. 337-363(1995).

A. Rikun. A convex envelope formula for multilinear functions. Journal ofGlobal Optimization, pages 10:425–437, 1997.

Andrea Grosso, Marco Locatelli and Fabio Schoen, A Population Based Ap-proach for Hard Global Optimization Problems Based on Dissimilarity Mea-sures, in: Mathematical Programming, volume 110, number 2, pages 373-404,2007.

1