IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture...

21

Transcript of IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture...

Page 1: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

IE 5531: Engineering Optimization ILecture 14: Unconstrained optimization

Prof. John Gunnar Carlsson

October 27, 2010

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1 / 21

Page 2: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Administrivia

Midterms returned 11/01

11/01 o�ce hours moved

PS5 posted this evening

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 2 / 21

Page 3: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Recap: Applications of KKT conditions

Applications of KKT conditions:

Portfolio optimization

Public good allocation

Communication channel power allocation (water-�lling)

Fisher's exchange market

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 3 / 21

Page 4: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Today

Algorithms for unconstrained minimization:

Introduction

Bisection search

Golden section search

Line search

Wolfe, Goldstein conditions

Gradient method (steepest descent)

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 4 / 21

Page 5: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Introduction

Today's lecture is focused on solving the unconstrained problem

minimize f (x)

for x ∈ Rn

Ideally, we would like to �nd a global minimizer, i.e. a point x∗ suchthat f (x∗) ≤ f (x) for all x ∈ Rn

In general, as we have seen with the KKT conditions, we have tosettle for a local minimizer, i.e. a point x∗ such that f (x∗) ≤ f (x) forall x in a local neighborhood N (x∗)

If f (x) is convex, these two notions are the same

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 5 / 21

Page 6: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Necessary and su�cient conditions

If x∗ is a local minimizer, then there must be no descent direction, i.e.a direction d such that ∇f (x∗)T d < 0

This immediately implies that ∇f (x∗) = 0

We also need to distinguish between local maximizers and local

minimizers, so we also require that H � 0, where hij = ∂2f (x∗)∂xi∂xj

The stronger condition H � 0 is a su�cient condition for x∗ to be aminimizer

Again, if f (x) is convex (and continuously di�erentiable), then∇f (x∗) = 0 is a necessary and su�cient condition for a global

minimizer

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 6 / 21

Page 7: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Overview

Optimization algorithms tend to be iterative procedures:

Starting at a given point x0, they generate a sequence {xk} of iteratesThis sequence terminates when either no more progress can be made(out of memory, etc.) or when a solution point has been approximatedsatisfactorily

At any given iterate xk , we generally want xk+1 to satisfyf (xk+1) < f (xk)

Furthermore, we want our sequence to converge to a local minimizerx∗

The general approach is a line search:

At any given iterate xk , choose a direction dk , and then set

xk+1 = xk + αkdk for some scalar αk > 0

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 7 / 21

Page 8: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Convergent sequences

De�nition

Let {xk} be a sequence of real numbers. Then {xk} converges to x∗ if andonly if for all real numbers ε > 0, there exists a positive integer K suchthat ‖xk − x∗‖ < ε for all k ≥ K .

Examples of convergence:

xk = 1/k

xk = (1/2)k

xk =[

1log(k+1)

]k

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 8 / 21

Page 9: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Searching in one variable: root-�nding

Intermediate value theorem: given a continuous single-variablefunction f (x) and a pair of points x0 and x1 such that f (x`) < 0 andf (xr ) > 0, there exists a point x∗ ∈ [x`, xr ] such that f (x∗) = 0

A simpler question to motivate: how can we �nd x∗ (or a point withinε of x∗)?

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 9 / 21

Page 10: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Bisection

1 Choose xmid = x`+xr

2and evaluate f (xmid)

2 If f (xmid) = 0, then x∗ = xmid and we're done

3 Otherwise,

1 If f (xmid) < 0, then set x` = xmid

2 If f (xmid) > 0, then set xr = xmid

4 If xr − x` < ε, we're done; otherwise, go to step 1

The algorithm above divides the search interval in half at every iteration;thus, to approximate x∗ by ε we require at most log2

r−`ε iterations

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 10 / 21

Page 11: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Golden section search

Consider a unimodal function f (x) de�ned on an interval [x`, xr ]

Unimodal: f (x) has only one local minimizer x∗in [x`, xr ]

How can we �nd x∗ (or a point within ε of x∗)?

Hint: we can do this without derivatives

Hint: we need to sample two points x′`, x

′r in [x`, xr ]

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 11 / 21

Page 12: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Golden section search

Assume without loss of generality that x` = 0 and xr = 1; set ψ = 3−√5

2

1 Set x′` = ψ and x

′r = 1− ψ.

2 If f(x′`

)< f

(x′r

), then the minimizer must lie in the interval[

x`, xr ′], so set xr = x

′r

3 Otherwise, the minimizer must lie in the interval[x`′ , xr

], so set

x` = x`′

4 If xr − x` < ε, we're done; otherwise, go to step 1

By setting ψ = 3−√5

2we decrease the search interval by a constant factor

1− ψ ≈ 0.618

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 12 / 21

Page 13: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Line search: step length

Consider the multi-dimensional problem

minimize f (x)

for x ∈ Rn

At each iteration xk we set dk = −∇f (xk) and setxk+1 = xk + αkdk , for appropriately chosen αk

Ideally, we would like for αk to be the minimizer of the univariatefunction

φ (α) := f (xk + αdk)

but this is time-consuming

In the big picture, we want αk to give us a su�cient reduction inf (x), without spending too much time on it

Two conditions we can impose are the Wolfe and Goldstein conditions

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 13 / 21

Page 14: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Armijo condition

Clearly the step length αk should guarantee a su�cient decrease inf (x), so we require

φ (α) = f (xk + αdk) ≤ f (xk) + c1α∇f (xk)Tdk

with c1 ∈ (0, 1)

The right-hand side is linear in α

Note that this is satis�ed for all α that are su�ciently small

In practice, we often set c1 ≈ 10−4

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 14 / 21

Page 15: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Curvature condition

The preceding condition is not su�cient because an arbitrarily small αsatis�es it, which means that {xk} may not converge to a minimizer

One way to get around this is to impose the additional condition

φ′(α) = ∇f (xk + αdk)

Tdk ≥ c2∇f (xk)

Tdk

where c2 ∈ (c1, 1)

This condition just says that the slope at φ (α) has to be more than c2times the slope at φ (0)

Typically we choose c2 ≈ 0.9

If the slope at φ (α) were really small, it would mean that our step sizewasn't chosen very well (we could continue in that direction anddecrease the function)

The Armijo condition and the curvature condition, when combined,are called the Wolfe conditions

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 15 / 21

Page 16: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Goldstein conditions

An alternative to the Wolfe conditions is the Goldstein conditions:

f (xk)+(1− c)α∇f (xk)Tdk ≤ f (xk + αdk) ≤ f (xk)+cα∇f (xk)

Tdk

with c ∈ (0, 1/2)

The second inequality is just the su�cient decrease condition

The �rst inequality bounds the step length from below

One disadvantage is that the local minimizers of φ (α) may beexcluded in this search

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 16 / 21

Page 17: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Steepest (gradient) descent example

Recall that in the method of steepest descent, we set dk = −∇f (xk)

Consider the case where we want to minimize

f (x) = cTx +

1

2xTQx

where Q is a symmetric positive de�nite matrix

Clearly, the unique minimizer lies where ∇f (x∗) = 0, which occursprecisely when

Qx = −c

The descent direction will be d = −∇f (x) = − (c + Qx)

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 17 / 21

Page 18: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Steepest descent example

The iteration schemexk+1 = xk + αkdk

is given byxk+1 = xk − αk (c + Qxk)

We need to choose a step size αk , so we consider

φ (α) = f (xk − α (c + Qxk))

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 18 / 21

Page 19: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Steepest descent example

Note that we don't even need the Wolfe or Goldstein conditions, as wecan �nd the optimal α analytically!

φ (α) = f (xk − α (c + Qxk))

= cT (xk − α (c + Qxk))

+1

2(xk − α (c + Qxk))

T Q (xk − α (c + Qxk))

Since φ (α) is a strictly convex quadratic function in α it is not hardto see that its minimizer occurs where

cTdk + x

Tk Qdk + αdT

k Qdk = 0

and thus we set

αk =dTk dk

dTk Qdk

with dk = − (c + Qxk)

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 19 / 21

Page 20: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Steepest descent example

The recursion for the steepest descent method is therefore

xk+1 = xk −(

dTk dk

dTk Qdk

)dk

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 20 / 21

Page 21: IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture 14: Unconstrained optimization Prof. John Gunnar Carlsson October 27, 2010 Prof.

Convergence of steepest descent

Theorem

Let f (x) be a given continuously di�erentiable function. Let x0 ∈ Rn be a

point for which the sub-level set

X0 = {x ∈ Rn : f (x) ≤ f (x0)}

is bounded. Let {xk} be a sequence of points generated by the steepest

descent method initiated at x0, using either the Wolfe or Goldstein line

search conditions. Then {xk} converges to a stationary point of f (x).

The above theorem gives what is called the global convergence

property of the steepest-descent methodNo matter how far away x0 is, the steepest descent method mustconverge to a stationary pointThe steepest descent method may, however, be very slow to reachthat point

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 21 / 21