IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture...

IE 5531: Engineering Optimization ILecture 14: Unconstrained optimization

Prof. John Gunnar Carlsson

October 27, 2010

Prof. John Gunnar Carlsson IE 5531: Engineering Optimization I October 27, 2010 1 / 21

Administrivia

Midterms returned 11/01

11/01 o�ce hours moved

PS5 posted this evening


Recap: Applications of KKT conditions

Applications of KKT conditions:

Portfolio optimization

Public good allocation

Communication channel power allocation (water-�lling)

Fisher's exchange market


Today

Algorithms for unconstrained minimization:

Introduction

Bisection search

Golden section search

Line search

Wolfe, Goldstein conditions

Gradient method (steepest descent)


Introduction

Today's lecture is focused on solving the unconstrained problem

minimize f (x)

for x ∈ Rn

Ideally, we would like to �nd a global minimizer, i.e. a point x∗ suchthat f (x∗) ≤ f (x) for all x ∈ Rn

In general, as we have seen with the KKT conditions, we have tosettle for a local minimizer, i.e. a point x∗ such that f (x∗) ≤ f (x) forall x in a local neighborhood N (x∗)

If f (x) is convex, these two notions are the same


Necessary and su�cient conditions

If x∗ is a local minimizer, then there must be no descent direction, i.e.a direction d such that ∇f (x∗)T d < 0

This immediately implies that ∇f (x∗) = 0

We also need to distinguish between local maximizers and local

minimizers, so we also require that H � 0, where hij = ∂2f (x∗)∂xi∂xj

The stronger condition H � 0 is a su�cient condition for x∗ to be aminimizer

Again, if f (x) is convex (and continuously di�erentiable), then∇f (x∗) = 0 is a necessary and su�cient condition for a global

minimizer


Overview

Optimization algorithms tend to be iterative procedures:

Starting at a given point x0, they generate a sequence {xk} of iteratesThis sequence terminates when either no more progress can be made(out of memory, etc.) or when a solution point has been approximatedsatisfactorily

At any given iterate xk , we generally want xk+1 to satisfyf (xk+1) < f (xk)

Furthermore, we want our sequence to converge to a local minimizerx∗

The general approach is a line search:

At any given iterate xk , choose a direction dk , and then set

xk+1 = xk + αkdk for some scalar αk > 0


Convergent sequences

De�nition

Let {xk} be a sequence of real numbers. Then {xk} converges to x∗ if andonly if for all real numbers ε > 0, there exists a positive integer K suchthat ‖xk − x∗‖ < ε for all k ≥ K .

Examples of convergence:

xk = 1/k

xk = (1/2)k

xk =[

1log(k+1)

]k


Searching in one variable: root-�nding

Intermediate value theorem: given a continuous single-variablefunction f (x) and a pair of points x0 and x1 such that f (x`) < 0 andf (xr ) > 0, there exists a point x∗ ∈ [x`, xr ] such that f (x∗) = 0

A simpler question to motivate: how can we �nd x∗ (or a point withinε of x∗)?


Bisection

1 Choose xmid = x`+xr

2and evaluate f (xmid)

2 If f (xmid) = 0, then x∗ = xmid and we're done

3 Otherwise,

1 If f (xmid) < 0, then set x` = xmid

2 If f (xmid) > 0, then set xr = xmid

4 If xr − x` < ε, we're done; otherwise, go to step 1

The algorithm above divides the search interval in half at every iteration;thus, to approximate x∗ by ε we require at most log2

r−`ε iterations



Consider a unimodal function f (x) de�ned on an interval [x`, xr ]

Unimodal: f (x) has only one local minimizer x∗in [x`, xr ]

How can we �nd x∗ (or a point within ε of x∗)?

Hint: we can do this without derivatives

Hint: we need to sample two points x′`, x

′r in [x`, xr ]



Assume without loss of generality that x` = 0 and xr = 1; set ψ = 3−√5

2

1 Set x′` = ψ and x

′r = 1− ψ.

2 If f(x′`

)< f

(x′r

), then the minimizer must lie in the interval[

x`, xr ′], so set xr = x

′r

3 Otherwise, the minimizer must lie in the interval[x`′ , xr

], so set

x` = x`′

4 If xr − x` < ε, we're done; otherwise, go to step 1

By setting ψ = 3−√5

2we decrease the search interval by a constant factor

1− ψ ≈ 0.618


Line search: step length

Consider the multi-dimensional problem

minimize f (x)

for x ∈ Rn

At each iteration xk we set dk = −∇f (xk) and setxk+1 = xk + αkdk , for appropriately chosen αk

Ideally, we would like for αk to be the minimizer of the univariatefunction

φ (α) := f (xk + αdk)

but this is time-consuming

In the big picture, we want αk to give us a su�cient reduction inf (x), without spending too much time on it

Two conditions we can impose are the Wolfe and Goldstein conditions


Armijo condition

Clearly the step length αk should guarantee a su�cient decrease inf (x), so we require

φ (α) = f (xk + αdk) ≤ f (xk) + c1α∇f (xk)Tdk

with c1 ∈ (0, 1)

The right-hand side is linear in α

Note that this is satis�ed for all α that are su�ciently small

In practice, we often set c1 ≈ 10−4


Curvature condition

The preceding condition is not su�cient because an arbitrarily small αsatis�es it, which means that {xk} may not converge to a minimizer

One way to get around this is to impose the additional condition

φ′(α) = ∇f (xk + αdk)

Tdk ≥ c2∇f (xk)

Tdk

where c2 ∈ (c1, 1)

This condition just says that the slope at φ (α) has to be more than c2times the slope at φ (0)

Typically we choose c2 ≈ 0.9

If the slope at φ (α) were really small, it would mean that our step sizewasn't chosen very well (we could continue in that direction anddecrease the function)

The Armijo condition and the curvature condition, when combined,are called the Wolfe conditions


Goldstein conditions

An alternative to the Wolfe conditions is the Goldstein conditions:

f (xk)+(1− c)α∇f (xk)Tdk ≤ f (xk + αdk) ≤ f (xk)+cα∇f (xk)

Tdk

with c ∈ (0, 1/2)

The second inequality is just the su�cient decrease condition

The �rst inequality bounds the step length from below

One disadvantage is that the local minimizers of φ (α) may beexcluded in this search


Steepest (gradient) descent example

Recall that in the method of steepest descent, we set dk = −∇f (xk)

Consider the case where we want to minimize

f (x) = cTx +

1

2xTQx

where Q is a symmetric positive de�nite matrix

Clearly, the unique minimizer lies where ∇f (x∗) = 0, which occursprecisely when

Qx = −c

The descent direction will be d = −∇f (x) = − (c + Qx)


Steepest descent example

The iteration schemexk+1 = xk + αkdk

is given byxk+1 = xk − αk (c + Qxk)

We need to choose a step size αk , so we consider

φ (α) = f (xk − α (c + Qxk))



Note that we don't even need the Wolfe or Goldstein conditions, as wecan �nd the optimal α analytically!

φ (α) = f (xk − α (c + Qxk))

= cT (xk − α (c + Qxk))

+1

2(xk − α (c + Qxk))

T Q (xk − α (c + Qxk))

Since φ (α) is a strictly convex quadratic function in α it is not hardto see that its minimizer occurs where

cTdk + x

Tk Qdk + αdT

k Qdk = 0

and thus we set

αk =dTk dk

dTk Qdk

with dk = − (c + Qxk)



The recursion for the steepest descent method is therefore

xk+1 = xk −(

dTk dk

dTk Qdk

)dk


Convergence of steepest descent

Theorem

Let f (x) be a given continuously di�erentiable function. Let x0 ∈ Rn be a

point for which the sub-level set

X0 = {x ∈ Rn : f (x) ≤ f (x0)}

is bounded. Let {xk} be a sequence of points generated by the steepest

descent method initiated at x0, using either the Wolfe or Goldstein line

search conditions. Then {xk} converges to a stationary point of f (x).

The above theorem gives what is called the global convergence

property of the steepest-descent methodNo matter how far away x0 is, the steepest descent method mustconverge to a stationary pointThe steepest descent method may, however, be very slow to reachthat point


IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture...

Documents

Transcript of IE 5531: Engineering Optimization I - isye.umn.edu · IE 5531: Engineering Optimization I Lecture...