1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.

1

Modeling and Simulation:Exploring Dynamic System

Behaviour

Chapter9

Optimization

2

A Restricted Perspective• Considerations that follow are restricted to

the domain of CTDS

• Consequently we are able to circumvent the additional complexity that arises in the DEDS domain because of its inherent stochastic nature

3

The Problem Statement

An Extended CTDS Model• An essential constituent of any optimization

problem is a collection of real-valued parameters (denoted by the m-vector, p).

• These parameters are embedded within the CTDS model and hence its generic representation changes from:

x(t) = f(x(t), t)

to:

x(t) = f(x(t), t; p)

4

The Problem Statement (Cont.)

• Implicit in the notion of optimization is the goal of finding a “best” value for the parameter vector p (denoted by p*)

• This search is associated with a scalar real-valued criterion function, J=J(p) and the best value for p (i.e., p*) is one which yields an extreme value for J; i.e., either a maximum or a minimum (note that p* may not be unique)

• For definiteness we assume the latter (note however that the maximization of J(p) is the equivalent to the minimization of –J(p)

• Thus our goal is to find p* such that: J(p*)≤J(p) for all pεΦ

where Φ is a constraint set

5

The Constraint Set

• The constraint set, Φ, reflects the fact that not all real-valued m-vectors are necessarily permissible candidates for p.

• Φ represents a set of functional constraints. These may either implicitly or explicitly restrict the values of some of the components of p. A simple explicit example is:

p1 > 0

6

The Constraint Set (cont.)

• In general, Φ is a set of c3 functional constraints of the form:

Φj(x(t; p)) >0 for j = 1, 2, . . . , c1

Φj(x(t; p)) ≥ 0 for j = c1 + 1, c1 + 1, . . , c2

Φj(x(t; p)) = 0 for j = c2 + 1, c2 + 2, . . , c3.

7

The Constraint Set (cont.)

• Solving the constrained problem is necessarily a more challenging task

• A variety of techniques exist for dealing with the added complexity; one such technique is called the Penalty Function Method

• With this approach the solution to the constrained problem is obtained by solving a sequence of specially formulated unconstrained problems, but no “special” minimization procedure is introduced

• The discussion that follows focuses on the unconstrained problem (the set Φ is empty)

8

Some Typical formats for J(p)

0

( ( ; ))ft

t

g t dt x pJ

1

( ( ; ))s

jj

g t

x pJ

( (t ; ))fg x pJ

9

Evaluating the Criterion Function

• The most distinctive feature of the optimization problem within the CTDS context is the need to solve a set of differential equations each time a value of J is required.

• This has two immediate consequences:

a) computational overhead can be high

b) the precision with which the value of J is generated is subject to the quality of the differential eq’n solving process.

10

Unconstrained Minimization Methods

• A wide range of methods is available for the unconstrained function minimization problem

• These methods can be categorized in a variety of ways

• The most fundamental is in terms of their need for gradient information. The need for gradient information introduces an additional level of complexity in the CTDS context

11

The Nelder-Mead Simplex Method

• The simplex method is a member of the class of methods that do not require gradient information.

• These methods are based on heuristic arguments; i.e., they lack a formal (theoretical) foundation.

12

General Concept

• The process begins with a collection of (m+1) points (i.e. a simplex) within the m-dimension parameter space

• Through a set of operations, the simplex is “moved” around in the parameter space and is ultimately contracted in size as it surrounds the minimizing argument, p*

• When m=2, the simplex is a triangle.

13

The Simplex Procedure

• Choose p0 an initial “guess” at the minimizing argument

• Determine m additional points based on p0 to create an initial simplex {p0, p1, p2 ---- pm}

• begin step: evaluate J at each of the points in the simplex

• Let pL be the vertex that yields the largest value for J, pG be the vertex that yields the next largest value for J and pS be the vertex that yields the smallest value for J. Correspondingly, let JL = J(pL), JG = J(pG) and JS = J(pS).

14

The Simplex Procedure (cont)

• The centroid of the simplex with pL excluded is given by:

• A reflection step is carried out by “reflecting” the worst point pL about the centroid, to produce a new point pR where:

pR = pC + α (pC – pL)

where α > 0 and is typically =1

m

1

m

c k L

k 0

1[( ) ]

mpp p

15

The Simplex Procedure (cont)a reflection step:

pS

pL

pG

pR

pC

16


• One of three possible actions now take place depending on the value of JR = J(pR):

• (i) If JG > JR > JS, then pR replaces pL and the step is completed.

• Return to begin step (in slide 13).

17


• (ii) If JR < JS then a new “least point” has been uncovered and it is possible that further movement in the same direction could be advantageous. Consequently an expansion step is carried out to produce pE where:

pE = pC + γ (pR – pC)

where γ > 1 and is typically 2

18


pS

pL

pG

pR

pC

pE

• An expansion step

19

The Simplex Procedure (cont)an expansion step:

• If JE = J(pE) < JS then pL is replaced with pE; otherwise pL is replaced with pR. In either case, the step is completed.

• Return to begin step (in slide 13)

20

The Simplex Procedure (cont)a contraction step:

• (iii) If JR > JG then a contraction step is made to produce the point pD where:

(where 0 < β <1 and is typically 0.5) Here is either pR or pL depending on whether JR is smaller or larger than JL. If JD = J(pD) < JG then the step ends (return to begin step – slide 13). Otherwise the simplex is shrunk about pS by halving the distances of all vertices from this point and then the step ends (return to begin step – slide 13).

D C C ( )p p p p p

21


• TERMINATIONThere are two possible ways for terminating the process:(1) when the separation among the points in the current simplex has become sufficiently small(2) when the variation among the criterion function values at the vertices of the current simplex is within a prescribed tolerance

In either case, the vertex pS is taken to be p*

22

The Conjugate Gradient Method• The conjugate gradient (CG) method is a member of the

class of gradient dependent methods• The CG method also has the property of quadratic

convergence. This means that the minimizing argument of a quadratic function of dimension m will be located in no more than m iterations of the method

• This property is significant because an arbitrary function, J(p), generally has a quadratic shape around its minimizing argument

• Recall that the gradient of J(p) at the argument is the m-vector of partial derivatives of J taken with respective to p1, p1, ---- pm.and evaluated at . It is denoted by Jp( ) . Recall also that Jp(p*) = 0.

p

p p

23

The Line Search Problem• A line search problem needs to be solved at each step of

the CG process• Suppose is a point in m-space and u is a given m-vector

of unit length which we interpret as a direction• The m-vector ( + α u) can be regarded as a point in m-

space reached by moving a distance α away from in the direction u

• The line search problem is the problem of finding a value α* for the scalar α which yields a minimum value for J( + α u).

• This is a one-dimensional minimization problem. The challenge of obtaining its precise solution in an efficient manner should not be underestimated!!

p

p

p

p

24

The CG Method (cont)

• The CG method carries out a sequence of steps which generate a sequence of estimates for p*.

• Two tasks are carried out at each step.(1) Generate a new estimate for p* via:

pk = pk-1 + α* rk-1 where:

(In other words, pk is the result of a line search from pk-1 in the direction rk-1)

2) Generate a new search direction via:

rk = – Jp(pk) + βk-1 rk-1

k k 1 k 1J( ) min J( )p p r

25


Initial step: to initiate the process there are

two requirements:

(a) p0; this is a “best guess” of the p*

(b) r0 ; this is an initial search direction and is set equal to -Jp(p0)

26


• A variety of possible assignments are available for βk-1. These have been proposed by different authors.

• The original was proposed by Fletcher and Reeves (1964):

kp

k 1 k 1p

( )

( )

J p

J p

27

Acquiring the Gradient Vector

• Within the CTDS context under consideration, the explicit determination of the gradient of the criterion function; i.e., Jp(ω), in not feasible.

• A finite difference approximation of the components of Jp(ω) can nevertheless be adequate for the implementation of the CG method

28

Acquiring the Gradient Vector (cont)

• Recall that …

• A finite difference approximation is:

• Here Δ is a suitably small positive scalar and ek is the kth column of the m x m identity matrix

)()(lim

0

ωeω

ωp

JJ

p

J k

k

)()( ωeω

ωp

JJ

p

J k

k

29

Acquiring the Gradient Vector (cont)

• The construction of the m-dimensional gradient vector requires a value for J(ω + Δek) for k = 1, 2, --- m. Consequently the underlying differential equations of the model must be solved m times.

• Computational overhead can become an issue!! • Note also that considerable care must be taken

to ensure that the value of Δ is not unreasonably small since there is a possibility of results becoming hopelessly corrupted by numerical noise.

1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.

Documents

Transcript of 1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.