Multi-Variable Optimization Methods

download Multi-Variable Optimization Methods

of 21

Transcript of Multi-Variable Optimization Methods

  • 8/11/2019 Multi-Variable Optimization Methods

    1/21

  • 8/11/2019 Multi-Variable Optimization Methods

    2/21

    Higher derivatives of multi-variable functions are dened as in the single-variable case, but notethat the number of gradient component increases by a factor of n for each differentiation.

    While the gradient of a function of n variables is an n -vector, the second derivative of an

    n -variable function is dened by n2

    partial derivatives of the n rst partial derivatives withrespect to th n variables.

    2f x i x j

    , i = j ; 2f

    x 2i, i = j (2)

    If the partial derivatives f /x i , f /x j and 2f/x i x j are continuous, then 2f/x i x j exists and 2f/x i x j = 2f/x j x i . These second order partial derivativescan be represented by a square symmetric matrix called the Hessian matrix ,

    2f (x ) H (x ) 2664

    2 f 2x 1

    2 f

    x i x n... ... 2 f

    x1

    xn

    2 f

    2

    x n

    3775

    (3)

    If f is a quadratic function, the Hessian of f is constant and f can be expressed as

    f (x ) = 12

    x T Hx + gT x + (4)

    2

  • 8/11/2019 Multi-Variable Optimization Methods

    3/21

    As in the single-variable case the optimality conditions can be derived from the Taylor-seriesexpansion of f about x ,

    f (x + p ) = f (x ) + p T g(x ) + 1

    2

    2 pT H (x + p ) p, (5)

    where 0 1, is a scalar, and p is an n -vector.

    For x to be a local minimum, then for any vector p, there must be a nite such thatf (x + p ) f (x ) , i.e. there is a neighborhood in which this condition holds. If thiscondition is satised, then f (x + p ) f (x ) 0 and the rst and second order terms inthe Taylor-series expansion must be greater of equal to zero.

    As in the single variable case, and for the same reason, we start by considering the rst orderterms. Since p is an arbitrary vector and can be either positive or negative, then everycomponent of the gradient vector g(x ) must be zero.

    Now we have to consider the second order term, 2 pT H (x + p ) p. For this term to be

    non-negative, H (x + p ) has to be positive semi-denite, and by continuity, the Hessian atthe optimum, H (x ) must also be positive semi-denite.

    3

  • 8/11/2019 Multi-Variable Optimization Methods

    4/21

  • 8/11/2019 Multi-Variable Optimization Methods

    5/21

    3.2 General Algorithm for Smooth Functions

    All algorithms for unconstrained gradient-based optimization can be described as follows. Westart with k = 0 and an estimate of x , xk .

    1. Test for convergence. If the conditions for convergence are satised, then we can stop andx k is the solution. Else, go to Step 2.

    2. Compute a search direction. Compute the vector pk that denes the direction in n -spacealong which we will search.

    3. Compute the step length. Find a positive scalar, k

    such that f (xk +

    k p

    k) < f (x

    k) .

    4. Update the design variables. Set xk+1 = xk + k pk , k = k + 1 and go back to 1.

    x k+1 = xk + k pk

    |{z} x k

    (8)

    5

  • 8/11/2019 Multi-Variable Optimization Methods

    6/21

    3.3 Steepest Descent Method

    The steepest descent method uses the negative of the gradient vector at each point as thesearch direction for each iteration. As mentioned previously, the gradient vector is orthogonal tothe plane tangent to the isosurfaces of the function.

    The gradient vector at a point, g(x k ) , is also the direction of maximum rate of change(maximum increase) of the function at that point. This rate of change is given by the norm,

    g(x k ) .

    Steepest descent algorithm:

    1. Select starting point x0, and convergence parameters g , a and r .2. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise, compute the

    normalized search direction to pk = g(x k ) / g(x k ) .3. Find the positive step length k such that f (x k + p k ) is minimized.

    4. Update the current point, xk+1 = xk + p k .5. Evaluate f (x k+1 ) . If the condition | f (x k+1 ) f (x k ) | a + r | f (x k ) | is satised for

    two successive iterations then stop. Otherwise, set k = k + 1 , xk+1 = xk + 1 and returnto step 2.

    6

  • 8/11/2019 Multi-Variable Optimization Methods

    7/21

    Note that the steepest descent direction at each iteration is orthogonal to the previous one, i.e.,gT (x k +1 )g(x k ) = 0 . Therefore the method zigzags in the design space and is ratherinefficient.

    The algorithm is guaranteed to converge, but it may take an innite number of iterations. Therate of convergence is linear.

    Usually, a substantial decrease is observed in the rst few iterations, but the method is veryslow after that.

    7

  • 8/11/2019 Multi-Variable Optimization Methods

    8/21

    3.4 Conjugate Gradient Method

    A small modication to the steepest descent method takes into account the history of thegradients to move more directly towards the optimum.

    1. Select starting point x0, and convergence parameters g , a and r .2. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise, go to step 5.3. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise continue.4. Compute the new conjugate gradient direction pk = g(x k ) + k pk 1, where

    = g

    kgk 1 2

    = gT

    k g

    kgT k 1gk 1.

    5. Find the positive step length k such that f (x k + p k ) is minimized.6. Update the current point, xk+1 = xk + p k .7. Evaluate f (x k+1 ) . If the condition | f (x k+1 ) f (x k ) | a + r | f (x k ) | is satised for

    two successive iterations then stop. Otherwise, set k = k + 1 , xk+1 = xk + 1 and return

    to step 3.

    Usually, a restart is performed every n iterations for computational stability, i.e. we start with asteepest descent direction.

    8

  • 8/11/2019 Multi-Variable Optimization Methods

    9/21

  • 8/11/2019 Multi-Variable Optimization Methods

    10/21

  • 8/11/2019 Multi-Variable Optimization Methods

    11/21

    3.5.1 Modied Newtons Method

    A small modication to Newtons method is to perform a line search along the Newtondirection, rather than accepting the step size that would minimize the quadratic model.

    1. Select starting point x0, and convergence parameter g .2. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise, continue.3. Compute H (x k ) 2f (x k ) and the search direction, pk = H 1gk .4. Find the positive step length k such that f (x k + k pk ) is minimized. (Start with

    k = 1 ).

    5. Update the current point, xk+1 = xk + k pk and return to step 2.

    Although this modication increases the probability that f (x k + pk ) < f (x k ) , it stillvulnerable to the problem of having an Hessian that is not positive denite and has all the otherdisadvantages of the pure Newtons method.

    11

  • 8/11/2019 Multi-Variable Optimization Methods

    12/21

    3.6 Quasi-Newton Methods

    This class of methods uses rst order information only, but build second order information anapproximate Hessian based on the sequence of function values and gradients from previousiterations. The method also forces H to be symmetric and positive denite, greatly improvingits convergence properties.

    When using quasi-Newton methods, we usually start with the Hessian initialized to the identitymatrix and then update it at each iteration. Since what we actually need is the inverse of theHessian, we will work with V k H 1k . The update at each iteration is written as V k and is

    added to the current one, V k +1 = V k + V k . (11)

    Let sk be the step taken from xk , and consider the Taylor-series expansion of the gradientfunction about xk ,

    g(x k + s k ) = gk + H k s k + (12)Truncating this series and setting the variation of the gradient to yk = g(x k + s k ) gk yields

    H k s k = yk . (13)

    Then, the new approximate inverse of the Hessian, V k+1 must satisfy the quasi-Newtoncondition ,

    V k+1 yk = sk . (14)

    12

  • 8/11/2019 Multi-Variable Optimization Methods

    13/21

    3.6.1 DavidonFletcherPowell (DFP) Method

    One of the rst quasi-Newton methods was devised by Davidon (in 1959) and modied byFletcher and Powell (1963).

    1. Select starting point x0, and convergence parameter g . Set k = 0 and H 0 = I .2. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise, continue.3. Compute the search direction, pk = V k gk .4. Find the positive step length k such that f (x k + k pk ) is minimized. (Start with

    k = 1 ).5. Update the current point, xk+1 = xk + k pk , set sk = k pk , and compute the change inthe gradient, yk = gk+1 gk .

    6. Update V k+1 by computing

    A k = V k yk yT k V k

    yT k V k yk

    B k = sk s T k

    sT k yk

    V k+1 = V k A k + B k7. Set k = k + 1 and return to step 2.

    13

  • 8/11/2019 Multi-Variable Optimization Methods

    14/21

    3.6.2 BroydenFletcherGoldfarbShanno (BFGS) Method

    The BFGS method is also a quasi-Newton method with a different updating formula

    V k +1 = "I sk y T k

    s T k yk#V k "I sk y T k

    s T k yk#+ sk s T k

    s T k yk(15)

    The relative performance between the DFP and BFGS methods is problem dependent.

    14

  • 8/11/2019 Multi-Variable Optimization Methods

    15/21

    3.7 Trust Region Methods

    Trust region, or restricted-step methods are a different approach to resolving the weaknessesof the pure form of Newtons method, arising from an Hessian that is not positive denite or ahighly nonlinear function.

    One way to interpret these problems is to say that they arise from the fact that we are steppingoutside a the region for which the quadratic approximation is reasonable. Thus we canovercome this difficulties by minimizing the quadratic function within a region around xk withinwhich we trust the quadratic model.

    15

  • 8/11/2019 Multi-Variable Optimization Methods

    16/21

    1. Select starting point x0, a convergence parameter g and the initial size of the trust region,h 0.

    2. Compute g(x k ) f (x k ) . If g(x k ) g then stop. Otherwise, continue.

    3. Compute H (x k ) 2

    f (x k ) and solve the quadratic subproblem

    minimize q(s k ) = f (x k ) + g(x k ) T s k + 12 sT k H (x k )s k (16)

    w.r.t. sk (17)

    s.t. h k s k h k , i = 1 , . . . , n (18)

    4. Evaluate f (x k + s k ) and compute the ratio that measures the accuracy of the quadraticmodel,

    r k = f q

    = f (x k ) f (x k + s k )

    f (x k ) q(s k ).

    5. Compute the size for the new trust region as follows:

    h k +1 = s k

    4 if r k < 0.25 , (19)

    h k +1 = 2 h k if r k < 0.25 and h k = s k , (20)

    hk+1

    = hk

    otherwise. (21)

    16

  • 8/11/2019 Multi-Variable Optimization Methods

    17/21

  • 8/11/2019 Multi-Variable Optimization Methods

    18/21

    3.8 Convergence Characteristics for a Quadratic Function

    18

  • 8/11/2019 Multi-Variable Optimization Methods

    19/21

    3.9 Convergence Characteristics for the Rosenbrock Function

    19

  • 8/11/2019 Multi-Variable Optimization Methods

    20/21

  • 8/11/2019 Multi-Variable Optimization Methods

    21/21

    References

    [1] A. D. Belegundu and T. R. Chandrupatla. Optimization Concepts and Applications inEngineering , chapter 3. Prentice Hall, 1999.

    [2] P. E. Gill, W. Murray, and M. H. Wright. Practical Optimization , chapter 4. AcademicPress, 1981.

    [3] C. Onwubiko. Introduction to Engineering Design Optimization , chapter 4. Prentice Hall,2000.

    21