Steepest Gradient Method Conjugate Gradient Method

13
Steepest Gradient Method Conjugate Gradient Method [ ] Hoonyoung Jeong Department of Energy Resources Engineering Seoul National University

Transcript of Steepest Gradient Method Conjugate Gradient Method

Steepest Gradient Method

Conjugate Gradient Method[ ]Hoonyoung Jeong

Department of Energy Resources EngineeringSeoul National University

Review

โ€ข What is the formula of โˆ‡๐‘“๐‘“?โ€ข What is the physical meaning of โˆ‡๐‘“๐‘“?โ€ข What is the formula of a Jacobian matrix?โ€ข How is a Jacobian matrix utilized?โ€ข What is the formula of a Hessian matrix?โ€ข How is a Hessian matrix utilized?

Key Questions

โ€ข Explain the steepest gradient method intuitivelyโ€ข Explain a step sizeโ€ข Present stop criteriaโ€ข What is the disadvantage of the steepest gradient method?โ€ข Explain the conjugate gradient method by comparing with the steepest

gradient method

Comparison Intuitive Solution to Steepest Gradient Method

The example is mathematically expressed as: ๐‘“๐‘“(x): elevation at location x = ๐‘ฅ๐‘ฅ ๐‘ฆ๐‘ฆ Find x maximizing ๐‘“๐‘“(x)

1) Find the steepest ascent direction by taking one step to each direction at the current locationSteepest ascent direction= Tangential direction at the current location= ๐›ป๐›ป๐‘“๐‘“ x๐‘˜๐‘˜ = ๐œ•๐œ•๐‘“๐‘“

๐œ•๐œ•๐‘ฅ๐‘ฅ๐œ•๐œ•๐‘“๐‘“๐œ•๐œ•๐‘ฆ๐‘ฆ

= Direction locally maximizing f(x) or Locally steepest ascent direction

2) Take steps along the steepest directionx๐‘˜๐‘˜+1 = x๐‘˜๐‘˜ + ๐œ†๐œ†๐‘˜๐‘˜๐›ป๐›ป๐‘“๐‘“ x๐‘˜๐‘˜

๐‘˜๐‘˜: iteration๐œ†๐œ†๐‘˜๐‘˜: how many steps to the steepest direction

3) Stop if you meet uphill

4) Repeat 1) โ€“ 3) until you reach a peak

4

x๐‘˜๐‘˜

x๐‘˜๐‘˜+1

Direciton:Length: ๐œ†๐œ†๐‘˜๐‘˜ ๐›ป๐›ป๐‘“๐‘“ x๐‘˜๐‘˜

x๐‘˜๐‘˜+1 โˆ’ x๐‘˜๐‘˜ = ๐œ†๐œ†๐‘˜๐‘˜๐›ป๐›ป๐‘“๐‘“ x๐‘˜๐‘˜

Example of Gradient Descent Method (1)

โ€ข ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 = ๐‘ฅ๐‘ฅ๐‘˜๐‘˜ โˆ’ ๐œ†๐œ†๐‘˜๐‘˜๐›ป๐›ป๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜)โ€ข ๐‘“๐‘“ = ๐‘ฅ๐‘ฅ4

โ€ข ๐›ป๐›ป๐‘“๐‘“โ€ข ๐‘ฅ๐‘ฅ0 = 2โ€ข ๐œ†๐œ†๐‘˜๐‘˜ =0.02 (constant)โ€ข ๐‘ฅ๐‘ฅ1 = ๐‘ฅ๐‘ฅ0 โˆ’ ๐œ†๐œ†0๐›ป๐›ป๐‘“๐‘“(๐‘ฅ๐‘ฅ0)=?โ€ข ๐‘ฅ๐‘ฅ2 = ๐‘ฅ๐‘ฅ1 โˆ’ ๐œ†๐œ†1๐›ป๐›ป๐‘“๐‘“(๐‘ฅ๐‘ฅ1)=?โ€ข ๐‘ฅ๐‘ฅ3 = ๐‘ฅ๐‘ฅ2 โˆ’ ๐œ†๐œ†2๐›ป๐›ป๐‘“๐‘“(๐‘ฅ๐‘ฅ2)=?โ€ข Repeat for ๐œ†๐œ†๐‘˜๐‘˜ =0.1 (constant)

Example of Gradient Descent Method (2)

โ€ข See gradient_descent_method.mlx

Disadvantage of Gradient Descent Method

โ€ข Slow near local minima or maximaThe 2-norm of a gradient vector

becomes small near local minima or maxima

Step Size

โ€ข Step length / Learning rate called in machine learningโ€ข What are the advantages of large and small step sizes?Large: fastSmall: stable

โ€ข What are the disadvantages of large and small step sizes?Large: unstableSmall: slow

โ€ข Step sizeShould be appropriateBut, different for different problems

How Determine a Step Size?โ€ข Line search methodsBacktracking line search

Golden section searchSecant method

โ€ข No implementation, No need to know detailsWhy? Matlab already has good line search methods

Try a large step size (๐œ†๐œ†0 )

๐œ†๐œ†1=๐œ†๐œ†0/2๐œ†๐œ†2=๐œ†๐œ†1/2

The convergence criteria is satisfied

๐‘ฅ๐‘ฅ๐‘˜๐‘˜๐›ป๐›ป๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜)

๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1

Stop (Convergence) Criteria

โ€ข 2-norm of โˆ‡๐‘“๐‘“ โˆ‡๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜) < ๐œ–๐œ– โˆ‡๐‘“๐‘“ , ex) 1e-5

โ€ข Relative change of ๐‘“๐‘“

๐‘“๐‘“ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1 โˆ’๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜)๐‘“๐‘“(๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1)

< ๐œ–๐œ–๐‘“๐‘“, ex) 1e-5

โ€ข Relative change of ๐‘ฅ๐‘ฅ๐‘š๐‘š๐‘š๐‘š๐‘ฅ๐‘ฅ ๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1,๐‘–๐‘–โˆ’๐‘ฅ๐‘ฅ๐‘˜๐‘˜,๐‘–๐‘–

๐‘ฅ๐‘ฅ๐‘˜๐‘˜+1,๐‘–๐‘–< ๐œ–๐œ–๐‘ฅ๐‘ฅ ๐‘“๐‘“๐‘“๐‘“๐‘“๐‘“ ๐‘–๐‘– = 1, โ€ฆ ,๐‘๐‘๐‘ฅ๐‘ฅ , ex) 1e-10

โ€ข Maximum number of iterationsโ€ข Maximum number of ๐‘“๐‘“ evaluations

Conjugate Gradient Method

โ€ข A modification of the gradient descent methodโ€ข Add a scaled direction to the search directionโ€ข The scaled direction is computed using the previous search direction, a

nd the current and previous gradientsโ€ข More stable convergence because the previous search direction is cons

idered (less oscillations)

SGM vs. CGM

Example of SGM and CGM

โ€ข Manual calculationExample_of_SGM_and_CGM.pdf

โ€ข MATLABExample_of_SGM_and_CGM.mlx