Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be...

41
Conjugate Gradient Descent

Transcript of Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be...

Page 1: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

Conjugate Gradient Descent

Page 2: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

2

Conjugate Direction Methods

Page 3: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

3

Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses gradient) and Newton’s method (second-order method that uses Hessian as well).

Motivation:

!  steepest descent is slow. Goal: Accelerate it!

! Newton method is fast… BUT: we need to calculate the inverse of the Hessian

matrix…

Something between steepest descent and Newton method?

Page 4: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

4

Conjugate Direction Methods Goal: •  Accelerate the convergence rate of steepest descent •  while avoiding the high computational cost of Newton’s

method

Originally developed for solving the quadratic problem:

Equivalently, our goal is to solve:

Conjugate direction methods can solve this problem at most n iterations (usually for large n less is enough)

Page 5: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

5

Conjugate Direction Methods

!  algorithm for the numerical solution of linear equations, whose matrix Q is symmetric and positive-definite.

!  An iterative method, so it can be applied to systems that are too large to be handled by direct methods (such as the Cholesky decomposition.)

!  Algorithm for seeking minima of nonlinear equations.

Page 6: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

6

Numerical Experiments

A comparison of

* gradient descent with optimal step size (in green) and

* conjugate vector (in red)

for minimizing a quadratic function.

Conjugate gradient converges in at most n steps (here n=2).

Page 7: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

7

Conjugate directions

!  In the applications that we consider, the matrix Q will be positive definite but this is not inherent in the basic definition.

!  If Q = 0, any two vectors are conjugate.

!  if Q = I, conjugacy is equivalent to the usual notion of orthogonality.

Definition [Q-conjugate directions]

Page 8: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

8

Linear independence lemma

Lemma [Linear Independence]

Proof: [Proof by contradiction]

Page 9: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

9

Why is Q-conjugacy useful?

Page 10: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

10

Quadratic problem

Goal:

the unique solution to this problem is also the unique solution to the linear equation:

Page 11: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

11

Importance of Q-conjugancy

Therefore,

Standard orthogonality is not enough anymore,

We need to use Q-conjugacy.

Page 12: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

12

Importance of Q-conjugancy

No need to do matrix inversion! We only need to calculate inner products.

This can be generalized further such a way that the starting point of the iteration can be arbitrary x0

n - 1 n - 1

Page 13: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

13

Conjugate Direction Theorem

Theorem [Conjugate Direction Theorem]

In the previous slide we had

No need to do matrix inversion! We only need to calculate inner products.

[update rule]

[gradient of f]

Page 14: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

14

Proof

Therefore, it is enough to prove that with these αk values we have

Page 15: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

15

Proof

Therefore,

We already know

Q.E.D.

Page 16: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

16

Another motivation for Q-conjugacy

Goal:

Therefore,

n separate 1-dimensional optimization problems!

Page 17: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

17

Expanding Subspace Theorem

Page 18: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

18

Expanding Subspace Theorem

Page 19: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

19

Expanding Subspace Theorem Theorem [Expanding Subspace Theorem]

Page 20: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

20

Expanding Subspace Theorem Proof

Page 21: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

21

Proof

By definition,

Therefore,

Page 22: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

22

Proof

We have proved

By definition,

Therefore,

Page 23: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

23

Proof

Since

[By induction assumption],

Therefore,

Q.E.D.

[We have proved this]

Page 24: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

24

Corollary of Exp. Subs. Theorem Corollary

Page 25: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

25

THE CONJUGATE GRADIENT METHOD

Page 26: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

26

THE CONJUGATE GRADIENT METHOD

!  The conjugate gradient method is a conjugate direction method

!  Selects the successive direction vectors as a conjugate version of the successive gradients obtained as the method progresses.

!  The conjugate directions are not specified beforehand, but rather are determined sequentially at each step of the iteration.

Given d0,…, dn-1, we already have an update rule for αk

How should we choose vectors d0,…, dn-1?

The conjugate gradient method

Page 27: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

27

THE CONJUGATE GRADIENT METHOD

Advantages

!  Simple update rule

!  the directions are based on the gradients, therefore the process makes good uniform progress toward the solution at every step.

For arbitrary sequences of conjugate directions the progress may be slight until the final few steps

Page 28: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

28

Conjugate Gradient Algorithm

!  The CGA is only slightly more complicated to implement than the method of steepest descent but converges in a finite number of steps on quadratic problems.

!  In contrast to Newton method, there is no need for matrix inversion.

Conjugate Gradient Algorithm

Page 29: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

29

Conjugate Gradient Theorem

To verify that the algorithm is a conjugate direction algorithm, all we need is to verify that the vectors d0,…,dk are Q-orthogonal.

Theorem [Conjugate Gradient Theorem]

The conjugate gradient algorithm is a conjugate direction method.

a)

b)

c)

d)

e)

Only αk needs matrix Q in the algorithm!

Page 30: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

30

Proofs

Page 31: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

31

EXTENSION TO NONQUADRATIC PROBLEMS

Page 32: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

32

EXTENSION TO NONQUADRATIC PROBLEMS

Do quadratic approximation

This is similar to Newton’s method. [f is approximated by a quadratic function]

!  When applied to nonquadratic problems, conjugate gradient methods will not usually terminate within n steps.

!  After n steps, we can restart the process from this point and run the algorithm for another n steps…

Goal:

Page 33: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

33

Conjugate Gradient Algorithm for nonquadratic functions

Step 1

Step 2

Step 3

Page 34: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

34

Properties of CGA

!  An attractive feature of the algorithm is that, just as in the pure form of Newton’s method, no line searching is required at any stage.

!  The algorithm converges in a finite number of steps for a quadratic problem.

!  The undesirable features are that Hessian matrix must be evaluated at each point.

Page 35: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

35

LINE SEARCH METHODS

Page 36: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

36

Step 1

Step 2

Step 3

Fletcher–Reeves method

Page 37: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

37

Line search method

Hessian is not used in the algorithm

In the quadratic case it is identical to the original conjugate direction algorithm

Fletcher–Reeves method

Page 38: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

38

Polak–Ribiere method

Again this leads to a value identical to the standard formula in the quadratic case.

Experimental evidence seems to favor the Polak–Ribiere method over other methods of this general type.

Same as Fletcher–Reeves method, BUT:

Page 39: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

39

Convergence rate Under some conditions the line search method is globally convergent.

Under some conditions, the rate is

[since one complete n step cycle solves a quadratic problem similarly

To the Newton method]

Page 40: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

40

Acceleration

Conjugate gradient method attempts to accelerate gradient descent by building in momentum.

Recall:

First one implies:

Substituting last two into first one:

dk�1 =xk � xk�1

↵k�1

= xk � ↵kgk +↵k�k�1

↵k�1(xk � xk�1)

xk+1 = xk � ↵kgk + ↵k�k�1dk�1

Momentum term

Page 41: Conjugate Gradient Descent · 2017-09-25 · 3 Motivation Conjugate direction methods can be regarded as being between the method of steepest descent (first-order method that uses

41

Summary

!  Conjugate Direction Methods

- conjugate directions

!  Minimizing quadratic functions

!  Conjugate Gradient Methods for nonquadratic functions

- Line search methods

* Fletcher–Reeves

* Polak–Ribiere