Conjugate Gradient

0. History

• Why iterate?

• Direct algorithms require O(n³) work.

• 1950: n=20

• 1965: n=200

• 1980: n=2000

• 1995: n=20000

dimensional increase: 103

computer hardware: 109

0. History

• If matrix problems could be solved in O(n²) time, matrices could be 30 bigger.

• There are direct algorithms that run in about O(n2.4) time, but their constant factors are to big for practicle use.

• For certain matrices, iterative methods have the potential to reduce computation time to O(m²).

1. Introduction

• CG is the most popular method for solving large systems of linear equations

Ax = b.

• CG is an iterative method, suited for use with sparse matrices with certain properties.

• In practise, we generally don’t find dense matrices of a huge dimension, since the huge matrices often arise from discretisation of differential of integral equations.

2. Notation

• Matrix: A, with components Aij

• Vector, n x 1 matrix: x, with components xi

• Linear equation: Ax=bwith components Σ Aij xj = bi

2. Notation

• Transponation of a matirx: (AT)ij = Aji

• Inner product of two vectors: xTy = Σ xiyi

• If xTy = 0, then x and y are orthogonal

3. Properties of A

• A has to be an n x n matrix.

• A has to be positive definite, xTAx > 0

• A has to be symmetric, AT = A

4. Quadratic Forms

• A QF is a scalar quadratic function of a vector:

• Example:

f xx i

A x i b i

f xx i

0

f x12x T A x b T x c

4. Quadratic Forms

• Gradient: Points to the greatest increase of f(x)

f ' x Ax b

4. Quadratic Formspositive definite

xT A x > 0

negative definite

xT A x < 0

positive indefinite

xT A x ≥ 0

indefinite

5. Steepest Descent

• Start at an arbitrary point and slide down to the bottom of the paraboloid.

• Steps x(1), x(2), … in the direction –f´(xi)

• Error e(i) = x(i) – x

• Residual r(i) = b – Ax(i)

r(i) = - Ae(i)

r(i) = - f`(x(i))

5. Steepest Descent

• x(i+1) = x(i) + α r(i) , but how big is α?

• f`(x(i+1)) orthogonal to r(i)

search line α r(i)

r iT r i

r iT A r i

5. Steepest Descent

r i b Ax i

i

r iT r i

r iT A r i

x i 1 x i i r i

The algorithm above requires two matrix multiplications per iteration. One can be eliminated by multiplying the last equation by –A.

5. Steepest Descent

r i 1 r i i A r i

r i 1 r i

r iT r i

r iT A r i

A r i

This sequence is generated without any feedback of x(i). Therefore, floatingpoint roundoff errors may accumulate and the sequence could converge at some point near x. This effect can be avoided by periodically recomputing the correct residual using x(i).

6. Eigenvectors

• v is an eigenvector of A, if a scalar λ so that A v = λ v

• λ is then called an eigenvalue.

• A symmetric n x n matrix always has n independent eigenvectors which are orthogonal.

• A positive definite matrix has positive eigenvalues.

7. Convergence of SD

• Convergence of SD requires the error e(i) to vanish. To measure e(i), we use the A- norm:

• Some math now yields

e A e T A e 1 2

e i 1 A2 e i A

2 2 2 12 2 2

2 3 2

7. Convergence of SD

max m in 1

Spectral condition number

.

An upper bound for ω is found by setting

.11

We therefore have instant convergence

if all the eigenvalues of A are the same.Id x b

7. Convergence of SDlarge κ

small μ

large κ

large μ

small κ

small μ

small κ

large μ

8. Conjugate Directions

• Steepest Descent often takes steps in the same direction as earlier steps.

• The solution is to take a set of A-orthogonal search directions d(0), d(1), … , d(n-1) and take exactly one step of the right length in each direction.

d iT A d j ij


A-orthogonal orthogonal


• Demanding dT(i) to be A-orthogonal on the

next error e(i+1), we get

.

• Generating search directions by Gram-Schmidt Conjugation. Problem: O (n³)

x i 1 x i i d i

i

d iT r i

d iT A d i


• CD chooses , so thatis minimized.

• The error term is therefore A-orthogonal to all the old search directions.

D i span d 0 , d 1 , ... , d i 1

e i e 0 D i e i A

0 d iT A e j d i

T r j i j

9. Conjugate Gradient

• The residual is orthogonal to the previous search directions.

• Krylov subspace

D i span r 0 , r 1 , ... , r i 1

r i 1 = A e i 1

= A e i i d i

= r i i A d i

D i span r 0 , A r 0 , A2 r 0 , ... , A

i 1 r 0


• Gram-Schmidt conjugation becomes easy, because r(i+1) is already A-orthogonal to all the previous search directions except d(i).

A D i D i 1r i 1 D i 1

r(i+1) is A-orthogonal to Di

d i 1 r i 1 i 1 d i i 1

r i 1T r i 1

r iR r i


i 1

r i 1T r i 1

r iR r i

d i 1 r i 1 i 1 d i

d 0 r 0 b A x 0

i

r iT r i

d iT A d i

x i 1 x i i d i

r i 1 r i i A d i

11. Preconditioning

• Improving the condition number of the matrix before the calculation. Example:

• Attempt to strech the quadratic form to make it more spherical.

• Many more sophisticated preconditioners have been developed and are nearly always used.

M 1 A x M 1 b

12. Outlook

• CG can also be used to solve

• To solve non-linear Problems with CG, one has to make changes in the algorithm. There are several possibilities, and the best choice is still under research.

minx

Ax b 2.

12. Outlook

In non-linear problems, there may be several local minima to which CG might converge.

It is therefore hard to determine the right step size.

12. Outlook

Ax = b Ax = λ x

A = A*CG

(A pos. definite)Lanczos

A ≠ A* GMRES Arnoldi

- There are other algorithms in numerical linear algebra closely related to CG.

- They all use Krylov subspaces.

Conjugate Gradient

Documents

Transcript of Conjugate Gradient