Conjugate Gradient

29
Conjugate Gradient

description

Conjugate Gradient. 0. History. Why iterate? Direct algorithms require O(n³) work. 1950: n=20 1965: n=200 1980: n=2000 1995: n=20000. dimensional increase: 10 3 computer hardware: 10 9. 0. History. If matrix problems could be solved in O(n²) time , matrices could be 30 bigger. - PowerPoint PPT Presentation

Transcript of Conjugate Gradient

Page 1: Conjugate Gradient

Conjugate Gradient

Page 2: Conjugate Gradient

0. History

• Why iterate?

• Direct algorithms require O(n³) work.

• 1950: n=20

• 1965: n=200

• 1980: n=2000

• 1995: n=20000

dimensional increase: 103

computer hardware: 109

Page 3: Conjugate Gradient

0. History

• If matrix problems could be solved in O(n²) time, matrices could be 30 bigger.

• There are direct algorithms that run in about O(n2.4) time, but their constant factors are to big for practicle use.

• For certain matrices, iterative methods have the potential to reduce computation time to O(m²).

Page 4: Conjugate Gradient

1. Introduction

• CG is the most popular method for solving large systems of linear equations

Ax = b.

• CG is an iterative method, suited for use with sparse matrices with certain properties.

• In practise, we generally don’t find dense matrices of a huge dimension, since the huge matrices often arise from discretisation of differential of integral equations.

Page 5: Conjugate Gradient

2. Notation

• Matrix: A, with components Aij

• Vector, n x 1 matrix: x, with components xi

• Linear equation: Ax=bwith components Σ Aij xj = bi

Page 6: Conjugate Gradient

2. Notation

• Transponation of a matirx: (AT)ij = Aji

• Inner product of two vectors: xTy = Σ xiyi

• If xTy = 0, then x and y are orthogonal

Page 7: Conjugate Gradient

3. Properties of A

• A has to be an n x n matrix.

• A has to be positive definite, xTAx > 0

• A has to be symmetric, AT = A

Page 8: Conjugate Gradient

4. Quadratic Forms

• A QF is a scalar quadratic function of a vector:

• Example:

f xx i

A x i b i

f xx i

0

f x12x T A x b T x c

Page 9: Conjugate Gradient

4. Quadratic Forms

• Gradient: Points to the greatest increase of f(x)

f ' x Ax b

Page 10: Conjugate Gradient

4. Quadratic Formspositive definite

xT A x > 0

negative definite

xT A x < 0

positive indefinite

xT A x ≥ 0

indefinite

Page 11: Conjugate Gradient

5. Steepest Descent

• Start at an arbitrary point and slide down to the bottom of the paraboloid.

• Steps x(1), x(2), … in the direction –f´(xi)

• Error e(i) = x(i) – x

• Residual r(i) = b – Ax(i)

r(i) = - Ae(i)

r(i) = - f`(x(i))

Page 12: Conjugate Gradient

5. Steepest Descent

• x(i+1) = x(i) + α r(i) , but how big is α?

• f`(x(i+1)) orthogonal to r(i)

search line α r(i)

r iT r i

r iT A r i

Page 13: Conjugate Gradient

5. Steepest Descent

r i b Ax i

i

r iT r i

r iT A r i

x i 1 x i i r i

The algorithm above requires two matrix multiplications per iteration. One can be eliminated by multiplying the last equation by –A.

Page 14: Conjugate Gradient

5. Steepest Descent

r i 1 r i i A r i

r i 1 r i

r iT r i

r iT A r i

A r i

This sequence is generated without any feedback of x(i). Therefore, floatingpoint roundoff errors may accumulate and the sequence could converge at some point near x. This effect can be avoided by periodically recomputing the correct residual using x(i).

Page 15: Conjugate Gradient

6. Eigenvectors

• v is an eigenvector of A, if a scalar λ so that A v = λ v

• λ is then called an eigenvalue.

• A symmetric n x n matrix always has n independent eigenvectors which are orthogonal.

• A positive definite matrix has positive eigenvalues.

Page 16: Conjugate Gradient

7. Convergence of SD

• Convergence of SD requires the error e(i) to vanish. To measure e(i), we use the A- norm:

• Some math now yields

e A e T A e 1 2

e i 1 A2 e i A

2 2 2 12 2 2

2 3 2

Page 17: Conjugate Gradient

7. Convergence of SD

max m in 1

Spectral condition number

.

An upper bound for ω is found by setting

.11

We therefore have instant convergence

if all the eigenvalues of A are the same.Id x b

Page 18: Conjugate Gradient

7. Convergence of SDlarge κ

small μ

large κ

large μ

small κ

small μ

small κ

large μ

Page 19: Conjugate Gradient

8. Conjugate Directions

• Steepest Descent often takes steps in the same direction as earlier steps.

• The solution is to take a set of A-orthogonal search directions d(0), d(1), … , d(n-1) and take exactly one step of the right length in each direction.

d iT A d j ij

Page 20: Conjugate Gradient

8. Conjugate Directions

A-orthogonal orthogonal

Page 21: Conjugate Gradient

8. Conjugate Directions

• Demanding dT(i) to be A-orthogonal on the

next error e(i+1), we get

.

• Generating search directions by Gram-Schmidt Conjugation. Problem: O (n³)

x i 1 x i i d i

i

d iT r i

d iT A d i

Page 22: Conjugate Gradient

8. Conjugate Directions

• CD chooses , so thatis minimized.

• The error term is therefore A-orthogonal to all the old search directions.

D i span d 0 , d 1 , ... , d i 1

e i e 0 D i e i A

0 d iT A e j d i

T r j i j

Page 23: Conjugate Gradient

9. Conjugate Gradient

• The residual is orthogonal to the previous search directions.

• Krylov subspace

D i span r 0 , r 1 , ... , r i 1

r i 1 = A e i 1

= A e i i d i

= r i i A d i

D i span r 0 , A r 0 , A2 r 0 , ... , A

i 1 r 0

Page 24: Conjugate Gradient

9. Conjugate Gradient

• Gram-Schmidt conjugation becomes easy, because r(i+1) is already A-orthogonal to all the previous search directions except d(i).

A D i D i 1r i 1 D i 1

r(i+1) is A-orthogonal to Di

d i 1 r i 1 i 1 d i i 1

r i 1T r i 1

r iR r i

Page 25: Conjugate Gradient

9. Conjugate Gradient

i 1

r i 1T r i 1

r iR r i

d i 1 r i 1 i 1 d i

d 0 r 0 b A x 0

i

r iT r i

d iT A d i

x i 1 x i i d i

r i 1 r i i A d i

Page 26: Conjugate Gradient

11. Preconditioning

• Improving the condition number of the matrix before the calculation. Example:

• Attempt to strech the quadratic form to make it more spherical.

• Many more sophisticated preconditioners have been developed and are nearly always used.

M 1 A x M 1 b

Page 27: Conjugate Gradient

12. Outlook

• CG can also be used to solve

• To solve non-linear Problems with CG, one has to make changes in the algorithm. There are several possibilities, and the best choice is still under research.

minx

Ax b 2.

Page 28: Conjugate Gradient

12. Outlook

In non-linear problems, there may be several local minima to which CG might converge.

It is therefore hard to determine the right step size.

Page 29: Conjugate Gradient

12. Outlook

Ax = b Ax = λ x

A = A*CG

(A pos. definite)Lanczos

A ≠ A* GMRES Arnoldi

- There are other algorithms in numerical linear algebra closely related to CG.

- They all use Krylov subspaces.