Conjugate Gradient

0. History

• Why iterate?

• Direct algorithms require O(n³) work.

• 1950: n=20

• 1965: n=200

• 1980: n=2000

• 1995: n=20000

dimensional increase: 103

computer hardware: 109

0. History

• If matrix problems could be solved in O(n²) time, matrices could be 30 bigger.

• There are direct algorithms that run in about O(n2.4) time, but their constant factors are to big for practicle use.

• For certain matrices, iterative methods have the potential to reduce computation time to O(m²).

1. Introduction

• CG is the most popular method for solving large systems of linear equations

Ax = b.

• CG is an iterative method, suited for use with sparse matrices with certain properties.

• In practise, we generally don’t find dense matrices of a huge dimension, since the huge matrices often arise from discretisation of differential of integral equations.

2. Notation

• Matrix: A, with components Aij

• Vector, n x 1 matrix: x, with components xi

• Linear equation: Ax=bwith components Σ Aij xj = bi

2. Notation

• Transponation of a matirx: (AT)ij = Aji

• Inner product of two vectors: xTy = Σ xiyi

• If xTy = 0, then x and y are orthogonal

3. Properties of A

• A has to be an n x n matrix.

• A has to be positive definite, xTAx > 0

• A has to be symmetric, AT = A

4. Quadratic Forms

• A QF is a scalar quadratic function of a vector:

• Example:

f xx i

A x i b i

f xx i

f x12x T A x b T x c

4. Quadratic Forms

• Gradient: Points to the greatest increase of f(x)

f ' x Ax b

4. Quadratic Formspositive definite

xT A x > 0

negative definite

xT A x < 0

positive indefinite

xT A x ≥ 0

indefinite

5. Steepest Descent

• Start at an arbitrary point and slide down to the bottom of the paraboloid.

• Steps x(1), x(2), … in the direction –f´(xi)

• Error e(i) = x(i) – x

• Residual r(i) = b – Ax(i)

r(i) = - Ae(i)

r(i) = - f`(x(i))

5. Steepest Descent

• x(i+1) = x(i) + α r(i) , but how big is α?

• f`(x(i+1)) orthogonal to r(i)

search line α r(i)

r iT r i

r iT A r i

5. Steepest Descent

r i b Ax i

r iT r i

r iT A r i

x i 1 x i i r i

The algorithm above requires two matrix multiplications per iteration. One can be eliminated by multiplying the last equation by –A.

5. Steepest Descent

r i 1 r i i A r i

r i 1 r i

r iT r i

r iT A r i

This sequence is generated without any feedback of x(i). Therefore, floatingpoint roundoff errors may accumulate and the sequence could converge at some point near x. This effect can be avoided by periodically recomputing the correct residual using x(i).

6. Eigenvectors

• v is an eigenvector of A, if a scalar λ so that A v = λ v

• λ is then called an eigenvalue.

• A symmetric n x n matrix always has n independent eigenvectors which are orthogonal.

• A positive definite matrix has positive eigenvalues.

7. Convergence of SD

• Convergence of SD requires the error e(i) to vanish. To measure e(i), we use the A- norm:

• Some math now yields

e A e T A e 1 2

e i 1 A2 e i A

2 2 2 12 2 2

7. Convergence of SD

max m in 1

Spectral condition number

An upper bound for ω is found by setting

We therefore have instant convergence

if all the eigenvalues of A are the same.Id x b

7. Convergence of SDlarge κ

small μ

large κ

large μ

small κ

small μ

small κ

large μ

8. Conjugate Directions

• Steepest Descent often takes steps in the same direction as earlier steps.

• The solution is to take a set of A-orthogonal search directions d(0), d(1), … , d(n-1) and take exactly one step of the right length in each direction.

d iT A d j ij

A-orthogonal orthogonal

• Demanding dT(i) to be A-orthogonal on the

next error e(i+1), we get

• Generating search directions by Gram-Schmidt Conjugation. Problem: O (n³)

x i 1 x i i d i

d iT r i

d iT A d i

• CD chooses , so thatis minimized.

• The error term is therefore A-orthogonal to all the old search directions.

D i span d 0 , d 1 , ... , d i 1

e i e 0 D i e i A

0 d iT A e j d i

T r j i j

9. Conjugate Gradient

• The residual is orthogonal to the previous search directions.

• Krylov subspace

D i span r 0 , r 1 , ... , r i 1

r i 1 = A e i 1

= A e i i d i

= r i i A d i

D i span r 0 , A r 0 , A2 r 0 , ... , A

i 1 r 0

• Gram-Schmidt conjugation becomes easy, because r(i+1) is already A-orthogonal to all the previous search directions except d(i).

A D i D i 1r i 1 D i 1

r(i+1) is A-orthogonal to Di

d i 1 r i 1 i 1 d i i 1

r i 1T r i 1

r iR r i

r i 1T r i 1

r iR r i

d i 1 r i 1 i 1 d i

d 0 r 0 b A x 0

r iT r i

d iT A d i

x i 1 x i i d i

r i 1 r i i A d i

11. Preconditioning

• Improving the condition number of the matrix before the calculation. Example:

• Attempt to strech the quadratic form to make it more spherical.

• Many more sophisticated preconditioners have been developed and are nearly always used.

M 1 A x M 1 b

12. Outlook

• CG can also be used to solve

• To solve non-linear Problems with CG, one has to make changes in the algorithm. There are several possibilities, and the best choice is still under research.

Ax b 2.

12. Outlook

In non-linear problems, there may be several local minima to which CG might converge.

It is therefore hard to determine the right step size.

12. Outlook

Ax = b Ax = λ x

A = A*CG

(A pos. definite)Lanczos

A ≠ A* GMRES Arnoldi

- There are other algorithms in numerical linear algebra closely related to CG.

- They all use Krylov subspaces.

Conjugate Gradient

Documents

Transcript of Conjugate Gradient

Conjugate Gradient Explanation

Conjugate Gradient Method - Institute of Space Technology

Performance Enhancement of Conjugate Gradient Method (CGM ...

Conjugate gradient method - An Introduction to the Conjugate

GLOBALCONVERGENCE PROPERTIES OF CONJUGATE GRADIENT …pages.cs.wisc.edu/~swright/726/handouts/SJE000021.pdf · ofseveral conjugate gradient methods for nonlinear optimization. Weconsider

The Conjugate Gradient Algorithm - University of Washington

ANALISIS ALGORITMA METODE CONJUGATE GRADIENT DAN ...

5.4 CONJUGATE GRADIENT CONVERGENCE ...anitescu/CLASSES/2012/LECTURES/...Preconditioned conjugate gradient Preconditioner action M=CTC 5.5 NONLINEAR CONJUGATE GRADIENT The Fletcher-Reeves

Iteration-Fusing Conjugate Gradient

Nonlinear Conjugate Gradient Methods Conjugate Gradient … · are provided for the conjugate gradient method assuming the descent property of each search direction. Some research

Gradient Methods April 2004. Preview Background Steepest Descent Conjugate Gradient.

Conjugate Gradient:Conjugate Gradientxhx/courses/ConvexOpt/projects/Patrick...• Shewchuck , “An Introduction to the Conjugate Gradient MethodAn Introduction to the Conjugate Gradient

A SURVEY OF NONLINEAR CONJUGATE …hozhang/papers/cgsurvey.pdfA SURVEY OF NONLINEAR CONJUGATE GRADIENT METHODS ... the development of di erent versions of nonlinear conjugate gradient

The HPC Conjugate Gradient (HPCG)

The Conjugate Gradient Method - Stanford Universityadl.stanford.edu/aa222/Lecture_Notes_files/CG_lecture.pdf · 2012-04-11 · The Conjugate Gradient Method = = = Lecture Objectives

Classroom Figures for the Conjugate Gradient Method ...

A conjugate-gradient based approach for approximate ...

Radoslaw_Pytlak - Conjugate Gradient Algorithms in Nonconvex Optimization

Conjugate Gradient Bundle Adjustment

CGIHT: Conjugate Gradient Iterative ... - University of Oxford