The Conjugate Gradient Method...Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

23
The Conjugate Gradient Method Tom Lyche University of Oslo Norway The Conjugate Gradient Method – p. 1/2

Transcript of The Conjugate Gradient Method...Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

The Conjugate Gradient MethodTom Lyche

University of Oslo

Norway

The Conjugate Gradient Method – p. 1/23

Plan for the dayThe method

Algorithm

Implementation of test problems

Complexity

Derivation of the method

Convergence

The Conjugate Gradient Method – p. 2/23

The Conjugate gradient methodRestricted to positive definite systems: Ax = b,A ∈ R

n,n positive definite.

Generate {xk} by xk+1 = xk + αkpk,

pk is a vector, the search direction,

αk is a scalar determining the step length.

In general we find the exact solution in at most niterations.

For many problems the error becomes small after a fewiterations.

Both a direct method and an iterative method.

Rate of convergence depends on the square root of thecondition number

The Conjugate Gradient Method – p. 3/23

The name of the gameConjugate means orthogonal; orthogonal gradients.

But why gradients?

Consider minimizing the quadratic function Q : Rn → R

given by Q(x) := 12xT Ax − xT b.

The minimum is obtained by setting the gradient equalto zero.

∇Q(x) = Ax − b = 0 linear system Ax = b

Find the solution by solving r = b − Ax = 0.

The sequence {xk} is such that {rk} := {b − Axk} isorthogonal with respect to the usual inner product in R

n.

The search directions are also orthogonal, but withrespect to a different inner product.

The Conjugate Gradient Method – p. 4/23

The algorithmStart with some x0. Set p0 = r0 = b − Ax0.

For k = 0, 1, 2, . . .

xk+1 = xk + αkpk, αk = rTk rk

pTk Apk

rk+1 = b − Axk+1 = rk − αkApk

pk+1 = rk+1 + βkpk, βk =rT

k+1rk+1

rTk rk

The Conjugate Gradient Method – p. 5/23

Example[

2 −1−1 2

]

[ x1x2

] = [ 10 ]

Start with x0 = 0.

p0 = r0 = b = [1, 0]T

α0 = rT0 r0

pT0 Ap0

= 12 , x1 = x0 + α0p0 = [ 0

0 ] + 12 [ 1

0 ] =[

1/20

]

r1 = r0 − α0Ap0 = [ 10 ] − 1

2

[

2−1

]

=[

01/2

]

, rT1 r0 = 0

β0 = rT1 r1

rT0 r0

= 14 , p1 = r1 + β0p0 =

[

01/2

]

+ 14 [ 1

0 ] =[

1/41/2

]

,

α1 = rT1 r1

pT1 Ap1

= 23 ,

x2 = x1 + α1p1 =[

1/20

]

+ 23

[

1/41/2

]

=[

2/31/3

]

r2 = 0, exact solution.

The Conjugate Gradient Method – p. 6/23

Exact method and iterative methodOrthogonality of the residuals implies that xm is equal to the solutionx of Ax = b for some m ≤ n.

For if xk 6= x for all k = 0, 1, . . . , n − 1 then rk 6= 0 fork = 0, 1, . . . , n − 1 is an orthogonal basis for R

n. But then rn ∈ Rn is

orthogonal to all vectors in Rn so rn = 0 and hence xn = x.

So the conjugate gradient method finds the exact solution in at mostn iterations.

The convergence analysis shows that ‖x − xk‖A typically becomessmall quite rapidly and we can stop the iteration with k much smallerthat n.

It is this rapid convergence which makes the method interesting andin practice an iterative method.

The Conjugate Gradient Method – p. 7/23

Conjugate Gradient Algorithm

[Conjugate Gradient Iteration] The positive definite linear system Ax = b is

solved by the conjugate gradient method. x is a starting vector for the iteration. The

iteration is stopped when ||rk||2/||r0||2 ≤ tol or k > itmax. itm is the number of

iterations used.

function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r =b−A∗x ; p= r ; rho=r ’∗ r ;

rho0=rho ; for k =0: i tmax

i f sqrt ( rho / rho0)<= t o l ˆ2

i tm=k ; return

end

t =A∗p ; a=rho / ( p ’∗ t ) ;

x=x+a∗p ; r =r−a∗ t ;

rhos=rho ; rho=r ’∗ r ;

p= r +( rho / rhos )∗p ;

end i tm=itmax +1;

The Conjugate Gradient Method – p. 8/23

A family of test problemsWe can test the methods on the Kronecker sum matrix

A = C1⊗I+I⊗C2 =

C1

C1

. . .

C1

C1

+

cI bI

bI cI bI

. . .. . .

. . .

bI cI bI

bI cI

,

where C1 = tridiagm(a, c, a) and C2 = tridiagm(b, c, b).Positive definite if c > 0 and c ≥ |a| + |b|.

The Conjugate Gradient Method – p. 9/23

m = 3, n = 9

A =

2c a 0 b 0 0 0 0 0

a 2c a 0 b 0 0 0 0

0 a 2c 0 0 b 0 0 0

b 0 0 2c a 0 b 0 0

0 b 0 a 2c a 0 b 0

0 0 b 0 a 2c 0 0 b

0 0 0 b 0 0 2c a 0

0 0 0 0 b 0 a 2c a

0 0 0 0 0 b 0 a 2c

b = a = −1, c = 2: Poisson matrix

b = a = 1/9, c = 5/18: Averaging matrix

The Conjugate Gradient Method – p. 10/23

Averaging problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

a = b = 1/9, c = 5/18

λmax = 59 + 4

9 cos (πh), λmin = 59 − 4

9 cos (πh)

cond2(A) = λmax

λmin= 5+4 cos(πh)

5−4 cos(πh) ≤ 9.

The Conjugate Gradient Method – p. 11/23

2D formulation for test problemsV = vec(x). R = vec(r), P = vec(p)

Ax = b ⇐⇒ DV + V E = h2F ,

D = tridiag(a, c, a) ∈ Rm,m, E = tridiag(b, c, b) ∈ R

m,m

vec(Ap) = DP + PE

The Conjugate Gradient Method – p. 12/23

Testing

[Testing Conjugate Gradient ] A = trid(a, c, a, m) ⊗ Im + Im ⊗trid(b, c, b, m) ∈ R

m2,m2

function [ V, i t ]= cg tes t (m, a , b , c , t o l , i tmax )

h =1 / (m+1) ; R=h∗h∗ones (m) ;

D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;

V=zeros (m,m) ; P=R; rho=sum(sum(R.∗R ) ) ; rho0=rho ;

for k =1: i tmax

i f sqrt ( rho / rho0)<= t o l

i t =k ; return

end

T=D∗P+P∗E; a=rho /sum(sum(P.∗T ) ) ; V=V+a∗P; R=R−a∗T ;

rhos=rho ; rho=sum(sum(R.∗R ) ) ; P=R+( rho / rhos )∗P;

end ;

i t = i tmax +1;

The Conjugate Gradient Method – p. 13/23

The Averaging Problem

n 2 500 10 000 40 000 1 000 000 4 000 000

K 22 22 21 21 20

Table 1: The number of iterations K for the averag-

ing problem on a√

n ×√n grid. x0 = 0 tol = 10−8

Both the condition number and the required number of iterations areindependent of the size of the problem

The convergence is quite rapid.

The Conjugate Gradient Method – p. 14/23

Poisson Problemλjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

a = b = −1, c = 2

λmax = 4 + 4 cos (πh), λmin = 4 − 4 cos (πh)

cond2(A) = λmax

λmin= 1+cos(πh)

1−cos(πh) = cond(T )2.

cond2(A) = O(n).

The Conjugate Gradient Method – p. 15/23

The Poisson problem

n 2 500 10 000 40 000 160 000

K 140 294 587 1168

K/√

n 1.86 1.87 1.86 1.85

Using CG in the form of Algorithm 8 with ǫ = 10−8 and x0 = 0 we listK, the required number of iterations and K/

√n.

The results show that K is much smaller than n and appears to beproportional to

√n

This is the same speed as for SOR and we don’t have to estimateany acceleration parameter!√

n is essentially the square root of the condition number of A.

The Conjugate Gradient Method – p. 16/23

ComplexityThe work involved in each iteration is

1. one matrix times vector (t = Ap),

2. two inner products (pT t and rT r),

3. three vector-plus-scalar-times-vector (x = x + ap,r = r − at and p = r + (rho/rhos)p),

The dominating part of the computation is statement 1.Note that for our test problems A only has O(5n) nonzeroelements. Therefore, taking advantage of the sparseness ofA we can compute t in O(n) flops. With such animplementation the total number of flops in one iteration isO(n).

The Conjugate Gradient Method – p. 17/23

More ComplexityHow many flops do we need to solve the test problemsby the conjugate gradient method to within a giventolerance?

Average problem. O(n) flops. Optimal for a problemwith n unknowns.

Same as SOR and better than the fast method basedon FFT.

Discrete Poisson problem: O(n3/2) flops.

same as SOR and fast method.

Cholesky Algorithm: O(n2) flops both for averaging andPoisson.

The Conjugate Gradient Method – p. 18/23

Analysis and Derivation of the MethodTheorem 3 (Orthogonal Projection). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. To eachx ∈ V there is a unique vector p ∈ S such that

〈x − p, s〉 = 0, for all s ∈ S. (1)

x

x

x - p

p=PS

S

The Conjugate Gradient Method – p. 19/23

Best ApproximationTheorem 4 (Best Approximation). Let S be a subspace of a finitedimensional real or complex inner product space (V , F, 〈·, ·, )〉. Letx ∈ V , and p ∈ S . The following statements are equivalent

1. 〈x − p, s〉 = 0, for all s ∈ S.

2. ‖x − s‖ > ‖x − p‖ for all s ∈ S with s 6= p.

If (v1, . . . ,vk) is an orthogonal basis for S then

p =k

i=1

〈x,vi〉〈vi,vi〉

vi. (2)

The Conjugate Gradient Method – p. 20/23

Derivation of CGAx = b, A ∈ R

n,n is pos. def., x, b ∈ Rn

(x,y) := xT y, x,y ∈ Rn

〈x,y〉 := xTAy = (x,Ay) = (Ax,y)

‖x‖A =√

xT Ax

W0 = {0}, W1 = span{b}, W2 = span{b,Ab},Wk = span{b,Ab,A2b, . . . ,Ak−1b}W0 ⊂ W1 ⊂ W2 ⊂ Wk ⊂ · · ·dim(Wk) ≤ k, w ∈ Wk ⇒ Aw ∈ Wk+1

xk ∈ Wk, 〈xk − x,w〉 = 0 for all w ∈ Wk

p0 = r0 := b, pj = rj −∑j−1

i=0〈rj ,pi〉〈pi,pi〉

pi, j = 1, . . . , k.

The Conjugate Gradient Method – p. 21/23

ConvergenceTheorem 5. Suppose we apply the conjugate gradient method to apositive definite system Ax = b. Then the A-norms of the errors satisfy

||x − xk||A||x − x0||A

≤ 2

(√κ − 1√κ + 1

)k

, for k ≥ 0,

where κ = cond2(A) = λmax/λmin is the 2-norm condition number ofA.

This theorem explains what we observed in the previoussection. Namely that the number of iterations is linked to√

κ, the square root of the condition number of A. Indeed,the following corollary gives an upper bound for the numberof iterations in terms of

√κ.

The Conjugate Gradient Method – p. 22/23

Corollary 6. If for some ǫ > 0 we have k ≥ 12 ln(2

ǫ )√

κ then

||x − xk||A/||x − x0||A ≤ ǫ.

The Conjugate Gradient Method – p. 23/23