The Conjugate Gradient Method ... Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

download The Conjugate Gradient Method ... Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The positive

of 23

  • date post

    09-Apr-2020
  • Category

    Documents

  • view

    4
  • download

    1

Embed Size (px)

Transcript of The Conjugate Gradient Method ... Conjugate Gradient Algorithm [Conjugate Gradient Iteration] The...

  • The Conjugate Gradient Method Tom Lyche

    University of Oslo

    Norway

    The Conjugate Gradient Method – p. 1/23

    http://www.ifi.uio.no/

  • Plan for the day The method

    Algorithm

    Implementation of test problems

    Complexity

    Derivation of the method

    Convergence

    The Conjugate Gradient Method – p. 2/23

  • The Conjugate gradient method Restricted to positive definite systems: Ax = b, A ∈ Rn,n positive definite. Generate {xk} by xk+1 = xk + αkpk, pk is a vector, the search direction,

    αk is a scalar determining the step length.

    In general we find the exact solution in at most n iterations.

    For many problems the error becomes small after a few iterations.

    Both a direct method and an iterative method.

    Rate of convergence depends on the square root of the condition number

    The Conjugate Gradient Method – p. 3/23

  • The name of the game Conjugate means orthogonal; orthogonal gradients.

    But why gradients?

    Consider minimizing the quadratic function Q : Rn → R given by Q(x) := 12x

    T Ax − xT b. The minimum is obtained by setting the gradient equal to zero.

    ∇Q(x) = Ax − b = 0 linear system Ax = b Find the solution by solving r = b − Ax = 0. The sequence {xk} is such that {rk} := {b − Axk} is orthogonal with respect to the usual inner product in Rn.

    The search directions are also orthogonal, but with respect to a different inner product.

    The Conjugate Gradient Method – p. 4/23

  • The algorithm Start with some x0. Set p0 = r0 = b − Ax0. For k = 0, 1, 2, . . .

    xk+1 = xk + αkpk, αk = rTk rk

    pTk Apk

    rk+1 = b − Axk+1 = rk − αkApk pk+1 = rk+1 + βkpk, βk =

    rTk+1rk+1

    rTk rk

    The Conjugate Gradient Method – p. 5/23

  • Example [

    2 −1 −1 2

    ]

    [ x1x2 ] = [ 1 0 ]

    Start with x0 = 0.

    p0 = r0 = b = [1, 0] T

    α0 = rT0 r0

    pT0 Ap0 = 12 , x1 = x0 + α0p0 = [

    0 0 ] +

    1 2 [

    1 0 ] =

    [

    1/2 0

    ]

    r1 = r0 − α0Ap0 = [ 10 ] − 12 [

    2 −1

    ]

    = [

    0 1/2

    ]

    , rT1 r0 = 0

    β0 = rT1 r1 rT0 r0

    = 14 , p1 = r1 + β0p0 = [

    0 1/2

    ]

    + 14 [ 1 0 ] =

    [

    1/4 1/2

    ]

    ,

    α1 = rT1 r1

    pT1 Ap1 = 23 ,

    x2 = x1 + α1p1 = [

    1/2 0

    ]

    + 23

    [

    1/4 1/2

    ]

    = [

    2/3 1/3

    ]

    r2 = 0, exact solution.

    The Conjugate Gradient Method – p. 6/23

  • Exact method and iterative method Orthogonality of the residuals implies that xm is equal to the solution x of Ax = b for some m ≤ n. For if xk 6= x for all k = 0, 1, . . . , n − 1 then rk 6= 0 for k = 0, 1, . . . , n − 1 is an orthogonal basis for Rn. But then rn ∈ Rn is orthogonal to all vectors in Rn so rn = 0 and hence xn = x.

    So the conjugate gradient method finds the exact solution in at most n iterations.

    The convergence analysis shows that ‖x − xk‖A typically becomes small quite rapidly and we can stop the iteration with k much smaller that n.

    It is this rapid convergence which makes the method interesting and in practice an iterative method.

    The Conjugate Gradient Method – p. 7/23

  • Conjugate Gradient Algorithm

    [Conjugate Gradient Iteration] The positive definite linear system Ax = b is solved by the conjugate gradient method. x is a starting vector for the iteration. The

    iteration is stopped when ||rk||2/||r0||2 ≤ tol or k > itmax. itm is the number of iterations used.

    function [ x , i tm ]= cg (A, b , x , t o l , i tmax ) r =b−A∗x ; p= r ; rho=r ’∗ r ; rho0=rho ; for k =0: i tmax

    i f sqrt ( rho / rho0)

  • A family of test problems We can test the methods on the Kronecker sum matrix

    A = C1⊗I+I⊗C2 =

    C1

    C1

    . . .

    C1

    C1

    +

    cI bI

    bI cI bI

    . . . . . .

    . . .

    bI cI bI

    bI cI

    ,

    where C1 = tridiagm(a, c, a) and C2 = tridiagm(b, c, b). Positive definite if c > 0 and c ≥ |a| + |b|.

    The Conjugate Gradient Method – p. 9/23

  • m = 3, n = 9

    A =

    2c a 0 b 0 0 0 0 0

    a 2c a 0 b 0 0 0 0

    0 a 2c 0 0 b 0 0 0

    b 0 0 2c a 0 b 0 0

    0 b 0 a 2c a 0 b 0

    0 0 b 0 a 2c 0 0 b

    0 0 0 b 0 0 2c a 0

    0 0 0 0 b 0 a 2c a

    0 0 0 0 0 b 0 a 2c

    b = a = −1, c = 2: Poisson matrix b = a = 1/9, c = 5/18: Averaging matrix

    The Conjugate Gradient Method – p. 10/23

  • Averaging problem λjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

    a = b = 1/9, c = 5/18

    λmax = 5 9 +

    4 9 cos (πh), λmin =

    5 9 − 49 cos (πh)

    cond2(A) = λmaxλmin = 5+4 cos(πh) 5−4 cos(πh) ≤ 9.

    The Conjugate Gradient Method – p. 11/23

  • 2D formulation for test problems V = vec(x). R = vec(r), P = vec(p)

    Ax = b ⇐⇒ DV + V E = h2F , D = tridiag(a, c, a) ∈ Rm,m, E = tridiag(b, c, b) ∈ Rm,m

    vec(Ap) = DP + PE

    The Conjugate Gradient Method – p. 12/23

  • Testing

    [Testing Conjugate Gradient ] A = trid(a, c, a, m) ⊗ Im + Im ⊗ trid(b, c, b, m) ∈ Rm2,m2

    function [ V, i t ]= cg tes t (m, a , b , c , t o l , i tmax )

    h =1 / (m+1) ; R=h∗h∗ones (m) ; D=sparse ( t r i d i a g o n a l ( a , c , a ,m) ) ; E=sparse ( t r i d i a g o n a l ( b , c , b ,m) ) ;

    V=zeros (m,m) ; P=R; rho=sum(sum(R.∗R ) ) ; rho0=rho ; for k =1: i tmax

    i f sqrt ( rho / rho0)

  • The Averaging Problem

    n 2 500 10 000 40 000 1 000 000 4 000 000

    K 22 22 21 21 20

    Table 1: The number of iterations K for the averag-

    ing problem on a √

    n ×√n grid. x0 = 0 tol = 10−8

    Both the condition number and the required number of iterations are independent of the size of the problem

    The convergence is quite rapid.

    The Conjugate Gradient Method – p. 14/23

  • Poisson Problem λjk = 2c + 2a cos (jπh) + 2b cos (kπh), j, k = 1, 2, . . . ,m.

    a = b = −1, c = 2 λmax = 4 + 4 cos (πh), λmin = 4 − 4 cos (πh)

    cond2(A) = λmaxλmin = 1+cos(πh) 1−cos(πh) = cond(T )2.

    cond2(A) = O(n).

    The Conjugate Gradient Method – p. 15/23

  • The Poisson problem

    n 2 500 10 000 40 000 160 000

    K 140 294 587 1168

    K/ √

    n 1.86 1.87 1.86 1.85

    Using CG in the form of Algorithm 8 with ǫ = 10−8 and x0 = 0 we list K, the required number of iterations and K/

    √ n.

    The results show that K is much smaller than n and appears to be proportional to

    √ n

    This is the same speed as for SOR and we don’t have to estimate any acceleration parameter! √

    n is essentially the square root of the condition number of A.

    The Conjugate Gradient Method – p. 16/23

  • Complexity The work involved in each iteration is

    1. one matrix times vector (t = Ap),

    2. two inner products (pT t and rT r),

    3. three vector-plus-scalar-times-vector (x = x + ap, r = r − at and p = r + (rho/rhos)p),

    The dominating part of the computation is statement 1. Note that for our test problems A only has O(5n) nonzero elements. Therefore, taking advantage of the sparseness of A we can compute t in O(n) flops. With such an implementation the total number of flops in one iteration is O(n).

    The Conjugate Gradient Method – p. 17/23

  • More Complexity How many flops do we need to solve the test problems by the conjugate gradient method to within a given tolerance?

    Average problem. O(n) flops. Optimal for a problem with n unknowns.

    Same as SOR and better than the fast method based on FFT.

    Discrete Poisson problem: O(n3/2) flops.

    same as SOR and fast method.

    Cholesky Algorithm: O(n2) flops both for averaging and Poisson.

    The Conjugate Gradient Method – p. 18/23

  • Analysis and Derivation of the Method Theorem 3 (Orthogonal Projection). Let S be a subspace of a finite dimensional real or complex inner product space (V , F, 〈·, ·, )〉. To each x ∈ V there is a unique vector p ∈ S such that

    〈x − p, s〉 = 0, for all s ∈ S. (1)

    x

    x

    x - p

    p=P S

    S

    The Conjugate Gradient Method – p. 19/23

  • Best Approximation Theorem 4 (Best Approximation). Let S be a subspace o