Changing the Norm in Conjugate Gradient Type...

SIAM J. NUMER. ANAL.Vol. 30, No. 1, pp. 40-56, February 1993

() 1993 Society for Industrial and Applied MathematicsOO3

CHANGING THE NORM IN CONJUGATE GRADIENT TYPEALGORITHMS*

MARTIN H. GUTKNECHT

Abstract. A conjugate gradient type method for solving Ax b or, more precisely, an or-thogonal error method OE(B, C,A), is determined by a "formal inner product" matrix B and a

(left) preconditioning matrix C. Relations are pointed out between quantities such as iterates,residuals, direction vectors, and recurrence coefficients of orthogonal error methods with differentbut, in a particular way, related matrices B, namely, for OE(B(CA)k, C, A), k 0, 1,..., and forOE((AHCH)kB, C,A), k 0,1,.... The relations for the first sequence of methods have to dowith Rutishauser’s LR algorithm; those for the second one are based on a generalization of theSch6nauer-Weiss smoothing algorithm. Relevant for practice are the cases k 0, 1 for B CA andB (CA)H.

Key words, conjugate gradient method, conjugate residual method, orthogonal error method,LR algorithm

AMS(MOS) subject classification. 65F10

1. Introduction. In the last 15 years a confusing variety of generalizations ofthe classical conjugate gradient (CG) method [10] has been proposed. Several authors[2], [4], [6], [11], [17], [26] have since pointed out common properties of these methodsand have developed schemes for classifying them. Here we want to describe therelationships among some of these methods, namely, those that belong to the class oforthogonal error (OE) methods [6]. Under the assumption that the inner products onwhich two such methods are based are related in a particular way, we specify relationsbetween iterates, residuals, direction vectors, and recurrence coefficients. We will seethat often these quantities for one method can be computed from those produced byanother method.

The classical CG method of Hestenes and Stiefel [10] for solving a linear systemAx b with a Hermitian positive definite (hpd) matrix A has the characteristicproperty that the error fn A-lb- xn of the nth iterate xn is minimized in theA-norm:

[[fnl[A min IlA-ib- XI[A,x-xoEn

where lYlIA v/yHAy is the norm induced by A and Kn span {ro, Aro,...,An-lro} is the nth Krylov subspaces generated by A from the initial residual r0 :=b- Axo. (As usual, yH denotes the conjugate transpose of y. Capital Latin lettersare used for matrices, lower case Latin letters for vectors; consequently, o is the zerovector. Greek letters denote scalars.)

More generally, one can define a CG method CG(B, C, A) for a nonsingular matrixA based on an hpd inner product matrix B and a nonsingular left preconditioner C[2]. A right preconditioner can be seen to fit into this class after a variable transform[2]. If we define the errors fn, the residuals rn, and the preconditioned residualss by

(1.1) f := A- b xn r b- Axn Afn Sn Cr CAf

Received by the editors August 29, 1991; accepted for publication (in revised form) March 30,1992.

Interdisciplinary Project Center for Supercomputing (IPS), ETH Ziirich, ETH-Zentrum, CH-8092 Ziirich, Switzerland.

4O

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

CHANGING THE NORM IN CG TYPE ALGORITHMS 41

respectively, and consider now the Krylov subspaces

K:n :- span{s0, CAso,..., (CA)’-lso},

then CG(B, C, A) is characterized by

(1.3) xn x0

and

(1.4) Ill, liB min IIA-Ib- xllsx-xo E1C,

where now II IIB := v/yHBy.CG(A,I,A) and CG(A2,I,A) are the classical CG and conjugate residual (CR)

method, respectively (which assume A to be hpd, although the iterates of the secondmethod are still well defined and can be found with the ORTHODIR algorithm if A isnermitian indefinite; cf. [2]). For non-nermitian A, the (preconditioned) generalizedca (GCR) method CG((CA)H CA, C, A) is of the above type and minimizes

[Ifll(ca)ca II-II,

The GMRES algorithm [18] is a particular realization of this method.An easy exercise in differentiation shows that (1.4) is equivalent to the Galerkin

condition

(1.5) (y, Bfn) --0 (/y e ]n), or, briefly, ]’n -[-B fn,

where (y,x) := yHx is the Euclidean inner product and thus (y, Bx) is the innerproduct induced by B. (Note that ]n -[-B fn fn -[-B ]n unless B is Hermitian.)The Galerkin condition (1.5) is the key to a further generalization of the conjugategradient method, since it does not require that B is Hermitian (cf. [26], [4], [11], [17],[6], [12], [1]). In fact, one could even consider the more general situation where in(1.5) y varies in yet another n-dimensional linear space/n which is unrelated to theaffine space x0 +/Cn containing xn (cf. [20], [17]). As long as B is definite, that is,(y, By) = 0 (for all y 0), the iterates are still uniquely defined and the methodhas the finite termination property, that is, f o if n denotes the dimension (andindex) of the largest Krylov subspace (1.2). Following Faber and Manteuffel [6], wewill call any such method generating iterates x characterized by (1.3) and (1.5) anOE method OE(B, C, A).

Actually, even if B is indefinite, a method based on (1.3) and (1.5) may workwell for many problems, but breakdowns are then possible. In particular, the Lanczosbiconjugate gradient (BICG) method may be understood in this way [11].

The classification of particular methods according to the triple A, B, C has beendiscussed extensively in the literature [26], [4], [11], [5], [17], [6], [12], [2]. Moreover, thecases with "short" recurrence formulas have been analyzed and, finally, characterizedcompletely [5], [6], [12], [23]. The case B I seems the most desirable one, but unlessthe right-hand side space/Cn in (1.5) is modified, the "iterates" xn corresponding toB C I are not computable.

Following tradition many authors define (y,x) := xHy, but we prefer the definition where thetranslation into matrix products does not require exchanging the arguments. For the same reason,we write xa if a is a scalar.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

42 MARTIN H. GUTKNECHT

In this paper we want to survey and discuss connections between OE methodswith "formal inner product" matrices B which are in certain ways related to eachother. A classical result, which is implicitly contained in [14] and [22], says that inthe situation where the matrix A is hpd and no preconditioner is applied (i.e., C I)the CG method CG(A, I, A) and the CR method CG(A2, I, A) are related by the factthat the measure d#(A) with respect to which the residual polynomials of CG(A, I, A)are orthogonal is multiplied by A to get the measure Ad#(A) of CG(A2, I,A). Asa consequence, the two bidiagonal matrices of recurrence coefficients for the lattermethod are obtained by applying one step of the qd algorithm (or, an LR transform)to the corresponding pair of matrices of the former method (cf. [14], [22]). Thisconnection extends readily to methods with B Ak [16] and to the Lanczos BICGmethod for non-Hermitian matrices and its variants and generalizations [9].

In [1] we show that this relation can be extended further to the series of methodsOE(B(CA)k, C, A), k 0, 1, Besides presenting a particularly simple derivationof this connection, we give in 3 the relations among the iterates and residuals ofthe methods in this series. In 4 we point out that there is another way to general-ize the classical connection, namely, to the series of methods OE((AHCH)kB, C,A),k 0, 1, The relevant result needed to make this connection has been found byWeiss in his thesis [25, p. 78] in which he analyzed Schhnauer’s smoothing method [19,p. 261]. We extend his results from real CG to complex OE methods (i.e., from realA, C and symmetric positive definite B to complex A, C and non-Hermitian definiteB), give a new and transparent proof, and put the result then in the right perspec-tive. We also present an algorithm which is the reverse to Schhnauer’s smoothing.Additionally, in 5, we derive relations among the recurrence coefficients of the sorelated methods and thus give another generalization of the classical relationship. Weconcentrate on the recurrence matrices that are used in the ORTHOMIN, ORTHO-RES, and ORTHODIR implementations of these OE methods. An equally completetreatment of relationships among GMRES type implementations for various "formalinner product" matrices B is left for future work. Some results of this type weregiven by Paige and Saunders [13], Freund [7], Freund and Nachtigal [8], Brown [3],and Vuik [24]. Here we only point out that another of Weiss’s results generalizes awell-known relation between the residual norms of classical CG and CR [13, p. 626].This relation has recently been shown by Brown [3] and Vuik [24] to hold likewisebetween the Arnoldi or full orthogonalization method (FOM) and the GCR or GM-RES method. Vuik’s result is also a special case of the one obtained here. Moreover,the relation between the residual norm in BICG and the residual norm of the leastsquares parameter problem solved in the quasi-minimal residual (QMR) method ofFreund and Nachtigal is of the same form [8].

As a prerequisite, we summarize in 2 the relevant properties of OE methods andprovide for most of them simple proofs that are, in contrast to those in [1], unrelatedto a particular algorithm for computing the iterates.

2. Basic properties of OE methods. Given the nonsingular, in general com-plex, linear system Ax b, a definite "formal inner product matrix" B, and a nonsin-gular preconditioner C, the nth iterate xn of the OE method OE(B, C, A) is implicitlydefined by (1.3) and (1.5). (At this point we are not yet concerned with the com-putation of the iterates.) The iterate Xn is uniquely determined, and the methodterminates after a steps, where a is the maximum dimension of the Krylov subspacesK:n from (1.2), that is, x, A-lb (cf. [26], [12], [6], [1]).

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


The increments x,+l -xn can be written as

(2.1) x,+ x, p,an,

where Pn are direction vectors and a E C is the steplength. (It may occur thatx+ xn, then an 0, but Pn o.) Consequently,

(2.2) s,+

It can be shown easily (see, e.g., [1]) that

(2.3) p ,+, s so CA (0 < n < ),and that as a consequence of (1.5)(2.4)

(2.5) (,h) 0 (0 <_ . < <_ ),where K:o :- {o} and

(2.6) [ :-- B(CA)-1.In other words, the direction vectors p are B-semiorthogonal and the preconditionedresiduals sn are/-semiorthogonal. These properties are used in the algorithms thatallow one to actually compute x, (n 1,..., ).

Since B is definite, (2.4) implies that

(2.7) span {P0,... ,Pn} n+l, n 0,... ,g 1,

and if we assume that alson < a), then likewise

(2.8) span {so,...,sn} n+i, n 0,...,,- 1.

If the last property does not hold, both the ORTHOMIN and the ORTHORESalgorithm break down; in contrast the ORTHODIR algorithm makes only use of(2.7). In the following, we assume that both (2.7) and (2.8) hold, which implies thatan =0, n--O,...,n- 1 [1].

To discuss further properties of OE(B, C, A) we make use of the matrix formula-tions promoted in [1]. Let

X Ix0 x_l ], S := So s-i ], P := [P0 P-

be matrices whose columns are the iterates, preconditioned residuals, and directionvectors, respectively, and define the matrices

As diag [1, ao, a, a-2],A :-- diag [ao, al,..., aa-1],(po, Bpo)(p,Bpo) (pl,Bp)

n :=

(p,-, Spo} (p,c-, Sp) (p,_,

(o,o)(,o)

(8--1, ]080> (8a--1, 81> (Sg--1, 8--1>and

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


By (2.2), that is, essentially by the definition (2.1) of p, one has then

(2.9) CAP SL.

In view of (2.7), (2.8), and the definitions of ]gn and a, there clearly exist irreducibleupper Hessenberg matrices G and H and a nonsingular upper triangular matrix Rsuch that

(2.10) CAP PH,(2.11) CAS SG,(2.12) S- PR.

H is chosen to be unit Hessenberg, which means that the direction vectors pn, whichhave so far been defined only up to scalar factors, now become uniquely defined. Incontrast, G is already fixed since the preconditioned residuals are uniquely defined.Equations (2.10) and (2.11) show that H and G may be thought of as representationsof the restriction of the operator CA to . Note that by looking at the nth columnof the matrix equations (2.9)-(2.12) we can read off recurrence relations for Pn+I and

8n+l.Finally, we need from R the vectors

/T’=[0 01],

and we let -"/a,a-1 denote the sum of the elements of the last column of G:

7,-1 :=-eTGl.

The diagonal matrix obtained by extracting from any matrix M the diagonal is de-noted by diag M.

In this notation a number of basic facts which have been proven in the literatureon orthogonal error methods [26], [12], [6], [1] can be formulated compactly and proveneasily.

THEOREM 2.1. For OE(B, C,A) with definite B and J, the following relationshold:

(a) Semiorthogonality conditions:(al) pHBp D,

(b) Semiconjugacy conditions:(bl) pHBeAR OH,(b2) SHBS DG.

(c) Interrelations:(el) H RL,(c2) G LR,(c3) pHBS DR.

(d) Normalization conditions:

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


(dl) diag R -AS,(d2) eTG -/,_11T,(d3) diag L A-1

(d4) eTL a-_ll.(e) Recurrences .for the iterates:

(el) P -XL + xa-llT,(e2) S -XG- x9/,_llT(Recall that x A-lb.)

Proof. Conditions (al) and (a2) follow from the definitions of D and and fromthe fact that these two matrices are lower triangular due to (2.4) and (2.5).

To obtain (bl) and (52), multiply (2.10) and (2.11) by pHB and sH[, respec-tively, and make use of (al) and (a2).

Inserting (2.12) into (2.9) and using (2.10), we get PRL CAP- PH, whichimplies (cl) since the columns of P are linearly independent. Likewise, eliminating Pfrom (2.12) and (2.9) by multiplying (2.9) from the right-hand side by R and applying(2.12) yields CAS SLR. Hence, by comparison with (2.11), one gets SG SLR,which implies (c2) since the columns of S are also linearly independent. Finally,multiplying (al) from the right-hand side by R and applying (2.12) gives (c3).

If pn is expressed as a linear combination of the Krylov vectors (CA)ks0, k1,..., n, then the "leading" coefficient of (CA)nso is 1 since H in (2.10) is unit Hes-senberg. From (2.2) it follows then that the analogous coefficient in the representationof sn is -a_. Hence, if Sn is expressed in terms of P0,...,P, as in (2.12), the coef-ficient of Pn is also -an-, asclaimed in (dl).

To obtain (d2) note that from s so E CAIrn it follows that G has columnsums zero, except for the last column, where the sum is given by --),,_. [Thezero column sum condition translates into the prescribed value of 1 at 0 for the(preconditioned) residual polynomial. We refer to it also as consistency condition.]The equations (d3) and (d4) are trivial.

The errors fn are the columns of the matrix A-lbeT X xeT X; hence

(2.13) S CA(xeT X) CbeT CAX

(cf. (1.1)). Note also that eTL lTo-l_l Using these relations we obtain (el) byapplying (CA)- to both sides of (2.9). Finally, (e2) follows in an analogous way from(2.11). D

If H and G were known, a column by column interpretation of the relations (2.10)and (2.11), respectively, would yield recurrence relations for the vectors Pn and Sn.However, although H and G are not known in advance, they can be determined columnby column in parallel with the generation of the vectors Pn and Sn and of the matricesD and by making use of Theorem 2.1(al) and (bl), or (a2) and (b2), respectively.To find the iterates Xn, one applies additionally (dl) and (d2), respectively. Note thatL is determined by A. These observations are the basis of the ORTHODIR and theORTHORES algorithm for solving Ax b, or rather CAx Cb.

Similarly, (2.9) and (2.12) can be thought of as defining mixed recurrences for thesimultaneous generation of the vectors Pn and Sn; the coefficients in D and R can befound using (al) and (c3) of Theorem 2.1, and (dl) is again applied to generate theiterates. This leads to the ORTHOMIN algorithm. See [26], [11], [12], [1] for furtherdetails.

In view of 8n CAfn (cf. (1.1)), relations involving the matrix of the errorvectors, E := If0"" f-], would readily follow from S CAE.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


The basic results compiled in Theorem 2.1 remain essentially valid even when Bor/ or both are indefinite (i.e., arbitrary nonsingular). However, On(B, C,A) maythen break down, in the sense that not all iterates are well defined, or at least someof the standard algorithms implementing it may break down. This then means thatonly certain principal submatrices of some of"the matrices we have defined exist; see

[1] for further details. If no such breakdown occurs, then the matrices exist in fulland the relations stated in the theorem remain valid. In particular the Lanczos-basedmethods like BICG belong to this class.

3. The first norm transformation related to Rutishauser’s LR algo-rithm. Here we investigate the relation between OE(B, C, A) and OE(BCA, C, A), ormore generally, among the elements of the sequence OE(B(CA)k, C, A), k 1, 2,...,of OE methods. As we have mentioned in (2.4)-(2.6), the orthogonal error methodOE(B, C, A) generates B-semiorthogonal direction vectors pi and B-semiorthogonalpreconditioned residuals si, where B :-- B(CA)-. We still assume here that bothP0,...,p- and so,..., s_ form a basis of the largest Krylov space ]C, which is inparticular true if both B and/ are definite. The recurrence coefficients for comput-ing these bases are stored in the Hessenberg matrices H and G, and from Theorem2.1(cl) and (c2) we know that these are related by

(3.1) G-- LR, H- RL,

so that

(3.2) G LHL-1 R-1HR, H RGR-1 L-IGL.

In other words, to get H from G, one has to compute an LU decomposition of Gand then to multiply together the factors in reverse order. This is exactly one stepof Rutishauser’s LR algorithm [15] applied to the Hessenberg matrix G. Note that in(3.2) we give two similarity transforms relating G and H, one induced by the lowertriangular L, the other by the upper triangular R.

While G is normalized to have column sums zero (except in its last column), His normalized as unit Hessenberg matrix. Therefore, if we are looking for anotherOE method whose preconditioned residual vectors have the same direction (in CN)as the direction vectors Pn of OE(B, C, A) considered above, we need to renormalizethese direction vectors. We call the new preconditioned residuals n. Since theyare B-semiorthogonal, they belong to the new orthogonal error method OE(/, C, A),where

(3.3) [ :- BCA.

For the renormalization we need a diagonal matrix/ such that

in order to have /H/-1 fulfill the analogue of Theorem 2.1(d2). The elements(k 0,..., a- 1) of can be found recursively

k-1

j=0

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


where H -: [j,k]. The renormalization induces a similarity transformation of H, i.e.,H is replaced by

(3.6) ( := AHA-,which yields the matrix of recurrence coefficients for the generation of the new B-semiorthogonal preconditioned residuals , according to CA(J . The directionvectors ihk of OE(/}, C, A) are then/}-semiorthogonal, and the Hessenberg matrix/:/containing the recurrence coefficients for their generation according to CAD 15Ican be obtained by another step of the LR algorithm applied to ( if this matrixhappens to have an LU decomposition (without pivoting)

To obtain the same normalization for ] and/ as for L and R, we need to choose the(0, 0)-element2 of/ equal to -1 and then, for n 1,..., - 1, the (n, n)-element’ of/equal to minus the reciprocal (n- 1, n- 1)-element of ]. Then, as in Theorem 2.1(d),diag/ _s and diag L -1, where s and / are related in the same way asA and A.

From (3.2), (3.6), and (3.7), it follows in particular that

(3.8) ARGR-A- AL-1GLA-,(3.9) /:/-//H/-I/-1 ]-IAH/-I],and

(3.10) LA ARL.

Some of these results are described in more detail in [1]. They are summarizedas the following theorem.

THEOREM 3.1. Let J BCA and assume that the principal minors of theHessenberg matrices G and of OE(B, C,A) and OE(/}, C,A), respectively, are allnonzero. Then the matrices G, H, R, L and ,, [, L of the two methods are relatedby (3.1), (3.2), and (3.5)-(3.10).

By repeating this process, one could obviously generate the recurrence matricesfor the ORTHODIR, ORTHOMIN, and ORTHORES algorithms for a whole seriesof OE methods, namely for OE(B(CA) t:, C,A), k 0, 1,..., at least as long as allthe required LU decompositions exist.

For the classical CG method of Hestenes and Stiefel [10], where B A is hpd,this change of the norm leads to the classical CR method [21], where/ A2, anda further LR step leads to methods minimizing the Am-norm of the error, which arementioned by Rutishauser [16]. In the classical case, the Hessenberg matrices reduceto tridiagonal matrices; the LR algorithm is then identical with Rutishauser’s qdalgorithm [14].

In the more general situation where B (CA)H and where CA is non-Hermitian,OE(B, C, A) is an OE method without an error minimization property; OE(/, C, A)OE((CA)H CA, C, A) is the GCR or generalized minimal residual (GMRES) method.

2 Since the columns of P, S, and X are indexed from 0 to ,- 1, it is natural to call the elementin the upper left corner of matrices like G, H, and R the (0, 0)-element of these matrices. Moregenerally, the (m, n)-element is then the one in the (m + 1)th row and the (n + 1)th column.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


We have seen above how the recurrence relations for direction vectors, residuals,and iterates of OE(B, C,A) are related with those of OE(BCA, C, A). However, itwould be as interesting to have direct relations between residuals, errors, and iteratesof the two methods.

First, recall from (2.12) and (3.4) that S PR and h P. Therefore,

(3.11)

In view of Theorem 2.1(e) we see further that the matrix of iterates of the secondmethod, J := [:0,... ,2-1], satisfies

( q- A-lb,,,_llT)h h P -XL + A-lbc-IConsequently,

XL 2(7h + A-1b(’,,_1_1 q- Ol-l_.l)lT.

Now, ,_ -- --eTl- --eTl --eTH1, and c-_11 1TA-ll 1TL1.Inserting ( ZH/-1 and H RL yields

XL 2RL- A-lb(eTRL- 1TL)llT.

From (3.11) and the normalizations sn-so E CA/Cnand n-so CAK., it is seen that/R has column sums 1, so that eTR eT. Therefore, eTfiRL1 eTL1 1TL1,and thus the whole rank-one modification in the above formula vanishes, leaving ussimply with

(3.12) X XAR.

Of course, conversely (3.11) follows readily from (3.12) and the unit column sums of

THEOREM 3.2. The iterates and the residuals of OE(B, C, A) and OE(/), C, A),where [ := BCA, are related by (3.12) and (3.11). The diagonal matrix h defined in(3.5) is characterized by the property of normalizing the column sums of the triangularmatrix ,R to 1.

The result of Theorem 3.2 would be of practical interest if the computation of theiterates xn were cheaper or more stable than the one of 5:n, or vice versa. However,regarding costs there is no hope for big savings: results of Joubert and Young [12],enhanced in [1], say that the lengths of the recurrence relations in question are thesame. Moreover, in general, zR is a full triangular matrix; hence relation (3.12)requires that all iterates of OE(B, C, A) are stored to compute those of OE(/), C, A),and vice versa. In other words, it is in general not cheap to compute one type ofiterates from the other. Of course, the situation is different if R is upper bidiagonaland thus G and H are tridiagonal. But then CA is B-normal(l) [5], [6], i.e., (CA)HBcBCA+B for some constants c and . This means that we are close to the situationdescribed next in 4, where B is replaced by (CA)HB instead of [ BCA. Regardingstability, there are certainly cases where one recurrence is more stable than the otherbecause the underlying formal inner product is a true inner product, i.e., the relevantinner product matrix is hpd. However, if B is hpd, but we aim at n, then theORTHORES algorithm allows us also to make use of B instead of/); see [1].

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


4. The second norm transformation, related to Sch6nauer-Weisssmoothing. Let us now turn to the connection between OE(B, C, A) and OE(/, C, A),where3/ :- (CA)HB. More generally, we consider the sequence OE((AHCH)kB, C, A),k 0, 1, 2,..., of OE methods. The following extrapolation process, which was in-troduced by Schhnauer [19] with a different objective, allows us to compute easily theiterates an of OE(/, C, A) from those of OE(B, C, A). Note that/ is definite if andonly if/} is definite, since (y,/y} CAy, B(CA)-1CAy} (z, [lzl, where z := CAy.

THEOREM 4.1. Assume that B and := (CA)HB are definite, and let [1 :=B(CA)-1 as before. Then the iterates Y.n and the preconditioned residuals n ofOE(/, C,A) (started with Y.o := x0) can be computed from the iterates x, and thepreconditioned residuals Sn of OE(B, C, A) by the recurrences

(4.1) X-*n X-*n--1 (1 adn) q-- XnOdn, 8-’n 8-n--1 (1 Odn) q" 8nadn,

n- 1,..., , in which w has to be chosen such that

s-i s, _l_t} Kn-l(1 -w,) + 8ndn,

namely, according to

(4.3) w, :=-

In (4.3) Ilyll is just a mnemonic abbreviation for (y,y). Its value may be

complex if/} is not hpd, and thus there may be no preference for any of the twovalues of the square root lYlI}.

Note that, in contrast to (3.12) where AR is in general a full matrix, (4.1) is ashort recurrence; only one previous iterate %-1 needs to be stored.

Proof. Xn is characterized by x,- x0 E K: and lCn -[-S fn, i.e., /C _Lh Sn.

Likewise, Zn is characterized by Zn -x0 E K: and/Cn _k f, i.e., /C _I_(CA), #n.We will show that the properties of Xn and the relations (4.1) and (4.2), if consideredas definition of x%, imply these characteristic properties for ’n. Making the inductionassumptions #n-1 So CA1C_I and /Cn-1 -I-(cA),[ ’n-1, which are obviouslysatisfied for n 1 since K:0 {o}, we conclude from (4.1), Sn SO CAICn, and.n -J- 8n that

8-n 80 e CA]C.n,CA]C. CAI N .n -J-[ n

Moreover, from (4.1) and (4.2), one gets

(4.6) 8"n--1 8n -J-[ n.Here,

8-*n--1 8n (n--1 80) (8i- 80) e CA]Cn,

3 The notation/ should not be confused with its ancient meaning of denoting a vector. It willbecome clear in (6.2) below why we prefer here/, n, etc., to, say,/, 5:n.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


and due to n-, so E CAin-, and Sn so - CAICn_,, which follows from (2.8), onehas actually sn - n, and hence n-1 Sn CAICn_I. Therefore, adding Fn-t sto a basis of CAn- yields a basis for CAin, and thus (4.5) and (4.6) mply

(4.8) CAIC, _1_ ,,which is equivalent to

]n _I_(CA)H[ n or ln _1_ fn;

while (4.4) impliesThere is also a matrix representation for the relationship between OE(B, C, A)

and OE(/, C, A). Let us introduce the coefficients

Wn 1(4.9) , n 1,..., a 1,

and the matrix

(4.10) (I) :=

1 11 1 (2

1 2 "’"

Since 1- Cn 1/Wn, (4.1) translates into

(4.11) x x(1 Cn) + Zn-, Sn=Kn(1-n)+Kn-n,

or

(4.12) X- 2(I), S S(I),

which will be used below and in the next section.If considered as definition for Zn, (4.1) and (4.2) generalize a procedure which was

introduced by Schhnauer [19, p. 261] for the case where B CA is real, and where/I, (CA)H(CA). He considered the procedure as a smoothing algorithm whichallowed him to generate from the iterates Xn of OE(CA, C, A) new approximationsx whose residuals converged smoother. His student Weiss [25] investigated thissmoothing algorithm both experimentally and theoretically. In his analysis he assumesthat CA)-H(CA)-1 B is real symmetric positive definite and that 8n _J_[ 8j (j <n) (i.e., (2.8) holds). Under these assumptions, his main result [25, p. 78] is equivalentto Theorem 4.1. In his experimental work, he obtained good results even in caseswhere/} is indefinite by using in (4.2) a matrix different from B (so that Theorem 4.1no longer holds). Our Theorem 4.1 is readily extended even to situations where Band/or B is indefinite. All we really need in our proof is that (2.7) and (2.8) hold forOE(B, C, A) and that the denominators in (4.3) do not vanish.

What one should conclude from Theorem 4.1 is that Schhnauer’s smoothing algo-rithm yields results that one could as well obtain by directly applying OE(/, C, A).Hence, it could be of practical interest in situations where the iterates of OE(B, C, A)can be computed directly more easily than those of OE(/, C, A), but the latter ones

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


are those desired. However, we have no indication that examples with much shorterrecurrence for xn than for n exist at all. In fact, our formulas (5.1) and (5.4) belowindicate that such a situation is unlikely. Increased stability could also be an aim, butin the most common application, where (CA)H CA belongs to the GCR or GM-RES method, it is the latter that is more stable than OE(B, C, A) OE(CA, C, A).Hence, the main advantage of Schhnauer’s smoothing process is that it can provideus with two related sequences of iterates, defined by different but related Galerkinconditions, roughly at the cost of one.

The residuals sn of GCR (or GMRES) converge optimally fast (according tothe definition of this method), but this does by no means imply that the errors alsoconverge optimally fast to zero. For example, in the case of an ill-conditioned realsymmetric matrix A, Paige and Saunders [13] claim their SYMMLQ implementationfor OE(A,I,A) to give more accurate results than their MINRES implementationof OE(A2, I, A). Hence, there may well be an interest in the reverse of Schhnauer’stransformation (4.1)-(4.3). Clearly, it must have the form (4.11), but the question ishow to determine Cn. Obviously, (4.3) cannot be used to find Wn, since sn is unknown.

THEOREM 4.2. Under the assumptions of Theorem 4.1, the iterates x, and thepreconditioned residuals Sn of OE(B, C, A) can be computed from the iterates Y.n andthe preconditioned residuals of OE(/, C, A) (started with Y.o xo) by the formulas(4.11), where Cn is chosen such that

so -[- g’n(1 -Cn) q- n--lCn,

namely, according to

(4.14) , :=

Proof. Since ].n -L(CA)H[j n and n-O -So E CA1C., we have for arbitrary

CAll,_1 2_ F(1 -Cn) +

_so + CAIC.

By choosing Cn to satisfy (4.13) one attains that KJn _L Fn(1 -Cn) + Fn-lCn, whichmeans that this linear combination equals s. Finally, the formula for xn is clearlyconsistent with the one for

Let us next generalizetwo further results of Weiss [25, Lemmas 4.4 and 4.5] to thecase of a non-nermitian B. As pointed out by Weiss, they are based on (4.1)-(4.3)only and make no use of Sn being generated by OE(B, C, A).

LEMMA 4.3. Under the assumption of Theorem 4.1 the following relations hold:

(4.15)

and

(4.16)

<Sn n-l,Hn--l><Sn n-1, *n-l>

8n [i_ (Sn n-1)(sn n-1)H[]Proof. By (4.1) and (4.2),

= )+(4.17)

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


which by inserting (4.3) leads to (4.15). Similarly, by inserting (4.3) into (4.1), writtenagain as s s-I + (s gn-1)wn, one gets (4.16). [:l

Although theoretically (4.15) allows us to "update" IIgll,}, the for,mula is toocomplicated to be useful in practice. However, if/ is Hermitian, it can be simplified,as we will show next. In contrast, (4.16)^ is a direct translation of the correspondingLemma 4.5 of Weiss [25] for Hermitian B.

In the Hermitian case, (4.15) first reduces to

(4.18) ][gn[[ 2 2 ]2 2 (if/ /H).

Second, we then find a simple formula for w. Note that

+

_(,_)

Here, the lt term vanishes since n- n S, and since is Hermitian, thesecond to lt term vanishes for the same re.on:

(4.19) [18n ’n--ll[} ----1[8nl]2 -}-][’n--1][ 2 (if/} /H)

Moreover, by the same argument, (s,- gn-1,/}g,-l> -][g-l[]2 E ]R if/} isHermitian, and thus

(.20) I1Consequently, by (4.18),

e (0, 1] (if/} =/H).

THEOREM 4.4. In addition to the assumptions of Theorem 4.1, suppose thatis Hermitian. Then (4.18)-(4.21) hold. In particular, if we let

(4.22) . arctan]l,n_lii/}

then, for n 1,..., ,(4.23)(4.24)

Consequently,

IIAll II&ll II&-l[ sin T }IA-1][ sin Tn,

n n

k:l

n--1 n--1

IIAI{ lls{l {l,oll tan Tn II sin rk {lfoll tan Tn H sin Tk.k=l k=l

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


Proof. The definition of Tn implies that

sin2 Tn 118nil + i[n_lll 2 COS2Tn 2 2

The equality signs in the middle of (4.23) and (4.24) follow then from (4.21) byinserting T,. The other equalities follow from

2 2 Note that []# > 0 Sn]2 > 0and the analogous relation ]] ]sn[.since is hpd; hence the nonnegative square roots ]]n][h and ]ISaiah are here welldefined.

Formula (4.21) is also due to Weiss [25, p. 77]. However, for the ce whereB A, C I, and A is real symmetric, the residual norm relations (4.23) and(4.24) have been given before by Paige and Saunders [13] when introducing theirSYMMLQ algorithm for OE(A, I, A) and their MINRES algorithm for OE(A2, I, A).Recently, Vuik [24] has established the same relations for the residuals of the Arnoldimethod (or, FOM) and of GMRES, which generalize the PaigSaunders approach tononsymmetric systems. Brown [3] h found equivalent, but slightly more complicatedformula. Both in the MINRES and the GMRES algorithm Tn is actually the angleof a Jacobi rotation used to update a Qa decomposition. Note that once ][n-[]hand []Sn]] are known, (4.22) and (4.23) us to compute ]#n]].

5. Relating the recurrence matrices in the second norm transforma-tion. Under the assumptions of Theorem 4.1, the relations (2.9)-(2.12) and the fur-ther properties summarized in Theorem 2.1 hold not only for OE(B, C, A), but alsofor OE(, C,A). In particular, there are matrices ,... ,fi_], , , , andn such that the recurrences (2.9)-(2.12) and those of Theorem 2.1(e) hold. Natu-rally, one must raise the question whether in analogy to Theorem 3.1 there are simplerelations between G and G, as well as between H and H.

Using (4.12), the answer to this question is not so difficult. First, (2.11) yieldsCA G-; hence

-1.

To get L and R satisfying G LR, we could LU decompose G. In order to normalizeL and R appropriately, we would need to choose, as in the LU decomposition ofthe (0, 0)-element of R equal to -1, and, for k 1,..., a- 1, the (n, n)-element of Requal to minus the reciprocal (n- 1, n- 1)-element of L.

However, we would like to find the direct connections between L and L, andbetween R and R. Thefffore, let us first consider CAP SL SOL, which musttranslate into CAP SL, with a lower bidiagonal L. Since and L are upper andlower bidiagonal, respectively, OL is tridiagonal. In view of (2.7), which holds for bothP and P, these two matrices are necessarily related by a nonsingular upper triangularmatrix "(5.2) P =/39.

Consequently,

SL CAP CAP SL S(L.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


Under our assumption of a definite B, the columns of S are linearly independent, andthus

(5.3) L (I)L.

This means that L is a LU decomposition of the tridiagonal matrix (I)L, and, con-versely, (I)L is a UL decomposition of the same matrix. In particular, is upperbidiagonal, i.e., iffn only depends on pn and lffn-1, which is no big surprise since thesame holds for n and ’n (cf. (4.12)).

In view of (2.12) and Theorem 2.1(dl), which shows that the (0,0)-element of Ris -1, we have so -P0 in our normalization. Accepting the same normalization forthe dotted quantities and making use of the assumption x0 0, we conclude thatP0 -so -s0 P0. In other words, the (0, 0)-element of is 1. This fixes the(0, 0)-element of f, in the LU decomposition f, of )L, and the requirement of zero

column sums in L determines the decomposition then uniquely.The relationships sought now follow readily: CAP PH turns into CAP

PH; hence

(5.4) /- H-.Final, from S PR we get S(I) PR, which by the definition of R must translateinto S PR, that is,

At first sight, this is a nice analogue of (5.3). However, here all matrices are uppertriangular, and (I) and are additionally bidiagonal.

THEOREM 5.1. Under the assumptions of Theorem 4.1, let H, G, R, L be thematrices defined by (2.9)-(2.12) of OE(B, C, A) and let , , , [ be the correspond-ing matrices of OE(/, C, A). Then these matrices are related by (5.1) and (5.3)-(5.5).The upper bidiagonal matrix is given by (4.10), and, according to (4.12), it trans-forms the iterates and the residuals of the two methods; by (5.2) the lower bidiagonalmatrix transforms the direction vectors. and are related by (5.3), which means

that Lq2 and OL are a LU and a UL decomposition, respectively, of the same tridiag-onal matrix.

6. Conclusions. Starting from the known relation between the recurrence co-efficients of the classical CG method OE(A,I, A) and its CR variant OE(A2,I, A),we have discussed two ways of generalizing this relation to OE methods OE(B, C, A),which, in particular, are also applicable to non-Hermitian matrices A. On the onehand, we have replaced B by J BCA, or, generally, by B(CA)k, k- 0, 1, Thesame transformation maps the "formal inner product" matrix / B(CA)-1 withrespect to which the (preconditioned) residual vectors s are formally orthogonal intothe "formal inner product" matrix B [CA with respect to which the directionvectors p, are formally orthogonal. The change of norm from B to/ BCA thusmeans that the old direction vectors become the new residual vectors (after a simplerenormalization). The new matrices of recurrence coefficients emerge from the oldones by applying one step of the LR algorithm.

On the other hand, we have replaced B by J (CA)HB, or, generally by[(CA)H]kB, k 0, 1, This second transformation generalizes a "smoothing pro-cess" introduced by Schhnauer [19, p. 261] and analyzed by Weiss [25]. For both

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


transformations we have specified the relationships among the recurrence matrices,the iterates, and the residuals. (From those for the residuals the ones for the errorsfollow trivially.)

Combining the two transformations, we can theoretically generate an infinitedoubly indexed array of methods,

(6.1) OE([(CA)H]kB(CA)t, C, A) (k, O, 1,...),

whose "formal inner product" matrices we can display as

(6.2)

=B(CA)-1 (CA)HB(CA)-1 [(CA)HI2B(CA)-1

B (CA)HB [(CA)H]2B

BCA (CA)HBCA [(CA)H]2BCA --....

B .(CA)2 (CA)HB(CA)2 -- [(CA)H]2B(CA)2 ---+

At the top of the array we have added the row of "formal inner product" matrices thatdefine the formal orthogonality of the (preconditioned) residuals s, of the methodsshown on the next line. (Recall that by definition of OE(B, C, A), the matrix B isthe one in the Galerkin condition (1.5) and the one for the formal orthogonality ofthe direction vectors Pn.) Generally, one always has to move up one row in the abovearray for the residual orthogonality. Note also that if one thinks of and as arrows,then, in (6.2), the arrows of/,/, and/ all point away from B.

Only very few of these infinitely many methods, essentially those with / I,B CA, and k, E {0, 1}, are of practical relevance. But these include the mostimportant Krylov space methods like GCR (= GMRES), GCG, and, except in thecase of breakdown, BICG.

Under the assumption that all matrices in (6.2) are definite (or, more generally,that the methods of interest do not break down), the relationships compiled in thispaper allow one to compute the iterates and residuals of one method from those ofanother one. Additionally, one can obtain the recurrence coefficients of the ORTHO-RES, ORTHOMIN, and ORTHODIR algorithms for one method from those ofanother one. Moving in the horizontal direction of the array is relatively cheap (shortrecurrences, few inner products), while moving vertically requires LR transformationsand, in general, long recurrences.

Clearly, in the classical situation, where B A is hpd and C I, the twotransformations are identical since/ A2 --/. More generally, they are the samewhenever/) B, i.e., when

(6.3) (CA)HB BCA,

which implies that CA is B-normal(l) [5], [6]. In this case, the recurrences of OR-WHORES, ORTHOMIN, and ORTHODIR are known to be short [5], [6], [12].

Acknowledgments. The author would like to thank Noel Nachtigal for his care-ful reading of the manuscript and the referees for their constructive comments.

Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p


REFERENCES

[1] S. F. ASHBY AND M. H. (UTKNECHT, A matrix analysis of orthogonal error agorithms, inpreparation.

[2] S. F. ASHBY, T. A. MANTEUFFEL, AND P. E. SAYLOR,.A taxonomy for conjugate gradientmethods, SIAM J. Numer. Anal., 27 (1990), pp. 1542-1568.

[3] P. N. BROWN, A theoretical comparison of the Arnoldi and GMRES algorithms, SIAM J. Sci.Statist. Comput., 12 (1991), pp. 58-78.

[4] n. C. ELMAN, Iterative methods for large, sparse, nonsymmetric systems of linear equations,Ph.D. thesis, Computer Science Department, Yale University, New Haven, CT, 1982.

[5] V. FABER AND T. MANTEUFFEL, Necessary and sufficient conditions for the existence of aconjugate gradient method, SIAM J. Numer. Anal., 21 (1984), pp. 352-362.

[6] , Orthogonal error methods, SIAM J. Numer. Anal., 24 (1987), pp. 170-187.[7] R. FREUND, On conjugate gradient type methods and polynomial preconditioners for a class

of complex non-Hermitian matrices, Numer. Math., 57 (1990), pp. 285-312.[8] R. FREUND AN) N. NACHTIGAL, QMR: a quasi-minimal residual method for non-Hermitian

linear systems, Numer. Math., 60 (1991), pp. 315-339.[9] M. H. GUTKNECHT, The unsymmetric Lanczos algorithms and their relations to Padd approx-

imation, continued fractions, and the qd algorithm, Preliminary Proceedings of the CopperMountain Conference on Iterative Method 1990, preliminary version.

[10] M. R. HETENES AND E. STIEFEL, Methods of conjugate gradients for solving linear systems,J. Res. Nat. Bur. Standards, 49 (1952), pp. 409-435.

[11] K. C. JEA AND D. M. YOUNG, On the simplification of generalized conjugate-gradient methodsfor nonsymmetrizable linear systems, Linear Algebra Appl., 52 (1983), pp. 399-417.

[12] W. D. JOUBERT AND D. M. YOUNG, Necessary and sufficient conditions for the simplificationof generalized conjugate gradient algorithms, Linear Algebra Appl., 88/89 (1987), pp. 449-485.

[13] C. C. PAIGE AND M. A. SAUNDERS, Solution of sparse indefinite systems of linear equations,SIAM J. Numer. Anal., 12 (1975), pp. 617-629.

[14] H. RUTISHAUSER, Der Quotienten-Differenzen-Algorithmus, Mitt. Inst. Angew. Math. ETH 7,Birkhuser, Basel, 1957.

[15] , Solution of eigenvalue problems with the LR transformation, in Further Contributionsto the Solution of Simultaneous Linear Equations and the Determination of Eigenvalues,Applied Mathematics, Vol. 49, National Bureau of Standards, Washington, DC, 1958,pp. 1-22.

[16] , Theory of gradient methods, in Refined Iterative Methods for Computation of theSolution and the Eigenvalues of Self-Adjoint Boundary Value Problems, Mitt. Inst. Angew.Math. ETH Ziirich, 8, Birkhiuser, Basel, 1959, pp. 24-49.

[17] Y. SAAD AND M. H. SCHULTZ, Conjugate gradient-like algorithms for solving nonsymmetriclinear systems, Math. Comp., 44 (1985), pp. 417-424.

[18] , GMRES: A generalized minimal residual algorithm for solving nonsymmetric linearsystems, SIAM J. Sci. Statist. Comput., 7 (1986), pp. 856-869.

[19] W. SCHSNAUER, Scientific Computing on Vector Computers, Elsevier, Amsterdam, 1987.[20] G. W. STEWART, Conjugate direction methods for solving systems of linear equtions, Numer.

Math., 21 (1973), pp. 285-297.[21] E. STIEFEL, Relaxationsmethoden bester Strategie zur L6sung linearer Gleichungssysteme,

Comment. Math. Helv., 29 (1955), pp. 157-179.[22] E. L. STEIFEL, Kernel polynomials in linear algebra and their numerical applications, in Fur-

ther Contributions to the Solution of Simultaneous Linear Equations and the Determinationof Eigenvalues, Applied Mathematics, Vol. 49, National Bureau of Standards, Washington,DC, 1958, pp. 1-22.

[23] V. V. VOEVODIN AND E. E. TYRTYSHNIKOV, On generalization of conjugate directions meth-ods, in Numerical Methods of Algebra (Chislennye Metody Algebry), Izd-vo MGU, MoscowState University Press, Moscow, 1981, pp. 3-9.

[24] C. VUIK, The rate of convergence of the GMRES method, Tech. Rep. 90-77, Technical Math-ematics and Informatics, Delft University of Technology, Delft, The Netherlands, 1990.

[25] R. WEISS, Convergence behavior of generalized conjugate gradient methods, Ph.D. thesis, Uni-versity of Karlsruhe, Karlsruhe, Germany, 1990.

[26] D. M. YOUNG AND K. C. JEA, Generalized conjugate-gradient acceleration of nonsymmetriz-able iterative methods, Linear Algebra Appl., 34 (1980), pp. 159-194.Dow

nloa

ded

12/0

7/14

to 1

29.1

32.2

10.5

5. R

edis

trib

utio

n su

bjec

t to

SIA

M li

cens

e or

cop

yrig

ht; s

ee h

ttp://

ww

w.s

iam

.org

/jour

nals

/ojs

a.ph

p

Changing the Norm in Conjugate Gradient Type...

Documents

Transcript of Changing the Norm in Conjugate Gradient Type...