Analysis of matrix-dependent multigrid algorithms

NUMERICAL LINEAR ALGEBRA WITH APPLICATIONSNumer. Linear Algebra Appl., 5, 165–201 (1998)

Analysis of Matrix-dependent Multigrid Algorithms

Yair Shapira∗†

Computer Science Department, Technion, Haifa 32000, Israel

Convergence theory for a multigrid method with matrix-dependent restriction, prolongation and coarse-gridoperators is developed for a class of SPD problems. It motivates the construction of improved multigridversions for diffusion problems with discontinuous coefficients. A computational two-level analysis methodfor a class of separable problems is also available. It motivates the design of matrix-dependent multigridalgorithms and, in particular, multiple coarse-grid correction algorithms for highly indefinite equations.Numerical experiments show the advantage of the present methods for several examples. © 1998 JohnWiley & Sons, Ltd.

KEY WORDS diffusion problem; discontinuous coefficients; indefinite Helmholtz equation; multi-grid method

1. Introduction

Some of the most robust and efficient multigrid methods for the numerical solution of (pos-sibly non-symmetric and with variable coefficients) elliptic boundary value problems arebased on matrix-dependent operators, that is, the transfer operators among grids and thecoarse-grid operators are generated as functions of the coefficient matrix. Once these oper-ators are constructed in a set-up phase, they are used in all the subsequent multigrid cycles.Among these methods are the methods of [2], the method of [23,24], black-box multi-grid [14,15,16], the method of [18], AutoMUG of [30,33,35,36] and the semi-coarseningmethod of [17,37]. For problems with constant coefficients, the success of these methodsmay be explained in part by smoothing analysis [44]. This analysis is available for severalkinds of Gauss–Seidel smoothers [8,9,45] and the Incomplete LU (ILU) smoother [23]. Forparallel implementations, multi-color Gauss–Seidel smoothing (e.g., red–black point relax-

∗ Correspondence to Yair Shapira, Los Alamos National Laboratory, Mail Stop B-256, Los Alamos, NM87545, USA. e-mail: [email protected] or [email protected].

† Present address: Los Alamos National Laboratory, Mail Stop B-256, Los Alamos, NM 87545, USA.

CCC 1070–5325/98/030165–37 $17.50 Received 10 April 1997©1998 John Wiley & Sons, Ltd. Revised 25 January 1998

166 Yair Shapira

ation (RB) for matrices with property-A) is efficient. It appears that for isotropic equationsthe RB smoother is efficient [9]. For moderately anisotropic problems, it is shown in [46]that red–black successive overrelaxation is preferable. For highly anisotropic problems,semi-coarsening or line relaxation should be considered [9]. The coarse-grid matrices inthe multigrid methods of [14,17,18,24] lack property-A; RB is thus not well parallelizableand multi-color relaxation should be considered instead.

One of the classes of problems for which standard multigrid is inefficient [10,35] is theclass of indefinite problems. Special treatments for such problems (involving projection) areproposed in [6,10,41]. An automatic approach for slightly indefinite problems is introducedin [5]. The AutoMUG method [35,36] is also suitable for such problems. Furthermore, it isshown in [30,33] that AutoMUG, when supplemented with an outer acceleration technique,can also handle highly indefinite equations. Both AutoMUG and the method of [5], however,apply to (2d + 1)-coefficient stencils only (whered is the dimension of the problem). Themethod proposed in this paper applies to the more general case of 3d -coefficient stencilsand outperforms AutoMUG for several highly indefinite examples.

For standard multigrid algorithms, two-level analysis has been developed for some modelproblems. A two-level analysis method for red–black point and line Gauss–Seidel smoothersfor the Poisson equation and anisotropic diffusion equations with constant coefficients ispresented in [40]. Another (equivalent) method is presented in [25]. These methods are basedon Fourier analysis, hence are restricted to normal problems with constant coefficients.A more general approach, using a splitting of the iteration matrix into smoothing andapproximating parts, is introduced in [22]. For certain finite element schemes, analysis isdeveloped in [3,4,7].

These analysis methods, however, do not apply to matrix-dependent multigrid algo-rithms. The aim of this work is to fill this gap, that is, to design matrix-dependent multigridalgorithms for which two-level analysis is available. The analysis motivates the further con-struction of robust multigrid versions for diffusion problems with discontinuous coefficientsand multiple coarse-grid correction algorithms for highly indefinite Helmholtz equations.The efficiency of these methods is illustrated numerically, which implies that the two-levelanalysis is a valid tool for predicting the performance of multilevel V-cycles.

Some of the present algorithms and numerical results already appeared in a conciseform in [31]. Here, however, we give a more complete approach for the development ofthe methods, including the analysis in Theorem 3.1 and the introduction and analysis ofmultiple coarse grid correction algorithms.

The contents of the paper are as follows. In Section 2 the multigrid method is defined.In Section 3 the symmetric positive definite (SPD) case is analyzed. In Section 4 multiplecoarse-grid correction algorithms are introduced. In Section 5 the computational two-levelanalysis method is introduced and applied to several model problems. In Section 6 black-boxmultigrid versions are developed in the spirit of the present method. In Section 7 numericalexperiments are presented. In Section 8 concluding remarks are made.

© 1998 John Wiley & Sons, Ltd. Numer. Linear Algebra Appl., 5, 165–201 (1998)

Analysis of Matrix-dependent Multigrid Algorithms 167

2. The multigrid method

2.1. Preliminaries

Consider the non-singular linear system of equations

Ax = b (2.1)

arising, for example, from a discretization of the elliptic PDE

−∇(D∇u)+ Eκ · ∇u+ βu = f (2.2)

in � ⊂ Rd with suitable boundary conditions, whered is the dimension of the problemandD, Eκ andβ are given bounded functions. (D is a d × d symmetric and uniformlypositive definite matrix andEκ is a d-dimensional vector.) Assume thatA is an operatorin l2(�), whereZ is the set of integer numbers and� ⊂ {Ej = (j1, . . . , jd)} ⊂ Zd is agrid (namely, a set of points associated with points in�). With this assumption,A may beindexedA = (aEi, Ej )Ei, Ej∈�. In the following, we treatA both as a matrix and an operator,

assuming that theEith equation in the matrixA (with some ordering) corresponds to gridpointEi. For any sets, let |s| denote its cardinality (the number of elements ins). For anypositive integerk, let Ik denote the identity matrix of orderk andI ≡ I|�|. For 1≤ j ≤ d,

denote by 1(j) thej th column vector ofId . Define the discrete boundary of� by

∂� = {Ei ∈ � | ∃j, 1≤ j ≤ d, such thatEi + 1(j) 6∈ � or Ei − 1(j) 6∈ �}

When periodic boundary conditions are imposed,� and� are a torus and a discrete torus,respectively. Hence in this case∂� = ∅.

For any matrixM, M = (mi,j )1≤i≤K, 1≤j≤L, let the absolute value ofM be defined by|M| = (|mi,j |)1≤i≤K, 1≤j≤L and the diagonal matrix of row-sums ofM by

rs(M) = diag(L∑

j=1

mi,j )1≤i≤K.

For any operatorM = (mEi, Ej ), define the off-diagonal sum operatorr(M) by

rEi (M) =∑Ej∈�, Ej 6=Ei

mEi, Ej = rs(M − diag(M))Ei,Ei .

2.2. The abstract two-level method

Let S : x → Sx, S ≡ S(A, b), be a relaxation (smoothing) procedure for (2.1) with thecorresponding iteration matrixS. Let ν1 andν2 be positive integers denoting, respectively,the number of pre-smoothings and number of post-smoothings used. The operatorsR (re-striction),P (prolongation) andQ (coarse-grid coefficient matrix) will be defined later. Theabstract two-level procedure TL is defined by

TL(xin, A, b, xout) : xout = Sν2(Sν1xin + PQ−1R(b − ASν1xin)

)(2.3)


168 Yair Shapira

� · � · �· ∗ · ∗ ·� · � · �· ∗ · ∗ ·� · � · �

Figure 1. Grid partitioning for the multigrid method. The grid points which serve also ascoarse-grid points are denoted by ‘∗’. Grid points for which values are assigned at the first

prolongation step are denoted by ‘·’. Grid points for which values are assigned at the secondprolongation step are denoted by ‘�’

An iterative application of TL is given by

x0 = 0, k = 0

while ‖Axk − b‖2 ≥ threshold· ‖Ax0 − b‖2TL(xk, A, b, xk+1) (2.4)

k← k + 1

endwhile

(Here ‘←’ stands for substitution.) Note that the iteration matrix for this method is

Sν2(I − PQ−1RA)Sν1 (2.5)

This representation is the basis for the computational two-level analysis in Section 5 below.In practice, multilevel implementation is often used; that is, the application ofQ−1 in (2.3)is replaced by a TL iteration for the solution of the coarse-grid problemQe = r with a zeroinitial guess.

2.3. The multigrid method in two dimensions

Here we describe a prototype multigrid method which gives intuition about the nature ofthe present method.

For symmetric problems, it is common to use in (2.3) the Galerkin operatorsR = P t

andQ = RAP . The construction of the prolongation operatorP from coarse to fine gridthus completes the definition of the multigrid method. Let us illustrate the construction inthe two-dimensional case.

The points denoted by ‘∗’ in Figure 1 are the coarse-grid points. The first prolongationstep, determining the values at points denoted by ‘·’ in Figure 1, is done as follows. Letf0be a fine grid point which lies between two coarse grid pointsc−1 andc1:

c−1 ∗ f0 · c1 ∗ (2.6)

Assume that the stencil ofA atf0 is of the form NW N NE

W C E

SW S SE

(2.7)



Let β be the sum of this stencil:

β = C +N + S + E +W +NE + SE +NW + SW (2.8)

β is a discrete approximation for the Helmholtz parameterβ in equation (4.1) below. Themotivation for this definition in terms of Fourier mode analysis is given in Section 6.2 below.Here we concentrate on showing that this definition is suitable for two-level analysis. Definethe scalarK by

K = C +N + S +NW + SW +NE + SE − β/2 (2.9)

The scalarK is used in the prolongation for determining the value atf0:

f0 = −K−1(Wc−1+ Ec1) (2.10)

(Heref0, c1 andc−1 denote also the values of a grid function at the corresponding gridpoints.) A similar procedure is used for prolongation in they-direction. The second stepof the prolongation, determining the values at grid points which do not lie in between twocoarse grid points (denoted by ‘�’ in Figure 1), is done in such a way that the homogeneousequation ((2.7) with zero right hand side) is satisfied there. For problems with zero row-sums(β ≡ 0) this procedure is equivalent to the method of [23,24].

2.4. General definition of the multigrid method

For the present analysis we need some more notation and a more complete definition ofthe multigrid method. Thed-dimensional framework will be useful in the sequel and, inparticular, in the proof of Theorem 5.1, where induction ond is used.

For any integerk, denote ‘k is even’ by 2|k and ‘k is odd’ by 2 6 | k. For any index sets ⊂ {1, . . . , d}, letg(s) be the set of grid points for which themth component is odd if andonly if m ∈ s:

g(s) = {Ej ∈ � | m ∈ s ⇔ 2 6 | jm}For example, in Figure 1 diamonds belong tog({1, 2}), dots tog({1}) or g({2}) and starsto g(∅). The family of disjoint sets{g(s)}s⊂{1,...,d} may be thought of as a coloring of� inwhich the setg(s) corresponds to colors (see [32], Method B). In our implementationg(∅)(namely, the set of grid points of which all components are even) serves as a coarse grid. Inthe sequel, we also use the notationc = g(∅) andf = � \ c. This notation induces a blockform for an operatorM : l2(�)→ l2(�) as follows:

M =(

Mff Mf c

Mcf Mcc

)(2.11)

whereMff andMcc are square matrices of order|f | and|c|, respectively.Let q(Ei) be the set of co-ordinates for which the corresponding components ofEi are odd,

namely, the set of co-ordinates for whichEi ∈ g(q(Ei)). (For example,q((3, 5)) = {1, 2},q((4, 5)) = {2}, etc.) LettEi denote the set of grid points contributing toEi in the prolongation,that is,

tEi ={Em ∈ � | | Em−Ei|∞ ≤ 1, q( Em) ⊂ q(Ei)

}


170 Yair Shapira

Next, we introduce the operatorA = (aEi, Ej ) from which the prolongation stencil is generated.This operator is a modification ofA in whichaEi,Ei is replaced by the off-diagonal row-sum

rEi (|A|) at certain pointsEi. The reason for this becomes clear in Section 5.1. The prolongationand restriction are then defined fromA. The definition ofA involves a distinction betweendifferent kinds of boundary points, which enables the computational two-level analysis inSection 5. In particular, the valuesrEi (|A|) are used only when the prolongation is donealong boundaries:

aEi, Ej ={

rEi (|A|) Ej = Ei ∈ f andtEi ⊂ ∂�

aEi, Ej otherwise(2.12)

For the example in Section 7.1.2, the criterion used in (2.12) is superior to the other criteriatested there, which indicates that methods which are suitable for analysis are also moreefficient numerically than standard ones.

We want the method to be applicable for indefinite problems as well. This propertyis achieved by constructing the operators in such a way that the computational two-levelanalysis of Section 5 is available. For this, we need to subtract a ‘discrete Helmholtz term’from theA defined above (see (2.9)). Letα(i) (1≤ i ≤ d) be some non-negative parameterssatisfying

∑di=1 α(i) = d. The default in this paper isα(i) ≡ 1 (1≤ i ≤ d). We first filter

out the effect of boundary conditions on the computation of the discrete Helmholtz term.For anyEi ∈ �, let Ez(Ei) ∈ Zd be the minimal integer vector (in, say, thel1 norm) for whichEi + Ez(Ei) andEi + 2Ez(Ei) are in� \ ∂�. Let Ee(Ei) = 2Ez(Ei). Clearly,Ez(Ei) = Ee(Ei) = 0 for allEi ∈ �\∂�. In the sequel, the effect of boundary conditions is avoided by calculating certainquantities at interior pointsi+e(i) rather than boundary pointsi. For the analysis in Section5, it is thus necessary to assume that the Helmholtz parameter is the same atEi + Ee(Ei) andEi(see the second assumption in Section 5.1). However, this assumption is not needed for theactual application of the method. Alternatively, one can defineEe(Ei) ≡ 0 and assume thatthe matrix argument ofr(·) is slightly extended so that it acts in a slight extension of� and,therefore, its stencil at points in∂� is of the same number of coefficients as that at pointsin the interior of�.

Define the discrete Helmholtz term at pointEi to be the row-sum ofA at the point belongingto the interior of� and being the closest toEi in the above sense (see also (2.8)):

βEi = aEi+Ee(Ei),Ei+Ee(Ei) + rEi+Ee(Ei)(A) = rs(A)Ei+Ee(Ei),Ei+Ee(Ei) (2.13)

Define the relative weighted diffusion in the directions of prolongation towardsEi ∈ f by

dEi =∑

k∈q(Ei) α(k)∑Ej∈tEi , q(Ei)\q( Ej)={k} aEi, Ej∑

Ej∈�, |Ei−Ej |1=1 aEi, Ej(2.14)

Substitute

aEi, Ej ←{

aEi,Ei − βEi(1Ei − dEi+Ee(Ei)

)Ei = Ej ∈ f

aEi, Ej otherwise(2.15)

where

1Ei ={

0 Ei ∈ f andtEi ⊂ ∂�

1 otherwise



Thus, the contribution of the discrete Helmholtz termβEi to the central element of theprolongation stencil is multiplied by the amount of diffusion in the corresponding directionsof prolongation divided by the total diffusion (compare with (2.9), whereα(1) = α(2) = 1are used and isotropy is assumed). With this procedure, the computational two-level analysisof Section 5 is available. Clearly, the first line of (2.15) is not invoked for problems of zerorow-sums (βEi ≡ 0).

Define the operatorsU = (uEi, Ej ) andL = (lEi, Ej ) using lumping onA over the sets� \ tEi :

uEi, Ej =

aEi,Ei +∑Em6∈tEi aEi, Em Ej = Ei ∈ f

1 Ej = Ei ∈ c

aEi, Ej Ej ∈ tEi , Ej 6= Ei0 otherwise

(2.16)

and lEi, Ej =

aEi, Ej Ei ∈ t Ej , Ej 6= EiuEi,Ei Ej = Ei0 otherwise

(2.17)

Assume that the variables are ordered in blocks corresponding to colorsg(s),s ⊂ {1, . . . , d},with decreasing order of|s|. SincetEi ⊂ {Ei} ∪ ∪|s|<|q(Ei)|g(s), U andL are upper and lowertriangular matrices, respectively. It is assumed hereafter thatU has non-vanishing maindiagonal elements. For any square matrixM, let diag(M) denote the diagonal matrix themain diagonal elements of which coincide with those ofM. Define

E = diag(U) = diag(L), P = U−1E andR = EL−1 (2.18)

For any subgridg ⊂ �, let Jg : l2(�)→ l2(g) be the injection

(Jgv) Ej = v Ej , v ∈ l2(�), Ej ∈ g

We consider two possible definitions for the coarse-grid operatorQ. For the first approach,Q is of the same order asA. However, the complexity of applyingQ−1 is much smallerthan that of applyingA−1. This approach, although unusual, is most useful in the analysisin Theorem 3.1 below. For the second approach, the order ofQ is smaller than that ofA,as in most multigrid algorithms. These approaches are summarized uniformly as follows.Define

eitherQ−1 =(

(diag(Aff ))−1(Rff )−1 00 (JcRAPJ t

c )−1

)(2.19)

or Q−1 =(

0 00 (JcRAPJ t

c )−1

)(2.20)

Note that the implementation (2.19) is equivalent to using a prolongation operator whichalso uses the residuals at grid points inf , in the spirit of the implementation proposed in[14]. Indeed, with (2.19), we have for a residual vectorr ∈ l2(c ∪ f )

PQ−1Rr = PQ−1(J tf Jf Rr + J t

c JcRr)

= J tf Pff (diag(Aff ))−1Jf r + PJ t

c (JcRAPJ tc )−1JcRr


172 Yair Shapira

On the other hand, the implementation (2.20) is in the spirit of standard multigrid, wherethe prolongation operator does not use the residuals atf :

PQ−1Rr = PQ−1(J tf Jf Rr + J t

c JcRr) = PJ tc (JcRAPJ t

c )−1JcRr

(See also (5.7) below.)We denote the above multigrid method by cartesian black-box multigrid (CBOX). This

name refers to both the implementations (2.19) and (2.20). It is mentioned in the appropriateplaces in the paper which of them is used. In general, (2.19) is used for SPD problems, whereit is found more efficient than (2.20) and (2.20) is used for indefinite problems, where (2.19)seems to have no practical advantage.

3. Convergence theory for SPD problems

The purpose of the following theory is to show the robustness of CBOX (implemented with(2.19)) for a class of SPD problems. This property is also verified numerically in Section7.1 below. The terminology ‘independent of the size of the problem’ used below meansthat the order ofA can be arbitrarily large, provided that‖diag(A)‖ and‖(diag(A))−1‖are bounded uniformly for all problem sizes. (See Appendix A for the definition of matrixnorm. The term ‘condition number’ is interpreted with respect to this norm.) For example,whenA represents a discretization of (2.2), it is assumed that the mesh is regular (the aspectratio is bounded) and thatA is given in the undivided form as in [2] so that its main diagonalelements are ofO(1). This property is required for the analysis only, not for the applicationof the algorithm. Note that the fourth assumption in the theorem (aimed to ensure diagonaldominance ofA andA− A) is not needed if the update (2.15) is not used.

By ‘A being diagonally dominant’ we meanri(|A|) ≤ aii for all i.

Theorem 3.1. Assume that

1. A is symmetric and diagonally dominant.2. aEi,Ei +

∑Em6∈tEi aEi, Em is bounded away from zero uniformly for allEi ∈ f and all problem

sizes.3. A is an L-matrix (i.e.,Ei 6= Ej ⇒ aEi, Ej ≤ 0).

4. For all Ei ∈ f ∩ ∂�, βEi ≤ aEi,Ei + rEi (A) = rs(A)Ei,Ei .

Then, for CBOX implemented with (2.19), the condition number of the preconditionedsystemPQ−1RA is bounded independent of the size of the problem.

For the proof, see Appendix A.

4. Multiple coarse-grid correction

The parametersα(i) used in (2.14) have not yet been specified. The choiceα(i) ≡ 1(1≤ i ≤ d) seems to be optimal and serves as a default for CBOX in this paper. However,multiple coarse grid corrections using restriction, prolongation and coarse grid operatorsresulting from multiple choices of theα(i)’s are also possible, particularly for indefinite



ω2

ω1√−β/π

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

q

Figure 2. The nearly singular Fourier modes which are handled by the multiple coarse gridcorrection (fork = 5).

Helmholtz equations of the form

−4u+ βu = f (4.1)

in the unit square withβ < 0. It is desired that each coarse grid correction handles oneof the nearly singular error modes of the equation (see [41] and Section 6.2 below). In thetwo-dimensional case, the choices (depending onj )

α(i) = 2 sin2(π/4+ (2i − 3)jπ/(2k)), i = 1, 2, |j | ≤ bk/2c (4.2)

give a uniform coverage of the circleω21 + ω2

2 = −β/π2 in the frequency space withresolution 2k/π , wherek is a positive integer (Figure 2). Thej th coarse grid correction ina multiple coarse-grid correction algorithm uses thej th choice in (4.2) for the constructionof its transfer and coarse grid operators and, for multilevel implementations, also for theconstruction of subsequent operators for subsequent coarser grids. The order in whichcoarse grid corrections are used is the increasing order ofj in (4.2). In order to achieve highresolution, largek should be used with 2bk/2c+1 different constructions ofR, P andQ. Asimilar idea (but in the context of standard multigrid) was used in [41] for one-dimensionalindefinite problems and also introduced by Achi Brandt in an informal talk in the 7th CopperMountain Conference on Multigrid Methods, 1995.

This approach, however, might be risky in the sense that a coarse grid correction for acertain nearly singular error mode might enhance other nearly singular error modes. Thisphenomenon does not occur in the skew-Laplacian equation handled in [41] since, for thisequation, for each singular error mode all other singular error modes lie in the null space ofthe restriction corresponding to it, and thus do not appear in the coarse-grid correction termfor it. Also, for the one-dimensional indefinite problem tested in [41] this phenomenon isnot significant since only two nearly singular modes exist. For multi-dimensional indefiniteproblems, however, this difficulty cannot be ignored. Indeed, applications of the two-levelanalysis method of Section 5 for two-dimensional indefinite Helmholtz equations showthat the spectrum of the iteration matrix for the multiple coarse-grid correction algorithmdeteriorates ask increases, no matter whether or not residuals were recalculated after eachcorrection. The only cure is to use a sequence of 2bk/2c + 1 V-cycles, each of which uses


174 Yair Shapira

another choice of parametersα(i) in (4.2) for its operators. (For multilevel implementations,each V-cycle uses the same choice forj in (4.2) at all levels.) We denote this sequence ofV-cycles by CBOX(k) (wherek is the same as in (4.2), so that CBOX(1) is equivalent toCBOX). Clearly, the time consumption of CBOX(k) is 2bk/2c + 1 times as large as that ofCBOX, since the same amount of smoothing is used at each V-cycle. The same is true alsofor storage requirements. Another possible multilevel implementation is using a W-cycle ofcycle-index 2bk/2c + 1, so that all the choices in (4.2) are used for any fine or coarse levelproblem in the multigrid cycle. This algorithm, however, is not considered here because ofits large time and storage consumption when a large number of levels is used and is left forfuture research.

5. Computational two-level analysis

5.1. Motivation and assumptions

The motivation for the definition of CBOX lies in the opportunity to simplify further therepresentation ofP for (2d + 1)-coefficient stencils arising, for example, from finite dif-ference or finite volume discretizations of (2.2). In this case, the coefficient matrixA takesthe form

A =d∑

i=1

Xi (5.1)

whereXi represents a three-point discretization of the derivatives in theith spatial direc-tion (including possible substitution for these derivatives at the boundaries using boundaryconditions). The present methods are designed so that computational two-level analysisfor problems of the form (5.1) is available. This construction is done in the spirit of theAutoMUG method, which is specifically designed for such problems and, hence, is suitablefor two-level analysis. For (5.1), AutoMUG is defined as follows. Let

Pi = 2I − diag(Xi)−1Xi andRi = 2I −Xidiag(Xi)

−1 (5.2)

DefineTi = rs(Pi) andTi = 5j 6=i Tj . Define

R = Jc51≤j≤dRj , P = 50≤j≤d−1Pd−j Jtc andQ = Jc

d∑j=1

Tj RjXj Pj Jtc (5.3)

Here the order of elements under the product symbol5 is a decreasing index order. (Notethat the matricesJcTjJ

tc used in the definition ofQ in (5.3) are usually constant multiples

of the identity matrix.) Thus, for AutoMUG the transfer and coarse grid operators are givenexplicitly in terms of theXi ’s, which enables computational two-level analysis. This is notalways the case with CBOX. However, for certain geometries the lumping used in CBOXresults in operators with such simple representations. To guarantee this property, someassumptions are needed.

Assume that the discretization of the termβu in (2.2) contributes to diag(A) only andthat the amount of this contribution at pointEi ∈ � is βEi (defined in (2.13)). Assume alsothat this contribution is distributed among the matricesXi decomposingA in (5.1) such that



the amountα(k)βEi · rEi+Ee(Ei)(Xk)/rEi+Ee(Ei)(A) (5.4)

goes to(Xk)Ei,Ei (1≤ k ≤ d). In order to guarantee consistency, it is required that

d∑k=1

α(k) · rEi+Ee(Ei)(Xk)/rEi+Ee(Ei)(A) = 1 (5.5)

for everyEi ∈ �. This may be considered an alternation of the previous rule∑d

k=1 α(k) = d;they agree on isotropic problems. Assume also that for eachEi ∈ ∂� and 1≤ j ≤ d{

Ei + 1(j) 6∈ � or Ei − 1(j) 6∈ �}⇒ Eij mod 2∈ 6Ei (5.6)

where 1(j) denotes thej th column vector ofId and6Ei is either{0} or {1}. This assumptionmeans that corner points of∂� belong tog({1, . . . , d}) ∪ c. It is necessary for CBOX forguaranteeing that (2.12) implies thatP is representable in terms of theXi ’s alone. (As amatter of fact, this assumption is a little too strong; it is sufficient to assume that (5.6) holdsfor directionsj for which boundary conditions other than Neumann are imposed.)

In the remainder of this section we consider the implementation (2.20) only. With thisapproach, one may substitute

P ← PJ tc , R← JcR and Q← RAP (5.7)

(in this order).

Theorem 5.1. Assume that (5.1) holds with thediag(Xi)−1Xi ’s commuting with each

other and theXidiag(Xi)−1’s commuting with each other. Assume also that the above

assumptions (namely, those made in (5.4)–(5.7)) hold. ThenR andP for CBOX are thesame as those of AutoMUG (defined in (5.3)).

For the proof, see Appendix B.Note that multiple coarse grid corrections are also available for AutoMUG, since multiple

choices of theα(i)’s in (5.4) induce multiple choices of theXi ’s decomposingA in (5.1)and, therefore, multiple possible choices of the operators used in AutoMUG. We denote byAutoMUG(k) the multiple coarse-grid correction algorithm which uses 2bk/2c+1 V-cycleswith the same parameter choices as CBOX(k).

5.2. The computational two-level analysis method

Assume thatA is diagonalizable and that, for every eigenvector ofA of the formv ={v Ej } Ej∈�,

v(σ) ≡ {(−1)

∑d

i=1σiji v Ej } Ej∈�, σ ∈ {0, 1}d

are also eigenvectors ofA with the corresponding eigenvaluesλ(σ). This assumption holds,for example, for (a) periodic problems with constant coefficients and an even number of gridpoints in each spatial direction; (b) problems with constant coefficients and 3d -coefficientcompletely symmetric stencils and (c) problems of the form (5.1) (see [48], Section 7.1).Let us compute the symbolA of A, namely, the representation ofA in the subspace spanned


176 Yair Shapira

by thev(σ)’s for somev. The basis used for this is{J tg(s)Jg(s)v}s⊂{1,...,d}. For any sets, let

2s denote the family of subsets ofs. Define the isomorphisms

s ∈ 2{1,...,d} → σ(s) ∈ {0, 1}d by σ(s)i = 1 ⇔ i ∈ s

and span{{J tg(s)Jg(s)v}s⊂{1,...,d}} → l2({0, 1}d) by J t

g(s)Jg(s)v → σ(s)

Then the symbolJg(s) is the (∑d

i=1 σ(s)i2i+1)st row ofI2d . Define the symmetric orthog-onal discrete Haar transform by

H = (hγ,δ)γ,δ∈{0,1}d , hγ,δ = 2−d/2(−1)

∑d

i=1γiδi

Using the latter isomorphism, we have

A = H diag(λ(σ))σ∈{0,1}d H

For cases (a) and (b), (2.12) and (2.15) and (5.6) guarantee thatR andP are also available. Forcase (c), assume that theXi ’s in (5.1) have constant main diagonals and commute with eachother. (These assumptions hold, for example, for normalized finite volume discretization ofDirichlet-Neumann convection-diffusion problems with constant diffusion and separableconvection.) Then theXi ’s can be computed in the same way asA and R and P resultfrom corresponding symbol products in (5.3) (using Theorem 5.1). The symbol ofQ isthen obtained from the symbol productQ = RAP . For AutoMUG, the symbols of therestriction, prolongation and coarse-grid coefficient matrix are computed similarly, usingthe definitions (5.2)–(5.3) and assuming that the matricesJcTiJ

tc used there are constant

multiples of the identity matrix (see [29] for details).Let 0 ≤ k ≤ d − 1 be a fixed integer and consider a 2d−k-color hyperplane relaxation

with k-dimensional hyperplanes (for example, whenk = 0 this is a multi-color pointrelaxation with the colorsg(s), s ⊂ {1, . . . , d}). For the symbolS of this relaxation, writeA = (Ai,j )1≤i,j≤2d−k , where theAi,j ’s are blocks of order 2k corresponding to the variouscolors. The symbol of the first color relaxation (with an overrelaxation parameterω) isgiven by

S1 =(

(1− ω)I2k −ωA−11,1A1,2 · · · −ωA−1

1,1A1,2d−k

0 I2d−2k

)

and S is just the product of such symbols. The symbol of the iteration matrix (2.5) isthen obtained by replacing individual matrices there by their corresponding symbols. Formultiple coarse-grid correction algorithms two possible implementations exist: the additiveapproach, for which the iteration matrix symbol is

Sν2

(I2d −

(∑j

Pj Q−1j Rj

)A

)Sν1 (5.8)

(wherej runs over all coarse-grid corrections) and the multiplicative approach, for whichthe residual is recomputed after every coarse-grid correction and the iteration matrix symbolis

Sν25j(I2d − Pj Q−1j Rj A)Sν1 (5.9)



The spectrum of the iteration matrix may be computed by scanning over eigenvalues cor-responding to elements in a set9 of eigenvectors ofA satisfying|9| ≥ d|�|/2de andu, v ∈ 9 ⇒ u 6∈ span{v(σ)}σ∈{0,1}d and computing numerically the spectra of the corre-sponding symbols of the iteration matrix.

5.3. Applications of the computational two-level analysis method

For separable problems of the form (5.1) the spectrum ofA is the sum of those of thematricesXi . In some cases it is known that these spectra lie in the interior of certain ellipses(or circles, using Gershgorin’s theorem) in the complex plane. Since the eigenvalues of theiteration matrix are given as a meromorphic function of those of theXi ’s, one may scanover the corresponding ellipses (and possible poles of this function) and obtain an upperbound for the asymptotic convergence factor. Alternatively, the spectra of theXi ’s can becomputed numerically either by an LU method or by a Lanczos-type method.

5.3.1. Finding optimal implementation parametersThe computational two-level analysis method may be used for finding optimal implemen-tation parameters such as the overrelaxation parameterω and a residual over-weightingparameter [13]. (Different overrelaxation parameters may be used in the various smooth-ings.) In [46], it is shown that the smoothing factor for (possibly anisotropic) diffusionequations has a unique local minimum as a function of the overrelaxation parameterω inthe permissableω-interval [1, 2]. Here we consider a more general question, that is, find-ing n optimal implementation parameters (e.g., overrelaxation and residual over-weightingparameters) in corresponding permissable intervalsF1, F2, . . . , Fn. Since we are not ableto find a global optimum for the values of these parameters for the multilevel method, weshow how to find a local minimum inF1 × F2 × · · · × Fn for the spectral radius of thetwo-level iteration matrix. Letgm = (−1+ √5)/2 denote the so–called ‘golden mean’.For any interval [a, b], let the golden mean partitioning be defined by the set of four points

f1([a, b]) = a, f2([a, b]) = a + (1− gm)(b − a), f3([a, b])

= a + gm(b − a) andf4([a, b]) = b

We are interested in finding small intervalsGk ⊂ Fk (1≤ k ≤ n) such that a local minimumfor the spectral radius of the two-level iteration matrix is contained inG1×G2×· · ·×Gn.The initial guess is, of course,Gk = Fk (1 ≤ k ≤ n). In order to improve this guess, weconsider the set of 4n n-dimensional vectors

{Ex, Ex = (x1, . . . , xn) | xk ∈ {fl(Gk)}4l=1, 1≤ k ≤ n}

We find the vectorEy = (y1, . . . , yn) of parameter values for which the spectral radius ofthe two-level iteration matrix is minimized over this set. For 1≤ k ≤ n, we updateGk by

Gk ←{

[f1(Gk), f3(Gk)] yk ∈ {f1(Gk), f2(Gk)}[f2(Gk), f4(Gk)] yk ∈ {f3(Gk), f4(Gk)}

In each update, the length ofGk is multiplied bygm. If accuracyε is desired for all the im-plementation parameters then a total amount of(4n−3n)dln(ε)/ ln(gm)e+3n applicationsof the computational two-level analysis method is required.


178 Yair Shapira

5.3.2. The model problemWe apply the computational two-level analysis method to model problems for which thespectra of theXi ’s are known in advance, namely, Dirichlet problems of the form

−uxx − uyy − cux − euy + βu = f

in the unit square. (wherec, e andβ are parameters). The second-order five-coefficientdifference scheme is used on anN × N uniform grid with cell sizeh = 1/(N + 1). Theanalysis can be applied also whenc is a function ofx ande is a function ofy, since theproblem is still separable; however, in order to avoid computations of spectra of tridiagonalmatrices we consider constantc ande only. We use the point red–black Gauss–Seidel (RB)smoother, for whichω = 1, k = 0 in Section 5.2 and colorsg(s) with even|s| are relaxedbefore those with odd|s|. The coarsening (2.20), (5.7) is used.

The eigenvalues ofA are calculated as follows. Define the diagonal two-dimensionaltensors

E = diag

((1− ch/2

1+ ch/2

)i1/2(1− eh/2

1+ eh/2

)i2/2)

(i1,i2)∈{1,...,N}×{1,...,N}

and3 = diag(xi1(c)+ xi2(e))(i1,i2)∈{1,...,N}×{1,...,N}

where, for anyt and 1≤ k ≤ N ,

xk(t) = 2+ βh2/2− 2√

(1− th/2)(1+ th/2) cos(πkh)

LetF denote the sine transform tensor the column-vectors of which are given in (6.2). Thenwe have

3 = EF−1E−1AEFE−1

and the eigenvalues ofA are given by the diagonal elements of3. Here the diagonalizingmatrix EFE−1 is preferable because of its small condition number (in comparison, e.g.,with that ofEF ), which indicates that the norm of the iteration matrix is not much largerthan its spectral radius.

5.3.3. The Poisson equationThe first example is the Poisson equation. In Table 1 we display the spectral radius of theiteration matrix of CBOX and AutoMUG for variousν = ν1 + ν2 (only the sum matters,see [25]). Since for oddN AutoMUG is equivalent to standard multigrid, it is expected thatthe results for AutoMUG will be close to the bound presented in [25]. Furthermore, sinceCBOX employs a Galerkin approach, its spectral radius is expected to be smaller than thatof AutoMUG. Table 1 confirms both expectations.

5.3.4. A convection-diffusion equationNext, we examine the (non-normal) convection-diffusion equation

− uxx − uyy + 15

8h(ux + γ uy) = f (5.10)

with some parameterγ and Dirichlet boundary conditions. The results for the casesγ = 1and γ = 0.8 are summarized in Table 2. The parameterγ is introduced for allowing



Table 1. Spectral radii of two-level iteration matrices for the Poisson equation. The bound forthe spectral radius presented in [25] is displayed in the last column.

N = 31 N = 31 N = 63 N = 63ν CBOX AutoMUG CBOX AutoMUG [25]

1 0.2452 0.2452 0.2488 0.2488 0.25002 0.0609 0.0738 0.0621 0.0739 0.07413 0.0267 0.0523 0.0280 0.0526 0.05274 0.0200 0.0407 0.0212 0.0408 0.0410

Table 2. Spectral radii of two-level iteration matrices for the non-normal convection diffusionequation with Dirichlet boundary conditions.ν1 = ν2 = 1 are used.

γ = 1 γ = 1 γ = 0.8 γ = 0.8N CBOX AutoMUG CBOX AutoMUG

15 0.0008 0.0060 0.011 0.02331 0.0369 0.0741 0.012 0.02463 0.0554 0.0740 0.012 0.024

problems with characteristics which are not obliquely aligned with the grid. The convectioncoefficient 15/8h is sufficiently small to guarantee diagonal dominance ofA. If it were2/h thenA would correspond to the upwind scheme and would not be diagonalizable,thus it would not be suitable for our analysis. (If periodic boundary conditions had beenimposed, though, the present approach would be applicable and equivalent to that of [40].)It was mentioned in [15] and proved in [34] that using half the artificial viscosity of theupwind scheme yields a stable discretization and diagonalizable coefficient matrix. Usinghalf artificial viscosity or periodic boundary conditions, however, would require complexarithmetic in the computational two-level analysis. Our aim is to show the applicabilityof the analysis method; hence, we avoid these difficulties by using the slightly perturbedupwind scheme (5.10). For highly non-symmetric problems of this type, the black boxmultigrid version of [15] is considered optimal in [47]. It is not claimed here that CBOX issuperior to it for such problems, but merely that it is suitable for computational two-levelanalysis. Although ILU [39] and lexicographically ordered Gauss–Seidel relaxation [12]are more suitable smoothers for this problem than RB, their parallel implementation is muchless efficient.

5.3.5. Indefinite equationsFinally, we consider indefinite Helmholtz equations in the unit square with Dirichlet bound-ary conditions. For the slightly indefinite equation

− uxx − uyy − 20u = f (5.11)

we obtained spectral radii similar to those for the Poisson equation, namely, 0.0625 forCBOX and 0.0741 for AutoMUG (N = 63, ν = 2). These results are in agreement withresults in [30,33] (see Theorem 1 and Corollary 1 there). Then we turn to the more difficulthighly indefinite equation

− uxx − uyy − 790u = f (5.12)


180 Yair Shapira

Table 3. 10 largest negative eigenvalues ofA (scaled) and 10 largest (in magnitude) eigenvaluesof two-level iteration matrices for the highly indefinite Helmholtz equation.ν1 = ν2 = 1 are used

N = 15 N = 15 N = 15 N = 31 N = 31 N = 31 N = 63 N = 63 N = 63A CBOX AMUG A CBOX AMUG A CBOX AMUG

-0.241 174.13 -886.52 -1.14×10−3 -6.710 -17.56 -1.16×10−3 -3.409 1.478-0.368 119.77 -727.10 -1.90×10−2 1.680 2.91 -8.42×10−3 0.751 -0.928-0.431 61.56 -594.85 -2.07×10−2 1.028 -1.75 -9.97×10−3 0.329 0.451-0.528 -42.44 -511.42 -5.04×10−2 -0.700 0.981 -1.08×10−2 0.152 -0.183-0.617 -33.37 -416.42 -6.03×10−2 0.554 0.786 -1.63×10−2 -0.136 -0.136-0.785 -21.84 -294.54 -6.17×10−2 -0.533 -0.673 -1.97×10−2 0.130 0.126-1.094 21.70 -288.96 -9.12×10−2 -0.493 0.542 -2.01×10−2 0.108 0.098-1.189 -20.85 -237.74 -0.102 0.420 -0.530 -2.46×10−2 -0.097 -0.096-1.285 -15.76 -162.36 -0.109 0.351 -0.508 -2.85×10−2 0.092 0.089-1.349 -11.81 -122.89 -0.123 0.279 -0.423 -3.48×10−2 -0.087 0.084

The (scaled) coefficient matrix for this problem withN = 15, 31 and 63 has (respectively)38, 32 and 30 distinct negative eigenvalues, the 10 smallest (in magnitude) of which aredisplayed in the first, fourth and seventh columns of Table 3. This information shows thatthe problem is nearly singular in the sense of [10]. In the other columns of Table 3, the 10largest (in magnitude) eigenvalues of the iteration matrices of CBOX and AutoMUG aredisplayed (AutoMUG is denoted by AMUG for short).

Although the basic iteration (2.4) diverges, it is seen that forN = 31 andN = 63 thereexist only few isolated eigenvalues of magnitude larger than or close to one. Consequently,it is expected that a Lanczos-type acceleration method applied to (2.4) would yield optimallinear combinations of iterants in the sense that error components corresponding to largeeigenvalues would be annihilated. This implementation would yield convergence controlledby the small eigenvalues alone, as confirmed numerically in Section 7.2.2. Moreover, theanalysis in [6] and the computational multilevel analysis in [30,33] show that the numberof levels may be arbitrarily enlarged so long as the appropriate mesh size for the coarsestgrid is preserved. A 31× 31 or a 15× 15 grid is thus suitable for serving as a coarsest gridin a multilevel implementation for this problem (see also Table 11).

5.3.6. Multiple coarse-grid correction for highly indefinite equationsFor example (5.12) we have also examined the multiple coarse grid correction variantsCBOX(5) and AutoMUG(5) defined in Section 4 and the end of Section 5.1, respectively.As mentioned there, it was found that neither (5.8) nor (5.9) could improve the abovespectra; the only cure is considering five consecutive V-cycles (using different choices ofthe parametersα(i) and, hence, different operators in each V-cycle) as a single iteration.The seven largest (in magnitude) eigenvalues of the iteration matrix for this iteration aredisplayed in Table 4. It is also seen from Table 4 thatν1 = 0 achieves almost the same resultsasν1 = 1. The more efficient V(0,1)-cycle is thus preferable for multilevel implementations,assuming that most of the computational work lies in the smoothing procedure (see Table12 below).

ForN = 31 the spectrum of AutoMUG(k) deteriorates with increasingk. The spectrumof CBOX(k), however, keeps improving, as is illustrated in Table 5. Although largek’smight be impractical due to storage constraints, the results in this table indicate that thealgorithms are developed in the right direction.



Table 4. Seven largest (in magnitude) eigenvalues of two-level iteration matrices for the highlyindefinite Helmholtz equation

N Method ν1 ν2 Eigenvalues

31 AutoMUG(5) 0 1 155.9−6.937 −6.506 4.558 4.533 1.069−0.55031 CBOX(5) 0 1 0.481 0.284−0.250 −0.247 −0.210 0.165 0.12231 AutoMUG(5) 1 1 209.9 4.069 4.069−3.933 −3.908 1.073 1.07331 CBOX(5) 1 1 0.493 0.492−0.237 −0.237 0.038−0.032 −0.02863 AutoMUG(5) 0 1 −0.136 −0.135 −0.074 −0.073 −0.019 −0.019 0.00263 CBOX(5) 0 1 0.0024 0.0016 0.0016 0.0016 0.0015 0.0015 0.001563 AutoMUG(5) 1 1 −0.134 −0.134 −0.075 −0.075 −0.018 −0.018 −0.000163 CBOX(5) 1 1 0.0007 0.0007 0.0000 0.0000 0.0000 0.0000 0.0000

Table 5. Spectral radii of two-level iteration matrices for the highly indefinite Helmholtzequation withN = 31

k 9 9 11 11 13 13 15 15 17 17

ν1 1 0 1 0 1 0 1 0 1 0ν2 1 1 1 1 1 1 1 1 1 1

CBOX(k) 0.105 0.277 0.230 0.131 0.061 0.059 0.085 0.092 0.0005 .007

6. Related multigrid algorithms

6.1. Black-box multigrid versions

Here we describe a prototype version of black box multigrid of [14] in the two-dimensionalcase and modify it in the spirit of CBOX. From this discussion, the relation between CBOXand black box multigrid as well as the advantages and disadvantages of both methodsbecomes clear. For brevity, we denote black box multigrid by BBOX.

The only difference between the definition of CBOX in Section 2.3 and that of BBOXis that for BBOX (2.9)–(2.10) are replaced by the following procedure. Do a column sum(lumping) on the stencil (2.7):

NW N NE

+W C E

+SW S SE

−− −− −−L K U.

(6.1)

Then, replace (2.10) byf0 = −K−1(Lc−1+ Uc1)

Note that ifNE dominatesE andSE (namely,|NE| � |E| and|NE| � |SE|), thenU

dominatesE, resulting in a strong coupling on the right which does not exist in the stencilof A. This phenomenon might lead to the stagnation described in Section 7.1 below. The fixsuggested here in the spirit of CBOX is to ‘throw’ large stencil elements (such asNE in theabove case) onto the main diagonal. (This is done, of course, only for the construction of the


182 Yair Shapira

stencils ofR andP . The stencil ofA remains unchanged.) More accurately, if|NE| ≥ τ |E|(whereτ ≥ 1 is a predetermined parameter) then the column-sum (6.1) uses not the stencilof A (2.7) but rather the modified stencil

NW N 0W C +NE E

SW S SE

(and similarly when|NW | ≥ τ |W | etc.). A similar procedure is used for prolongation in they-direction. The parameterτ should be intermediate to the values of the scaled diffusioncoefficient (τ might also vary with the spatial location, but this is not considered here).Reasonable choices forτ might beτ = 1 or

1≤ τ ≤ minEi, Ej∈�, |aEi,Ei |≥4|a Ej, Ej |

|aEi,Ei |/|a Ej, Ej |

This approach is denoted hereafter by BBOX(τ ). (Note that the continuous parameterτ

is not the same as the integer parameterk in CBOX(k) which represents the number ofV-cycles used in a single CBOX(k) iteration. In BBOX(τ ) only one V-cycle is used.) It maybe considered as a compromise between CBOX (which enjoys the theoretical backgroundin Theorem 3.1 but is inefficient for some examples) and BBOX.

6.2. Multigrid for indefinite problems

An important feature of well-constructed multigrid algorithms is that the prolongation isalmost accurate for the nearly singular Fourier modes of the equation. Here a mode denotesa vector of the form(

e2πh√−1

∑d

i=1jiωi

)1≤jk≤N, 1≤k≤d

(h = 1N

) for periodic boundary conditions(5d

i=1 sin(πjiωih))1≤jk≤N, 1≤k≤d

(h = 1N+1) for Dirichlet boundary conditions

(6.2)with the corresponding frequency vectorEω = (ω1, . . . , ωd) ∈ {1, . . . , N}d . For the Poissonequation, for example, the prolongation is accurate for the constant vector, corresponding toEω = E0 [9,17,37]. For diffusion problems with discontinuous coefficients, it is accurate forcertain grid functions with constant flux [2,14,36]. For highly non-symmetric problems, itshould be also accurate for modes which are smooth in the characteristic direction [47]. Theproblem of constructing suitable prolongation for indefinite problems is considered here.

Consider first the periodic two-dimensional case (2.6)–(2.7). A nearly singular mode(ω1, ω2) should satisfy

C +Ne2πh√−1ω2 + Se−2πh

√−1ω2

+ e2πh√−1ω1(E +NEe2πh

√−1ω2 + SEe−2πh√−1ω2) (6.3)

+ e−2πh√−1ω1(W +NWe2πh

√−1ω2 + SWe−2πh√−1ω2) = 0

A proper prolongation at the pointf0 in (2.6) for such a mode should thus use a column-sum



of the form

NWe2πh√−1ω2 Ne2πh

√−1ω2 NEe2πh√−1ω2

+W C E

+SWe−2πh

√−1ω2 Se−2πh√−1ω2 SEe−2πh

√−1ω2

(6.4)

rather than (6.1), where (6.3) can be solved locally forω1 andω2. Note that thex-directionfrequencyω1 does not appear in (6.4) since it is incorporated in the resulting one-dimensionalstencil, which represents a one-dimensional indefinite problem in thex-direction.

The situation simplifies considerably when the nearly singular modes are derived from thePDE rather than the discrete stencil. Consider, for example, the periodic two-dimensionalindefinite Helmholtz equation

− uxx − uyy + βu = f (6.5)

in the unit square with periodic boundary conditions, whereβ < 0 is a parameter. Here thecontinuous nearly singular modes are functions of the forme2π

√−1(ω1x+ω2y) with ω21+ω2

2 ≈−β/(4π2). We seek a prolongation which is almost accurate for nearly singular modes with|ω1| = |ω2|. In this case, the 1-d continuous modee2π

√−1ω1x is a nearly singular mode forthe one-dimensional indefinite Helmholtz equation−u” + β/2 = f in the unit interval.Define the discreteβ by the stencil sum

β = C +N + S +W +NW + SW + E +NE + SE

Do the column sum (6.1) using not the stencil ofA in (2.7) but rather the modified stencil NW N NE

W C − β/2 E

SW S SE

The resulting one-dimensional prolongation is almost accurate fore2π√−1ω1x . For the usual

central second order five-coefficient scheme for (6.5), this procedure is just theO(h4β2)

approximation of (6.3)–(6.4).Consider the multi-dimensional Helmholtz equation

−4u+ βu = f

in � ⊂ Rd with <(β) < 0 and suitable boundary conditions. Here the nearly singularmodes are those with frequencyEω satisfying

4d∑

i=1

sin2(

πhωi

2

)≈ −βh2 (6.6)

(for Dirichlet boundary conditions, which are used in most of the present experiments).This equation yields an implicit function forEω. However, since it is hard to solve, it is


184 Yair Shapira

approximated by the sphere equation

| Eω|2 =d∑

i=1

ω2i = −β/π2

This sphere is the limit whenh→ 0 of the Eω-curve (6.6). Although it is impossible for aprolongation to be accurate for all of these modes, it is possible for it to be rather accuratefor some of them, e.g., those for which

ω2i = −α(i)β/(π2d), 1≤ i ≤ d

where theα(i)’s are non-negative weights satisfying∑d

i=1 α(i) = d (the above two-dimensional example usesα(1) = α(2) = 1). The modified BBOX algorithm resultingfrom the above guidelines uses the stencil of the matrixA defined in (2.12)–(2.15) (ratherthan that ofA used in (6.1)) in the BBOX lumping used for constructingP (see [31] formore details). For symmetric problems, we takeR = P t and defineQ according to (2.19)or (2.20). (In the non-symmetric caseR is the transpose of a prolongation resulting fromAt

in which the central elements in the prolongation stencil are replaced by those ofP , see [31]and Section 7.3 below.) This approach is denoted hereafter by averaged black box multigrid(ABOX). Note that its two-level implementation for problems of (2d + 1)-coefficient sten-cils is equivalent to that of CBOX; hence, it enjoys the computational two-level analysis ofSection 5 for problems of the form (5.1). Furthermore, by using multiple coarse grid cor-rection with various choices of the parametersα(i) one may obtain algorithms ABOX(k)with the same computational two-level analysis as the corresponding algorithms CBOX(k).

It was also observed in applications of the computational two-level analysis that thelower bound for the coarsest grid resolution introduced in [30,33] also holds for multiplecoarse grid algorithms (since the nearly singular modes cannot be approximated on furthercoarser grids, see references). Indeed, the spectrum of the iteration matrix of CBOX(k)for equation (5.12) withN = 15 (computed as in Section 5.3) deteriorates ask increases.This difficulty may be partially relaxed by using semi-coarsening, implemented as follows.If semi-coarsening black box multigrid is used (namely, the second coarsening method in[16] applied to one of the spatial directions only), then averaging parametersα(i) shouldbe used as above in the coarsened spatial direction. If the method of [17,37] is used, thenin the coarsened spatial directionα(i) should be used as above and in the other spatialdirections the current nearly singular eigenvector ofA should be used instead of the constantvector used there. The semi-coarsening is done in the spatial direction in which this nearlysingular mode is relatively smooth. In two-dimensions, for example, this would mean thatsemi-coarsening in directioni is performed wheneverα(i) < α(3− i), i = 1, 2. With thisapproach, the coarsest grid used might perhaps be coarser than that for the full coarseningapproach used here. However, it is not implemented here but left for future research.



Table 6. Convergence factors (cf) for black box umltigrid for diffusion problems withdiscontinuous coefficients

ξ Levels (2.19) (2.20)

30 2 .050 .07530 4 .083 .12031 2 .067 .09631 3 .980 .980

7. Numerical experiments

7.1. SPD problems

7.1.1. Isotropic examplesHere we demonstrate the efficiency of CBOX and the BBOX(τ ) variant introduced above forthe numerical solution of diffusion problems with discontinuous coefficients. The numericalresults in this subsection are also presented in [31]. Here, however, we give them with a fullexplanation for the stagnation of standard black-box multigrid for certain examples.

Consider finite volume discretization with mesh sizeh = 1 of the equation

−(Dux)x − (Duy)y = f, (x, y) ∈ (0, 62)× (0, 62)

whereD is defined by (see Figure 3)

D(x, y) =

1000+12 · 1

1000 (x, y) ∈ θ ≡ {(x, y) | |x − ξ | + |y − ξ | ≤ 1}

1000 (x, y) ∈ ((0, ξ)× (0, ξ)) ∪ ((ξ, 62)× (ξ, 62)) \ θ

1 (x, y) ∈ ((0, ξ)× (ξ, 62)) ∪ ((ξ, 62)× (0, ξ)) \ θ

0 (x, y) 6∈ (0, 62)× (0, 62),

(7.1)

and 0< ξ < 62 is the breaking point. This example is equivalent to Example II in [2]with the discretization (5.8) there. As discussed below, the difficulty with this example isin the areas of strong diffusion being separated by a thin strip. Indeed, when this strip is notpresented both BBOX and CBOX converge with a convergence factor of at most 0.46 (seeTable 8 below). (Although this convergence is slower than that for the Poisson equation,the number of iterations is still affordable.) Although cases involving thin strips are rare,the present example reflects situations that can occur in practice. Indeed, the stagnationobserved in BBOX is due to improper interaction between strong-diffusion regions on thecoarse grids; a similar difficulty may thus arise for diffusion problems with several disjointstrong-diffusion areas, since at some level these areas may interact.

The boundary conditions are

un = 0 x = 0 ory = 0Dun + 0.5u = 0 x = 62 ory = 62

The finite volume scheme of [2] is used on anN×N uniform grid withN = 63. (The origin


186 Yair Shapira

@@@@

00

ξ

62

ξ 62

D = 1000

D = 1

D = 1

D = 1000

Figure 3. The diffusion coefficientD in the present example. The distance between the regionsof strong diffusion is 21/2



Table 7. Convergence factors for BBOX(10) and CBOX for diffusion problems withdiscontinuous coefficients

BBOX(10) BBOX(10) CBOX

ξ Levels (2.19) (2.20) (2.19)30 4 0.084 0.114 0.32031 4 0.082 0.129 0.111

Staircase 4 0.077 0.096 0.121

lies on the grid point numbered (1,1).) The coarse grids consist of even numbered pointsof the next finer grid (as defined in Section 2.4). (This implementation is used throughoutSection 7 unless stated otherwise.) Both the implementations (2.19) and (2.20) of Section2.4 are tested. A four-color Gauss–Seidel smoother ([1] and Method A in [32]) is used in aV(1,1) multigrid cycle. The coarsest level equation is solved with six orders of magnitudeaccuracy. The convergence factor (cf) is defined by

cf = ‖Axlast− b‖2‖Axlast−1− b‖2

where ‘last’ is large enough for realizing the behavior of the process (thel2 norm of theresidual is reduced by at least six orders of magnitude). It is seen from Table 6 that stagnationoccurs when the breaking pointξ lies on the coarse grids. For this case, the strong couplingpattern near the junction point(ξ, ξ) = (31, 31) is displayed in Figure 4. It is seen fromthis figure that the third level system involves strong coupling between the upper right andlower left subdomains. Since such coupling does not exist in the original system, the thirdlevel cannot possibly supply a suitable correction term for the first level. For a two-levelimplementation, however, BBOX is equivalent to CBOX since five-coefficient stencils areused; hence, the convergence is rapid, as expected from Theorem 3.1.

The improved results obtained with CBOX and BBOX(10) are displayed in Table 7. Wehave also tested the ‘staircase’ problem (Example IV in [2]) with a 65× 65 grid. For thisgrid, criterion (2.12) suggests that the off-diagonal row-sumsrEi (|A|) should not be usedon the first level (since the first coarse-grid is interior to the finest one) but should be usedonly at further coarser levels near the upper and right edges of the domain, where coarsegrid boundary points coincide with boundary points of the second-level grid. The results inTable 7 also show the advantage of BBOX(10) in comparison with CBOX (implementedwith (2.19)).

Finally, we report results for a diffusion problem as above but with no thin strip separatingthe strong-diffusion areas in Figure 3. (This is equivalent to usingθ = ∅ in (7.1).) Theimplementation details are as above. The results are displayed in Table 8.

7.1.2. Prolongation strategies at boundariesWe have also used the above ‘staircase’ example (again withN = 65) to show the advantageof the criterion (2.12). The coarse grids consist of either even numbered points or oddnumbered points of the next finer grid. Note that theA used in [16] corresponds to eitherthe first and third rows in Table 9 or to the second and fourth ones, depending on theinterpretation of the notation ‘�’ used there. Criterion (2.12), on the other hand, correspondsto the second and third rows in Table 9 (at least on the finest level; the results in the thirdrow can be further improved by using Criterion (2.12) on coarse levels as well, see Table 7).


188 Yair Shapira

00

ξ

62

ξ 62(a)0

0

@@

@@@

@@@

@�

��

�

@@

@@@

@@@

@�

��

�ξ

62

ξ 62(b)

00

��

��@

@@@

@@

@@

��

��

��

�

��

��@

@@@ @

@

@@

ξ

62

ξ 62(c)

Figure 4. Strong coupling pattern near the junction point(ξ, ξ) = (31, 31) for black boxmultigrid for (a) the first level, (b) the second level and (c) the third level. The third level equation

is obviously not adequate

Table 8. Convergence factors for BBOX and CBOX for the diffusion problem with no thin strip(θ = ∅ in (7.1))

BBOX CBOX

ξ Levels (2.19) (2.19)

30 4 0.173 0.19231 4 0.460 0.460



Table 9. Various four-level implementations of BBOX(10) for the staircase problem (N = 65)

Coarse grids consist of: aEi,Ei Convergence factor

Odd numbered points aEi,Ei 0.161Odd numbered points rEi (|A|) 0.090Even numbered points aEi,Ei 0.089Even numbered points rEi (|A|) 0.209

Thus, it gives optimal results no matter whether even-numbered or odd-numbered pointsare used for the next coarser grid.

7.1.3. Locally anisotropic exampleConsider the equation

− ux√

1+ u2x + u2

y

x

− uy√

1+ u2x + u2

y

y

+ u = f, (x, y) ∈ (0, 1)× (0, 1)

with Neumann boundary conditions, whereu is the unknown function andu is a given func-tion. This problem arises from linearization of the ‘de-noising’ problem of [27];u is usuallyan approximation ofu, namely, a piecewise constant function. In our application,u is equalto 1 000 in a subdomain and to 1 elsewhere. A symmetric second-order finite differencing(as in [43]) is used on a uniform 63× 63 grid. The derivatives ofu are approximated atedge-midpoints as follows.

(ux)i+1/2,j.= (ui+1,j − ui,j )/h,

(uy)i+1/2,j.= (ui+1,j+1+ ui,j+1− ui+1,j−1− ui,j−1)/(4h)

(uy)i,j+1/2.= (ui,j+1− ui,j )/h

and (ux)i,j+1/2.= (ui+1,j+1+ ui+1,j − ui−1,j+1− ui−1,j )/(4h)

(It is assumed thatu satisfies homogeneous Neumann boundary conditions.) In our tests,ui,j is equal to 1 000 when 15≤ i, j ≤ 49 and to 1 elsewhere. Note that this impliesthat the problem is locally anisotropic at the discontinuities ofu. Indeed, the equationsat the points(14, j) and(15, j) (16 ≤ j ≤ 48), for example, have large coefficients forthe discrete secondx-derivative and very small ones for the discrete secondy-derivative.Therefore, the second condition in Theorem 3.1 does not hold. Although the conditionsin this theorem are sufficient conditions and not necessary ones, it seems that they give areliable indication about the efficiency of matrix-dependent multigrid algorithms. Indeed,no matter whether four-color point relaxation or alternating ‘zebra’ line relaxation is usedin the V(1,1)-cycle, BBOX and AutoMUG diverge for the above example and CBOX con-verges with convergence factor 0.98. (These results are for two-level as well as five-levelimplementations.) The reason for these poor results is that for anisotropic problems thenearly singular modes are those which are smooth in the direction of strong diffusion andmight oscillate in the direction of weak diffusion. Since the prolongation cannot possiblybe sufficiently accurate for all of these modes (unless local semi-coarsening is used, as inalgebraic multigrid), there must be some modes for which the coarse grid approximation is


190 Yair Shapira

Table 10. Convergence factors (cf) for four-level methods for the slightly indefinite equation.No acceleration is used

(2.19) (2.19) (2.20) (2.20)N ν1 ν2 BBOX ABOX BBOX ABOX AutoMUG

31 1 1 7.196 0.077 7.195 0.063 0.13163 1 1 0.431 0.064 0.431 0.064 0.096

poor. In the constant coefficient case, line relaxation may enable proper treatment of thesemodes [9]. However, this is not necessarily the case for problems with strongly varyingcoefficients. The only cure we were able to find is to use BBOX or CBOX in conjunctionwith the outer acceleration described in Section 7.2.2 below. With this acceleration, for theV(1,1)-cycle with five levels, the averaged convergence factor (defined in (7.4) below) is0.58 for BBOX and 0.55 for CBOX when the four-color smoother is used and 0.41 forBBOX and 0.29 for CBOX when the alternating ‘zebra’ smoother is used. (The initial errorin these experiments is equal tou.)

7.2. Indefinite problems

Here we consider five-coefficient stencils for the indefinite Helmholtz equation (6.5) withDirichlet (or mixed complex) boundary conditions. In particular, we test the methods BBOX,AutoMUG, ABOX and ABOX(k) (the integer parameterk is not to be confused with thecontinuous parameterτ in BBOX(τ ), which is not used here since no discontinuous co-efficients are presented). Since for two-level implementations ABOX(k) is equivalent toCBOX(k), the present results confirm those of Section 5.3 obtained from the computationalanalysis (the results in Table 11 below coincide with those of Tables 3 and 4). For morerealistic implementations, the advantage of ABOX over BBOX is demonstrated. It was alsoobserved that CBOX(k) is only slightly inferior to ABOX(k). Owing to its inexpensivemultilevel implementation, AutoMUG appears to be competitive in some cases; however,it should be kept in mind that AutoMUG is less robust in the sense that it cannot handlestencils with more than 2d + 1 coefficients.

The present examples use(2n−1)× (2n−1) grids for some positive integern; therefore,criterion (2.12) implies that the off-diagonal row-sumsrEi in (2.12) are never used. This isalso the implementation used for BBOX. It was observed that if the notation ‘�’ in [16] isinterpreted such that therEi ’s are used everywhere then the convergence results for BBOXare essentially unchanged.

7.2.1. Slightly indefinite exampleWe start with four-level implementations for the slightly indefinite equation

−uxx − uyy − 20u = 0, (x, y) ∈ � = (0, 1)× (0, 1)

with homogeneous Dirichlet boundary conditions and a random initial guess. The usual five-coefficient finite difference scheme is used on the finest grid, which is a uniformN×N grid.A V(1,1)-cycle is used with the RB smoother for AutoMUG and the four-color smoother([1] and Method A in [32]) for the other methods. The advantage of ABOX and AutoMUGis evident from Table 10.



Table 11. Convergence factors (cf) for two-level implementations for the highly indefiniteequation. No acceleration is used. These results are presented for the sake of confirmation of the

computational analysis in Tables 3–4

N ∂� ν1 ν2 BBOX AutoMUG ABOX ABOX(5)

31 (a) 1 1 10.613 17.563 6.710 0.49363 (a) 1 1 2.007 1.478 3.409 0.0006

Table 12. Preconditioned convergence factors (pcf) for two-level implementations for the highlyindefinite equation. The multigrid iteration is accelerated by CGS

N ∂� ν1 ν2 BBOX AutoMUG ABOX ABOX(5)

31 (a) 1 1 0.976 0.798 0.784 0.74331 (a) 0 1 0.915 0.726 0.767 0.741

7.2.2. Highly indefinite examplesWe consider the problem

− uxx − uyy − 790u = 0, (x, y) ∈ � = (0, 1)× (0, 1) (7.2)

with homogeneous Dirichlet boundary conditions on0D ⊂ ∂�, homogeneous Neumannboundary conditions on0N ⊂ ∂� and complex boundary conditions of the third kind

∂u∂n+ 10√−1u = 0 (x, y) ∈ ∂� \ 0D \ 0N (7.3)

(whereEn is the outer normal vector). We consider the following cases:

(a) 0D = ∂� 0N = ∅(b) 0D = ∅ 0N = {(x, y) ∈ ∂� | x = 0 orx = 1 ory = 0}

Case (a) yields an ill-posed problem which serves merely as a test problem. Case (b) gives amore realistic situation which is in the spirit of an example tested in [20]. A uniformN×N

grid is used. The finite volume scheme of [2] is used (in most of the domain it is equivalentto the usual finite difference scheme). The initial guess is random with values in(0, 1). Thecoarsest level equation is solved with six orders of magnitude accuracy.

In the examples in the sequel we have found essentially no difference between (2.19) and(2.20). Since the latter is less expensive in terms of time and storage (see (5.7)) it was usedhere. The RB smoother is used for the two-level implementations in Tables 11 and 12 inV(0,1) and V(1,1)-cycles.

The results in Table 11 are presented for the sake of confirmation of the computationalanalysis only. Indeed, they agree with those of Tables 3 and 4. In Table 12 we turn tomore realistic implementations, using an outer acceleration scheme. In order to handleeigenvalues of the iteration matrix with magnitude larger than one, we apply to the basicBBOX, AutoMUG, ABOX or ABOX(k) iteration the conjugate gradients squared (CGS)acceleration method of [38]. This technique is a generalization of the conjugate gradientsmethod to non-symmetric and indefinite problems; it has the advantage of avoiding thecomputation of the transposes of the coefficient matrix and preconditioner (the latter is


192 Yair Shapira

defined only implicitly in (2.3), so its transpose is not available). The cost of CGS iscomparable with that of the conjugate gradients method, namely, about one work unit periteration. No essential change in the convergence rates is observed when CGS is replacedby the transpose free quasi minimal residual acceleration method of [21].

We define the preconditioned convergence factor (pcf) by

pcf=(‖3−1(Axlast− b)‖2‖3−1(Ax0 − b)‖2

)2(last·(ν1+ν2+1)(2bk/2c+1))−1

, (7.4)

where3 is the multigrid preconditioner used,k is the parameter in (4.2) and ‘last’ isthe number of AutoMUG, BBOX, ABOX or ABOX(k) iterations used within the CGSiteration for reducing thel2 norm of the preconditioned residual by six orders of magnitude.The preconditioned residual norm is available in the CGS process. Furthermore, sincethe preconditioned system is better conditioned than the original one, the norm of thepreconditioned residual is a better convergence measure than that of the residual itself. Itwas also checked that thel2 andl∞ norms of the error decrease by at least four orders ofmagnitude during the convergence process.

The above definition of pcf takes into account the additional work required for smoothing,residual computation and multiple coarse-grid correction. (Its basic measure is the averagedconvergence factor for a V(0,1)-cycle with a single coarse grid correction.) This is why forABOX(k) for some examples pcf is larger than the corresponding cf in Table 11; this doesnot mean that the acceleration is bad for these examples but merely that, unlike pcf, cf doesnot incorporate the extra work involved. The results are summarized in Table 12.

When comparing ABOX(5) with other methods, it should be taken into account thatit requires larger amounts of storage and set-up time. The storage of coarse grid opera-tors is five times as large, which amounts to about 100% overhead in operator storagefor two-dimensional problems. Still, it is attractive for multiple right hand side problems.Furthermore, it is five times as cheap in terms of the work required for acceleration. Thisproperty is especially important in the context of parallel computing, where the inner prod-uct calculations required in the acceleration are expensive. Consider, for example, a messagepassing architecture with a mesh-connected array of processors. For V(0,1)-cycles, the im-plementation of [19] requires only one communication step per V-cycle. To this one shouldadd the communication required to broadcast the coarsest grid problem to be solved oneach processor independently. The acceleration procedure, on the other hand, requires twoadditional such broadcasts per iteration, namely, about twice as much communication over-head. This overhead becomes even larger when variants of GMRES of [28] are used foracceleration. GMRES seems suitable for accelerating ABOX(5) since an optimal acceler-ation method may take advantage of the improved spectrum. Here, however, only CGS isused.

We conclude by showing that, for the highly indefinite problem (7.2)–(7.3), once thecoarsest grid is properly chosen (i.e., 31× 31 grid for case (b)) high accuracy may beachieved by enlarging the number of levels. A V(0,1)-cycle is used with the RB smootherfor AutoMUG and the four-color smoother for the other methods. The results in Table 13are as expected from the analysis in [6] and [30,33], namely, the convergence rates areessentially unchanged when the number of levels increases.

The computational analysis results in Tables 3, 4 and 5 apply to case (a) and two-levelimplementations only. For multilevel implementations or for case (b), acceleration is also



Table 13. Preconditioned convergence factors (pcf) for multilevel solution of the highlyindefinite Helmholtz equation. The multigrid iteration is accelerated by CGS

N Levels ∂� ν1 ν2 AutoMUG BBOX ABOX ABOX(5)

63 2 (b) 0 1 0.610 0.614 0.509 0.636127 3 (b) 0 1 0.592 0.588 0.482 0.447255 4 (b) 0 1 0.648 0.594 0.488 0.503

Table 14. Preconditioned convergence factors (pcf) for multilevel solution of the highlyindefinite Helmholtz equation. The multigrid iteration is accelerated by CGS. The coarsest-grid

equation is solved approximately by 10 point Kacmarz sweeps

N Levels ∂� ν1 ν2 AutoMUG BBOX ABOX ABOX(5)

31 2 (b) 0 1 0.901 0.900 0.862 0.94063 3 (b) 0 1 0.949 0.947 0.865 0.947127 4 (b) 0 1 0.963 >0.99 0.863 >0.98

needed for ABOX(k). In these cases, ABOX(k) may yield larger pcf’s than those for ABOX(see Table 13). ABOX(k) should thus be considered for parallel implementations. Alter-natively, one might also choose parametersα(i) other than those suggested in (4.2). Forexample, one might use the curve (6.6) and seek parametersα(i) for which it is covereduniformly. In addition, one might seek discrete Fourier modes which describe the eigenvec-tors in case (b) better than those used in (6.2). It is also expected that the use of (6.4) insteadof (6.1) might improve the results. We believe that the present computational two-levelanalysis method may serve as a tool for finding optimal parameters for single and multiplecoarse-grid correction algorithms for various kinds of boundary conditions. We leave thisfor future research.

When a 15× 15 coarsest grid is used for case (b), all methods failed; the only cureis to solve the coarsest-grid equation approximately by the Kacmarz iteration [42]. Wehave used 10 sweeps of point Kacmarz relaxation implemented as follows. For AutoMUG,the lexicographical ordering of grid points was used; for the other methods, a nine-colorordering was used, which is suitable for parallel implementations. (Using W-cycles (orKacmarz relaxation at all the coarse grids) did not yield an essential improvement.) Inpractice, this ‘slow’ multigrid algorithm of Table 14 may be used as a coarsest grid solverin a more efficient multigrid implementation, such as that of Table 13.

7.3. Non-normal example

Here we test the two-dimensional circulating convection example of [13] (Example 11 in[36])

sin(π(y − 0.5)) cos(π(x − 0.5))ux − sin(π(x − 0.5)) cos(π(y − 0.5))uy = f

with Dirichlet boundary conditions. The region is the unit square(0, 1) × (0, 1) with atiny hole at the middle of it (discretized on anN × N uniform grid with a 1× 1 ‘hole’in the middle of it, where a trivial equation is used). For this region, an upwind schemeis inadequate [11]; following [13], we add to the unstable second-order difference scheme


194 Yair Shapira

O(h) isotropic artificial viscosity, the amount of which is locally chosen to be the minimalamount required for diagonal dominance. The initial error is (non-zero) constant except atone grid-point.

We have found that a naive implementation of (2.16)–(2.17) diverges; it is thus necessaryto use the idea of [15] to derive the prolongationP from the symmetric part ofA. In thissubsection we refer by BBOX to black box multigrid as implemented in [15], namely, withthe prolongationP derived from the symmetric part ofA (with no use of the right-handside, as in (5.7)) and the restrictionR being the transpose of a prolongation derived fromAt . We could, though, use the spirit of (2.17) to modify BBOX and obtain slightly improvedresults. This modification is done by takingR to be the transpose of a prolongation derivedfrom At in which the central elements in the prolongation stencil are replaced by thoseof P . We refer to this approach by BBOX∗. (We have also tried to use forR the centralelements of a prolongation stencil derived fromA, but this results in a divergent method.)As recommended in [13], we are using a W(1,1) cycle (implemented as in equation (1) in[30] and with the four-color smoothing method B of [32]). Indeed, it is evident from Table15 that the V-cycle is not robust with respect to the number of levels used. Still, the V-cycleconverges well when the number of levels used is limited which implies that the analysisof [47] applies to the coarse-grid problems as well. Since we are interested in automaticmethods, we do not use the other recommendations of [13], such as residual over-weightingand defect correction.

Since the coarsest-grid problem is not diagonally dominant, its exact solution yieldsnon-physical oscillations and, hence, a poor correction term. Therefore, the coarsest-gridproblem is handled as follows. When the coarsest grid consists of one point only, it issolved exactly by one relaxation. When it is of size 3× 3, it is solved approximately by25 relaxations for the V-cycles and one relaxation for the W-cycles. (This choice seemsto be the optimal one, especially for BBOX∗ which may diverge if some more relaxationsare used, which implies that its coarse-grid scheme has little artificial viscosity and, hence,approximates the fine grid scheme well.) When it is of size 7×7, it is solved approximatelyby 25 relaxations. CBOX is found to be considerably inferior to BBOX for all variants. Theresults are summarized in Table 15.

Another way for annihilating the non-physical oscillations generated by the coarse gridsand for achieving good convergence is to apply the CGS acceleration to the basic multigriditeration. With this approach, the coarsest grid problem may be solved exactly (to six ordersof magnitude) which provides a more uniform approach. The preconditioned convergencefactor pcf defined in (7.4) (withk = 1 there, since no multiple coarse grid correction isused) is used for reporting the rate of convergence. Note that, as mentioned above, pcfincorporates the additional work required for smoothing and residual computation and,therefore, although the numbers in Table 16 are sometimes larger than the correspondingones for the V-cycle in Table 15, they are better in terms of work required for convergence.

8. Conclusions

In this work we introduce a matrix-dependent multigrid method which is suitable for anal-ysis. For a class of SPD problems this analysis implies robustness, as is also confirmednumerically for diffusion problems with discontinuous coefficients. Furthermore, compu-tational two-level analysis is available for a class of separable problems, including certainnon-normal cases. Black box multigrid is then modified in the spirit of this method to yield



Table 15. Convergence factors (cf) for the circulating convection example. No acceleration isused

V(1,1) V(1,1) W(1,1) W(1,1)N Levels BBOX BBOX∗ BBOX BBOX∗

63 4 0.320 0.295 0.328 0.303127 5 0.42 0.50 0.339 0.28763 5 0.353 0.365 0.229 0.210127 6 0.621 0.68 0.266 0.22963 6 0.346 0.355 0.315 0.283127 7 0.620 0.700 0.317 0.250

Table 16. Preconditioned convergence factors (pcf) for the circulating convection example. AV(1,1)-cycle is used. The basic multigrid iteration is accelerated by CGS

N Levels BBOX BBOX∗

63 4 0.338 0.339127 5 0.469 0.43063 5 0.423 0.385127 6 0.599 0.54463 6 0.405 0.394127 7 0.519 0.574

improved versions. It appears that the careful design of the methods in light of the analysisyields better convergence rates for the examples tested.

For indefinite Helmholtz equations the analysis provides a way for choosing in advance asuitable resolution for the coarsest grid used. Also, it motivates the definition of a multiplecoarse grid correction multigrid algorithm and supplies a suitable implementation for it.The numerical results agree with the computational two-level analysis. The advantage ofthe present algorithms in comparison with standard ones is illustrated numerically for aslightly indefinite example and, with an outer acceleration, also for highly indefinite ones.

Appendix A

Let (·, ·) denote the usual inner product inl2(�):

(u, v) =∑Ei∈�

uEi vEi , u, v ∈ l2(�)

The terms ‘symmetric’ and ‘orthogonal’ and the transpose operator ‘t ’ used below are withrespect to this inner product. The following lemma is used in the proof of Theorem 3.1below.

Lemma A.1. Let M be a symmetric positive semi-definite matrix of order|�|. Then, forany vectorx ∈ l2(�),

(x, Mx) ≤ 2(x,(J tf Jf MJ t

f Jf + J tc JcMJ t

cJc

)x)


196 Yair Shapira

ProofLet x = J t

f Jf x − J tc Jcx. Then we have

0 ≤ (x, Mx) =(x,(J tf Jf MJ t

f Jf + J tc JcMJ t

cJc

)x)

−(x,(J tf Jf MJ t

cJc + J tc JcMJ t

f Jf

)x)

.

The lemma follows from

(x, Mx) =(x,(J tf Jf MJ t

f Jf + J tc JcMJ t

cJc

)x)

+(x,(J tf Jf MJ t

cJc + J tc JcMJ t

f Jf

)x)

≤ 2(x,(J tf Jf MJ t

f Jf + J tc JcMJ t

cJc

)x)

Let ‖ · ‖ denote the usual vector and matrix norms inl2(�).

‖v‖ =√

(v, v), v ∈ l2(�) and ‖M‖ = maxv∈l2(�), ‖v‖=1

‖Mv‖, M : l2(�)→ l2(�)

Proof of Theorem 3.1: Define

P = U−1 = PE−1, R = L−1 = E−1R and Q = blockdiag(I|f |, JcQJ tc )

Let us show first that the condition number ofP Q−1RA is bounded independent of the sizeof the problem. SinceA is symmetric, so isA defined in (2.12)–(2.15) andR = P t andthe norm ofA is equal to its spectral radius. It follows from the boundedness of‖diag(A)‖,the diagonal dominance ofA and Gershgorin’s theorem that‖A‖ is bounded independentof the size of the problem. From the fourth assumption,A andA − A are also diagonallydominant; hence, it follows from Gershgorin’s theorem thatA, A andA − A are positivesemi-definite. From the second assumption in the theorem it follows that the main diagonalelements ofUff are bounded away from zero, independent of the size of the problem. Fromthis and Gershgorin’s theorem applied toP t P andRt R, it follows that‖P ‖ and‖R‖ arealso bounded independent of the size of the problem.

Let x ∈ l2(�) be a non-zero vector and denoteε = (x, Ax). From the above properties,we have

‖Ax‖2 ≤ ‖A1/2‖2‖A1/2x‖2 = ‖A‖εand ‖Ax‖2 ≤ ‖A1/2‖2‖A1/2x‖2 = ‖A‖(x, Ax) ≤ ‖A‖ε ≤ ‖A‖ε

Let 1≤ k ≤ d be fixed. Define the matrices

Ek = J∪|s|=kg(s)

(A− P−1

)J t∪|s|>kg(s)

Fk = J∪|s|=kg(s)

(A− P−1

)J t∪|s|=kg(s)

andGk = J∪|s|=kg(s)

(A− P−1

)J t∪|s|<kg(s)



Note that these matrices are just block submatrices ofA − P−1. In fact, Ek is the block

submatrix corresponding to elements(A− P−1

)Ei, Ej= AEi, Ej with Ei ∈ ∪|s|=kg(s) and Ej ∈

∪|s|>kg(s), Fk is the block submatrix corresponding to elements(A− P−1

)Ei, Ej

with Ei, Ej ∈∪|s|=kg(s) andGk is the block submatrix corresponding to elements

(A− P−1

)Ei, Ej

with

Ei ∈ ∪|s|=kg(s) and Ej ∈ ∪|s|<kg(s). Define the matrixAk (of order|c ∪ f |) by

Ak = rs(|Et

k|) Etk 0

Ek Fk Gk

0 Gtk rs(|Gt

k|)

Clearly,

J∪|s|=kg(s)

(A− P−1

)= J∪|s|=kg(s)Ak

or, in other words, for everyEi ∈ ∪|s|=kg(s) and Ej ∈ �,

(A− P−1

)Ei, Ej=(Ak

)Ei, Ej

SinceJ∪|s|=kg(s)P−1J t∪|s|=kg(s) is diagonal,Fk is symmetric and, therefore,Ak andA−Ak are

symmetric. From the third assumption (the L-property ofA) it follows thatAk andA− Ak

are diagonally dominant. From these properties and Gershgorin’s theorem it follows thatAk andA− Ak are positive semi-definite, which implies

‖Akx‖2 ≤ ‖A1/2k ‖2‖A1/2

k x‖2 = ‖Ak‖(x, Akx) ≤ ‖A‖(x, Ax) ≤ ‖A‖ε

Hence,

| ‖Jf Ax‖ − ‖Jf P−1x‖ |2 ≤ ‖Jf (A− P−1)x‖2

=d∑

k=1

‖J∪|s|=kg(s)Akx‖2

≤d∑

k=1

‖Akx‖2 ≤ d‖A‖ε

This implies that

‖Jf P−1x‖ ≤ √ηε, where η =(√

d + 1)2 ‖A‖

SinceEcc = I|c|, JcRAP J tc = Qcc. It follows that(

x, R−1QP−1x)=

(P−1x, QP−1x

)=

(J t

c Jcx + J tf Jf P−1x, Q

(J t

c Jcx + J tf Jf P−1x

))


198 Yair Shapira

≤(J t

c Jcx, RAP (J tc Jcx)

)+ ηε

=(P(P−1x − J t

f Jf P−1x)

, AP(P−1x − J t

f Jf P−1x))+ ηε

≤(1+ 2‖Pff ‖

√η‖A‖ + η‖Rff Aff Pff ‖ + η

)ε

which implies that the function(x, R−1QP−1x)/(x, Ax) is bounded independent of thesize of the problem. On the other hand, we have from Lemma A.1 that, for any non-zerovectory ∈ l2(�),

(y, RAP y) ≤ 2(y,(J tf Jf RAP J t

f Jf + J tc JcRAPJ t

cJc

)y)

≤ 2 max{‖Rff Aff Pff ‖, 1

}(y, Qy)

which implies that the function(x, Ax)/(x, R−1QP−1x) is bounded independent of the

size of the problem.For completing the proof, let us show that the condition number of(P Q−1R)−1PQ−1R is

bounded independent of the size of the problem. Indeed, from the boundedness of‖diag(A)‖it follows that‖R−1‖ and‖P−1‖ are bounded (independent of the size of the problem). Asdiscussed in the beginning of the proof,‖R‖ and‖P ‖ are also bounded independent of thesize of the problem. Consequently, the condition number of

(P Q−1R)−1PQ−1R = R−1QEQ−1ER

= R−1blockdiag(diag(Uff )diag(Aff )−1Lff , I|c|

)R

is bounded independent of the size of the problem.

Appendix B

The following analysis is also presented in [31] (p. 242 and 244 there). We give it here forthe sake of completeness and in a clearer format. The results apply to CBOX in the generalcase, unless certain assumptions are mentioned.

For any two setst2 ⊂ t1 ⊂ {1, . . . , d}, define

Pt1,t2 ={ −J t

g(t1)Jg(t1)P

−1J tg(t2)

Jg(t2) t1 6= t2

0 t1 = t2

For any ordered sets, let 5s denote a product of elements indexed ins with a decreasing

index order. (For example,51≤i≤dei = eded−1 · · · e1.) From simple properties of triangularmatrices (see, e.g., [26] with the vector-columns used there replaced by block-columns) wehave (note that the order under the two latter5-products is immaterial)

P−1 = 51≤j≤d5|t2|=d−j5t2⊂t1(I − Pt1,t2)



and, hence,

P = 50≤j≤d−15|t2|=j5t2⊂t1(I + Pt1,t2)

= I +∑

k≥2, t1⊂t2⊂···⊂tk⊂{1,...,d}51≤l≤k−1Ptl+1,tl (B.1)

For 1≤ i ≤ d, definepi = {Ej ∈ � | 2|ji}

Pi =(I − diag(Xi)

−1Xi

)J tpi

Jpiand Ri = J t

piJpi

(I −Xidiag(Xi)

−1)

Assume that (5.1) holds and, hence, for any two co-ordinate setst1 and t2, Pt1,t2 = 0 if|t1 \ t2| ≥ 2. Consequently, (B.1) takes the form

P = I +∑

t⊂{1,...,d}, k≥1, 1≤r1 6=···6=rk≤d, r1,...,rk 6∈t50≤l≤k−1Pt∪{r1,r2,...,rl+1},t∪{r1,r2,...,rl}

(B.2)Assume also that the assumptions made in (5.4)–(5.6) hold. Then (2.12) and (2.15) implythat (B.2) takes the form

P = I +∑

t⊂{1,...,d}, k≥1, 1≤r1 6=···6=rk≤d, r1,...,rk 6∈t50≤l≤k−1

(diag

(∑i∈t

Xi +l+1∑i=1

Xri

))−1

diag(Xrl+1)Prl+1

J t

g(t)Jg(t) (B.3)

Assume also that the implementation (2.20) is used so that the substitution (5.7) may takeplace. Note that, with (5.7), onlyt = ∅ remains in the first sum in (B.3) and, therefore, it isreduced to

P = J tc +

∑k≥1, 1≤r1 6=···6=rk≤d

50≤l≤k−1

(diag

(l+1∑i=1

Xri

))−1

diag(Xrl+1)Prl+1

J t

c

If, in addition, the diag(Xi)−1Xi ’s commute with each other, then this simplifies to read

(the proof is by induction ond, see [29])

P = J tc +

∑k≥1, 1≤r1<···<rk≤d

(50≤l≤k−1Prl+1

)J t

c

= (51≤i≤d (I + Pi)

)J t

c (B.4)

Note that theXidiag(Xi)−1’s commute with each other thenR also is equal to the right

hand side of (B.4), provided that the multiplication byJ tc on the right is replaced by a

multiplication byJc on the left and thePi ’s are replaced by the correspondingRi ’s.


200 Yair Shapira

Acknowledgements

The author wishes to thank Joel Dendy and Irad Yavneh for valuable discussions.

REFERENCES

1. L. M. Adams. Is sor color-blind?SIAM J. Sci. Statist. Comput., 7 490–506, (1986).2. R. Alcouffe, A. Brandt, J. E. Dendy and J. Painter. The multigrid method for the diffusion

equation with strongly discontinuous coefficients.SIAM J. Sci. Statist. Comput., 2, 430–454,1981)

3. R. E. Bank. A comparison of two multilevel iterative methods for nonsymmetric and indefinitefinite element equations.SIAM J. Numer. Anal., 18,724–743, 1981.

4. R. E. Bank and T. Dupont. An optimal order process for solving finite element equations.Math. Comp., 36,35–51, 1981.

5. K. Brackenridge. Multigrid and cyclic reduction applied to the Helmholtz equation. In N. D.Melson, T. A. Manteuffel and S. F. McCormick, editors,Sixth Copper Mountain Conferenceon Multigrid Methods, pages 31–42, Hampton, VA, 1993.

6. J. H. Bramble, Z. Leyk, and J. E. Pasciak. Iterative schemes for non-symmetric and indefiniteelliptic boundary value problems.Math. Comp., 60,1–22, (1993).

7. D. Braess. The contraction number of a multigrid method for solving the poisson equation.Numer. Math., 37,387–404, 1981.

8. A. Brandt. Multi-level adaptive solutions to boundary-value problems.Math. Comp., 31,333–390, 1977.

9. A. Brandt. Guide to multigrid development. In W. Hackbusch and U. Trottenberg, editors,Multigrid Methodsvolume 960 ofLecture Notes in Mathematics, Springer-Verlag, Berlin,Heidelberg, 1982.

10. A. Brandt and S. Ta’asan. Multigrid methods for nearly singular and slightly indefinite prob-lems. In W. Hackbusch and U. Trottenberg, editors,Multigrid Methods II, volume 1228 ofLecture Notes in Mathematics, 100–122, Springer-Verlag, Berlin, Heidelberg, 1985.

11. A. Brandt and I. Yavneh. Inadequacy of first-order upwind difference schemes for some recir-culating flows.J. Comp. Phys., 93,128–143, 1991.

12. A. Brandt and I. Yavneh. On multigrid solution of high Reynolds incompressible enteringflows. J. Comp. Phys., 101,151–164, 1992.

13. A. Brandt and I. Yavneh. Accelerated multigrid convergence and high Reynolds recirculatingflows. SIAM J. Sci. Statist. Comput., 14,607–626, 1993.

14. J. E. Dendy. Black box multigrid.J. Comput. Phys., 48,366–386, 1982.15. J. E. Dendy. Black box multigrid for nonsymmetric problems.Appl. Math. Comp.,13,261–283,

1983.16. J. E. Dendy. Two multigrid methods for the three-dimensional problems with discontinuous

and anisotropic coefficients.SIAM J. Sci. Statist. Comput., 8, 673–685, 1987.17. J. E. Dendy, M. P. Ida and J. M. Rutledge. A semicoarsening multigrid algorithm for simd

machines.SIAM J. Sci. Statist. Comput., 13,1460–1469, 1992.18. P. M. de Zeeuw. Matrix-dependent prolongations and restrictions in a blackbox multigrid

solver.J. Comput. Appl. Math., 33,1–27, 1990.19. B. Diskin. M.Sc. thesis, Weismann Institute of Science, Rehovot, Israel, 1993.20. R. W. Freund. Conjugate gradients type methods for linear systems with complex symmetric

coefficient matrices.SIAM J. Sci. Statist. Comput., 13,425–448, 1992.21. R. W. Freund. Transpose free quasi-minimal residual algorithm for non-Hermitian linear sys-

tems.SIAM J. Sci. Statist. Comput., 14,470–482, 1993.22. W. Hackbusch.Multigrid methods and Applications. Springer-Verlag, Berlin, Heidelberg,

1985.23. R. Kettler. Analysis and comparison of relaxation schemes in robust multigrid and precondi-

tioned conjugate gradients methods. In W. Hackbusch and U. Trottenberg, editors,MultigridMethods, volume 960 ofLecture Notes in Mathematics, Springer-Verlag, Berlin, Heidelberg,1982.



24. R. Kettler and J. A. Meijerink. A multigrid method and a combined multigrid–conjugate gradi-ent method for elliptic problems with strongly discontinuous coefficients in general domains.Technical report 604, Shell Publications, Kselp, Rijswijk, The Netherlands, 1981.

25. C. C. J. Kou and B. C. Levy. Two-color fourier analysis of the multigrid method with red–blackGauss–Seidel smoothing.Appl. Math. Comp., 29,69–87, 1989.

26. J. A. Meijerink and H. A. Van der Vorst. An iterative solution method for linear systems ofwhich the coefficients matrix is a symmetric M-matrix.Math. Comp., 31,148–162, 1977.

27. H. E. Oman. Fast multigrid techniques in total variation-based image reconstruction. In N. D.Melson, T. A. Manteuffel, S. F. McCormick and C. C. Douglas, editors,Seventh CopperMountain Conference on Multigrid Methods, volume CP 3339, pages 649–660, Hampton,VA, 1996.

28. Y. Saad and M. H. Schultz. A generalized minimal residual algorithm for solving nonsymmetriclinear systems.SIAM J. Sci. Statist. Comput., 7, 856–869, 1986.

29. Y. Shapira. Two-level analysis of automatic multigrid for non-normal and indefinite problems.Technical report 824 (revised version), Computer Science Dept., Technion, Haifa, Israel, 1995;also on mgnet/papers/Shapira.

30. Y. Shapira. Multigrid techniques for highly indefinite equations. In N. D. Melson, T. A. Man-teuffel, S. F. McCormick and C. C. Douglas, editors,Seventh Copper Mountain Conferenceon Multigrid Methods, volume CP 3339, pages 689–705, Hampton, VA, 1996.

31. Y. Shapira. Black-box multigrid solver for definite and indefinite problems. In O. Axelssonand B. Polman, editors,Algebraic Multi-Level Iteration Methods with Applications, pages235–250, Nijmegen, The Netherlands, 1996.

32. Y. Shapira. Coloring update methods.BIT, 38,180–188, 1998.33. Y. Shapira. Multigrid methods for 3-d definite and indefinite problems.Appl. Numer. Math.,

26,377–398, 1998.34. Y. Shapira. Convergence properties of certain families of continued fractions and applications

to tridiagonal systems.J. Math. Anal. Appl., (submitted).35. Y. Shapira, M. Israeli and A. Sidi. An automatic multigrid method for the solution of sparse

linear systems. In N. D. Melson, S. F. McCormick and T. A. Manteuffel, editors,Sixth CopperMountain Conference on Multigrid Methods, pages 567–582, Hampton, VA, 1993.

36. Y. Shapira, M. Israeli and A. Sidi. Towards automatic multigrid algorithms for SPD, nonsym-metric and indefinite problems.SIAM J. Sci. Comput., 17,439–453, 1996.

37. R. A. Smith and A. Weiser. Semicoarsening multigrid on a hypercube.SIAM J. Sci. Comput.,13,1314–1329, 1992.

38. P. Sonneveld. CGS, a fast lanczos-type solver for nonsymmetric linear systems.SIAM J. Sci.Statist. Comput., 10,36–52, 1989.

39. P. Sonneveld, P. Wesseling and P. M. de Zeeuw.Multigrid and Conjugate Gradient Methodsas Convergence Acceleration Techniques. In Multigrid methods for integral and differentialequations, D. J. Paddon and H. Holstein (eds.), Oxford Univ. Press, pages 117–168, NY, 1985.

40. K. Stuben and U. Trottenberg. Multigrid methods: Fundamental algorithms, model problemanalysis and applications. In W. Hackbusch and U. Trottenberg, editors,Multigrid Methods,volume 960 ofLecture Notes in Mathematics, 1–176, Springer-Verlag, Berlin, Heidelberg,1982.

41. S. Ta’asan. Multigrid methods for highly oscillatory problems, Ph.D. dissertation, WeismannInstitute of Science, Rehovot, Israel, 1984.

42. K. Tanabe. Projection methods for solving a singular system of linear equations and its appli-cations.Numer. Math., 17,203–214, 1971.

43. R. Varga.Matrix Iterative Analysis. Prentice-Hall, NJ, 1962.44. P. Wesseling.An Introduction to Multigrid Methods. Wiley, Chichester, UK, 1992.45. I. Yavneh. Multigrid smoothing factors for red black Gauss–Seidel applied to a class of elliptic

operators.SIAM J. Numer. Anal., 32,1126–1138, 1995.46. I. Yavneh. On red black sor smoothing in multigrid.SIAM J. Sci. Comput., 17,1996.47. I. Yavneh. Coarse grid correction for nonelliptic and singular perturbation problems.SIAM J.

Sci. Comput., (to appear).48. D. Young.Iterative Solution of Large Linear Systems. Academic Press, NY, 1971.


Analysis of matrix-dependent multigrid algorithms

Documents

Transcript of Analysis of matrix-dependent multigrid algorithms