Seminar for Applied Mathematics – Seminar for Applied ...mhg/talks/bgtalk.pdfThe Block Grade of a...

The Block Grade of a Block Krylov Space

Martin H. Gutknecht1 Thomas Schmelzer2

1Seminar for Applied MathematicsETH Zurich

2Computing LaboratoryOxford University

9th Copper Mountain Conference on Iterative Methods

6 April 2006

Martin H. Gutknecht, Thomas Schmelzer Block Grade

Linear systems of equations with multiple RHSs

Given is a nonsingular linear system with s RHSs,

Ax = b (1)

whereA ∈ CN×N , b ∈ CN×s , x ∈ CN×s . (2)

We may try to solve it for all RHSs at once by using a blockKrylov space solver such as Block CG or Block GMRES.

Advantages:

The search space for the solutions is much bigger.Several matrix-vector multiplications (MVs) can be done atonce.

Linear systems of equations with multiple RHSs

Given is a nonsingular linear system with s RHSs,

Ax = b (1)

whereA ∈ CN×N , b ∈ CN×s , x ∈ CN×s . (2)

We may try to solve it for all RHSs at once by using a blockKrylov space solver such as Block CG or Block GMRES.

Advantages:

The search space for the solutions is much bigger.Several matrix-vector multiplications (MVs) can be done atonce.

Block Krylov space methods

We seek approximate solutions of the form

xn ∈ x0 + B�n (A, r0) ⊆ CN×s , (3)

where the block Krylov (sub)space B�n :≡ B�

n (A, r0) is

B�n (A, r0) :≡ block span (r0,Ar0, . . . ,An−1r0) ⊂ CN×s (4)

{n−1∑k=0

Ak r0γk ; γk ∈ Cs×s (k = 0, . . . ,n − 1)

DEFINITION. A (complex) block vector is a matrix y ∈ CN×s.

Hence, the elements of B�n are block vectors.

Block Krylov space methods

We seek approximate solutions of the form

xn ∈ x0 + B�n (A, r0) ⊆ CN×s , (3)

where the block Krylov (sub)space B�n :≡ B�

n (A, r0) is

B�n (A, r0) :≡ block span (r0,Ar0, . . . ,An−1r0) ⊂ CN×s (4)

{n−1∑k=0

Ak r0γk ; γk ∈ Cs×s (k = 0, . . . ,n − 1)

DEFINITION. A (complex) block vector is a matrix y ∈ CN×s.

Hence, the elements of B�n are block vectors.

Block Krylov space methods (cont’d)

This means that for an individual approximation x (j) holds

x (j)n ∈ x (j)

0 + Bn(A, r0) ⊆ CN , (5)

whereBn :≡ Bn(A, r0) :≡ K(1)

n + · · ·+K(s)n , (6)

with the s “usual” Krylov (sub) spaces for the s systems,

K(j)n :≡ Kn(A, r

(j)0 ) :≡

{n−1∑k=0

Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)

}. (7)

In other words, each approximation x (j) is from a space that isas large as all s “usual” Krylov spaces together: dimBn ≤ ns .

B�n is a Cartesian product of s copies of Bn :

B�n = Bn × · · · × Bn︸︷︷︸

s times

x (j)n ∈ x (j)

0 + Bn(A, r0) ⊆ CN , (5)

n + · · ·+K(s)n , (6)

K(j)n :≡ Kn(A, r

(j)0 ) :≡

{n−1∑k=0

Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)

}. (7)

B�n = Bn × · · · × Bn︸︷︷︸

s times

x (j)n ∈ x (j)

0 + Bn(A, r0) ⊆ CN , (5)

n + · · ·+K(s)n , (6)

K(j)n :≡ Kn(A, r

(j)0 ) :≡

{n−1∑k=0

Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)

}. (7)

B�n = Bn × · · · × Bn︸︷︷︸

s times

Linear dependence of residuals, deflation

Bn :≡ Bn(A, r0) :≡ K(1)n + · · ·+K(s)

is, in general, not a direct sum.

Already the initial residuals could be linearly dependent.

But also some of the later generated Krylov subspacedimensions may not increase the block Krylov subspace.

The treatment of these cases requires deflation: the explicitdetermination of linear dependencies during the generation ofthe block Krylov subspaces ( application of RRQR).

Deflation leads to “reduction of number of RHSs”.

It is not only possible when “one of the systems converges”, butwhen “a linear combination of the s systems converges”.

Bn :≡ Bn(A, r0) :≡ K(1)n + · · ·+K(s)

The grade

Recall from single RHS case (s = 1):

Characteristic properties of grade ν(A,y) of y with resp. to A:

dim Kn(A,y) =

{n if n ≤ ν ,ν if n ≥ ν ;

ν = min{

n∣∣ dim Kn(A,y) = dim Kn+1(A,y)

}= min

∣∣ Kn(A,y) = Kn+1(A,y)}

ν = min{

n∣∣ A−1y ∈ Kn(A,y)

ν = min{

n∣∣ x? ∈ x0 +Kn(A, r0)

where Ax? = b, r0 :≡ b− Ax0.

The grade

dim Kn(A,y) =

ν = min{

}= min

∣∣ Kn(A,y) = Kn+1(A,y)}

ν = min{

n∣∣ A−1y ∈ Kn(A,y)

ν = min{

n∣∣ x? ∈ x0 +Kn(A, r0)

The grade

dim Kn(A,y) =

ν = min{

}= min

∣∣ Kn(A,y) = Kn+1(A,y)}

ν = min{

n∣∣ A−1y ∈ Kn(A,y)

ν = min{

n∣∣ x? ∈ x0 +Kn(A, r0)

The grade

dim Kn(A,y) =

ν = min{

}= min

∣∣ Kn(A,y) = Kn+1(A,y)}

ν = min{

n∣∣ A−1y ∈ Kn(A,y)

ν = min{

n∣∣ x? ∈ x0 +Kn(A, r0)

The grade (cont’d)

All this is based on the simple fact that

Aνy = −yγ0 − Ayγ1 − · · · − Aν−1yγν−1 , (8)

where γ0 6= 0 and ν is minimal. This can be written

ψ(A)y = o , where ψ(t) :≡ ψA,y(t) :≡ tν+γν−1 tν−1+· · ·+γ1 t+γ0

is the minimum polynomial of y with respect to A

Kν(A,y) is the smallest A-invariant subspace that contains y.The polynomial ψ = ψA,y is the smallest divisor of the minimalpolynomial χA of A with ψ(A)y = o. In particular, ν ≤ ∂χA.

All this is based on the simple fact that

Aνy = −yγ0 − Ayγ1 − · · · − Aν−1yγν−1 , (8)

where γ0 6= 0 and ν is minimal. This can be written

ψ(A)y = o , where ψ(t) :≡ ψA,y(t) :≡ tν+γν−1 tν−1+· · ·+γ1 t+γ0

is the minimum polynomial of y with respect to A

Kν(A,y) is the smallest A-invariant subspace that contains y.The polynomial ψ = ψA,y is the smallest divisor of the minimalpolynomial χA of A with ψ(A)y = o. In particular, ν ≤ ∂χA.

In practice, in most problems the grade ν is irrelevant, becauseν is large and we need convergence for n � ν.

There are exceptions, where ν is small. For such problemsprojection methods (CG, BICG, GMRES, ...) are very effective.

In any case, considerations about the grade can help usunderstand the effectiveness of Krylov space methods andblock Krylov space methods.

To justify this, we must replace the grade by a more subtlemeasure that takes into account approximate solutions.For one proposal see Ilic/Turner [’03ANZIAM J.], [’05NLAA].

In practice, in most problems the grade ν is irrelevant, becauseν is large and we need convergence for n � ν.

There are exceptions, where ν is small. For such problemsprojection methods (CG, BICG, GMRES, ...) are very effective.

In any case, considerations about the grade can help usunderstand the effectiveness of Krylov space methods andblock Krylov space methods.

To justify this, we must replace the grade by a more subtlemeasure that takes into account approximate solutions.For one proposal see Ilic/Turner [’03ANZIAM J.], [’05NLAA].

The block grade

In multiple RHS case (s > 1):

Introduce block grade ν(A,y) of y with respect to A withcharacteristic properties:

ν = min{

n∣∣ dim Bn(A,y) = dim Bn+1(A,y)

}= min

∣∣ Bn(A,y) = Bn+1(A,y)}

= min{

n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)

ν = min{

n∣∣ A−1y ∈ B�

n (A,y)}

ν = min{

n∣∣ x? ∈ x0 + B�

n (A, r0)},

The block grade

ν = min{

}= min

∣∣ Bn(A,y) = Bn+1(A,y)}

= min{

n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)

ν = min{

n∣∣ A−1y ∈ B�

n (A,y)}

ν = min{

n∣∣ x? ∈ x0 + B�

n (A, r0)},

The block grade

ν = min{

}= min

∣∣ Bn(A,y) = Bn+1(A,y)}

= min{

n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)

ν = min{

n∣∣ A−1y ∈ B�

n (A,y)}

ν = min{

n∣∣ x? ∈ x0 + B�

n (A, r0)},

The block grade (cont’d)

Relations between ordinary grade and block grade:

The block grade of the block Krylov space and the grades of thecorresponding individual Krylov spaces are related by

ν(A,y) ≤ maxi=1,...,s

ν(A, y (i)) . (9)

A block Krylov space and the corresponding individual Krylovspaces are related by

Bν(A,y)(A,y) = Kν(A,y (1))(A, y(1)) + · · ·+Kν(A,y (s))(A, y

(s)) ,

and ν(A,y) is the smallest index for which this holds.

Relations between ordinary grade and block grade:

The block grade of the block Krylov space and the grades of thecorresponding individual Krylov spaces are related by

ν(A,y) ≤ maxi=1,...,s

ν(A, y (i)) . (9)

A block Krylov space and the corresponding individual Krylovspaces are related by

Bν(A,y)(A,y) = Kν(A,y (1))(A, y(1)) + · · ·+Kν(A,y (s))(A, y

(s)) ,

and ν(A,y) is the smallest index for which this holds.

In the single RHS case, in exact arithmetic, computing x?

requiresdim Kν = ν MVs.

In the multiple RHS case, in exact arithmetic, computing x?

requiresdim Bν ∈ [ν, s · ν] MVs.

This is a big interval!

Block methods are most effective (compared to single RHSmethods) if

dim Bν � s · ν .

More exactly: block methods are most effective if

dim Bν(r0, A) �s∑

dim Kν(r (k)

0 , A).

In other words: block methods are most effective (comparedto single RHS methods) if deflation is possible and used!

However, exact deflation is rare, and we need approximatedeflation depending on a deflation tolerance in RRQR.

Approximate deflation introduces a deflation error.

The deflation error may deteriorate the convergence speedand/or the accuracy of the computed solution.

Restarting the iteration can be useful from this point of view.

Block Krylov bases

In the single right-hand side case, the columns of the n × NKrylov matrix

Kn :≡(

r0 Ar0 . . . An−1r0)

form the Krylov basis of Kn (for n = 1, . . . , ν).

In the multiple right-hand side case, the columns of then · s × N matrix

Bn :≡(

r0 Ar0 . . . An−1r0)

are still a spanning set of Bn.

But they are in general not linearly independent.Need to delete columns starting with those in An−1r0.

Nonunique! Deleting the most left ones would be arbitrary.

Obtain a tree of block Krylov bases for B1, . . . ,Bν .

We also obtain a set of minimum polynomials of y withrespect to A.

Block Krylov bases

Kn :≡(

r0 Ar0 . . . An−1r0)

Bn :≡(

r0 Ar0 . . . An−1r0)

Block Krylov bases

Kn :≡(

r0 Ar0 . . . An−1r0)

Bn :≡(

r0 Ar0 . . . An−1r0)

Thanks for listening and come to ...

The ε–grade

Since x? − x0 ∈ Kν(A, r0) we could write

x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸︷︷︸∈ Kn

+ Anr0ξn + · · ·+ Aν−1r0ξν−1︸︷︷︸“reminder”

but we prefer an orthogonal decomposition.

Could use Arnoldi algorithm to construct B–orthonormal basis{v0, . . . ,vν−1} of Kν(A, r0):

x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

+ vnωn + · · ·+ vν−1ων−1︸︷︷︸“reminder” ⊥B Kn

xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,

then “reminder” = x?− xn ⊥B Kn , i.e., xn optimal in B–norm.

Definition: νε(A, r0,B) :≡ n once ‖“reminder”‖B ≤ ε..

The ε–grade

x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸︷︷︸∈ Kn

x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,

The ε–grade

x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸︷︷︸∈ Kn

x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,

The ε–grade

x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸︷︷︸∈ Kn

x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,

The ε–grade (cont’d)

When B = A?A the partial sums of the B–orthogonal series

x? = x0+v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

just represent the iterates of the GCR method (and of GMRES).

Likewise, when A is Hpd and B = A they represent the CGiterates.

So, we have not really gained much!

In the block case analogous statements can be made.

The ε–grade (cont’d)

When B = A?A the partial sums of the B–orthogonal series

x? = x0+v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸︷︷︸∈ Kn

just represent the iterates of the GCR method (and of GMRES).

Likewise, when A is Hpd and B = A they represent the CGiterates.

So, we have not really gained much!

In the block case analogous statements can be made.

Seminar for Applied Mathematics – Seminar for Applied ...mhg/talks/bgtalk.pdfThe Block Grade of a...

Documents

Transcript of Seminar for Applied Mathematics – Seminar for Applied ...mhg/talks/bgtalk.pdfThe Block Grade of a...