Seminar for Applied Mathematics – Seminar for Applied ...mhg/talks/bgtalk.pdfThe Block Grade of a...
Transcript of Seminar for Applied Mathematics – Seminar for Applied ...mhg/talks/bgtalk.pdfThe Block Grade of a...
The Block Grade of a Block Krylov Space
Martin H. Gutknecht1 Thomas Schmelzer2
1Seminar for Applied MathematicsETH Zurich
2Computing LaboratoryOxford University
9th Copper Mountain Conference on Iterative Methods
6 April 2006
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Linear systems of equations with multiple RHSs
Given is a nonsingular linear system with s RHSs,
Ax = b (1)
whereA ∈ CN×N , b ∈ CN×s , x ∈ CN×s . (2)
We may try to solve it for all RHSs at once by using a blockKrylov space solver such as Block CG or Block GMRES.
Advantages:
The search space for the solutions is much bigger.Several matrix-vector multiplications (MVs) can be done atonce.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Linear systems of equations with multiple RHSs
Given is a nonsingular linear system with s RHSs,
Ax = b (1)
whereA ∈ CN×N , b ∈ CN×s , x ∈ CN×s . (2)
We may try to solve it for all RHSs at once by using a blockKrylov space solver such as Block CG or Block GMRES.
Advantages:
The search space for the solutions is much bigger.Several matrix-vector multiplications (MVs) can be done atonce.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov space methods
We seek approximate solutions of the form
xn ∈ x0 + B�n (A, r0) ⊆ CN×s , (3)
where the block Krylov (sub)space B�n :≡ B�
n (A, r0) is
B�n (A, r0) :≡ block span (r0,Ar0, . . . ,An−1r0) ⊂ CN×s (4)
:≡
{n−1∑k=0
Ak r0γk ; γk ∈ Cs×s (k = 0, . . . ,n − 1)
}.
DEFINITION. A (complex) block vector is a matrix y ∈ CN×s.
Hence, the elements of B�n are block vectors.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov space methods
We seek approximate solutions of the form
xn ∈ x0 + B�n (A, r0) ⊆ CN×s , (3)
where the block Krylov (sub)space B�n :≡ B�
n (A, r0) is
B�n (A, r0) :≡ block span (r0,Ar0, . . . ,An−1r0) ⊂ CN×s (4)
:≡
{n−1∑k=0
Ak r0γk ; γk ∈ Cs×s (k = 0, . . . ,n − 1)
}.
DEFINITION. A (complex) block vector is a matrix y ∈ CN×s.
Hence, the elements of B�n are block vectors.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov space methods (cont’d)
This means that for an individual approximation x (j) holds
x (j)n ∈ x (j)
0 + Bn(A, r0) ⊆ CN , (5)
whereBn :≡ Bn(A, r0) :≡ K(1)
n + · · ·+K(s)n , (6)
with the s “usual” Krylov (sub) spaces for the s systems,
K(j)n :≡ Kn(A, r
(j)0 ) :≡
{n−1∑k=0
Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)
}. (7)
In other words, each approximation x (j) is from a space that isas large as all s “usual” Krylov spaces together: dimBn ≤ ns .
B�n is a Cartesian product of s copies of Bn :
B�n = Bn × · · · × Bn︸ ︷︷ ︸
s times
.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov space methods (cont’d)
This means that for an individual approximation x (j) holds
x (j)n ∈ x (j)
0 + Bn(A, r0) ⊆ CN , (5)
whereBn :≡ Bn(A, r0) :≡ K(1)
n + · · ·+K(s)n , (6)
with the s “usual” Krylov (sub) spaces for the s systems,
K(j)n :≡ Kn(A, r
(j)0 ) :≡
{n−1∑k=0
Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)
}. (7)
In other words, each approximation x (j) is from a space that isas large as all s “usual” Krylov spaces together: dimBn ≤ ns .
B�n is a Cartesian product of s copies of Bn :
B�n = Bn × · · · × Bn︸ ︷︷ ︸
s times
.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov space methods (cont’d)
This means that for an individual approximation x (j) holds
x (j)n ∈ x (j)
0 + Bn(A, r0) ⊆ CN , (5)
whereBn :≡ Bn(A, r0) :≡ K(1)
n + · · ·+K(s)n , (6)
with the s “usual” Krylov (sub) spaces for the s systems,
K(j)n :≡ Kn(A, r
(j)0 ) :≡
{n−1∑k=0
Ak r (j)0 βk ,j ; βk ,j ∈ C (∀k)
}. (7)
In other words, each approximation x (j) is from a space that isas large as all s “usual” Krylov spaces together: dimBn ≤ ns .
B�n is a Cartesian product of s copies of Bn :
B�n = Bn × · · · × Bn︸ ︷︷ ︸
s times
.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Linear dependence of residuals, deflation
Bn :≡ Bn(A, r0) :≡ K(1)n + · · ·+K(s)
n
is, in general, not a direct sum.
Already the initial residuals could be linearly dependent.
But also some of the later generated Krylov subspacedimensions may not increase the block Krylov subspace.
The treatment of these cases requires deflation: the explicitdetermination of linear dependencies during the generation ofthe block Krylov subspaces ( application of RRQR).
Deflation leads to “reduction of number of RHSs”.
It is not only possible when “one of the systems converges”, butwhen “a linear combination of the s systems converges”.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Linear dependence of residuals, deflation
Bn :≡ Bn(A, r0) :≡ K(1)n + · · ·+K(s)
n
is, in general, not a direct sum.
Already the initial residuals could be linearly dependent.
But also some of the later generated Krylov subspacedimensions may not increase the block Krylov subspace.
The treatment of these cases requires deflation: the explicitdetermination of linear dependencies during the generation ofthe block Krylov subspaces ( application of RRQR).
Deflation leads to “reduction of number of RHSs”.
It is not only possible when “one of the systems converges”, butwhen “a linear combination of the s systems converges”.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Linear dependence of residuals, deflation
Bn :≡ Bn(A, r0) :≡ K(1)n + · · ·+K(s)
n
is, in general, not a direct sum.
Already the initial residuals could be linearly dependent.
But also some of the later generated Krylov subspacedimensions may not increase the block Krylov subspace.
The treatment of these cases requires deflation: the explicitdetermination of linear dependencies during the generation ofthe block Krylov subspaces ( application of RRQR).
Deflation leads to “reduction of number of RHSs”.
It is not only possible when “one of the systems converges”, butwhen “a linear combination of the s systems converges”.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade
Recall from single RHS case (s = 1):
Characteristic properties of grade ν(A,y) of y with resp. to A:
dim Kn(A,y) =
{n if n ≤ ν ,ν if n ≥ ν ;
ν = min{
n∣∣ dim Kn(A,y) = dim Kn+1(A,y)
}= min
{n
∣∣ Kn(A,y) = Kn+1(A,y)}
;
ν = min{
n∣∣ A−1y ∈ Kn(A,y)
};
ν = min{
n∣∣ x? ∈ x0 +Kn(A, r0)
},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade
Recall from single RHS case (s = 1):
Characteristic properties of grade ν(A,y) of y with resp. to A:
dim Kn(A,y) =
{n if n ≤ ν ,ν if n ≥ ν ;
ν = min{
n∣∣ dim Kn(A,y) = dim Kn+1(A,y)
}= min
{n
∣∣ Kn(A,y) = Kn+1(A,y)}
;
ν = min{
n∣∣ A−1y ∈ Kn(A,y)
};
ν = min{
n∣∣ x? ∈ x0 +Kn(A, r0)
},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade
Recall from single RHS case (s = 1):
Characteristic properties of grade ν(A,y) of y with resp. to A:
dim Kn(A,y) =
{n if n ≤ ν ,ν if n ≥ ν ;
ν = min{
n∣∣ dim Kn(A,y) = dim Kn+1(A,y)
}= min
{n
∣∣ Kn(A,y) = Kn+1(A,y)}
;
ν = min{
n∣∣ A−1y ∈ Kn(A,y)
};
ν = min{
n∣∣ x? ∈ x0 +Kn(A, r0)
},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade
Recall from single RHS case (s = 1):
Characteristic properties of grade ν(A,y) of y with resp. to A:
dim Kn(A,y) =
{n if n ≤ ν ,ν if n ≥ ν ;
ν = min{
n∣∣ dim Kn(A,y) = dim Kn+1(A,y)
}= min
{n
∣∣ Kn(A,y) = Kn+1(A,y)}
;
ν = min{
n∣∣ A−1y ∈ Kn(A,y)
};
ν = min{
n∣∣ x? ∈ x0 +Kn(A, r0)
},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade (cont’d)
All this is based on the simple fact that
Aνy = −yγ0 − Ayγ1 − · · · − Aν−1yγν−1 , (8)
where γ0 6= 0 and ν is minimal. This can be written
ψ(A)y = o , where ψ(t) :≡ ψA,y(t) :≡ tν+γν−1 tν−1+· · ·+γ1 t+γ0
is the minimum polynomial of y with respect to A
LEMMA
Kν(A,y) is the smallest A-invariant subspace that contains y.The polynomial ψ = ψA,y is the smallest divisor of the minimalpolynomial χA of A with ψ(A)y = o. In particular, ν ≤ ∂χA.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade (cont’d)
All this is based on the simple fact that
Aνy = −yγ0 − Ayγ1 − · · · − Aν−1yγν−1 , (8)
where γ0 6= 0 and ν is minimal. This can be written
ψ(A)y = o , where ψ(t) :≡ ψA,y(t) :≡ tν+γν−1 tν−1+· · ·+γ1 t+γ0
is the minimum polynomial of y with respect to A
LEMMA
Kν(A,y) is the smallest A-invariant subspace that contains y.The polynomial ψ = ψA,y is the smallest divisor of the minimalpolynomial χA of A with ψ(A)y = o. In particular, ν ≤ ∂χA.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade (cont’d)
In practice, in most problems the grade ν is irrelevant, becauseν is large and we need convergence for n � ν.
There are exceptions, where ν is small. For such problemsprojection methods (CG, BICG, GMRES, ...) are very effective.
In any case, considerations about the grade can help usunderstand the effectiveness of Krylov space methods andblock Krylov space methods.
To justify this, we must replace the grade by a more subtlemeasure that takes into account approximate solutions.For one proposal see Ilic/Turner [’03ANZIAM J.], [’05NLAA].
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The grade (cont’d)
In practice, in most problems the grade ν is irrelevant, becauseν is large and we need convergence for n � ν.
There are exceptions, where ν is small. For such problemsprojection methods (CG, BICG, GMRES, ...) are very effective.
In any case, considerations about the grade can help usunderstand the effectiveness of Krylov space methods andblock Krylov space methods.
To justify this, we must replace the grade by a more subtlemeasure that takes into account approximate solutions.For one proposal see Ilic/Turner [’03ANZIAM J.], [’05NLAA].
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade
In multiple RHS case (s > 1):
Introduce block grade ν(A,y) of y with respect to A withcharacteristic properties:
ν = min{
n∣∣ dim Bn(A,y) = dim Bn+1(A,y)
}= min
{n
∣∣ Bn(A,y) = Bn+1(A,y)}
= min{
n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)
};
ν = min{
n∣∣ A−1y ∈ B�
n (A,y)}
;
ν = min{
n∣∣ x? ∈ x0 + B�
n (A, r0)},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade
In multiple RHS case (s > 1):
Introduce block grade ν(A,y) of y with respect to A withcharacteristic properties:
ν = min{
n∣∣ dim Bn(A,y) = dim Bn+1(A,y)
}= min
{n
∣∣ Bn(A,y) = Bn+1(A,y)}
= min{
n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)
};
ν = min{
n∣∣ A−1y ∈ B�
n (A,y)}
;
ν = min{
n∣∣ x? ∈ x0 + B�
n (A, r0)},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade
In multiple RHS case (s > 1):
Introduce block grade ν(A,y) of y with respect to A withcharacteristic properties:
ν = min{
n∣∣ dim Bn(A,y) = dim Bn+1(A,y)
}= min
{n
∣∣ Bn(A,y) = Bn+1(A,y)}
= min{
n∣∣ Bn(A,y) = Bn+`(A,y) (∀` ∈ N)
};
ν = min{
n∣∣ A−1y ∈ B�
n (A,y)}
;
ν = min{
n∣∣ x? ∈ x0 + B�
n (A, r0)},
where Ax? = b, r0 :≡ b− Ax0.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade (cont’d)
Relations between ordinary grade and block grade:
LEMMA
The block grade of the block Krylov space and the grades of thecorresponding individual Krylov spaces are related by
ν(A,y) ≤ maxi=1,...,s
ν(A, y (i)) . (9)
LEMMA
A block Krylov space and the corresponding individual Krylovspaces are related by
Bν(A,y)(A,y) = Kν(A,y (1))(A, y(1)) + · · ·+Kν(A,y (s))(A, y
(s)) ,
and ν(A,y) is the smallest index for which this holds.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade (cont’d)
Relations between ordinary grade and block grade:
LEMMA
The block grade of the block Krylov space and the grades of thecorresponding individual Krylov spaces are related by
ν(A,y) ≤ maxi=1,...,s
ν(A, y (i)) . (9)
LEMMA
A block Krylov space and the corresponding individual Krylovspaces are related by
Bν(A,y)(A,y) = Kν(A,y (1))(A, y(1)) + · · ·+Kν(A,y (s))(A, y
(s)) ,
and ν(A,y) is the smallest index for which this holds.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade (cont’d)
In the single RHS case, in exact arithmetic, computing x?
requiresdim Kν = ν MVs.
In the multiple RHS case, in exact arithmetic, computing x?
requiresdim Bν ∈ [ν, s · ν] MVs.
This is a big interval!
Block methods are most effective (compared to single RHSmethods) if
dim Bν � s · ν .
More exactly: block methods are most effective if
dim Bν(r0, A) �s∑
k=1
dim Kν(r (k)
0 , A).
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The block grade (cont’d)
In other words: block methods are most effective (comparedto single RHS methods) if deflation is possible and used!
However, exact deflation is rare, and we need approximatedeflation depending on a deflation tolerance in RRQR.
Approximate deflation introduces a deflation error.
The deflation error may deteriorate the convergence speedand/or the accuracy of the computed solution.
Restarting the iteration can be useful from this point of view.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov bases
In the single right-hand side case, the columns of the n × NKrylov matrix
Kn :≡(
r0 Ar0 . . . An−1r0)
form the Krylov basis of Kn (for n = 1, . . . , ν).
In the multiple right-hand side case, the columns of then · s × N matrix
Bn :≡(
r0 Ar0 . . . An−1r0)
are still a spanning set of Bn.
But they are in general not linearly independent.Need to delete columns starting with those in An−1r0.
Nonunique! Deleting the most left ones would be arbitrary.
Obtain a tree of block Krylov bases for B1, . . . ,Bν .
We also obtain a set of minimum polynomials of y withrespect to A.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov bases
In the single right-hand side case, the columns of the n × NKrylov matrix
Kn :≡(
r0 Ar0 . . . An−1r0)
form the Krylov basis of Kn (for n = 1, . . . , ν).
In the multiple right-hand side case, the columns of then · s × N matrix
Bn :≡(
r0 Ar0 . . . An−1r0)
are still a spanning set of Bn.
But they are in general not linearly independent.Need to delete columns starting with those in An−1r0.
Nonunique! Deleting the most left ones would be arbitrary.
Obtain a tree of block Krylov bases for B1, . . . ,Bν .
We also obtain a set of minimum polynomials of y withrespect to A.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Block Krylov bases
In the single right-hand side case, the columns of the n × NKrylov matrix
Kn :≡(
r0 Ar0 . . . An−1r0)
form the Krylov basis of Kn (for n = 1, . . . , ν).
In the multiple right-hand side case, the columns of then · s × N matrix
Bn :≡(
r0 Ar0 . . . An−1r0)
are still a spanning set of Bn.
But they are in general not linearly independent.Need to delete columns starting with those in An−1r0.
Nonunique! Deleting the most left ones would be arbitrary.
Obtain a tree of block Krylov bases for B1, . . . ,Bν .
We also obtain a set of minimum polynomials of y withrespect to A.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
Thanks for listening and come to ...
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade
Since x? − x0 ∈ Kν(A, r0) we could write
x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸ ︷︷ ︸∈ Kn
+ Anr0ξn + · · ·+ Aν−1r0ξν−1︸ ︷︷ ︸“reminder”
but we prefer an orthogonal decomposition.
Could use Arnoldi algorithm to construct B–orthonormal basis{v0, . . . ,vν−1} of Kν(A, r0):
x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
Let
xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,
then “reminder” = x?− xn ⊥B Kn , i.e., xn optimal in B–norm.
Definition: νε(A, r0,B) :≡ n once ‖“reminder”‖B ≤ ε..
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade
Since x? − x0 ∈ Kν(A, r0) we could write
x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸ ︷︷ ︸∈ Kn
+ Anr0ξn + · · ·+ Aν−1r0ξν−1︸ ︷︷ ︸“reminder”
but we prefer an orthogonal decomposition.
Could use Arnoldi algorithm to construct B–orthonormal basis{v0, . . . ,vν−1} of Kν(A, r0):
x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
Let
xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,
then “reminder” = x?− xn ⊥B Kn , i.e., xn optimal in B–norm.
Definition: νε(A, r0,B) :≡ n once ‖“reminder”‖B ≤ ε..
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade
Since x? − x0 ∈ Kν(A, r0) we could write
x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸ ︷︷ ︸∈ Kn
+ Anr0ξn + · · ·+ Aν−1r0ξν−1︸ ︷︷ ︸“reminder”
but we prefer an orthogonal decomposition.
Could use Arnoldi algorithm to construct B–orthonormal basis{v0, . . . ,vν−1} of Kν(A, r0):
x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
Let
xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,
then “reminder” = x?− xn ⊥B Kn , i.e., xn optimal in B–norm.
Definition: νε(A, r0,B) :≡ n once ‖“reminder”‖B ≤ ε..
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade
Since x? − x0 ∈ Kν(A, r0) we could write
x?−x0 = r0ξ0 + Ar0ξ1 + · · ·+ An−1r0ξn−1︸ ︷︷ ︸∈ Kn
+ Anr0ξn + · · ·+ Aν−1r0ξν−1︸ ︷︷ ︸“reminder”
but we prefer an orthogonal decomposition.
Could use Arnoldi algorithm to construct B–orthonormal basis{v0, . . . ,vν−1} of Kν(A, r0):
x?−x0 = v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
Let
xn :≡ x0 + v0ω0 + v1ω1 + · · ·+ vn−1ωn−1 ∈ x0 +Kn ,
then “reminder” = x?− xn ⊥B Kn , i.e., xn optimal in B–norm.
Definition: νε(A, r0,B) :≡ n once ‖“reminder”‖B ≤ ε..
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade (cont’d)
When B = A?A the partial sums of the B–orthogonal series
x? = x0+v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
just represent the iterates of the GCR method (and of GMRES).
Likewise, when A is Hpd and B = A they represent the CGiterates.
So, we have not really gained much!
In the block case analogous statements can be made.
Martin H. Gutknecht, Thomas Schmelzer Block Grade
The ε–grade (cont’d)
When B = A?A the partial sums of the B–orthogonal series
x? = x0+v0ω0 + v1ω1 + · · ·+ vn−1ωn−1︸ ︷︷ ︸∈ Kn
+ vnωn + · · ·+ vν−1ων−1︸ ︷︷ ︸“reminder” ⊥B Kn
just represent the iterates of the GCR method (and of GMRES).
Likewise, when A is Hpd and B = A they represent the CGiterates.
So, we have not really gained much!
In the block case analogous statements can be made.
Martin H. Gutknecht, Thomas Schmelzer Block Grade