Lanczos Method Krylov Subspace Methods - TUHH · CHAPTER 2 : KRYLOV SUBSPACE METHODS Heinrich Voss...

CHAPTER 2 : KRYLOV SUBSPACE METHODS

Heinrich [email protected]

Hamburg University of Technology

Heinrich Voss (Hamburg University of Technology) Krylov subspace methods Eigenvalue problems 2012 1 / 95

Lanczos Method

Krylov Subspace MethodsSuppose that A ∈ Rn×n is large, sparse and symmetric, and assume thatsome of its extremal eigenvalues are wanted. This problem can be solved bythe method of Lanczos.

The Lanczos method generates a sequence of tridiagonal matrices Tk ∈ Rk×k

with the property that that the extremal eigenvalues of Tk are progressivelybetter approximations to the extremal eigenvalues of A.

We assume that the eigenvalues of A are ordered by magnitude

λ1 ≤ λ2 ≤ · · · ≤ λn,

and we denote by

R(x) =xT AxxT x

, x 6= 0

the Rayleigh quotient of A. Then by Rayleigh’s principle it holds

λ1 = minx 6=0

R(x) and λn = maxx 6=0

R(x).


Lanczos Method

Krylov Subspaces

Let Vk be a subspace of Rn, let v1, . . . , vk be an orthonormal basis of Vk , andVk = [v1, . . . , vk ] ∈ Rn×k .

Let µk be the smallest eigenvalue of the projection V Tk AVk of A to V and νk be

its largest eigenvalue. Then it follows from the minmax and maxmincharacterization of eigenvalues

µk = miny 6=0

yT V Tk AVk y

yT y= min

y 6=0

yT V Tk AVk y

yT V Tk Vk y

= miny 6=0

R(Vk y) ≥ λ1,

and

νk = maxy 6=0

yT V Tk AVk y

yT y= max

y 6=0

yT V Tk AVk y

yT V Tk Vk y

= maxy 6=0

R(Vk y) ≤ λn.

The Lanczos algorithm can be derived by considering how to generate the vk

so that µk and νk become increasingly better estimates to λ1 and λn.


Lanczos Method

Krylov Subspaces ct.Suppose that uk ∈ Vk such that R(uk ) = µk . R(x) decreases most rapidly inthe direction of the negative gradient

−∇R(x) = − 2xT x

(Ax − R(x)x)

and therefore µk+1 < µk if vk+1 is determined such that

∇R(uk ) ∈ spanv1, . . . , vk , vk+1.

Likewise, if wk ∈ Vk satisfies νk = R(wk ), then Vk should be expanded such

that∇R(wk ) ∈ spanv1, . . . , vk , vk+1

since R(x) increases most rapidly in the direction of ∇R(x). Since

∇R(x) ∈ spanx ,Ax both requirements can be satisfied simultaneously if

Vk+1 = spanv1, . . . , vk , vk+1 = spanv1,Av1, . . . ,Ak v1.

Kk (v1,A) = spanv1,Av1, . . . ,Ak−1v1 is called Krylov space.


Lanczos Method

Lanczos method

The Lanczos algorithm determines an orthonormal basis v1, . . . , vk of theKrylov space

Kk (r0,A) := spanr0,Ar0, . . . ,Ak−1r0, k = 1, . . . ,n,

such thatTk := V T

k AVk , Vk := [v1, . . . , vk ]

is tridiagonal.

The vectors vk can be obtained by a three term recurrence.


Lanczos Method

Lanczos method ct.

Assume that we already computed the orthonormal basis v1, . . . , vk of theKrylov space Kk (v ,A).

Then Ak−1v ∈ spanv1, . . . , vk, and therefore, there exist γ1, . . . , γk ∈ R suchthat

Ak v = A(Ak−1v) = A( k∑

j=1

γk v j)

= γk Avk + A( k−1∑

j=1

γjv j).

The second term on the right hand side is contained in Kk (v ,A). Hence, toobtain an orthonormal basis of Kk+1(v ,A) it suffices to compute theorthonormal complement of the uk := Avk with respect to the vectorsv1, . . . , vk .


Lanczos Method

Lanczos method ct.

Since Av j ∈ Kj+1(v ,A) ⊂ Kk−1(v ,A) for every j < k − 1, we have

(v j )T uk = (v j )T Avk = (Av j )T vk = 0.

Hence, (uk )T v j = 0 for j = 1, . . . , k − 2, and therefore

Avk = γk vk+1 + αk vk + βk−1vk−1.


Lanczos Method

Lanczos method ct.The coefficients are obtained from

βk−1 = (vk−1)T Avk = (Avk−1)T vk

= (γk−1vk + αk−1vk−1 + βk−2vk−2)T vk = γk−1,

i.e.Avk = βk vk+1 + αk vk + βk−1vk−1. (1)

Thus,αk = (vk )T Avk ,

and the condition ‖vk+1‖2 = 1 yields

βk = 1/‖Avk − αk vk − βk−1vk−1‖2. (2)

If the denominator in (2) vanishes, then Avk ∈ Kk (v ,A), and therefore,Kk (v ,A) is a k -dimensional invariant subspace of A.


Lanczos Method

Lanczos method ct.

1: v0 = 0; k = 12: β0 = ‖r0‖3: while r k−1 6= 0 do4: vk = r k−1/βk−15: r k = Avk

6: r k = r k − βk−1vk−1

7: αk = (vk )T r k

8: r k = r k − αk vk

9: βk = ‖r k‖10: end while

Then with Tk = tridiagβj−1, αj , βj

AVk = Vk Tk + r k (ek )T , r k = Avk − αk vk − βk−1vk−1.

and r k = ‖r k‖2vk+1 = βk vk+1


Lanczos Method

Lucky termination

The Lanczos method may terminate with βj = 0 for some j .

Then v j+1 = 0, and therefore

Av j = αjv j + βj−1v j−1 ∈ Kj (A, v1).

For i < j it holds by construction

Av i ∈ Kj (A, v1).

Hence, Kj (A, v1) is an invariant subspace of A, and therefore everyeigenvalue θ(j)

i of Tj is an eigenvalue of A, and the corresponding Ritz vectorsare eigenvectors of A.


Lanczos Method

Error boundIf θ(m)

i are the Ritz values (eigenvalues of Tm), s(m)i the corresponding

eigenvectors, and x (m)i = Vms(m)

i the Ritz vectors, it holds

(A− θ(m)i I)x (m)

i = (A− θ(m)i I)Vms(m)

i = AVms(m)i − Vm(θ

(m)i s(m)

i )

= VmTms(m)i + βmvm+1(em)T s(m)

i − VmTms(m)i = βmvm+1(em)T s(m)

i ,

which implies‖(A− θ(m)

i I)x (m)i ‖2 = βm|s(m)

m,i |.

Then by the Krylov & Bogoliubov Theorem there exists an eigenvalue λ of Asuch that

|λ− θ(m)i | ≤ βm|s(m)

m,i |.

Notice, that this error bound can be computet without determining the Ritzvector x (m)

i = Vms(m)i .


Lanczos Method

Convergence

The Lanczos method is a generalization of the power method whereapproximations to eigenvectors are obtained from the last iterate only.

As for the power method we therefore can expect fast convergence to theeigenvalue which is maximal in modulus and to the correspondingeigenvector.

One can influence the convergence of the power method by shifts, either toseparate the wanted eigenvalue more from the remaining spectrum or toenforce convergence to a different eigenvalue.

The Lanczos method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ R.

Hence, we can expect convergence of the Lanczos method to extremeeigenvalues first.


Lanczos Method

MonotonicityAssume that the eigenvalues of A and the Ritz values with respect to Km areordered by magnitude:

λ1 ≤ λ2 ≤ · · · ≤ λn, θ(m)1 ≤ θ(m)

2 ≤ · · · ≤ θ(m)m ,

Then the minmax principle yields (S denotes a subspace of Cm and S asubspace of Cn)

θ(m)j = min

dim S=jmaxy∈S

yHTmyyHy

= mindim S=j

maxy∈S

yHV Hm AVmy

yV Hm Vmy

= mindim S=j, S⊂Km

maxx∈S

xHAxxHx

≥ mindim S=j, S⊂Km+1

maxx∈S

xHAxxHx

= θ(m+1)j ≥ min

dim S=jmaxx∈S

xHAxxHx

= λj .

Each (finite) sequence θ(m)j m=j,j+1,j+2 therefore is monotonically decreasing

and bounded below by λj .Likewise the sequence of the j-largest eigenvalues of Tm are monotonicallyincreasing and bounded above by the j-largest eigenvalue of A.


Lanczos Method

Example


Lanczos Method

Convergence

Before getting results about the speed of convergence of the Lanczos methodwe first prove a bound for the angle between an eigenvector of A and a Krylovspace Km(v1,A). Denote by ui a system of orthonormal eigenvectorscorresponding to the eigenvalues λi .

LEMMA 10.1Let Pi be the orthogonal projector onto the eigenspace corresponding to λi . IfPiv1 6= 0 then

tan δ(ui ,Km) = minp∈Πm−1, p(λi )=1

‖p(A)y i‖2 tan δ(ui , v1),

where

y i =

(I−Pi )v1

‖(I−Pi )v1‖2if (I − Pi )v1 6= 0

0 otherwise


Lanczos Method

Proof

The Krylov space Km(v1,A) consists of all vectors which can be written asx = q(A)v1 where q ∈ Πm−1 is any polynomial of degree m − 1.With the orthogonal decomposition

x = q(A)v1 = q(A)Piv1 + q(A)(I − Pi )v1

it holds for the angle δ(x ,ui ) between x and ui

tan δ(x ,ui ) =‖q(A)(I − Pi )v1‖2

‖q(A)Piv1‖2=‖q(A)y i‖2

|q(λi )|‖(I − Pi )v1‖2

‖Piv1‖2,

and the scaling p(λ) := q(λ)/q(λi ) yields

tan δ(x ,ui ) = ‖p(A)y i‖2 tan δ(v1,ui )

from which we get the statement by minimizing over all x ∈ K(v1,A).


Lanczos Method

Convergence ct.

Inserting any polynomial of degree m − 1 which satisfies p(λi ) = 1 oneobtains an upper bound for tan δ(ui ,Km(v1,A)) from the last lemma.

DefinitionThe Chebyshev polynomials are defined as

ck (t) := cos(k · arccos t), |t | ≤ 1, k ∈ N ∪ 0.

In the following lemma we collect some properties of the Chebyshevpolynomials that are needed in the sequel.


Lanczos Method

Lemma 3.1

(i) The functions ck (t) satisfy the recurrence formula

c0(t) ≡ 1, c1(t) = t , ck+1(t) = 2tck (t)− ck−1(t), k ≥ 1.

In particular, ck is a polynomial of degree k .

(ii) For |t | ≥ 1 the Chebyshev polynomials have the following representation

ck (t) = cosh(k · Arcosh t), t ∈ R, |t | ≥ 1.

Proof: (i) follows from the properties of the cosine function, in particular from

cos((k + 1)t) + cos((k − 1)t) = 2 cos(kt) · cos t .

(ii) follows from the fact that the functions cosh(k · Arcosh t) satisfy the samerecurrence formula.


Lanczos Method

Theorem 10.2

Let α, β, γ ∈ R with α < β and γ 6∈ (α, β). Then the minimization problem

minp∈Πm, p(γ)=1

maxt∈[α,β]

|p(t)|

has a unique solution, and is solved by the scaled Chebyshev polynomial

cm(t) :=

cm(1 + 2 t−β

β−α )/cm(1 + 2 γ−ββ−α ) für γ > β

cm(1 + 2 α−tβ−α )/cm(1 + 2α−γβ−α ) für γ < α


Lanczos Method

ProofWe restrict ourselves to the case γ > β.

From t := 1 + 2(γ − β)/(β − α) 6∈ [−1,1] it follows that cm is defined, andobviously, cm is a polynomial of degree m, and cm(γ) = 1 holds.

It remains to show, that cm is the unique solution of the minimum problem.

Assume that qm is a polynomial of degree m such that qm(γ) = 1 and

maxα≤t≤β

|qm(t)| ≤ 1cm(1 + 2(γ − β)/(β − α))

=: cm.

Obviously, the Chebyshev polynomial cm(τ) attains the values +1 and −1alternately at the arguments τj := cos(jπ/k), j = 0, . . . , k . Hence,

cm(tj ) =(−1)j

cmfor tj :=

2τj − α− ββ − α , j = 0, . . . ,m.


Lanczos Method

Proof ct.

Let rm := cm − qm. Then from

|qm(tj )| ≤1

cm= |cm(tj )|, j = 0, . . . ,m,

one obtains that

r(tj ) ≥ 0, if j is even, r(tj ) ≤ 0, if j is odd.

From the continuity of rm we get the existence of a root tj in every interval[tj , tj−1], j = 1, . . . , k . Moreover, rm(γ) = cm(γ)− qm(γ) = 0.

Therefore, the polynomial rm of degree m has at least m + 1 roots, and itfollows that rm(t) ≡ 0, i.e. qm = cm.


Lanczos Method

THEOREM 10.3The angle δ(ui ,Km(v1,A)) between the exact eigenvector ui and the m-thKrylov space satisfies the inequality

tan δ(ui ,Km) ≤ κi

cm−i (1 + 2ρi )tan δ(ui , v1) (1)

where

κ1 = 1, κi =i−1∏

j=1

λn − λj

λi − λjfür i > 1 (2)

andρi =

λi+1 − λi

λn − λi+1. (3)

In particular for i = 1 one gets the estimate

tan δ(u1,Km(v1,A)) ≤ 1cm−1(1 + 2ρ1)

tan δ(v1,u1) where ρ1 =λ2 − λ1

λn − λ2.

Crucial for the convergence is the distance of the two smallest eigenvaluesrelative to the width of the entire spectrum.


Lanczos Method

Proof

We consider the case i = 1 first.

Expanding the vector y i in the basis ui of eigenvectors yields

y i =n∑

j=1

αjuj , wheren∑

j=1

|αj |2 = 1

from which we get

‖p(A)y1‖2 =n∑

j=2

|p(λj )αj |2 ≤ maxj=2,...,n

|p(λj )|2 ≤ maxλ∈[λ2,λn]

|p(λ)|2,

and the statement follows from Theorem 10.2.


Lanczos Method

Proof ct.For i > 1 we consider in Lemma 10.1 polynomials of the form

p(λ) :=(λ− λ1) · · · · · (λ− λi−1)

(λi − λ1) · · · · · (λi − λi−1)q(λ)

with q ∈ Πm−i and q(λi ) = 1.

Then one gets as before

‖p(A)y i‖2 ≤ maxλ∈[λi+1,λn]

∣∣∣∣∣∣

i−1∏

j=1

λ− λj

λi − λjq(λ)

∣∣∣∣∣∣

≤i−1∏

j=1

λn − λj

λi − λjmax

λ∈[λi+1,λn]|q(λ)|.

The result follows by minimizing this expression over all polynomials qsatisfying the constraint q(λi ) = 1.


Lanczos Method

THEOREM 10.4 (Kaniel & Paige; 1. eigenvalue)

Let A ∈ Cn×n be Hermitean with eigenvalues λ1 ≤ λ2 ≤ · · · ≤ λn andcorresponding orthonormal eigenvectors u1, . . . ,un.

If θ(m)1 ≤ · · · ≤ θ(m)

m denote the eigenvalues of the matrix Tm obtained after msteps of Lanczos’ method, then

0 ≤ θ(m)1 − λ1 ≤ (λn − λ1)

(tan δ(u1, v1)

cm−1(1 + 2ρ1)

)2

,

where ρ1 = (λ2 − λ1)/(λn − λ2).

Crucial for the speed of convergence is the growth of cj−1(1 + 2ρ1), i.e. theseparation of the first two eigenvalues relative to the width of the entirespectrum of A.


Lanczos Method

Proof

The left inequality follows from Rayleigh’s principle.

We have

θ(m)1 = min

x∈Km(v1,A),x 6=0

xHAxxHx

,

and, since each x ∈ Km(v1,A) can be represented as x = q(A)v1 for someq ∈ Πm−1, it follows

θ(m)1 − λ1 = min

x∈Km(v1,A),x 6=0

xH(A− λ1I)xxHx

= minq∈Πm−1,q 6=0

(v1)Hq(A)H(A− λ1I)q(A)v1

(v1)Hq(A)2v1 .


Lanczos Method

Proof ct.With v1 =

∑nj=1 αjuj it holds

θ(m)1 − λ1 = min

q∈Πm−1,q 6=0

∑nj=2(λj − λ1)|αjq(λj )|2∑n

j=1 |αjq(λj )|2

≤ (λn − λ1) minq∈Πm−1,q 6=0

∑nj=2 |αjq(λj )|2|α1q(λ1)|2

≤ (λn − λ1) minq∈Πm−1,q 6=0

maxj=2,...,n

|q(λj )|2|q(λ1)|2 ·

∑nj=2 |αj |2|α1|2

Defining p(λ) = q(λ)/q(λ1), and observing that the set of all p:s when qpasses through the set Πm−1 is the set of all polynomials of degree notexceeding m − 1 and satisfying the constraint p(λ1) = 1 we get

θ(m)1 − λ1 ≤ (λn − λ1) tan2 δ(u1, v1) min

p∈Πm−1,p(λ1)=1max

λ∈[λ2,λn]|p(λ)|2

and the statement follows from Theorem 10.2.Heinrich Voss (Hamburg University of Technology) Krylov subspace methods Eigenvalue problems 2012 27 / 95

Lanczos Method

ExampleMatrix A has eigenvalue λ1 = 1 and 99 eigenvalues uniformly distributed in[α, β].

it. [α, β]=[20,100] [α, β]=[2,100]error bound error bound

2 3.3890e+001 3.4832e+003 1.9123e+001 1.1385e+0053 1.9708e+001 1.0090e+002 1.0884e+001 7.5862e+0044 6.6363e+000 1.5083e+001 5.8039e+000 5.5241e+0045 1.5352e+000 2.2440e+000 4.1850e+000 3.8224e+004

10 7.9258e-005 1.6293e-004 1.8948e+000 4.2934e+00315 4.3730e-009 1.1827e-008 1.0945e+000 4.1605e+00220 2.9843e-013 8.5861e-013 1.6148e-001 3.9725e+00125 1.4843e-002 3.7876e+00030 1.6509e-003 3.6109e-00135 2.0658e-004 3.4423e-00240 5.8481e-006 3.2816e-00345 9.5770e-007 3.1285e-00450 3.0155e-009 2.9824e-00555 8.0487e-012 2.8432e-00660 3.1530e-014 2.7105e-007


Lanczos Method

THEOREM 10.5 (Kaniel & Paige; higher eigenvalues)Under the conditions of Theorem 10.4 it holds

0 ≤ θ(m)j − λj ≤ (λn − λ1)

(κ

(m)j tan δ(v1,ui )

cm−j (1 + 2ρj )

)2

withρj = (λj+1 − λj )/(λn − λj+1),

and

κ(m)1 ≡ 1, κ(m)

j =

j−1∏

i=1

λn − θ(m)i

λj − θ(m)i

.

The general case j > 1 can be proved by using the maxmin characterizationof θ(m)

j of Courant and Fischer.

Analogous results hold for the largest eigenvalues and Ritz values.


Lanczos Method

Orthogonality of basis vectors

In exact arithmetic the Lanczos method generates an orthonormal basis of theKrylov space Km(v1,A). In the algorithm only the orthogonality with respect totwo basis vectors v j and v j−1 obtained in the last two steps is enforced, withrespect to the previous v i :s it follows from the symmetry of A.

It can be shown that in floating point arithmetic the orthogonality is destroyedwhen a sequence θ(j)

i , j = 1,2, . . . , of Ritz values has converged to aneigenvalue λ of A, i.e. if the residual βjs

(j)j,i has become small.

Thereafter all v j :s obtain a component in the direction of the eigenspace ofthe converged eigenvalue, and a duplicate copy of that eigenvalue will showup in the spectrum of the tridiagonal matrix Tm.

This effect was first observed and studied by Paige (1971). A detaileddiscussion is contained in the monograph of Parlett (1998).


Lanczos Method

Orthogonality of basis vectors ct.

Note that these multiple Ritz values have nothing to do with possible multipleeigenvalues of a given matrix, they occur simply as a result of a convergedeigenvalue.

For multiple eigenvalues the Lanczos (and Arnoldi) method in exact arithmeticcan only detect one eigenvector, namely the projection of the initial vector v1

to the corresponding eigenspace. Further eigenvectors can only be obtainedrestarting with a different initial vector or by a block Lanczos (Arnoldi) method.

A simple trick to detect duplicate copies of an eigenvalue advocated byCullum & Willoughby (1986) is the following.

Compute the eigenvalues of the reduced matrix Tm ∈ Rm−1×m−1 obtainedfrom Tm by detracting the first row and column. Those eigenvalues that differless than a small multiple times machine precision from the eigenvalues of Tmare the unwanted eigenvalues, i.e. the ones due to loss of orthogonality.


Lanczos Method

ExampleConvergence of the Lanczos process for A = diag (rand(100,1))


Lanczos Method

Complete reorthogonalization

In each step the new vector v j+1 is reorthogonalized against all previousvectors v i .

With classical Gram–Schmidt this means: v j+1 is replaced by

v j+1 = v j+1 − VjV Hj v j+1.

If the norm is decreased by a nontrivial amount, say

‖v j+1‖ < 1√2‖v j+1‖,

the reorthogonalization will have to be repeated.

Complete reorthogonalization is very reliable, but very expensive.


Lanczos Method

Lanczos with complete reorthogonalization

1: Choose initial vector v1 with ‖v1‖ = 12: Set v0 = 0; β0 = 0; V = [v1]3: for j = 1,2, . . . do4: v j+1 = Av j − βj−1v j−1

5: αj = (v j )Hv j+1

6: v j+1 = v j+1 − αjv j

7: v j+1 = v j+1 − VV Hv j+1

8: βj = ‖v j+1‖9: v j+1 = v j+1/βj

10: V = [V , v j+1]11: Solve projected eigenproblem Tjs = θs12: Test for convergence13: end for


Lanczos Method

ExampleConvergence of the Lanczos process with complete reorthogonalization forA = diag (rand(100,1))


Lanczos Method

THEOREM 10.8 (Paige)

Let Vk = [v1, . . . , vk ] be the matrix of vectors actually obtained in the Lanczosalgorithm, Θk = diagθ1, . . . , θk and Sk = [s1, . . . , sk ] such that Tk Sk = Sk Θkand SH

k Sk = Ik .

Let yk,i = Vk si be the corresponding Ritz vectors. Then it holds

(yk,i )Hvk+1 =O(ε‖A‖2)

βk |sk,i |.

Hence, the component (yk,i )Hvk+1 of the computed Lanczos vector vk+1 inthe direction of the Ritz vector yk,i is proportional to the reciprocal of the errorbound for the Ritz value θi


Lanczos Method

Selective reorthogonalization

By Paige’s theorem the v j : s lose orthogonality since the vector v j+1 obtainedin the final step has a large component with respect to the Ritz vectory = [v1, . . . , v j ] ∗ s corresponding to the converged Ritz value θ (measured bythe error bound βj |sj |).

This suggests to monitor the error bounds βj |sj | for all eigenvectors s of Tj inevery iteration step, and to reorthogonalize v j+1 against the Ritz vector y :

v j+1 = v j+1 − (yHv j+1)y .

This so called selective reorthogonalization is applied if

βj |sj | <√ε‖Tj‖

(actually ‖A‖ would have been needed on the right hand side, but ‖A‖ is notavailable).


Lanczos Method

Selective reorthogonalization1: Choose initial vector v1 with ‖v1‖ = 12: Set v0 = 0; β0 = 0; V = [v1]3: for j = 1,2, . . . do4: v j+1 = Av j − βj−1v j−1

5: αj = (v j )Hv j+1

6: v j+1 = v j+1 − αjv j

7: βj = ‖v j+1‖8: Solve tridiagβi−1, αi , βiS = SΘ9: for i = 1, . . . , j do

10: if βj |s(i)j | <

√εmax(diagΘ) then

11: y = [v1, . . . , v j ]s12: v j+1 = v j+1 − (yHv j+1)y13: end if14: end for15: βj = ‖v j+1‖16: v j+1 = v j+1/βj17: V = [V , v j+1]18: end for


Lanczos Method

Partial reorthogonalization

It can be shown (Simon (1984)) that the properties of the Lanczos method arewidely retained as long as the basis is semiorthogonal, i.e.

V Hj Vj = Ij + E with ‖E‖2 ≤

√ε,

where ε denotes the rounding unit.

If the tridiagonal matrix Tj is determined using a semiorthogonal basis Vj thenthere exists an orthonormal basis Nj of spanVj such that

Tj = V Hj AVj = NH

j ANj + G with ‖G‖2 = O(ε‖A‖2).

Hence, the eigenvalues of the problem projected to spanVj are obtained withfull precision.


Lanczos Method

Partial reorthogonalization

1: Choose initial vector v1 with ‖v1‖ = 12: Set v0 = 0; β0 = 0; V = [v1]3: for j = 1,2, . . . do4: v j+1 = Av j − βj−1v j−1

5: αj = (v j )Hv j+1

6: v j+1 = v j+1 − αjv j

7: βj = ‖v j+1‖8: v j+1 = v j+1/βj9: if ‖[V , v j+1]H [V , v j+1]− Ij+1‖ >

√ε then

10: v j+1 = v j+1 − VV Hv j+1

11: βj = ‖v j+1‖12: v j+1 = v j+1/βj13: end if14: V = [V , v j+1]15: end for


Arnoldi Method

Arnoldi MethodOne way to extend the Lanczos process to non-symmetric matrices is due toArnoldi (1951). Consider the Hessenberg reduction V T AV = H with V T V = I .

If V = [v1, . . . , vn], then the k th column in AV = VH reads

Avk =k+1∑

j=1

hjk v j , 1 ≤ k ≤ n − 1.

Isolating the last term in the summation gives

hk+1,k vk+1 = Avk −k∑

j=1

hjk v j =: r k ,

from which we obtain hjk = (v j )T Avk for j = 1, . . . , k , and if r k 6= 0, then vk+1

is defined asvk+1 = r k/hk+1,k .


Arnoldi Method

Arnoldi method1: choose initial vector v1 with ‖v1‖ = 1, V1 = [v1]2: compute w = Av1, h = (v1)T w , r = w − v1h, H1 = [h], β = ‖r‖23: for j = 1,2, . . . do4: v j+1 = r/β

5: Vj+1 = [Vj , v j+1], Hj =

[HjβeT

j

]

6: w = Av j+1

7: h = V Tj+1w , r = w − Vj+1h

8: if ‖r‖2 < η‖h‖2 then9: s = V T

j+1r , r = r − Vj+1s10: h = h + s11: end if12: Hj+1 = [Hj ,h], β = ‖r‖213: compute approximate eigenvalues of Hj+114: test for convergence15: end for

The vk are called Arnoldi vectors. They obviously define an orthonormal basisof the Krylov space Kk (v1,A) = spanv1,Av1, . . . ,Ak−1v1.


Arnoldi Method

Arnoldi method; compact formIn compact form the Arnoldi method can be written as

AVm = VmHm + hm+1,mvm+1eTm

V Tm Vm = Im, V T

m vm+1 = 0,

V Tm AVm = Hm =

h11 h12 h13 . . . . . . h1mh21 h22 h23 . . . . . . h2m0 h32 h33 . . . . . . h3m

. . . . . ....

. . . . . ....

hm,m−1 hmm

The eigenvalue problemHms = θs

can be solved inexpensively by the QR algorithm.


Arnoldi Method

Cost

For large matrices the Arnoldi method becomes costly, both in terms ofcomputation and storage.

We need to keep m vectors of length n plus an m ×m Hessenberg matrix.

For the arithmetic costs, we need to multiply v j+1 by A, at the cost of 2Nz ,where Nz is the number of nonzero elements of A, and then orthogonalize theresult against j basis vectors, at the cost of 4(j + 1)n.

Thus, an m-dimensional Arnoldi costs ≈ nm + 0.5m2 in storage and≈ 2nNz + 2nm2 in arithmetic operations.

In the symmetric case only the last two basis vectors v j are needed whendetermining the projection Hm. The previous vectors are not even needed todetermine an error bound, and can be stored on secondary storage until theRitz vectors are computed.


Arnoldi Method

Backward ErrorFor symmetric A the theorem of Krylov and Bogoliubov immediately yieldedan error bound. Simmilarly, we obtain for general A the backward error fromthe compact form of the Arnoldi recurrence

hm+1,m|sT em| = min ‖E‖2 such that (A + E − θI)x = 0, x := Vms.

Ex = (A− θI)Vms = VmHms + hm+1,mvm+1(em)T s − VmHms

= hm+1,mvm+1(em)T s,

from which we obtain

‖E‖2 = maxy 6=0

‖Ey‖2

‖y‖2≥ ‖Ex‖2

‖x‖2= hm+1,m|sT em|.

Conversely, for E = −hm+1,mvm+1xT · sT em one gets

(A+E−θI)Vms = 0 and ‖E‖2 = hm+1,m‖vm+1‖2‖x‖2|sT em| = hm+1,m|sT em|.


Arnoldi Method

ConvergenceThe Arnoldi method is a generalization of the power method whereapproximations to eigenvectors are obtained from the last iterate only (or thetwo last iterates in case of a complex eigenvalue).

As for the power method we therefore can expect fast convergence to theeigenvalue which is maximal in modulus and to the correspondingeigenvector.

One can influence the convergence of the power method by shifts, either toseparate the wanted eigenvalue more from the remaining spectrum or toenforce convergence to a different eigenvalue.

The Arnoldi method is independent of shifts since

Km(v1,A) = Km(v1,A + αI) for all α ∈ C.

Hence, we can expect convergence of the Arnoldi method to extremeeigenvalues first.


Arnoldi Method

ExampleEigenvalues (blue plus) of a random tridiagonal (100,100) matrix andapproximations (red circle) after 10 steps of Arnoldi:


Arnoldi Method

Convergence of Arnoldi Method

For nonsymmetric matrices and the Arnoldi process the speed ofconvergence was analyzed by Saad (1983).

This was done by considering the distance of a particular eigenvector u1 of Afrom the subspace Km(v1,A).

Letε(m) := min

p∈Π∗m−1

maxλ∈σ(A)\λ1

|p(λ)|

where σ(A) is the spectrum of A, and Π∗m−1 denotes the set of polynomials ofmaximum degree m − 1 such that p(λ1) = 1.

The following lemma relates the distance of this quantity to ‖(I − Pm)u1‖where Pm denotes the projector onto Km(v1,A).


Arnoldi Method

Convergence of Arnoldi Method ct.LEMMA 10.6Assume that A is diagonalizable and that the initial vector v1 of Arnoldi’smethod has the expansion v1 =

∑nj=1 αjuj with respect to an eigenbasis

u1, . . . ,un of A where ‖uj‖ = 1 and α1 6= 0. Then the following inequality holds

‖(I − Pm)u1‖ ≤ ξε(m) where ξ =n∑

j=2

|αj |/|α1|.

Hence, upper bounds of ε(m) estimate the speed of convergence of theArnoldi method.

THEOREM 10.7Assume that all eigenvalues of A but λ1 are lying in an ellipse with center c,focal points c − e and c + e and large semiaxis a. Then it holds

ε(m) ≤ cm−1( ae )∣∣cm−1(λ1−ce )

∣∣ ,

where cm−1 denotes the Chebyshev polynomial of degree m − 1.The relative difference between the right and left hand side converges to 0.


Arnoldi Method

Refined Ritz vectorAfter a Ritz pair (θ, y) has been determined, the approximation y to theeigenvector can be improved solving the optimization problem

‖Az − θz‖2 = min!, z ∈ Km(v1,A), ‖z‖2 = 1,

This improvement was introduced by Jia (1997) and was called refined Ritzvector although a solution in general is not a Ritz vector corresponding to θ.

Given a Ritz pair the refined Ritz vector can be obtained from the augmentedHessenberg matrix

Hm =

h11 h12 h13 . . . . . . h1mh21 h22 h23 . . . . . . h2m0 h32 h33 . . . . . . h3m

. . . . . ....

. . . . . ....

hm,m−1 hmmhm+1,m

∈ R(m+1)×m


Arnoldi Method

Refined Ritz vector ct.

z ∈ Km(v1,A) can be written as z = Vmt , and ‖z‖2 = 1 holds if and only if‖t‖2 = 1.

Hence,

‖(AVm − θVm)t‖2 = ‖(Vm+1Hm − θVm)t‖2

= ‖Vm+1(Hm − θIm+1,m)t‖2

= ‖(Hm − θIm+1,m)t‖2

and this expression attains its minimum under the constraint ‖t‖2 = 1 for theright singular vector of Hm − θIm+1,m corresponding to the smallest singularvalue.


Arnoldi Method

Explicit restarts

The growing storage and arithmetic cost may make restarts of the Arnoldialgorithm necessary.

Since the Arnoldi method naturally starts with one vector, one of the moststraightforward restarting schemes is to reduce the whole basis into onevector and start the new Arnoldi iteration with it.

If only one eigenvalue is required (for instance the one with the largest realpart), we can choose to restart with the corresponding Ritz vector.

If more than one eigenvalue is wanted, we may add all Ritz vectors together toform one starting vector, or use a block version of the Lanczos algorithm thathas the same block size as the number of wanted eigenvalues.

These options are simple to implement but not nearly as effective as the moresophisticated ones such as the implicit restarting scheme and the thick restartscheme.


Arnoldi Method

Implicit restarts (Sorensen 1992)With m = k + p steps of the Arnoldi method one determines a factorization

AVm = VmHm + rm(em)H .

With p steps of the QR algorithm with implicit shifts for Hm one gets

AV +m = V +

m H+m + rm(em)HQ (∗)

where Q = Q1Q2 · · · · ·Qp, and Qj are the orthogonal matrices from the pQR–steps, and V +

m = VmQ, H+m = QHHmQ.

The leading k − 1 components of (em)HQ are 0. Hence, the leading kcolumns of (*) have the form

AV +k = V +

k H+k + (r k )+(ek )H

with the updated residual (r k )+ = V +m ek+1hk+1,k + rmQ(m, k).


Arnoldi Method

Implicitly restarted Arnoldi method (IRA)

1: Choose initial vector v1 with ‖v1‖ = 12: Determine AVm = VmHm + rm(em)T for m = k + p3: while maxj=1,...,k |tj+1,j | > tol do4: Determine eigenvalues of Hm and choose shifts µ1, . . . , µp5: Q = Im6: for j = 1, . . . ,p do7: Compute QR factorization QjRj = Hm − µj I;8: Hm = QH

j HmQj9: Q = QQj

10: end for11: V +

k = VmQ(:,1 : k);12: H+

k = Hm(1 : k ,1 : k);13: (r k )+ = V +

m ek+1hk+1,k + rmQ(m, k)14: Determine AVm = VmHm + rm(em)H by p Arnoldi steps

starting with AV +k = Vk H+

k + (r k )+(ek )H

15: end while


Arnoldi Method

Advantages

From the standpoint of numerical stability the updating scheme has severaladvantages:

(i) Orthogonality can be maintained since the value k is modest.(ii) There is no question of spurious solutions(iii) There is a fixed storage requirement(iv) Deflation techniques similar to those associated with the QR iteration for

dealing with numerically small diagonal elements of Hk (or Tk in thesymmetric case) may be taken advantage of directly.


Arnoldi Method

Choice of shiftsApplying one QR–step with shift µ is equivalent to multiplying v1 by A− µI(actually multiplying e1 ∈ Rm by Hm − µIm), p QR steps with shifts µ1, . . . , µptherefore corresponds to a multiplication

v1 ← ψ(A)v1 mit ψ(λ) =

p∏

j=1

(λ− µj ).

If for instance λ(A) is known to be contained in D ⊂ C, and if the eigenvaluesin D ⊂ D are wanted, then it is reasonable to chose the shifts as roots of apolynomial ψ the modulus of which is as large as possible on D and as smallas possible on D.

This suggests for instance the roots of a

• Chebyshev polynomial with respect to D \ D (Saad (1984))• least squares polynomial (Saad (1987))• Leja polynomial for D \ D (Baglama, Calvetti & Reichel (1998))


Arnoldi Method

Leja polynomialDefinition: Let K ⊂ C be a compact set, and let w : K → R+ be a continuousweight function.

Let a sequence of points zk be defined recursively by(i) z1 ∈ K : w(z1)|z1| = maxz∈K w(z)|z|(ii) zk ∈ K : w(zk )

∏k−1j=1 |zk − zj | = maxz∈K w(z)

∏k−1j=1 |z − zj |, k = 2,3, . . .

Then zk are called Leja points, and the polynomial

ψ(λ) =

p∏

k=1

(λ− zk )

is called Leja polynomial of degree p with respect to w .

There is no easy way to determine Leja points. However, Baglama, Calvetti &Reichel (1998) contains a method to determine approximations (called fastLeja points) in an efficient way.


Arnoldi Method

Exact shifts

Lehoucq & Sorensen (1996) suggested exact shifts, i.e. λ(Tm) is decomposedinto k wanted and p unwanted eigenvalues, and the unwanted eigenvaluesare chosen as shifts.

Possible wanted eigenvalues are• the k largest / smallest eigenvalues• the k largest / smallest eigenvalues in modulus• the k right most eigenvalues• the k eigenvalues with largest/smallest imaginary part• the k eigenvalues which are closest to an excitation frequency

Other strategies include• refined shifts (Jia 1998)• harmonic Ritz values (Morgan 1991)


Arnoldi Method

Locking and Purging

It may happen that a Ritz pair converged (θ(m)i ,Vms(m)

i ) without hj+1,j or βjhaving become small.

If θ(m)i is a wanted eigenvalue, then in the next step of the Arnoldi (or Lanczos)

method the factorization can be curtailed to

Av1 = θ1v1 + “small perturbation”, (v1 := Vms(m)i ))

AV2 = V2T2 + hk+1,k r(ek−1)H .

with V H2 v1 = 0. (θ1, v1) then is “locked” and will not be changed in the

subsequent steps.

If θ(m)i is unwanted it may happen that the the influence of θ(m)

i can not beremoved by the QR iteration. This situation can be handled by a specialdeflation technique called “purging” (cf. Lehoucq & Sorensen 1996).


Arnoldi Method

Software

Implementations of the Arnoldi method with implicit shifts by Lehoucq,Sorensen & Yang (1998) are freely available

• ARPACK (Fortran 77)• P_ARPACK (Fortran 77, parallel version)• eigs (MATLAB) (which calls ARPACK)

For symmetric eigenvalue problems Wu and Simon (2000) proposed analternative restarted version of Lanczos (called thick restarts) which ismathematically equivalent to the implicitly restarted Lanczos method withexact shifts. An implementation

• TRLAN (Fortran 90)is also freely available.The code can run on a single address machine or in a distributed parallelenvironment, which requires MPI.


Arnoldi Method

Thick restartIRA projects Ax = λx to the Krylov space Km(A,VmQ(:,1)), and with exactshifts µj this subspace is

spany1, . . . , yk , vk+1,Avk+1, . . . ,Ap−1vk+1 (∗)

where y j denotes the Ritz vector corresponding to the kept Ritz values.

It was shown by Morgan (1996) that the subspace (*) is equal to

spany1, . . . , yk ,Ay i ,A2y i , . . . ,Apy i

for every i ∈ 1, . . . , k. This helps to explain the efficiency of IRA, since foreach Ritz vector y j the IRA subspace contains a Krylov subspace Kp+1(y j ,A)with starting vector y j .

Wu and Simon (2000) developed an alternative restarted version of theLanczos method (called thick restarts) which is equivalent to IRA with exactshifts. Instead of using the QR algorithm they orthonormalize the vectorsy1, . . . , yk , vk+1,Avk+1, . . . ,Ap−1vk+1 in order to generate an orthonormalbasis of the subspace (*).


Generalized Eigenvalue Problems

Generalized Hermitean eigenproblemThere are several variants of the Lanczos algorithm for the generalizedHermitean eigenvalue problem

Ax = λBx , A = AH , B = BH , B positive definite (∗).

They all correspond to a reformulation as a standard eigenproblem Cy = θy

Problem (*) can be transformed to a symmetric eigenproblem

Cy := R−HAR−1y = λy , x = R−1y

where R denotes the Cholesky factor B = RHR.

Alternatively the problem Cx = B−1Ax = λx is symmetric with respect to thescalar product 〈x , y〉B := yHBx . Hence, one can construct by the Lanczosprocess a B–orthogonal basis of the Krylov space Km(C, v) such that theprojected problem is tridiagonal.

Obviously in each step we have to multiply a vector by C, i.e. we have to solveone linear system.



Lanczos method for C := B−1AThe Lanczos method for C := B−1A computes a basis Vj of Kj (v1,C) and areal symmetric tridiagonal matrix Tj such that

AVj = BVjTj + r(ej )H

with V Hj BVj = Ij , V H

j AVj = Tj , V Hj Br = 0.

Ritz pairs (θ(j)i , x i,(j)) are obtained from the tridiagonal eigenproblem

Tjsi,(j) = θ(j)i si,(j), x i,(j) = Vjsi,(j).

For the residual it holds

r i,(j) = Ax i,(j) − Bx i,(j)θ(j)i = AVjsi,(j) − BVjsi,(j)θ

(j)i

= (AVj − BVjTj )si,(j) = r(ej )Hsi,(j) = Bv j+1βjsi,(j)j

from which we obtain

‖r i,(j)‖2B−1 = (r i,(j))HB−1r i,(j) = |βjs

i,(j)j |2



Theorem 10.9Let A,B ∈ Cn×n Hermitean and B positive definite, and denote by λj ,j = 1, . . . ,n the eigenvalues of Ax = λBx . Then it holds

minj=1,...,n

|λj − θ| ≤‖Ax − θBx‖B−1

‖x‖B.

Proof: Let uj be a set of B-orthonormal eigenvectors corresponding to λj andx =

∑nj=1 αjuj . Then it holds ‖x‖2

B =∑n

j=1 |αj |2, and

‖Ax − θBx‖B−1 = ‖n∑

j=1

αj (Auj − θBuj )‖B−1 = ‖n∑

j=1

αj (λj − θ)Buj‖B−1

=n∑

j,k=1

αk (λk − θ)αj (λj − θ)(Bvk )HB−1Bv j =n∑

j=1

|αj (λj − θ)|2

≥ minj=1,...,n

|λj − θ|2‖x‖2B.

As in the standard case we only have to monitor the subdiagonal elements βj

of Tj and the last component si,(j)j of its eigenvectors to control the errors of

the Ritz values.Heinrich Voss (Hamburg University of Technology) Krylov subspace methods Eigenvalue problems 2012 64 / 95


Lanczos method for C := B−1A

1: Start with q = x , determine r = Bq, β0 =√

qH r2: for j=1,2,. . . until convergence do3: v j = q/βj−14: w j = r/βj−15: r = Av j

6: r = r − βj−1w j−1

7: αj = (v j )H r8: r = r − αjw j

9: reorthogonalize if necessary10: solve Bq = r for q11: βj =

√qH r

12: solve eigenproblem Tj = SΘjSH

13: test for convergence14: end for15: compute approximate eigenvectors X = VjS



CommentsTo simplify the description of the B-orthogonalization we introduce an auxiliarybasis Wj := BVj , which is B−1-orthogonal, i.e. W H

j B−1Wj = Ij , and for whichW H

j Vj = Ij .

In step 9 we only have to reorthogonalize one of the bases Vj and Wj wherewe can use (as for the standard eigenproblem) complete or selective or partialreorthogonalization.

The complete reorthogonalization obtains the form

r = r − B(Vj (V Hj r)),

and this step is repeated if the new residual r and Vj are not yet orthogonal.

The additional multiplication by B in the reorthogonalization step can beavoided if both bases, Vj and Wj are stored. Then the reorthogonalization canbe performed by

r = r −WjV Hj r .



Comments ct.

The algorithm is stopped when the Ritz values θ(j)i are sufficiently good

approximations of the wanted eigenvalues of the pencil Ax = λBx .

The estimate |βjsi,(j)j | may be too optimistic if the basis Vj is not fully

B-orthogonal. Then the Ritz vector x i,(j) may have its norm smaller than 1,and we have to replace the estimate by

‖r i,(j)‖B−1 ≈ |βjsi,(j)j |/‖Vjsi,(j)‖B.

The Ritz vectors of the original matrix pencil are computed only when the testin step 13 has indicated that the wanted eigenvalues have been converged.



Shift-and-invertUnder general conditions an eigenvalue problem

Lu(x) = λMu(x), x ∈ Ω, Bu(x) = 0, x ∈ ∂Ω,

with elliptic operators L and M has a countable set of eigenvalues λn, whichare clustered only at∞.

For instance, for an ordinary differential operator L of second order and M = Iit holds λn = O(n2). Hence the small eigenvalues are relatively close to eachother whereas for large eigenvalues the distances will grow.

Therefore, for a discretization by the Rayleigh–Ritz method the spectrum willbe very widely extended, in the lower part the eigenvalues will be clustered(relatively to the width of the spectrum) and higher eigenvalues will be wellseparated.

Usually one is interested in small eigenvalues. Therefore, the Lanczosmethod is applied to A−1B or to (A− σB)−1B if eigenvalues in the vicinity of afixed parameter σ are of interest.



Shift-and-invert

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−10

−5

0

5

10

λ1

λ2

1/λ2

1/λ1



Shift–and–invert Lanczos methodThe shift-and-invert variant corresponds to the application of the Lanczosmethod to C := B(A− σB)−1 for some shift σ.

It gives eigenvalues close to σ, and usually one gets convergence after asmall number of steps. Even if systems with the shifted matrix A− σB aremore laborious to solve than those with B needed in the direct variant, thesmaller number of required steps will very often compensate for this.

The basic recursion of the shift-and-invert method is

B(A− σB)−1Vj = VjTj + r(ej )H . (∗)

If Vj is chosen to be B−1-orthogonal, i.e. V Hj B−1Vj = Ij then multiplying (∗) by

V Hj B−1 one gets

V Hj (A− σB)−1Vj = Tj ,

and the Lanczos process yields V Hj B−1r = 0.



Shift–and–invert Lanczos method ct.An eigenvalues θ(j)

i of the tridiagonal matrix Tj is an approximate eigenvalue of

C := B(A− σB)−1,

and thereforeλ

(j)i = σ +

1

θ(j)i

is an approximate eigenvalue of the original pencil Ax = λBx .

If si,(j) is a corresponding eigenvector, then the Ritz vector

x i,(j) := B−1Vjsi,(j)

is an approximation to the corresponding eigenvector of Ax = λBx .

Let Wj be the auxiliary basis for which Vj = BWj . Then Wj is B-orthogonal, Vjand Wj are biorthogonal, i.e. W H

j Vj = Ij , and the Ritz vectors are

x i,(j) := Wjsi,(j).



Shift–and–invert Lanczos method ct.Multiplying the basic recursion by (A− σB)B−1 from the left and si,(j) from theright, one gets

Vjsi,(j) = (A− σB)B−1VjTjsi,(j) + βj (A− σB)B−1v j+1si,(j)j ,

and it follows for the residual of a Ritz pair

r i,(j) = Ax i,(j) − λ(j)i Bx i,(j) = (A− σB)x i,(j) − 1

θ(j)i

Bx i,(j)

=1

θ(j)i

((A− σB)B−1Vjsi,(j)θ

(j)i − Vjsi,(j)

)

= − 1

θ(j)i

(A− σB)B−1v j+1βjsi,(j)j .

In this case we do not obtain an error bound for the Ritz values from theresidual (but only for the harmonic Ritz values). Nevertheless, |βjs

i,(j)j | is an

error indicator and is used in termination conditions.



Shift–and–invert Lanczos algorithm

1: Start with r = x , compute q = Br , β0 =√

qH r2: for j=1,2,. . . until convergence do3: v j = q/βj−14: w j = r/βj−15: Solve (A− σB)r = v j for r6: r = r − βj−1w j−1

7: αj = (v j )H r8: r = r − αjw j

9: reorthogonalize if necessary10: q = Br11: βj =

√qH r

12: solve eigenproblem Tj = SΘjSH

13: test for convergence14: end for15: compute approximative eigenvectors X = WjS



Shift–and–invert Lanczos algorithmSince for the shift-and-invert method we can expect rapid convergence toeigenvalues close to the shift one usually applies completereorthogonalization

r = r −Wj (W Hj (Br))

orr = r −Wj (V H

j r)

until r and the basis Wj are B-orthogonal.

The linear system in step 5 one uses a factorization

LDLH = PT (A− σB)P

for an appropriate sparsity preserving permutation P, which is determined inthe beginning using sparse Gaussian elimination. Then r in step 5 is obtainedas

r = P(L−H(D−1(L−1(PT v j )))).

If all eigenvalues are on one side of the shift then A− σB is definite, and onecan use (sparse) Cholesky, otherwise the matrix A− σB is indefinite, and onehas to use a symmetric indefinite factorization.



Shift–and–invert Arnoldi

The shift-and-invert idea can be used for non-Hermitean eigenvalue problemsand regular pencils Ax = λBx (i.e. det(A− λB) 6≡ 0) to determine eigenvaluesin the vicinity of a given shift σ.

Since B is no longer definite the Arnoldi method constructs an orthogonalbasis (with respect to the Euclidean inner product) of Km(v1,C) whereC := (A− σB)−1B such that

(A− σB)−1BVm = VmHm + hm+1,mvm+1eTm

The method converges to eigenvalues close to the shift first, and theconvergence is faster, the better these eigenvalues are separated from therest of the spectrum.

ARPACK has driver routines for the generalized eigenvalue problem and usesshift-and-invert.


Rational Krylov Method

Rational Krylov subspace methodIf one is interested in eigenvalues in a large interval [α, β] ⊂ σ(A) or anextended region of the complex plane then one can apply the shift-and-invertmethod in several runs with several parameters σj ∈ [α, β].

The cost can be reduced considerably if the eigenproblem is projected to arational Krylov space

v , (A− σ1B)−1Bv , . . . , (A− σ1B)−i1Bv , (A− σ2B)−1Bv , . . . ,(A− σ2B)−i2Bv , . . . , (A− σk B)−1Bv , . . . , (A− σk B)−ik Bv

and the projected eigenproblem is solved.In the Arnoldi method the eigenproblem is projected onto a subspace of theform

Vm = ψ((A− σB)−1B)v : ψ ∈ Πm−1, ψ(0) = 1.For the rational Krylov method the subspace can be written as

V = ρ(A)v : ρ a suitable rational function, ρ(0) = 1.It can be shown (Ruhe 1998) that the rational Krylov method can beinterpreted as a shift-and-invert Lanczos method with shift σk and a modifiedinitial vector v1.



Rational Krylov subspace method ct.Rational Krylov starts as shift-and-invert Arnoldi method with shift σ1 andinitial vector v1, and determines an Arnoldi recursion

(A− σ1B)−1BVm = Vm+1Hm+1,m. (1)

If m is big enough then accurate approximations to eigenvalues in the vicinityof σ1 are obtained from extreme eigenvalues of Hm = Hm+1,m(1 : m,1 : m).

To obtain further eigenvalue approximations we choose a new shift σ2 andcontinue the Arnoldi process without throwing away the information gatheredin the basis of the Krylov space K := Km((A− σ1B)−1B, v1).

This is indeed possible if we are able to determine an Arnoldi recursion

(A− σ2B)−1BWm = Wm+1Hm+1,m. (2)

corresponding to the shift σ2 and the initial vector w1 such that Hm+1,m has thesame trapezoidal form as Hm+1,m (i.e. hij = 0 for i > j + 1), andspan(Vm+1) = span(Wm+1).



Rational Krylov subspace method ct.

Rewrite the recursion (1) as

BVm = (A− σ1B)Vm+1Hm+1,m,

which is equivalent to

(σ1 − σ2)BVm+1Hm+1,m + BVm = (A− σ2B)Vm+1Hm+1,m,

and to

BVm+1(Im+1,m + (σ1 − σ2)Hm+1,m) = (A− σ2B)Vm+1Hm+1,m,

where the matrix

Km+1,m := Im+1,m + (σ1 − σ2)Hm+1,m

is trapezoidal of the same form as Hm+1,m.




From(A− σ2B)−1BVm+1Km+1,m = Vm+1Hm+1,m

we obtain the desired Arnoldi recursion if we get rid of the factor Km+1,m onthe left.

If

Km+1,m = Qm+1

[Rm0

]

denotes the QR factorization of Km+1,m, then Rm is regular (otherwise asubdiagonal element of Hm+1,m would have been 0, and the Arnoldi processwould have stopped with an invariant subspace), and it follows

(A− σ2B)−1BVm+1Qm+1

[Rm0

]= Vm+1Hm+1,m,

and multiplication by R−1m from the right yields




(A− σ2B)−1BVm+1Qm+1,m = Vm+1Qm+1QHm+1Hm+1,mR−1

m . (3)

Hence, with the orthogonal basis Vm+1Qm+1 of the Krylov space K theprojection of (A− σ2B)−1B is represented by the full matrix

Lm+1,m = QHm+1Hm+1,mR−1

m

which can be transformed to trapezoidal form by applying Householdermatrices from bottom upwards

Lm+1,m =

[Pm 00 1

]Hm+1,mPH

m .

Multiplying equation (3) by Pm from the right one gets




(A− σ2B)−1BVm+1Qm+1,mPm = Vm+1Qm+1

[Pm 00 1

]Hm+1,m,

i.e. an Arnoldi recursion

(A− σ2B)−1BWm = Wm+1Hm+1,m

with the new shift σ2, the new orthogonal basis

Wm+1 = Vm+1Qm+1

[Pm 00 1

],

and the upper Hessenberg matrix Hm+1,m.




Notice that all transformations are done without performing operations withthe large matrices A and B, and that it can even be avoided to form the matrixW explicitly, thus avoiding all work on large vectors.

In practical implementations the rational Krylov method is combined withlocking, purging, and implicit restarts.

Implicitly restarted shift-and-invert Arnoldi with shift σ1 is run until anappropriate number of eigenvalues around σ1 have converged. Then theseeigenvalues are locked, and eigenvalues outside the interesting region arepurged leaving an Arnoldi recursion of dimension m. Then a new shift σ2 isintroduced, Wm+1 and Hm+1,m are determined, and the implicitly restartedshift-and-invert Arnoldi process with shift σ2 is continued without touching thelocked Ritz pairs. The same procedure is repeated until all interestingeigenvalues have converged.


Miscellaneous Remarks

Block Arnoldi method

Block methods are used mainly for reliably determining multiple and/orclustered eigenvalues.

Let A be a matrix of order n, and let b be th block size.

AV[m] = V[m]H[m] + FmEm (1)

is a block Arnoldi reduction of order m if V H[m]AV[m] = H[m] is a band upper

Hessenberg matrix of order mb, V H[m]V[m] = Imb, V H

[m]Fm = 0, andEm = [O, . . . ,O, Ib] ∈ Rmb×b.

Here, a band upper Hessenberg matrix is an upper triangular matrix with bsubdiagonals. The columns of V[m] form an orthonormal basis of the blockKrylov space

Km(A,V1) := spanV1,AV1, . . . ,Am−1V1.



Extending a block Arnoldi reduction

1: compute the QR factorization Vm+1Hm+1,m = Fm using iterated classicalGram Schmidt process

2: V[m+1] =[V[m],Vm+1

]

3: W = AVm+14: Hm+1,m+1 = V H

m+1W

5: H[m+1] =

[H[m] V H

[m]WHm+1,mEm Hm+1,m+1

]

6: Fm+1 = W − V[m+1]

[V H

[m]WHm+1,m+1

]



Comments

1. The classical Gram Schmidt method can be performed using the BLAS2matrix-vector multiplication subroutine.

Moreover, this scheme gives a simple way to fill out a rank deficient Fm. Forinstance, if a third step of orthogonalization is needed when computingcolumn j of Vm+1 then the corresponding column of Fm depends linearly onthe previous columns of Vm+1. The j-th diagonal element of Hm+1,m then is setot zero, and random vector is orthogonalized against V[m] and the first j − 1columns of Vm+1.

2. Application of A to a group of vectors might prove essential if accessing Ais expensive.

3. Allows to use BLAS3 matrix-matrix multiplication subroutine



Block implicitly restarted Arnoldi method

1: for i = 1,2, . . . until convergence do2: extend the length r block Arnoldi reduction by p blocks:

AV[r+p] = V[r+p]H[r+p] + Fr+pEr+p

3: check whether the k wanted Ritz values are sufficiently accurate4: lock Ritz values that satisfy the convergence tolerance5: implicitly restart with p shifts and retain a length r block Arnoldi

reduction6: end for



Band Lanczos method (Ruhe (1979), Freund (1997))

The Lanczos algorithm is not able to detect the multiplicity of a multipleeigenvalue and to compute a basis of the corresponding eigenspace. It hasproblems with clusters of eigenvalues.

Way out: Use several initial vectors w1, . . . ,wp and consider the block Krylovsequence

w1, . . . ,wp,Aw1, . . . ,Awp, . . . ,Am−1w1, . . . ,Am−1wp, . . . , (1)

construct an orthonormal basis v1, . . . , vj of the space which is spanned bythe first j linear independent vectors of the sequence (1), and solve theprojection of the eigenproblem to this subspace.



Band Lanczos method ct.

For the Lanczos method vm+1 = 0, vj 6= 0, j = 1, . . . ,m, means that Km(A, v1)within the sequence Kj (A, v1), j = 1,2, . . . , is the first invariant subspace of A.Then all eigenvalues of Tm are eigenvalues of A as well, and the Ritz vectorsare eigenvectors of A.

For p > 1 the occurrence of the first linear dependent vector does not meanthat an invariant subspace has been detected but only that some newlyobtained vector Ak vi can be combined linearly by the previous vectors andhence does not contain new information. The same holds for the subsequentvectors Ak+1vi , Ak+2vi , . . . . Therefore, the vector Ak vi is removed from thefurther construction. Particularly inconvenient is the fact that the lineardependence has to be detected in finite precision.

The projected problem is a banded matrix with bandwidth 2p + 1, where theeffective bandwidth diminishes by 2 for every deflation.



Lanczos method for non-Hermitean problems

Disadvantages of the Arnoldi method are high arithmetic cost(orthonormalization with respect to all previous basis elements andreorthogonalization) and high (unpredictable) storage requirements.

The arithmetic cost can be reduced essentially by using oblique projectionsand a two sided Gram–Schmidt procedure. To this end two Krylov sequencesp1,p2, . . . and q1,q2, . . . are constructed according to

βj+1qj+1 = Aqj − αjqj − γjqj−1

γj+1pj+1 = AHpj − αjpj − βjpj−1.

q1, . . . ,qm is a(non orthogonal) basis of Km(A,q1) and p1, . . . ,pm is a (nonorthogonal) basis of Km(AH ,p1). The bases are biorthogonal, i.e. pH

j qk = δkl .



Lanczos method for non-Hermitan problems ct.

With Qm = [q1, . . . ,qm], Pm = [p1, . . . ,pm], Tm = tridiagβj , αj , γj+1 a compactform of the two-sided Lanczos method reads

AQm = QmTm + βm+1qm+1eTm

AHPm = PmT Hm + γm+1pm+1eT

m

PHmQm = Im, PH

mqm+1 = 0, pHm+1Qm = 0

Tm = PHmAQm

The Ritz values θ(m)i of

Tmz(m)i = θ

(m)i z(m)

i and T Hm w (m)

i = θ(m)i w (m)

i

are approximations to eigenvalues of A, and the right Ritz vectorsx (m)

i = Qmz(m)i and left Ritz vectors y (m)

i = Pmw (m)i are approximations to

corresponding right and left eigenvectors.



Two-sided Lanczos method

1: Choose q1,p1 with (q1)T p1 6= 0; q0 = 0; p0 = 0; β0 = 0; γ0 = 02: for m = 1,2, . . . do3: q = Aqm; p = AT pm;4: αm = qT pm;5: q = q − αmqm − βm−1qm−1

6: p = p − αmpm − γm−1pm−1

7: γm = ‖q‖28: if γm == 0 then9: STOP

10: end if11: qm+1 = q/γm12: βm = pT qm+1

13: if βm == 0 then14: STOP15: end if16: pm+1 = p/βm17: end for



Lanczos method for non-Hermitean problems ct.

The residuals

r (m)i = Ax (m)

i − θ(m)i x (m)

i = βm+1qm+1eTmx (m)

i

s(m)i = AHy (m)

i − θ(m)i y (m)

i = γm+1pm+1eTmy (m)

i

and their norms again can be determined without computing the Ritz vectors.

If γm+1 = 0 or ‖p‖ = 0, then the algorithm terminates which is called luckybreakdown. In this case Km(A,q1) is an invariant subspace of A or Km(AH ,p1)

is an invariant subspace of AH , the Ritz values θ(m)i are eigenvalues of A, and

the right and left Ritz vectors are right and left eigenvectors of A, respectively.



Lanczos method for non-Hermitean problems ct.

The method also terminates if βm = pHm+1qm+1 = 0 for some m which is called

an (essential breakdown). In this case one very often can construct vectorsqm+1, . . . ,qm+k ∈ Km+k (A,q1) and pm+1, . . . ,qm+1 ∈ Km+k (AH ,p1) for somek > 1 such that pH

j qi = 0 and pHi qj = 0 for j = 1, . . . ,m and

i = m + 1, . . . ,m + k , and it is possible to construct bi-orthogonal bases ofsubsequent Krylov spaces by the Lanczos algorithm. The obliquely projectedeigenvalue problem then has tridiagonal form which is perturbed by a fullblock in columns and rows m, . . . ,m + k .

This construction is called look ahead Lanczos method. It is implemented inthe (public domain) software QMRPACK by Freund and Nachtigal (1996). Ifthe construction (in very rare cases) is not possible then the breakdown iscalled incurable.



Inexact Arnoldi method (Simoncini 2005)

Assume that A can not be applied exactly, but at each iteration the operationy = Av is replaced by the inexact evaluation

y = Av + f

where f can change in each iteration and ‖f‖ can be monitored.

For example, let A = (B − σI)−1 be a shift-and-invert operator, and in the k thstep of the Arnoldi method solve

(B − σI)y = v

by a (preconditioned) Krylov method iterating until

‖f‖ = ‖(B − σI)y − v‖ ≤ εk

for a given εk .



Inexact Arnoldi method ct.Aim: Given ε > 0, determine εk > 0, k = 1,2, . . . ,m such that for an eigenpair(θ,u) of Hm it holds

‖(AVmu − θVmu)− rm‖ ≤ ε (at least asymptotically)

whererm = hm+1,mvm+1eT

mu

is the computet residual of the Ritz pair (θ,Vmu).

Suggestion of Simoncini justified by a THEOREM which contains unavailableterms: Assume that a maximum of m inexact Arnoldi steps are to be carriedout.Require that

‖fk‖ ≤

1mε for k = 1minα,δk−1

2m‖rk−1‖ ε for k > 1

whereδk−1 := min

θ∈Λ(Hk−1)\θ(k−1)|θk−1 − θ|,

α is an estimate of ‖A‖ and rk−1 is the computed residual in step k − 1.Heinrich Voss (Hamburg University of Technology) Krylov subspace methods Eigenvalue problems 2012 95 / 95

Lanczos Method Krylov Subspace Methods - TUHH · CHAPTER 2 : KRYLOV SUBSPACE METHODS Heinrich Voss...

Documents

Transcript of Lanczos Method Krylov Subspace Methods - TUHH · CHAPTER 2 : KRYLOV SUBSPACE METHODS Heinrich Voss...