Krylov space methods

20
Krylov space methods Name/ Surname: Dionysios Zelios Email: [email protected] Course: Computational Physics (FK8002)

description

We investigate the Krylov space methods for diagonalization of a matrix. The Krylov space methods transform the original matrix to one of much lower order. This lower order matrix can then easily be diagonalized. We write the Arnoldi and Lanczos algorithm in Matlab and then we apply these methods to harmonic oscillator Hamiltonian with an extra potential term added.

Transcript of Krylov space methods

Page 1: Krylov space methods

‘ Krylov space methods ’

Name/ Surname: Dionysios Zelios Email: [email protected] Course: Computational Physics (FK8002)

Page 2: Krylov space methods

CONTENTS

Description of the problem Introduction i. Arnoldi algorithm ii. Lanczos algorithm iii. Time evolution of our system

Results

i. Arnoldi

ii. Lanczos

iii. Time comparison between the algorithms

iv. Time evolution of our system

References

Page 3: Krylov space methods

Description of the problem

To begin with, we will investigate the Krylov space methods for diagonalization of a matrix.

The Krylov space methods transform the original matrix to one of much lower order. This

lower order matrix can then easily be diagonalized. We will write the Arnoldi and Lanczos

algorithm in Matlab and then we will apply those methods to harmonic oscillator

Hamiltonian with an extra potential term added (this problem has already been solved in

assignment 4 with the shifted inverse power method).

Moreover, we will use the Krylov space methods to calculate the exponent of the

Hamiltonian matrix and hereby to construct the time evolution of the aforementioned

quantum mechanical system.

Introduction An intuitive method for finding an eigenvalue (specifically the largest eigenvalue) of a given m × m matrix is the power iteration. Starting with an initial random vector x, this

method calculates 2 3, , ...Ax A x A x iteratively storing and normalizing the result into x on

every turn. This sequence converges to the eigenvector corresponding to the largest

eigenvalue, 1 .

However, much potentially useful computation is wasted by using only the final

result, 1n x . This suggests that instead, we form the so-called Krylov matrix:

2 3 1( , , , ..., )n

nK x Ax A x A x A x

The columns of this matrix are not orthogonal, but in principle, we can extract an orthogonal basis, via a method such as Gram–Schmidt orthogonalization. The resulting

vectors are a basis of the Krylov subspace, nK .

We may expect the vectors of this basis to give good approximations of the eigenvectors

corresponding to the largest eigenvalues, for the same reason that 1n x approximates the dominant eigenvector.

The process described above is intuitive. Unfortunately, it is also unstable. This is where

the Arnoldi iteration enters.

Page 4: Krylov space methods

Arnoldi algorithm Iterative algorithms compute a sequence of vectors that hopefully converges to an

eigenvector. The most basic iteration is the power method, where the 0x is a starting

guess and then a sequence kx is computed by:

1k kx Ax (1)

After many iterations kx will tend to an eigenvector, corresponding to the eigenvalue

1 that is largest in absolute value, provided there is only one such eigenvalue.

We will get interesting algorithms, if we save all vectors in the sequence (1), and get the Krylov subspace:

2 1

0 0 0 0 0( , ) { , , ,..., }kK A x x Ax A x A x

(2)

(Here brackets mean linear space of the columns given)

Then, we can write: 1

n n nK AK C , where nC is an upper Hessenberg matrix . (An upper

Hessenberg matrix is a matrix where its elements obey the rule that : 0ija , for 1i j ).

In order to obtain better conditioned basis for span ( nK ), we compute the QR

factorization:

n n nQ R K , so that 1H

n n n n nQ AQ R C R H , where H is an upper Hessenberg matrix.

Equating thk columns on each side of equation n nAQ Q H , we have the recurrence

relation:

1 1 1, 1...k k kk k k k kAq h q h q h q relating 1kq to the preceding vectors 1.... kq q

Continuing, we premultiply by H

jq and using orthonormality, we have: H

jk j kh q Aq ,

1....j k

These relationships yield Arnoldi iteration, which produces unitary matrix nQ and upper

Hessenberg matrix nH , using only matrix-vector multiplication by A and inner products of

vectors. Below we present a flow chart of the steps that we have followed in order to create the Arnoldi algorithm.

Page 5: Krylov space methods

If 1[ ... ]k kQ q q , then H

k k kH Q AQ is an upper Hessenberg matrix. The eigenvalues of kH

are called Ritz values and they are approximate eigenvalues of the matrix A. Ritz vectors

are given by kQ y where y is an eigenvector of the matrix kH and they are the

approximate eigenvectors of matrix A. Eigenvectors of kH must be computed by another

method such as QR iteration (in our project, we have used the build-in Matlab command ‘eig’ which gives eigenvalues from a Schur decomposition).

It is often observed in practice that some of the Ritz eigenvalues converge to eigenvalues

of A. Since Hn is n-by-n, it has at most n eigenvalues, and not all eigenvalues of A can be

approximated. Typically, the Ritz eigenvalues converge to the extreme eigenvalues of A.

This can be related to the characterization of Hn as the matrix whose characteristic

polynomial minimizes ||p(A)q1|| in the following way. A good way to get p(A) small is to

choose the polynomial p such that p(x) is small whenever x is an eigenvalue of A. Hence,

the zeros of p (and thus the Ritz eigenvalues) will be close to the eigenvalues of A.

However, the details are not fully understood yet. This is in contrast to the case where A is symmetric. In that situation, the Arnoldi iteration becomes the Lanczos iteration, for which the theory is more complete.

1. Start with q1=x/||x||2, where x is an arbitraty

non-zero starting vector

For k=1,2...

2. uk=A*qk

3. For j=1...k

(1) hj,k=qj,H*uk

(2) uk=uk-qj*hj,k

4. If hk+1,k=||uk||2 then stop!

5. qk+1=uk/hk+1,k

Page 6: Krylov space methods

Arnoldi iteration is fairly expensive in work and storage because each new vector kq must

be orthogonalized against all previous columns of kQ and all must be stored for that

purpose. Ritz values and vectors are often good approximations to eigenvalues and eigenvectors of A after relatively few iterations (20-50).

Lanczos algorithm In order to decrease the work and storage dramatically, we use the Lanczos algorithm. If a

matrix is symmetric or Hermitian recurrence then it has only three terms and kH is

tridiagonal (so usually denoted kT ). Below we present a flow chart of the steps that we have

followed in order to create the Lanczos algorithm.

1. q0=0

b0=0

x0=arbitrary non zero starting vector

q1=x0/||x0||2

2. for k=1,2....

uk=A*qk

ak=qkH*uk

uk=uk-bk-1qk-1-akqk

3. bk=||uk||2

4. If bk=0 then stop!

5. qk+1=uk/bk

Page 7: Krylov space methods

ka and kb are diagonal and subdiagonal entries of the symmetric tridiagonal matrix kT .

As with Arnoldi, Lanczos iteration does not produce eigenvalues and eigenvectors directly, but only

the tridiagonal matrix kT , whose eigenvalues and eigenvectors must be computed by another

method to obtain the Ritz values and vectors. If 0kb , then the algorithm appears to break

down but in that case invariant subspace has already been identified (i.e. eigenvalues and eigenvectors are already exact at that point).

In principle, if Lanczos algorithm run until k=n, the resulting tridiagonal matrix would be orthogonally similar to matrix A. In practice, it was proved by Christopher Paige in his thesis 1970, that loss of orthogonality happens precisely when the first eigenvalue converges. As the calculations are performed in floating point arithmetic where inaccuracy is inevitable, the orthogonality is quickly lost and in some cases the new vector could even be linearly dependent on the set that is already constructed. As a result, some of the eigenvalues of the resultant tridiagonal matrix may not be approximations to the original matrix. Therefore, the Lanczos algorithm is not very stable. This problem can be overcome by reorthogonalizing vectors as needed, but expense can be substantial. Alternatively, we can ignore the problem, in which case the algorithm still produces good eigenvalue approximations but multiple copies of some eigenvalues may be generated.

Time evolution of our system If we know the wave function of a system at certain time t we can find it at a later time with the help of the time-dependent Schrödinger equation:

it

(1)

as

( )

( ) e ( )

t t

t

iH t dt

t t t

(2)

The so called time-propagation operator can, to first order in Δt be approximated as:

( )( ) ( )

e e

t t

t

iiH t dt

H t t

(3) Still we have though the operator, H(t), in the exponent. If we have a complete, but still finite, set of solutions to H(t) (for a specific time t):

( ) | |iH t i i

(4)

Page 8: Krylov space methods

we can use this to effectively take the exponent of H(t) as:

( ) ( ) ( ) ( )

e e | |i i

H t t H t t

i

i i

(5)

Starting for example in the ground state of the harmonic oscillator (at 0t ) , 0 0( )t we can

consider an additional potential V(t) (such that V(t)=0 for 0t t . The time dependent

Hamiltonian is thus:

( ) ( ) ( , )H t H x V x t (6)

where H is the time independent harmonic oscillator Hamiltonian. For V(x,t) one can

take:

( , ) sin( * )* ( )V x t t V x , 0 t

(7)

i.e. V(x,t) is non-zero only between t=0 and t

. The term V(x) can be the bump in

Assignment 4. We use then a time grid and we get:

1( ( ) ( ))

1 1 1( ) | | ( )i n

iE t t

n n n n

i

t e i i t

(8)

The time grid has to be chosen with small enough steps to capture the dynamics. To get

the set 1| ni and its eigenvalues one may of course diagonalize the full 1( )nH t matrix. It has to

be done in every step. A more efficient way is to use the Kyrlov-space obtained by working with

1( )nH t on n ,

i.e. 2 3

1 1 1, ( ) , ( ) , ( ) ...n n n n n n nH t H t H t

and use the Lanczos algorithm to get the set 1| ni and its eigenvalues. We still have to do

this in every time-step, but now the matrix is just of the size of the Krylov space. Since we

use the solution at the previous time-step to construct the space, we can hope that (the small) set

we obtain is adequate: that we emphasize the part of the full space spanned by 1( )nH t that is

important for the time-evolution of n and neglect less important parts.

Page 9: Krylov space methods

Results To begin with, we apply the Arnoldi & Lanczos algorithms, to the harmonic oscillator Hamiltonian with an extra potential term added:

2 2

2 2

2

1( ) ( )

2 2H m x V x H V x

m x

(1)

We will start with a simple form of the extra potential: 2

2

1( ) x CV x C e (2)

where 1 20C and 2 0.5C are numerical constants.

The extra potential is thus a ‘bump’ in the middle of the harmonic oscillator potential. Below, we present the graph of the potential:

Above, we have used a grid in the interval [-7,7] and a stepsize h=0,1.

Page 10: Krylov space methods

In order to apply Arnoldi & Lanczos algorithms in this problem, we get an initial vector q, which takes random integer values from [1,10] and its size is determined by the size of our initial Hamiltonian matrix. In this case, that means that q has 141 rows and 1 column. Using the flow chart of the Arnoldi algorithm that we have described above, we get two

matrices, 1[ ... ]k kQ q q and H

k k kH Q AQ which is an upper Hessenberg matrix.

The eigenvalues of kH are computed by the build-in Matlab command ‘eig’. These are the

Ritz values which are actually approximations of the eigenvalues of our initial Hamiltonian

matrix. We note that when we calculate the matrix kH , we have 142 rows and 141

columns. In order to use the ‘eig’ command, we should have a square matrix, hence we remove the last row. Moreover, in order to calculate the corresponding eigenvectors, we multiply the matrix

kQ with the corresponding eigenvector of matrix kH .

Arnoldi For 40 iterations, we get a good approximation of the first eigenvalue, 5.1822 . Hence, below we present the first solution for our potential and we compare it with the one given by the build-in Matlab command ‘eig’:

Page 11: Krylov space methods

We notice that even though we get a good approximation for the first eigenvalue, in the middle spectrum the eigenvalue approximations are not so good. This can partly be explained by the fact that we get only 41 eigenvalues instead of 141. In order to have a better view of our result, we present in the following table some of the eigenvalues that we got and we find in which eigenvalue of the build-in Matlab command ‘eig’ each one corresponds to.

# state Eigenvalues Arnoldi # state ‘Eig’

Eigenvalues ‘Eig’

1 5.1822 1 5.1822

2 7.4789 3 7.4303

3 9.5476 5 9.5412

4 13.3676 9 13.4966

5 16.0584 12 15.3817

6 18.5041 15 18.7751

15 85.7669 55 84.5857

19 120.8626 68 119.9923

28 212.4741 99 211.6348

41 224.4503 104 224.8900

Hence, we can conclude that we take very good approximations in the initial and in the last positions. This can be clearer with the following graph which shows the eigenvalues that we have calculated with the Arnoldi algorithm and those generated by the build in Matlab command ‘Eig’.

Page 12: Krylov space methods

Lanczos

In our next step, we follow the same procedure as described above for the Lanczos method. We start working with 40 iterations. For the first eigenvalue, we take as a result 5.1822 (as expected) and the corresponding eigenvector is presented below:

Page 13: Krylov space methods

In the following table, we compare the values that we got from our algorithm with those of the build-in Matlab command ‘eig’, as we did before:

# state Eigenvalues Lanczos

# state ‘Eig’

Eigenvalues ‘Eig’

1 5.1822 1 5.1822

2 7.4350 3 7.4303

3 9.5445 6 9.5417

4 13.4689 9 13.4966

10 39.0985 33 38.9009

15 89.9717 57 89.7022

20 139.6471 75 140.6537

30 238.1887 109 237.0860

40 284.1597 141 284.1615

Page 14: Krylov space methods

We notice that we get very good approximations of the eigenvalues for the first 3-4 eigenvalues and also for the last. Hence, we can conclude that when we use this algorithm in a symmetric matrix, we can obtain very good approximations of a few lower eigenvalues and a lot of higher eigenvalues. Moreover, the more iterations we let our algorithm to run, the better results for the lowest and highest eigenvalues we obtain. We chose to let our algorithm to run only for 40 iterations since we noticed that we take satisfactory results with this number of iterations. Below we present a table where we compare the eigenvalues for different number of iterations:

# eigenvalues

Expected value

40 iter.

50 iter.

60 iter.

70 iter.

1 5.1822 5.1822 5.1822 5.1822 5.1822

9 13.4966 13.4689 13.5078 13.4971 13.4967

32 37.3456 39.0985 33.7400 37.0993 36.2141

73 134.6757 139.6471 131.8173 132.9627 135.4785

107 232.3495 229.2186 233.5147 231.1329 235.1561

141 284.1615 284.1597 284.1615 284.1615 284.1615

We notice that letting our algorithm to run over 50 times, we lose some accuracy in the eigenvalues that we found, but on the other hand we take more results. Furthermore, we see that for the lowest and highest eigenvalues, our algorithm converges very fast to the desired eigenvalue. The main problem in convergence is detected in the middle spectrum. If eigenvalues are needed in the middle of spectrum, let’s say near s, then the algorithm can be applied to matrix | |A sI . In this way, we obtain the eigenvalue near s point.

Furthermore, we present some graphs in which we vary the number of iterations and check if the eigenvalues that we have found from the Lanczos algorithm coincide with the ones expected from the build-in Matlab function ‘eig’.

Page 15: Krylov space methods
Page 16: Krylov space methods
Page 17: Krylov space methods

From the graphs above, we verify that our algorithm (Lanczos) converges very fast to the first and last few eigenvalues. The more iterations we let our program to run, the more eigenvalues we get and as a result the eigenvalues in the middle spectrum start to coincide with the expected values, in a slower pace.

Page 18: Krylov space methods

Time comparison between the algorithms In assignment 3, we have used the ‘inverse power with shift’ routine in order to calculate the eigenvalue and the corresponding eigenvector of the Hermitian matrix. Now, we will compare the ‘inverse power iteration with shift’, Arnoldi and Lanczos algorithms as far as the time that each one of those needs, in order to calculate eigenvalues and eigenvectors.

Method

Size matrix

Shifted Inverse Power iteration Time(sec)

Arnoldi Time(sec)

Lanczos Time(sec)

141x141

0.0137

0.0633

0.0158

467x467

0.1210

0.0972

0.0769

701x701

0.1411

0.0951

0.0651

1401x1401

0.8758

0.3097

0.2138

2801x2801

6.7129

1.1123

1.0213

4667x4776

32.4801

14.0167

2.6940

As we can see, the Lanczos algorithm is the fastest method to calculate eigenvalues and its corresponding eigenvectors. This is clearer for large matrices. We note that in the above table, we have used 40 iterations for the Arnoldi algorithm, 40 iterations for the Lanczos algorithm and 4 iterations for the Inverse power iteration with shift. Moreover, using the shifted inverse power iteration, we have only calculated the first eigenvalue and its corresponding eigenvector in the time shown in the table. On the other hand, using Arnoldi and Lanczos method, we have calculated 40 eigenvalues and its corresponding eigenvectors. This is the reason that for a small matrix (141*141) we get that the ‘inverse power iteration with shift’ is faster than the other two methods. Hence, we conclude that Arnoldi & Lanczos methods are much more efficient than the above table shows.

Page 19: Krylov space methods

Time evolution of our system

Our first step is to create the new potential. Since we already have the potential from the

time independent harmonic oscillator Hamiltonian and we also know the potential with

the bump from assignment 4, we conclude that our potential will be given by the formula:

222

1( , ) 0.5* sin( * )*( )c xV x t x t c e , 02

t

where 1 20c , 2

1

2c , 2

The time grid has to be chosen with small enough steps to capture the dynamics. Hence,

initially we split the time interval [0, π/2] in 141 points (time-step 0.01). For the x-axis grid,

we have used the interval [-7,7] and a stepsize h=0,1. As a result, we took a matrix that

each row shows how each point of the potential changes with time.

In our next step, we calculate the eigenvalues and eigenvectors of our Hamiltonian matrix, so we

get two matrices with dimensions (141,141). We store in a matrix called ( )nt the matrix of the

eigenvectors. Hence, in order to get the next set (of eigenvectors and eigenvalues) , we diagonalize

the full Hermitian matrix again for the next time value. Doing this iteration for the whole time grid

and using the equation:

1( ( ) ( ))

1 1 1( ) | | ( )i n

iE t t

n n n n

i

t e i i t

where 1 | ( )n ni t is the inner product of currently calculated eigenvectors and the

eigenfunction of the previous step, we calculate the time evolution of our problem.

Page 20: Krylov space methods

References

1) Topics in numerical linear algebra

Axel Ruhe

2) Lecture notes, computational physics course (FK8002)

Eva Lindroth

3) http://en.wikipedia.org/wiki/Lanczos_algorithm

4) http://en.wikipedia.org/wiki/Arnoldi_iteration