TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers...

41
1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF February 27. 2017

Transcript of TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers...

Page 1: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

1

Discussion on Direct SolversTMA4280—Introduction to Supercomputing

Based on 2016v slides by Eivind Fonn

NTNU, IMF

February 27. 2017

Page 2: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

The problem

— We want to solve

Ax = b, b,x ∈ RN , A ∈ RN×N

where A is the system resulting from discretiziing a Poisson problemusing finite differences (see the previous slides).

— We use standard notation for matrices and vectors, i.e.a1,1 a1,2 · · · a1,Na2,1 a2,2 · · · a2,N

......

. . ....

aN,1 aN,2 · · · aN,N

x1x2...

xN

=

b1b2...

bN

2

Page 3: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Computer algorithms

— We know that the solution is given by

x = A−1b.

— Explicitly forming the inverse A−1 is expensive, prone to round-offerrors, and something we seldom do on computers.

— Exception: Very small and frequently re-used sub-problems that areknown to be well conditioned.

— Matrix inversion has the same time complexity as matrixmultiplication (typically O(n3)).

— Instead, we implement algorithms that solve a linear system given aspecific right-hand-side vector.

— The structure and properties of the matrix A determine whichalgorithms we can use.

3

Page 4: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Computer algorithms

— Trivial example: if A is known to be orthogonal, then

x = Aᵀb.

— Orthogonal matrices are the exception, and not the rule.— We are more likely to find and exploit properties such as

• Symmetry• Definiteness• Sparsity• Bandedness

— We will now consider a number of different algorithms, theirimplementation and usability in a parallel context.

4

Page 5: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— In introductory linear algebra we learned two ways to invert generalmatrices.

— First is Cramer’s rule. The solution to a linear system

Ax = b

can be found by repeated application of

xi =det Ai

det A

where Ai is A with the i th column replaced with b.— Naive implementations scale as N!, which makes Cramer’s rule

useless.

5

Page 6: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— Instead we tend to resort to Gaussian elimination.— A systematic procedure that allows us to transform the matrix A into

triangular form, which allows for easy inversion using backwardsubstitution.

× × × × ×× × × ×× × ×× ××

— The last equation is trivial. Solve that, plug the value in the

next-to-last equation, which makes that trivial, etc.

6

Page 7: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— The equations are on the form

a1,1x1 + a1,2x2 + . . .+ a1,NxN = b1

a2,1x1 + a2,2x2 + . . .+ a2,NxN = b2

... =...

— We want to get rid of the term a2,1x1. We do this by applying therow-operation (row 2)− a2,1/a1,1 (row 1).

— This yields in the second row

0 +

(a2,2 −

a2,1

a1,1a1,2

)x2 + . . .+

(a2,N −

a2,1

a1,1a1,N

)xN = b2 −

a2,1

a1,1b1

7

Page 8: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— We repeat this for all rows and all columns beneath the diagonal.— Note: we need ak,k 6= 0 when eliminating column k . If that’s not the

case, we have to exchange two rows. This is called pivoting.— To limit cancellation errors due to limited precision, we also want

ak,k 0. Therefore, choose the row with the largest element. This iscalled partial pivoting. (Full pivoting also interchanges columns.)

— This is a procedure with relatively simple rules, which makes itsuitable for implementation.

8

Page 9: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

There are two problems with Gaussian elimination.— It modifies the right-hand-side vector b. If we want to solve the same

linear system with a different b we have to redo the whole procedure.— It is still prone to round-off errors, even with partial pivoting.— Therefore, the typical implementation of Gaussian eliminiation is in

terms of the LU decomposition: we seek two matrices L and U, lowerand upper triangular, such that A = LU.

— This decomposition is independent of b.

9

Page 10: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— We can then find the solution to a system

Ax = LUx = b

in this way.— First, solve Lv = b for v . (Forward substitution.)— Then, solve Ux = v for x . (Backward substitution.)— Forward substitution works the same way as backward substitution.

Just backward. . . in other words, forward.

10

Page 11: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

General matrices (the hammer)

— The LU decomposition has redundancy: there are two sets ofdiagonal elements. There are two popular implementations:Doolittle’s method (unit diagonal on L) and Crout’s method (unitdiagonal on U).

— Doolittle’s algorithm can be stated briefly as

u1,k = a1,k k = 1, . . . ,N

`j,1 =aj,1

u1,1j = 2, . . . ,N

uj,k = aj,k −j−1∑s=1

`j,sus,k k = j , . . . ,N

`j,k =1

uk,k

(aj,k −

k−1∑s=1

`j,sus,k

)j = k + 1, . . . ,N

11

Page 12: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

LAPACK

— LU decomposition is somewhat tedious to implement, particularly inan efficient way.

— Smart people have done it for you.— LAPACK: The Linear Algebra Pack, which builds on BLAS.— Same naming conventions and matrix formats.— Just like with BLAS and CBLAS there is a LAPACK and a CLAPACK

(and LAPACKE).— Here we will stick to LAPACK (Fortran numbering).

12

Page 13: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

LAPACK

Function prototype:

void dgesv(const int *n, const int *rhs ,

double *A, const int *lda , int *ipiv ,

double *B, const int *ldb , int *info);

Usage:

void lusolve(Matrix A, Vector x)

int *ipiv = malloc(x->len * sizeof(int));

int one = 1, info;

dgesv (&x->len , &one , A->data[0], &x->len ,

ipiv , x->data , &x->len , &info);

free(ipiv);

13

Page 14: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

LAPACK

— dgesv overwrites the matrix A with the factorization L,U.— dgesv actually calls two functions:

• dgetrf to compute the decomposition,• dgetrs to solve the system.

— Therefore we can solve for different right-hand-sides after the firstcall, by calling dgetrs ourselves.

— It would be a mistake to call dgesv more than once!

14

Page 15: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Improving performance

— While LAPACK is efficient, it cannot do wonders.— LU decomposition has the same time complexity as matrix

multiplication: O(N3). It is asymptotically as bad as matrix inversion(although in realistic terms much better).

— Even if A is sparse, in general L and U aren’t. (Although for bandedmatrices they actually are.)

15

Page 16: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Symmetry

— If A is symmetric and positive definite, we find that U = Lᵀ.— Thus we can save a factor of two in memory and in floating point

operations.— No pivoting is required for such systems.— This is termed Cholesky factorization.— Note that all conditions are satisfied by the Poisson matrices.

16

Page 17: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

LAPACK

Function prototype:

void dposv(char *uplo , const int *n,

const int *nrhs , double *A,

const int *lda , double *B,

const int *ldb , int *info);

Usage:

void llsolve(Matrix A, Vector x)

int one = 1, info;

char uplo = 'L';

dposv (&uplo , &x->len , &one , A->data[0],

&x->len , x->data , &x->len , &info);

17

Page 18: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Improving performance

— LU decomposition is unfit for parallel implementation• It is sequential in nature (you must eliminate for row 2 before row 3).• Pivoting requires synchronization for every row.• The substitution phase is completely sequential (but this is not so

important, since it is O(N2)).• Smart people have tried to make it work by identifying independent

blocks in the matrix.

— For most systems the results are mediocre and with limitedscalability.

— ⇒ LU decomposition is unusable in a parallel context.— However, it is useful as a component in other algorithms.

18

Page 19: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Better approaches

— Solution methods can be categorized in two classes.— Direct methods yield the solution in a predictable number of

operations. Cramer’s rule, Gaussian elimination and LUdecomposition are good examples of direct methods.

— Iterative methods converge to the solution through some iterativeprocedure with an unpredictable number of operations.

— Let us now consider another example of a direct method.

19

Page 20: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— This is not a hammer. We must exploit known properties about thematrix A.

— We recall the five-point stencil:

−1 −1

−1

−1

+4

— This results from a double application of the three-point stencil in twodirections.

−1 +2 −1

20

Page 21: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— In the following, the matrix A results from the five-point stencil (2Dproblem), while the matrix T results from the three-point stencil (T ).

— For a symmetric positive definite matrix C, we know that we canperform an eigendecomposition

CQ = QΛ

where Q is the matrix of eigenvectors (as columns) and Λ is thediagonal matrix of eigenvalues.

— Since C is SPD, Q is orthogonal, so

C = QΛQᵀ.

21

Page 22: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— Using this in the linear system, we get

Cx = b =⇒ QΛQᵀ = b.

— Multiply from the left by Qᵀ, recalling that Qᵀ = Q−1.

ΛQᵀx︸︷︷︸x

= Qᵀb︸︷︷︸b

.

— This reduces the problem to three relatively easy steps.

22

Page 23: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

1. Calculate b with a matrix-vector product

b = Qᵀb

in O(N2) operations.2. Solve the system

Λx = b

in O(N) operations (Λ is diagonal).3. Calculate x with another matrix-vector product

x = Qx

in O(N2) operations.

23

Page 24: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— This looks promising: it seems we have found a way to solve thesystem in O(N2) operations instead of O(N3)!

— Unfortunately this is not true, as the computation of theeigendecomposition itself (Q and Λ) requires O(N3) operations.

— However, in certain cases we can still exploit this method.

24

Page 25: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— We constructed the matrix A by applying the three-point stencil in twodirections.

— In the language of linear algebra, this translates to a tensor product.

A = I ⊗ T + T ⊗ I

— The linear system(I ⊗ T + T ⊗ I) x = b

in a global numbering scheme, can equivalently be stated

TX + XT ᵀ = B

in a local numbering scheme. Here, the unknown X and theright-hand-side B are matrices.

25

Page 26: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— A global numbering scheme is a scheme where we number theunknowns using a single index. This naturally maps to a vector.

— A local numbering scheme is a scheme where we number theunknowns using one index for each spatial direction. This naturallymaps to a matrix (in 2D) or more generally a tensor.

— Thus, we consider in this case a system of matrix equations.— Alternative way of thinking about it: we operate with T along the

columns of X , and then along the rows.

TX︸︷︷︸columns

+ (TXᵀ)ᵀ︸ ︷︷ ︸rows

= TX + XT ᵀ

26

Page 27: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— Now let us diagonalize T , recalling that it is SPD.

TX + XT ᵀ = B ⇒QΛQᵀX + XQΛQᵀ = B.

— Multiply with Q from the right,

QΛQᵀXQ + XQΛ = BQ

— and by Qᵀ from the left,

ΛQᵀXQ︸ ︷︷ ︸X

+ QᵀXQ︸ ︷︷ ︸X

Λ = QᵀBQ︸ ︷︷ ︸B

,

orΛX + XΛ = B.

27

Page 28: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

We find the solution in three steps1. Calculate B with two matrix-matrix products

B = QᵀBQ

in O(n3) operations.2. Solve the system

xi,j =bi,j

λi + λj

in O(n2) operations.3. Recover the solution with two matrix-matrix products

X = QXQᵀ

in O(n3) operations.

28

Page 29: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— Note that n =√

N in two dimensions.— That gives O(N3/2) operations in total, and a space complexity ofO(N). A very fast algorithm!

— Computing the eigendecomposition is also O(N3/2). Since the matrixis smaller, this no longer dominates the running time.

— It is parallelizable: a few large, global data exchanges are needed(transposing matrices). Multiplying matrices can be done withrelatively little communication. (More on this in the big project.)

— LU decomposition would require O(N3) operations and O(N2)storage. Bandewd LU decomposition would require O(N2)operations and O(N3/2) storage.

29

Page 30: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— The diagonalization method is quite general, applicable to any SPDsystem with a tensor-product operator.

— We used Poisson to make it more easily understandable (I hope).— It turns out that we can do even better by exploiting even more

structure in the Poisson problem.

30

Page 31: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— The continuous eigenvalue problem

−uxx = λu, in Ω = (0,1),

u(0) = u(1) = 0

has solutions for j = 1,2, . . .

uj (x) = sin(jπx),

λj = j2π2

— We now consider operating with T on vectors consisting of ujsampled at the gridpoints, i.e.

q(j)i = uj (xi ) = sin

(ijπn

)

31

Page 32: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Diagonalization

— This yields

(T q(j))i = 2(

1− cos(

jπn

))︸ ︷︷ ︸

λj

sin(

ijπn

)︸ ︷︷ ︸

q(j)i

— Thus, the eigenvectors of T are the eigenfunctions of −∂2/∂x2

sampled at the gridpoints. (Lucky!)— Let us normalize to obtain the matrix Q.

qᵀj qj = 1⇒ Qi,j =

√2n

sin(

ijπn

)Λi,i = 2

(1− cos

(jπn

))

32

Page 33: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Fourier Transform

— Now for something completely different. (But not really. It’s a magictrick.)

— Consider a periodic function v(x) with period 2π.— Sample this function at equidistant points xj , j = 0,1, . . . ,n.— As with the finite difference grid, xj = jh where h = 2π/N.— We name the samples vj = v(xj ).

x0 x1 xn

33

Page 34: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Fourier Transform

— Consider the vectors ϕk , where

(ϕk )j = eikxj ,

whereeikxj = cos(kxj ) + i sin(kxj ).

— These vectors form a basis for the complex n-dimensional space Cn.In particular, they are orthogonal:

ϕHkϕ` =

n, k = `,

0, k 6= `,k , ` = 0,1, . . . ,n − 1.

34

Page 35: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Fourier Transform

— Since they form a basis, any vector in the space can be expressed asa linear combination of these vectors.

— The vectorv =

(v0 · · · vn−1

)ᵀ ∈ Rn

can be expressed as

v =n−1∑k=0

vkϕk ,

or element by element,

vj =n−1∑k=0

vk (ϕk )j =n−1∑k=0

vk eikxj .

35

Page 36: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Fourier Transform

— Here vk are the discrete Fourier coefficients, which are given by theinverse relation. (Since ϕk ) are orthogonal, this is easy to compute.)

vk =1n

n∑j=0

vjeikxj .

— The Fourier transform is extremely useful and has been extensivelystudied.

— It is useful in, among other things, signal analysis, audio and videocompression.

— Important property: the DFT coefficients of an odd, real signal arepurely imaginary.

36

Page 37: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Sine Transform

— A transform closely related to DFT is the discrete sine transform, theDST.

— It applies to a function v(x), periodic with period 2π, and odd. That is,v(x) = −v(−x)

— We discretize this function on an equidistant mesh between 0 and π.(The values between π and 2π aren’t needed because it is odd.)

— Since v is odd, we also know that v(x0) = v(xn) = 0. (These are ourPoisson boundary conditions.)

— Thus the discrete function is represented by the vector

v =(v1 · · · vn−1

)ᵀ ∈ Rn−1.

37

Page 38: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Discrete Sine Transform

— As a basis for this space, let us use the sines

(ψk )j = sin(

kjπn

), j = 1, . . . ,n − 1,

and note that

ψᵀkψ` =

n/2, k = `,

0, k 6= `.

— Thus

vj =n−1∑k=1

vk sin(

kjπn

)=(S−1v

)j ,

and

vk =2n

n−1∑j=1

vj sin(

jkπn

)= (Sv)k

38

Page 39: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Tying it all toghether

— In particular, we have

Q =

√n2

S, Qᵀ =

√2n

S−1.

Therefore we can compute a matrix-vector product involving Q or Qᵀ

by using the DST.— How to compute the DST quickly? Consider a vector

v =(v1 · · · vn−1

)ᵀ ∈ Rn−1.

Construct the odd extended vector

w =(0 v1 · · · vn−1 0 −vn−1 · · · −v1

)ᵀ ∈ R2n.

39

Page 40: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Tying it all toghether

— It turns out that the DST coefficients of w and the DFT coefficients ofw are related by

vk = 2iwk , k = 1, . . . ,n − 1

— Thus, we can find the DST by computing the DFT of the extendedvector, multiplying the first n − 1 coefficients (after the constantmode) by 2i, and throwing away the rest of them.

— This is good because there are very fast algorithms for computing theDFT: the infamous Fast Fourier Transform (FFT).

— One FFT is O(n log n), so a matrix-matrix product using the FFT isO(n2 log n).

40

Page 41: TMA4280—Introduction to Supercomputing · 2017. 3. 13. · 1 Discussion on Direct Solvers TMA4280—Introduction to Supercomputing Based on 2016v slides by Eivind Fonn NTNU, IMF

Tying it all together

We find the solution in three steps1. Calculate B with two matrix-matrix products

Bᵀ = S−1(SB)ᵀ

in O(n2 log n) operations.2. Solve the system

xi,j =bi,j

λi + λj

in O(n2) operations.3. Recover the solution with two matrix-matrix products

X = S−1(SXᵀ)ᵀ

in O(n2 log n) operations.

41