An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations...

0-0

These notes are based on a lecture course given by the author in thesummer semester of 2005 for postgraduate students at the University ofLeipzig/Max-Planck-Institute for Mathematics in the Sciences. The purposeof this course was to provide an introduction to modern methods of a data-sparse representation to integral and more general nonlocal operators basedon the use of Kronecker tensor-product decomposition.

In recent years multifactor analysis has been recognised as a powerful(and really indispensable) tool to represent multi-dimensional data arisingin various applications. Well-known since three decades in chemometics,physicometrics, statistics, signal processing, data mining and in complexitytheory, nowadays this tool has also become attractive in numerical PDEs,many-particle calculations, and in solving integral equations.

Our goal is to introduce the main mathematical ideas and principles whichallow effective representation of some classes of high-dimensional operatorsin the Kronecker tensor-product form, as well as rigorous analysis of thearising approximations. Low Kronecker-rank representation of operators notonly relaxes the “curse of dimensionality”, but also provides efficient numer-ical methods of sub-linear complexity to approximate 2D- and 3D-problems.

Leipzig, July 2005.

1

Everything should be made as simpleas possible, but not simpler.

A. Einstein (1879-1955)

An Introduction to Structured Tensor-Product

Representation of Discrete Nonlocal Operators

Part I: Approximation Tools

Boris N. Khoromskij

University of Leipzig/MPI MIS, summer 2005

http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij

Outline of the Lecture Course B. Khoromskij, Leipzig 2005(L1) 2

1. Ubiquitous data-sparse matrix arithmetics; look on Fourier kingdom.

2. Celebrated sampling theorem; Sinc interpolation and quadratures.

3. Introduction to wavelet techniques.

4. Separable approximation to multi-variate functions in Rd.

5. Kronecker-product decomposition of high-dimensional tensors.

Combination with H-matrix, FFT- and FWT-based formats.

6. Hierarchical Kronecker-product (HKT) representation to

multi-dimensional integral operators Au =R

Rd g(·, y)u(y)dy.

7. Structured representation to matrix-valued functions with application

to A−1,√

A, sign(A).

8. Truncated iteration: convergence and truncation error analysis.

9. HKT approximation to matrix-valued functions A−1,√

A, sign(A).

10. Application to the Hartree-Fock and Boltzmann equations.

Lect. 1. Ubiquitous data-sparse matrix arithm.; Fourier kingdom. B. Khoromskij, Leipzig 2005 3

Basic physical models are described by nonlocal data transfer.

In large scale applications the algebraic operations on high-dimensional,

densely populated matrices/tensors require huge computational resources.

Standard methods suffer from the “curse of dimensionality” (R. Bellman).

Examples of (discrete) nonlocal operators:

1. Multi-dimensional integral operators in Rd

2. Elliptic/parabolic solution operators (e.g., financial PDEs)

3. Lyapunov/Riccati matrix equations in control theory

4. Density matrix calculation for many-particle systems

5. Deterministic Boltzmann equation in R3 (dilute gas).

6. Ornstein-Zernike integral equation in R3(theory of disordered matter)

7. Chemometric, psychometric, stochastic models ...

Huge problems: special methods vs. super-computers B. Khoromskij, Leipzig 2005(L1) 4

Complexity of standard matrix operations:

NStor ≈ NA·v = O(N2) for the storage/MVM of fully populated

matrix A ∈ RN×N ; besides NA−1 ≈ NA·B ≈ NL·U = O(N3).

A paradigm of up-to-date numerical simulations:

the faster the computer is the better asymptotical complexity

of the algorithm is required (speed increases proportional to memory).

In low dimensions (d ≤ 3) the goal is O(N)-methods.

Basic principles: making use of hierarchical structures,

low-rank pattern and recursive algorithms.

In multi-dimensional perspective O(N) is not enough since the

“curse of dimensionality”: N = nd (3 · 1022 mol. in 1 cm3 of water).

The challenge is to develop O(n)-algorithms !

Main ideas: tensor-product data-struct. + H-matrix formats.

Old and new ideas or what we are going to discuss B. Khoromskij, Leipzig 2005(L1) 5

Based on recursions via hierarchical structures:

Classical Fourier (1768-1830) methods, FFT in O(N log N) op.

Circulant convolution, Toeplitz, Hankel matrices.

Multiresolution representation via wavelets, FWT in O(N) op.

Data and matrix compression in O(N) op.

Multigrid methods: O(N) - elliptic problem solvers.

Domain decomposition: O(N/p) - parallel algorithms.

Panel clustering, fast multipole, H-matrix in O(qdN logβ N) op.

Well suited for integral (nonlocal) operators in FEM/BEM.

Based on tensor-product data organization:

Kronecker tensor-product (KT) representation in RN , N = nd

(multiway decomposition): O(nq logβ n), q = q(d) - fixed.

Combination of KT formats with H-matrix, wavelet or

FFT-based structures: O(n logβ n) op.

Alternative directions: Compress the input data B. Khoromskij, Leipzig 2005(L1) 6

• High order methods: hp-FEM/BEM, spectral methods,

bcFEM (Khoromskij, Melenk), Richardson extrapolation.

• Adaptive mesh refinement: a priori/a posteriori strateg.

• Best N-term nonlinear approximation (wavelet/FEM)

• Dimension reduction: boundary/interface equations,

Schur complement methods.

• Combination of tensor-product basis with anisotropic

adaptivity: hyperbolic cross approximation by

FEM/wavelets, sparse grids.

• Model reduction: multi-scale, homogenization, genetic

algorithms, neural networks.

• Monte-Carlo methods (e.g., random walk dynamics).

Fourier kingdom. Fourier transform in L1(R) B. Khoromskij, Leipzig 2005(L1) 7

Continuous Fourier transform (S.G. Mallat)

f(ω) :=∫

R

f(t)e−iωtdt.

If f ∈ L1(R) then f ∈ C0(R) and |f(ω)| ≤ ∫R|f(t)|dt < +∞.

If f, f ∈ L1(R) then the inverse Fourier transform is given by

f(t) :=12π

∫R

f(ω)eiωtdω.

Let f, h ∈ L1(R). The convolution

g(t) = f ∗ h :=∫

R

f(t− u)h(u)du

then satisfies

g =12π

∫R

g(ω)eiωtdω ∈ L1(R) with g(ω) = h(ω)f(ω).

Important features of the Fourier transform B. Khoromskij, Leipzig 2005(L1) 8

Each frequency eiωt is amplified by a factor h.

Hence a convolution is called a frequency filtering with a

transfer function of a filter h.

Important relations between f(t) and its FT f(ω):

Inverse: f(t) ⇐⇒ 2πf(−ω)Convolution: (h ∗ f)(t) ⇐⇒ h(ω)f(ω)Multiplication: h(t)f(t) ⇐⇒ 1

2π (h ∗ f)(ω)Translation: f(t− u) ⇐⇒ e−iuω f(ω)Modulation: eiνtf(t) ⇐⇒ f(ω − ν)Scaling: f(t/s) ⇐⇒ |s|f(sω)Time derivatives: f (p)(t) ⇐⇒ (iω)pf(ω)Frequency derivatives: (−it)pf(t) ⇐⇒ f (p)(ω)Complex conjugate: f∗(t) ⇐⇒ f∗(−ω)Hermitian symmetry: f(t) ∈ R ⇐⇒ f(−ω) = f∗(ω).

Fourier transform in L2(R) B. Khoromskij, Leipzig 2005(L1) 9

The inner product of f, h ∈ L2(R) and L2(R)-norm:

〈f, h〉 =∫

R

f(t)h∗(t)dt, ||f ||2 = 〈f, f〉 =∫

R

|f(t)|2dt.

Let f, h ∈ L1(R) ∩ L2(R). The Parseval and Plancherel

formulas read, respectively, as

〈f, h〉 =12π

∫R

f(ω)h∗(ω)dω, ||f ||2 =12π

∫R

|f(ω)|2dω.

The global regularity of f(t) can be controlled by the decay

rate of |f(ω)|, i.e.,

|f (k)(t)| ≤ 12π

∫R

|f(ω)||ω|kdω, k = 0, 1, ...

and f (k) is continuous, if the corresponding integrals converge.

Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 10

Example 1.1. For a Dirac δ (tempered distribution)

concentrated at the origin t = 0, i.e.,∫

Rδ(t)f(t)dt = f(0),

δ(ω) =∫

R

δ(t)e−iωtdt = 1 (formal representation).

Example 1.2. The FT of the characteristic (indicator, step)

function f(t) = χ[−T,T ](t) =

⎧⎨⎩1 if t ∈ [−T, T ],

0 otherwise:

f(ω) =∫ T

−T

e−iωtdt =2 sin(Tω)

ω∈ L1(R) (not integrable).

Example 1.3. An ideal low-pass filter has a transfer function

h = χ[−ξ,ξ](ω), thus its inverse FT (impulse response) is

h(t) =12π

∫ ξ

−ξ

eiωtdω =sin(ξt)

πt.

With ξ = π, we obtain the classical sinc-function.

Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 11

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5

Haar scaling function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

Figure 1: Haar (indicator) and Sinc scaling functions.

Functions χ[−π,π](t) (cf. Haar scaling function) and sinc(t)have the complementary (in fact, the opposite) features in

the time and frequency (Fourier) domains.

Numerous wavelet families realize certain compromise

between these two “extreme cases”.

Examples of FT (II) B. Khoromskij, Leipzig 2005(L1) 12

Example 1.4. A FT for a translated Dirac δτ (t) = δ(t− τ) is

calculated by evaluating e−iωt at t = τ :

δτ (ω) =∫

R

δ(t− τ)e−iωtdt = e−iωτ .

For the Dirac comb c(t) =∞∑

n=−∞δ(t− nT ) we have

c =∞∑

n=−∞e−inTω.

Example 1.5. A FT of a Gaussian f(t) = exp(−t2) ∈ C∞ is

also a Gaussian:

f(ω) =√

π exp(−ω2/4).

We readily get 2f ′(ω) + ωf(ω) = 0, which proves the statement.

Fourier series of 2π-periodic functions B. Khoromskij, Leipzig 2005(L1) 13

Denote by L2[−π, π] the Hilbert space of 2π-periodic functions

with the inner product and norm

〈f, h〉 =12π

∫ π

−π

f(ω)h∗(ω)dω, ||f ||2 =12π

∫ π

−π

|f(ω)|2dω.

Thm. 1.1. The family of functions e−ikωk∈Z is an

orthonormal basis of L2[−π, π].

Let lp(Z) be the space of complex-valued sequences f [k]k∈Z

such that∞∑

k=−∞|f [k]|p < +∞. Thm. 1.1 proves that if

f ∈ l2(Z), the Fourier series

f(ω) =∞∑

k=−∞f [k]e−iωk, with f [k] =

12π

∫ π

−π

f(ω)eiωkdω,

is the decomposition of f ∈ L2[−π, π] in the orthogonal Fourier

basis.

Discrete Fourier transform B. Khoromskij, Leipzig 2005(L1) 14

Let SN be the space of finite sequences f [n]0≤n<N of period

N . SN is an Euclidean space with the inner product

〈f, g〉 =N−1∑n=0

f [n]g∗[n].

Thm. 1.2. The familyek[n] = exp

(2iπkn

N

)0≤k<N

is an

orthogonal basis of SN with ||ek||2 = N . Any f ∈ SN can berepresented by

f =

N−1Xk=0

〈f, ek〉||ek||2

ek. (1)

Def. 1.1. The discrete Fourier transform (DFT) of f is

bf [k] := 〈f, ek〉 =

N−1Xn=0

f [n] exp

„−2iπkn

N

«, (N2 complex multipl.).

Due to (1) an inverse DFT is given by

f [n] :=1

N

N−1Xk=0

bf [k] exp

„2iπkn

N

«.

Fast Fourier transform: Outlook B. Khoromskij, Leipzig 2005(L1) 15

FFT: hierarchical recursive algorithm

The fast Fourier transform (FFT) can be traced back (1805)

to Gauss (1777 - 1855). First computer progr. Coolly/Tukey (1965).

FFT is to split the unknown Fourier coefficients f [k],k = 0, ..., N − 1, into the odd and even parts.

Let N = 2q. This allows to make use recursion:

a problem of dimension N = 2q (level q) is transformed to two

problems of dimension N/2 = 2q−1 (level q − 1) plus O(N)operations, etc. until it is reduced to N = 2q problems of

dimension 1 (level 0).

Since the cost per step is O(N) and the number of levels is

q = log2 N , this results in the linear-logarithmic complexity

O(N log2 N) ∼ CF N log2 N with small const. CF ∼ 4.

FFT: sketch of the algorithm B. Khoromskij, Leipzig 2005(L1) 16

When the frequency index is even, we group the terms n and n + N/2:

f [2k] =N/2−1∑

n=0

(f [n] + f [n + N/2]) exp(−2iπkn

N/2

).

When the frequency index is odd, we have

f [2k + 1] =N/2−1∑

n=0

exp(−2iπn

N

)(f [n]− f [n + N/2]) exp

(−2iπkn

N/2

).

First equation shows that even frequencies are obtained

calculating the DFT of N/2 periodic signal

fe[n] = f [n] + f [n + N/2],

second eq. implies that odd frequencies can be computed by

the DFT of the diagonally scaled N/2 periodic signal

fo[n] = exp(−2iπn

N

)(f [n]− f [n + N/2]).

FFT: Matrix representation B. Khoromskij, Leipzig 2005(L1) 17

The FT matrix FN = fk,nNk,n=1 is given by

fk,n := exp(−2iπkn

N) = W−nk, W = e2iπ/N .

The FFT recursion connects the M-point transform to two

copies of the M/2-point transform

FN =

⎛⎝ IN/2 DN/2

IN/2 −DN/2

⎞⎠⎛⎝ FN/2 0

0 FN/2

⎞⎠⎛⎝ even

odd

⎞⎠ .

IN/2 is the identity matrix, DN/2 is the diagonal matrix with

diagonal entries 1, W−1, ..., W−N/2. The permutation matrix

at the end transforms the input vector into its “even” and its

“odd” part.

Finally, the FFT algorithm keeps going, recursively:

FN → FN/2 → ... → F1.

FFT: complexity, inverse transform B. Khoromskij, Leipzig 2005(L1) 18

The DFT(N) may be calculated with two DFT(N/2) plus

CF N operations to compute fe[n] and fo[n], n = 0, ..., N/2− 1.We obtain the recursion

NFFT (N) = 2NFFT (N/2) + CF N with NFFT (1) = 0.

Setting N = 2q, q ∈ N, and introducing Q(q) = NFFT (N)/N , we

get

Q(q) = Q(q − 1) + CF with Q(0) = 0,

which implies Q(q) = CF q. Hence NFFT (N) = CF N log2 N .

The inverse FFT of f can be derived from the forward FFT

of its complex conjugate f∗ due to

f∗[n] :=1N

N−1∑k=0

f∗[k] exp(−2iπkn

N

).

FFT: fast discrete convolution B. Khoromskij, Leipzig 2005(L1) 19

Let g be the discrete convolution of two signals f, h supported

only by the indices 0 ≤ n ≤ M − 1,

g[n] = (f ∗ h)[n] =∞∑

k=−∞f [k]h[n− k].

The naive implementation requires M(M + 1) operations.

It can be represented as a matrix-by-vector product (MVP)

with the Toeplitz matrix

T = h[n− k]0≤n,k<M ∈ RM×M , g = Tf.

Extending f and h with over M samples by

h[M ] = 0, h[2M − i] = h[i], i = 1, ..., M − 1,

f [n] = 0, n = M, ..., 2M − 1,

we reduce the problem to the MVP with a circulant matrix

C ∈ R2M×2M specified by the first row h ∈ R2M .

FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 20

An n× n matrix C is called circulant if it has the form

C = circc1, . . . , cn :=

⎛⎜⎜⎜⎜⎜⎜⎝c1 c2 . . . cn

cn c1 . . . cn−1

......

. . ....

c2 . . . cn c1

⎞⎟⎟⎟⎟⎟⎟⎠ , ci ∈ C .

The set of all n× n circulant matrices is closed with respect

to addition and multiplication by a constant.

Any circulant matrix C is associated with the polynomial

pc(z) := c1 + c2z + . . . + cnzn−1, z ∈ C.

FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 21

Matrix C has a diagonal representation in the Fourier basis,

C = FTn ΛcFn

with

Λc = diagpc(1), . . . , pc(ωn−1), ω = eiπ/n.

The eigenvector corresponding to the eigenvalue pc(ωj−1) is

given by jth column of Fn, i.e.,

ωj =1√n

(ω(k−1)(j−1))k=1,...,n.

The matrix-vector product with C costs 2CF n log2 n + O(n) op.

Multi-dimensional FFT can be performed by tensorization

process with the linear-logarithmic cost O(N log2 N), N = nd.

Literature to Lecture 1 B. Khoromskij, Leipzig 2005(L1) 22

1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.

2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.

3. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class

of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.

4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,

Leipzig 2003.

URL: http://personal-homepages.mis.mpg.de/bokh

http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor1.ps

Lecture 2. Sampling Theorem, Sinc Approximation B. Khoromskij, Leipzig 2005 23

How to discretise analog signals ?

The class of functins f(t), t ∈ R (analog signals) can be

discretized by recording their sample values f(nh)n∈Z at

intervals h > 0.V.A. Kotelnikov (1933) and J. Whittaker (1935) proved a celebrated

theorem: band-limited signals can be exactly reconstructed

via their sampling values.

The sinc function (also called Cardinal function) is given as

sinc(x) :=sin(πx)

πxwith convention sinc(0) = 1.

Thm. 2.1. (Kotelnikov, Shannon, Whittaker) If the support of f is

included in [−π/h, π/h] then

f(t) =∞∑

n=−∞f(nh)Sn,h(t), t ∈ R, (2)

where Sn,h(t) = sinc(t/h− n).

Sampling Theorem B. Khoromskij, Leipzig 2005(L2) 24

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5


−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

Figure 2: Haar (cf. bf of f = sinc) and Sinc scaling functions.

Sampling theorem plays an important role in tele/radio

communications, signal processing, stochastical models etc.

The class of band-limited functions has a direct

characterisation, namely, it is the Paley-Wiener space W (π/h)of entire functions of exponential type (see later on).

Proof of Sampling Theorem (I) B. Khoromskij, Leipzig 2005(L2) 25

Preliminaries to the proof.

(a) The Poisson formula is (in the sense of distributions)

c =∞∑

n=−∞e−inhω =

2π

h

∞∑k=−∞

δ

(ω − 2kπ

h

). (3)

Recall that c =∞∑

n=−∞e−inhω is the FT of the Dirac comb

c(t) =∞∑

n=−∞δ(t− nh) (cf. Ex. 1.4).

Since c is 2πh -periodic, it suffices to prove that c[−π/h,π/h] = 2π

h δ.

(b) To any sample f(nh) we associate a Dirac and introduce

the weighted Dirac sum fd(t) :=∞∑

n=−∞f(nh)δ(t− nh). Since the

FT of δ(t− nh) is e−inhω, we obtain fd =∞∑

n=−∞f(nh)e−inhω.

Proof of Sampling Theorem (II) B. Khoromskij, Leipzig 2005(L2) 26

(c) Now f(t) can be computed from the sample values f(nh)due to the simple relation between FTs fd and f as follows.

Lem. 2.2. The FT of fd is given by

fd(ω) =1h

∞∑k=−∞

f

(ω − 2kπ

h

).

Proof. f(nh)δ(t− nh) = f(t)δ(t− nh) implies

fd(t) := f(t)∞∑

n=−∞δ(t− nh) ≡ f(t)c(t).

Computing the FTs

fd =12π

f ∗ c(ω) (4)

we apply the Poisson formula (3) to represent c(ω).

Since f ∗ δ(ω− ξ) = f(ω− ξ), inserting the above formula to (4)

proves Lem. 2.2.

Proof of Sampling Theorem (III) B. Khoromskij, Leipzig 2005(L2) 27

Proof of Sampling Theorem.

If n = 0, the support of f(ω − nπ/h) does not intersect the

support of f(ω) since f(ω) = 0 for |ω| > π/h. Thus Lem. 2.2

implies

fd(ω) =f(ω)

hif |ω| ≤ π

h.

Recall that the FT of S0,h = sinc(t/h) is S0,h = hχ[−π/h,π/h].

Since supp(f) ∈ [−π/h, π/h], the previous relation results in

f(ω) = S0,h(ω)fd(ω).

The inverse FT of this equation, that is f(t) = S0,h ∗ fd(t),leads to the required result (since Sn,h(t) = S0,h(t− nh))

f(t) = S0,h ∗∞∑

n=−∞f(nh)δ(t− nh) =

∞∑n=−∞

f(nh)S0,h(t− nh).

Generalised sampling theorem B. Khoromskij, Leipzig 2005(L2) 28

Sampling Thm. as a decomposition in orthogonal basis.

Define the space Uh as a set of functions whose FTs have a

support included in [−π/h, π/h].

Lem. 2.2. A set of functions Sn,h(t)n∈Z is an orthogonal

basis of the space Uh. If f ∈ Uh then

f(nh) =1h〈f(t), Sn,h(t)〉 .

Cor. 2.3. The sinc-interpolation formula of Thm. 2.1 can be

interpreted as a decomposition of f ∈ Uh in an orthogonal

basis of Uh:

f(t) =1h

∞∑n=−∞

〈f(·), Sn,h(·)〉Sn,h(t).

If f ∈ Uh, one finds the orthogonal projection of f in Uh.

Proof of Lemma 2.2 B. Khoromskij, Leipzig 2005(L2) 29

Use Sampling Theorem and the Parseval formula.

Recall that S0,h = hχ[−π/h,π/h] and apply the Parseval formula

〈Sn,h(u), Sm,h(t)〉 = 12π

∫R

h2χ[−π/h,π/h]e−inhωeimhωdω

= h2

2π

π/h∫−π/h

e−i(n−m)hωdω = hδ[n−m].

Hence, Sn,h(t)n∈Z is the orthogonal family. Since

Sn,h(t) ∈ Uh, Thm. 2.1 implies that any f ∈ Uh can be

represented as a linear combination of Sn,h(t)n∈Z, i.e., the

latter is an orthogonal basis of Uh.

To verify the second assertion, we again apply the Parseval

formula to obtain

〈f(t), Sn,h(t)〉 =h

2π

∫ π/h

−π/h

f(ω)einhωdω = hf(nh).

Sinc-interpolation of entire functions B. Khoromskij, Leipzig 2005(L2) 30

When the Sinc-interpolant represents a funct. exactly?

C(f, h)(x) =∞∑

k=−∞f(kh)Sk,h(x).

Def. 2.5 Let h > 0, and let W(π/h) denote the family of

entire functions, s.t.∫

R|f(t)|2dt < ∞, and s.t. for all z ∈ C

|f(z)| ≤ Ceπ|z|/h with constant C > 0.

Thm. 2.4 (Stenger) h−1/2Sk,h(x)k∈Z is a complete

L2(R)-orthonormal sequence in W(π/h). Every f ∈W(π/h) has

the cardinal series representation

f(x) = C(f, h)(x), x ∈ R.

Proof: Consequence of the classical Paley-Wiener Theorem.

Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 31

Interpolant C(f, h) provides an incredibly accurate approx.

on R for functions that are analytic and uniformly bounded on

the strip

Dδ := z ∈ C : |m z| ≤ δ, 0 < δ <π

2,

such that

N(f, Dδ) :=∫

R

(|f(x + iδ)|+ |f(x− iδ)|) dx < ∞.

This defines the Hardy space H1(Dδ).

For functions f ∈ H1(Dδ)

supx∈R

|f(x)− C(f, h)(x)| = O(e−πδ/h) h → 0. (5)

Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 32

Likewise, if f ∈ H1(Dδ), the integral

I(f) =∫

Ω

f(x)dx (Ω = R or Ω = R+) (6)

can be approximated with exponential convergence by the

Sinc-quadrature

T (f, h) := h

∞∑k=−∞

f(kh)(

=∫

R

C(f, h)(x)dx ≈ I(f))

,

|I(f)− T (f, h)| = O(e−πδ/h) h → 0. (7)

Analogues estimates hold for (computable) trucated sums

CM (f, h) =∑M

k=−M f(kh)Sk,h(x), TM (f, h) = h∑M

k=−M f(kh).

Standard error estimates B. Khoromskij, Leipzig 2005(L2) 33

Thm. 2.5. (Stenger) If f ∈ H1(Dδ) and |f(x)| ≤ C exp(−b|x|) for

all x ∈ R b, C > 0, then

‖f − CM (f, h)‖∞ ≤ C

[e−πδ/h

2πδN(f, Dδ) +

1bh

e−bhM

], (8)

|I(f)− TM (f, h)| ≤ C

[e−2πδ/h

1− e−2πδ/hN(f, Dδ) +

1be−bhM

]. (9)

Proof: First term of the rhs in (8) represents the

approximation error (5),

‖f(x)− C(f, h)(x)‖∞ ≤ N(f, Dδ)2πδ sinh(πδ/h)

,

while the second one gives the truncation error

‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1

|f(kh)|

≤ 2C∞∑

k=M+1

e−bkh ≤ 2Cbh e−bhM .

Exponential convergence rate B. Khoromskij, Leipzig 2005(L2) 34

Similar arguments apply to (9).

For interpolation error (8), the choice

h =√

πδ/bM

implies the exponential convergence rate

‖f − CM (f, h)‖∞ ≤ CM1/2e−√

πδbM . (10)

In fact, for the chosen h, the first term in the right-hand side

in (8) dominates, hence (10) follows. Usually we set δ = π/2.

For the quadrature error (9), the choice

h =√

2πδ/bM

yields

|I(f)− TM (f, h)| ≤ Ce−√

2πδbM . (11)

Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 35

If f has a double-exponential decay as |x| → ∞, i.e.,

|f(x)| ≤ C exp(−bea|x|) for all x ∈ R with a, b, C > 0, (12)

the convergence rate of Sinc-interpolation and quadrature

can be improved up to O(e−cM/ log M ) (cf. Thm. 2.5).

Thm. 2.6. (Gavrilyuk, Hackbusch, Khoromskij) Let f ∈ H1(Dδ) with

some δ < π2 , and let (12) hold. Then the choice

h = log( 2πaMb )/ (aM) leads for the quadrature error

|I − TM (f, h)| ≤ C N(f, Dδ)e−2πδaM/ log(2πaM/b). (13)

The choice h = log(πaMb )/ (aM) implies for the interpolation

error

‖f − CM (f, h)‖∞ ≤ CN(f, Dδ)

2πδe−πδaM/ log(πaM/b). (14)


Proof. The quadrature error has a bound

|I − TM (f, h)| ≤ C

[e−2πδ/h

1− e−2πδ/hN(f, Dδ) +

e−ahM

abexp(−beahM )

].

In fact the bound for |I − T (f, h)| is the same as in Thm. 2.5.

For the rest sum we use the simple estimate to obtain

∑k: |k|>M

exp(−bea|kh|) = 2∞∑

k=M+1

exp(−bea|kh|)

≤ 2∫ ∞

M

exp(−bea|xh|)dx ≤ 2e−ahM

abhexp(−beahM ).

Now (13) follows.


The interpolation error of CM (f, h) satisfies

‖f − CM (f, h)‖∞ ≤ C

[e−πδ/h

2πδN(f, Dδ) +

e−ahM

abhexp(−beahM )

].

Again, the approximation error allows the same estimate as in

the standard case. The truncation error bound is determined

by the decay rate of f as |x| → ∞,

‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1

|f(kh)|

≤ 2C∞∑

k=M+1

e−beakh ≤ 2CbaheahM e−beahM

,

which proves (14).

Sinc-interpolation on (a, b) via Thm. 2.5 B. Khoromskij, Leipzig 2005(L2) 38

To apply Thm. 2.5 in the case Ω = (a, b) (say, Ω = R+) one

has to substitute the variable x ∈ Ω by x = ϕ(ζ) such that

ϕ : R → (a, b) is a bijection. This changes f : (a, b)→ R into

f1 := ϕ′ · (f ϕ) : R → R (quadrature),

f1 := f ϕ (interpolation).

Assuming f1 ∈ H1(Dδ), one can apply (10)-(11) to the

transformed function.

Ex. 2.1. In the case of interval, (a, b):

ϕ−1(z) = log[(z − a)/(b− z)], e z = x.

Ex. 2.2. In the case of semi-axis, R+ := (0,∞):

ϕ−1(z) = log[sinh(z)] or ϕ−1(z) = log(z).

Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 39

Polynomial decay. Let us set Ω = R+ and assume:

(i) f can be analytically extended from R+ into the sector

D(1)δ = z ∈ C : | arg(z)| < δ for some 0 < δ < π/2, (15)

(actually, ϕ−1 : D(1)δ → Dδ is the conformal map),

(ii) f satisfies the inequality

|f(z)| ≤ c|z|α−1(1+|z|)−α−β for some 0 < α, β ≤ 1 and ∀z ∈ D(1)δ .

Let α = 1. Choosing any M ∈ N and taking

h(1) =√

2πδ/(βM), (16)

we define the corresponding quadrature rule (with ϕ(ζ) = eζ)

I(1)M = h(1)

M∑k=−βM

κ(1)k f(z(1)

k ), z(1)k = ekh(1)

, κ(1)k = ekh(1)

,


possessing the exponential convergence rate∣∣∣I − I(1)M

∣∣∣ ≤ Ce−√

2πδβM (17)

with a positive constant C independent of M .

d

d 0

Dd1

id

0

d

d

Dd3

Figure 3: The analyticity sector D(1)δ (left) and the “bullet-shaped” do-

main D(2)δ .


Exponentail decay. Assume that the integrand f can be

analytically extended into the “bullet-shaped” domain

D(2)δ = z ∈ C : | arg(sinh z)| < δ, 0 < δ < π/2,

and that f satisfies

|f(z)| ≤ C

( |z|1 + |z|

)α−1

e−β e z in D(2)δ , α, β ∈ (0, 1]. (18)

Setting α = 1 and choosing h(2) = h(1), κ(2)k = 1 + e−2kh(2)

and

M ∈ N, we obtain the quadrature

I(2)M = h(2)

M∑k=−βM

κ(2)k f(z(2)

k ), z(2)k = log[ekh(2)

+√

1 + e2kh(2) ],

possessing again the exponential convergence rate.

Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 42

For applications in FEM/BEM, we reformulate the result of

Thm. 2.6 for parameter dependent functions g(x, y),y ∈ Y ⊂ Rm, defined on the reference interval x ∈ (0, 1].Introduce the mapping

ζ ∈ Dδ → φ(ζ) =1

cosh(sinh(ζ)), δ <

π

2. (19)

Clearly, (0, 1] = φ(R) and, also, φ(ζ) decays twice exponentially,

|φ(ζ)| ≤ 2 exp(−cos δ

2e|e ζ|), ζ ∈ Dδ.

In particular, we have |φ(ζ)| ≤ 2 exp(− 12e|ζ|), ζ ∈ R. Let

Dφ(δ) := φ(ζ) : ζ ∈ Dδ ⊃ (0, 1] be the image of Dδ. One

checks easily that Dφ(δ) ⊂ Sr(0)\0, where Sr(0) is the disc

around zero with a radius r > 1.


Hence, if a function g is holomorphic on Dφ(δ), then

f(ζ) := φα(ζ)g(φ(ζ)) for any α > 0

is also holomorphic on Dδ. Now the Sinc interpolation

CM (f(·, y), h)(ζ) =M∑

k=−M

f(kh, y)Sk,h(ζ)

with the back-transformation ζ = φ−1(x) = arsinh(arcosh( 1x))

and multiplication by x−α yields the separable approximation

gM (x, y) :=M∑

k=−M

φ(kh)α

xαg(φ(kh), y)Sk,h(φ−1(x)) ≈ g(x, y) (20)

of g(x, y) for x ∈ (0, 1] = φ(R) and y ∈ Y . Since φ(ζ) is an even

function, the separation rank in (20) is reduced to r = M + 1.


Cor. 2.7. Assume that for all y ∈ Y the functions g(·, y) and

f(ζ, y) := φα(ζ)g(φ(ζ), y) satisfy:

(a) g(·, y) is holomorphic on Dφ(δ), and supy∈Y N(f, Dδ) < ∞(b) f(·, y) satisfies (12) with a = 1 and with certain C, b ∀y ∈ Y .

Then, for all y ∈ Y , the optimal choice h := log MM yields

EM (ζ) := |f(ζ, y)− CM (f(·, y), h)(ζ)| ≤ CN(f, Dδ)

2πδe−

πδMlog M , (21)

|g(x, y)− gM (x, y)| ≤ |x|−α ∣∣EM (f(·, y), h)(φ−1(x))∣∣ . (22)

Proof: Due to the properties of φ : Dδ → Dφ(δ), condition (a) implies

f ∈ H1(Dδ), hence, in view of (b), we can apply Thm. 2.6. NowN(f,Dδ)

2πδe−πδM/ log M corresponds to approx. err., while the evaluation of

truncation err. yields the bound 2Cb log M

e−bM , which is asymptotically

faster decaying as M → ∞. Now (21) follows.

Transforming to approximand (20) implies the bound (22) for g − gM .

Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 45

Ex. 2.3. Separable approximation to the function

g(x, y) = |x|λ sinc(|x| |y|), λ ∈ (−3, 1],

arising from the Boltzmann equation.

4 8 12 16 20 24 28 32 36 40 44 4810

−12

10−10

10−8

10−6

10−4

10−2

100

M − number of quadrature points

erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16

4 8 12 16 20 24 28 32 36 40 44 4810

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1


erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25

4 8 12 16 20 24 28 32 36 40 44 4810

−5

10−4

10−3

10−2

10−1


erro

r

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36

Figure 4: L∞-error of the sinc-interpolation to |x|λsinc(|x|y), x ∈[−1, 1], y ∈ [1, 36], λ = 1.


Ex. 2.4. Sinc-interpolation for g(x, y) = exp(−xy), x, y ≥ 0.

Consider the auxiliary function f(x, y) = x1+x exp(−xy), x ∈ R+,

y ∈ [1, R], which satisfies all the conditions above with

α = β = 1 (exponential decay). With the choice of

interpolation points xk := log[ekh +√

1 + e2kh] ∈ R+, it can be

approximated with exponential convergence.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100


erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=1.

4 8 12 16 20 24 28 32 36 40 44 4810

−14

10−12

10−10

10−8

10−6

10−4

10−2

100


erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=10.

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2


erro

r

|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=100.

Figure 5: L∞-error of the sinc-interpolation of exp(−|x|y), x ∈ [−1, 1], y ∈ [1, 100] .


Ex. 2.5. Mexican hat scaling function

−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5

0

0.5

1Mexican hat scaling function

Figure 6: Mexican hat f(x) = (1− x2) exp(−αx2), α > 0.

Sinc interpolation to the Mexican hat, r = M + 1.

α\M 4 9 16 25 36 49 64 81 100

1 0.05 6.10-4 7.10-7 1.10-10 2.10-15 1.10-15 - - -

10 0.17 0.13 0.12 0.04 0.01 0.004 0.0009 1.710-4 2.610-5

0.1 3.8 2.6 0.6 0.08 0.006 1.610-5 2.10-7 2.510-9 2.10-11


Ex. 2.6. (Helmholtz kernel in Rd). Define

f(ζ, η, ν) :=eiκ|x−y|

|x− y| , ζ = |x1 − y1|, η = |x2 − y2|, ν = |x3 − y3|.

For (ζ, η) ∈ [0, 1]× [0, b], consider

F (ζ, η) := f(ζ, η, 0) =eiκ√

ζ2+η2√ζ2 + η2

.

We approximate the modified function

F0(ζ, η) := ζα0(F (ζ, η)− F (0, η)), 0 < α0 < 1, (23)

on the domain Ω1 := [δ, 1]× [0, b], where δ > 0 is a small

parameter. The considerations for the remaining domain

Ω2 := [0, δ]× [δ, b] are completely similar.


4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100


erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1], β=0.95

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100


erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=1.

4 8 12 16 20 24 28 32 36 40 44 4810

−10

10−8

10−6

10−4

10−2

100


erro

r

|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=10

Figure 7: Error (depending on κ !) for the Sinc-interpolation to F0 with

κ = 0.01, 1.0, 10, respectively, from left to right.


−1 −0.5 0 0.5 1−5

0

5x 10

−6

−1 −0.5 0 0.5 1−6

−4

−2

0

2

4

6

8x 10

−8

−1 −0.5 0 0.5 1−5

0

5x 10

−9

Figure 8: Pointwise error for the Sinc-interpolation to F0 with κ = 0.01

for r = 25 (left), r = 37 (middle) and r = 49.



2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.

3. I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij: Tensor-product approximation to elliptic and parabolic

solution operators in higher dimensions. Preprint 83, MPI MIS, Leipzig 2003; Computing (to appear).


of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.

5. F. Stenger: Numerical methods based on Sinc and analytic functions. Springer-Verlag, 1993.



Lecture 3. Introduction to wavelet methods B. Khoromskij, Leipzig 2005 52

Wavelet is the mathematical microscop (B. Hubbard)

Purposes:

• Audio/video compression, radar processing

• Surface identification/analysis

• Image analysis (e.g., “finger prints”, medical imaging)

• Communications (radio, TV)

• Numerical PDEs and IEs, many-particle systems, ...

The fundamental theory behind wavelets is known as the

multi–resolution analysis (MRA).

The MRA provides a great deal of possibilities for multi-level

data and signal processing getting widespread popularity.

Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 53

The multiresolution approach is based on the idea that the

wavelet functions generate a hierarchical sequence of

subspaces in L2(R), which forms the MRA,

Vj+1 ⊂ Vj ⊂ ... ⊂ V0 ⊂ V−1 ⊂ ....

A signal f0 ∈ V0 (at scale 20) is split into a “blurred” version

f1 ∈ V1 at the coarser scale 21 and “detail” d1 ∈ W1 at scale 20.

Repeating this process gives a sequence f0, f1, f2, ... of more

and more blurred versions and the details d1, d2, d3, ....

Each dj can be represented in the wavelet basis using the

“filter coefficients” (high-pass filters), while fj are given in

the scaling function basis via the low-pass filters.

After J iterations the original signal can be exactly

reconstructed f0 = fJ + d1 + ... + dJ .


MRA is completely recursive and hence ideal for computation.

Important ingredient is the discrete wavelet transform

(DWT).

DFT allows fast implementation (FWT) with the linear cost

O(N), N = 2J .

Orthogonal wavelets are generated by

– the scaling function (SF) ϕ(x) (mother wavelet) and

– the wavelet ψ(x) (father wavelet).

Sinc approximation method (cf. Lect. 2) can be inspected

within the wavelet concept: Sinc MRA, Sinc wavelet.

It is instructive to compare the Sinc and Haar MRA.


A wavelet ψ(x) is a function of zero average∫R

ψ(x)dx = 0.

Using dilated and translated versions of ψ defined by

ψu,s(x) =1√sψ(

x − u

s),

one can apply the continious wavelet transform (cf. the continious FT)

Wf(u, s) :=∫

R

f(x)ψ∗u,s(x)dx.

This provides two-dimensional representation of

one-dimensional signal, which indicates some redundancy.

Elimination of this redundancy can be done by constructing a

basis of the signal space. Hence the next step would be the

discrete wavelet transform. First example is given by the

classical Haar wavelet (Haar 1910).

Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 56

The SF ϕ(x) generates an orthogonal MRA if it satisfies the

following conditions (i)-(iv):

(i) Translates of these functions with integers

ϕk(x) = ϕ(x− k), k ∈ Z,

are linearly independent and produce the Riesz bases of the

subspace V0 ⊂ L2(R): there exist A, B > 0 s.t. for all

f =∞∑

k=−∞a[k]ϕk(x) ∈ V0, we have

A‖f‖2 ≤∞∑

k=−∞a[k]2 ≤ B‖f‖2.

In the case of orthogonal basis A = B = 1.


(ii) Dyadic dilates of these functions ϕj,k = ϕ(2−jx− k), j ∈ Z,

generate hierarchical sets of subspaces Vj. Specifically, Vj

contains all scaling functions on level j. This means that if a

function f(x) ∈ Vj, its integer translates proportional to the

scale 2j have to be contained in the same space,

f(x) ∈ Vj ⇔ f(x− 2jk) ∈ Vj , k ∈ Z.

(iii) The scaling function spaces satisfy Vj+1 ⊂ Vj, i.e., an

approximation at a resolution 2−j contains all the information

to compute an approximation at coarser resolution 2−j−1.

Moreover, if f(x) ∈ Vj, the dilated function f(x/2) has to be

contained in the coarser resolution space Vj+1

f(x) ∈ Vj ⇔ f(x/2) ∈ Vj+1, j ∈ Z.


(iv) The scaling function spaces also satisfy

(a) limj→∞

Vj =∞⋂

j=−∞Vj = 0,

(b)∞⋃

j=−∞Vj is dense in L2(R).

Specifically, (b) means

limj→−∞

Vj = Closure

⎛⎝ ∞⋃j=−∞

Vj

⎞⎠ = L2(R).

Recall that 2−j is the resolution and 2j is a scale parameter.

Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 59

The set of functions ϕj,k(x) is supposed to be orthogonal. It

means that for any k, k′ ∈ Z:∫R

ϕj,k(x)ϕj,k′(x) dx = δkk′ , j ∈ Z.

Let ϕnn∈Z be an orthogonal basis of V0. Then the family

ϕj,nn∈Z is an orthogonal basis of Vj, j ∈ Z, where

ϕj,n(x) :=1

2j/2ϕ

(x− n

2j

).

The orthogonal projection of f over Vj is given by

PVj f =∞∑

n=−∞aj [n]ϕj,n, aj [n] = 〈f, ϕj,n〉 ,

where aj [n] provide a discrete approximation at the scale 2j.

Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 60

Scaling (delation) equation.

Since 2−1/2ϕ(x/2) ∈ V1 ⊂ V0, we can decompose

1√2ϕ(x/2) =

∞∑n=−∞

h[n]ϕ(x− n) with (24)

h[n] =1√2〈ϕ(x/2), ϕ(x− n)〉 .

In signal processing, the sequence h[n] is interpreted as a

discrete filter usually called as a conjugate mirror filter

(Mallat, Meyer) or low-pass filter.

For scaling functions with compact support h[n] is the finite

sequence (cf. the Haar SF).

If ϕ(x) has infinite support h[n] might be an infinite sequence

(cf. the Sinc SF).

Scaling equation B. Khoromskij, Leipzig 2005(L3) 61

The FT of (24) implies

ϕ(2ω) =1√2h(ω)ϕ(ω) for h(ω) =

∞∑n=−∞

h[n]e−inω.

For any p ≥ 0, the previous implies

ϕ(2−p+1ω) =1√2h(2−pω)ϕ(2−pω).

Thus, by substitution, we obtain (with arbitrary P ∈ N)

ϕ(ω) =

(P∏

p=1

h(2−pω)√2

)ϕ(2−P ω) =

( ∞∏p=1

h(2−pω)√2

)ϕ(0) (25)

(the latter, if ϕ(ω) is continuous at ω = 0).

Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 62

Ex. 3.1. Define the Haar scaling function

ϕ(x) = χ0,1(x).

The Haar MRA corresponds to the approximation by

piecewise const. funct., cond. (i)-(iv) can be easily checked.

Clearly, ϕk is the orthogonal basis (i.e., A = B = 1).

Vj ⊂ L2(R) consists of functions which are constant for

x ∈ [n2j , (n + 1)2j) and n ∈ Z, so that Vj ⊂ Vj−1.

The approximation at a resolution 2−j is a projection on a set

of piecewise constant functions on intervals of size 2j.

The filter coefficients h[n] = 1√2〈ϕ(x/2), ϕ(x− n)〉 , are given by

h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise.

Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 63

Ex. 3.2. To approximate smooth (analytic) data one make

use of the Sinc (Shannon) scaling function

ϕ(x) = sinc(x) :=sin(πx)

πx.

Vj ⊂ L2(R) is defined as the set of functions whose FT has a

support included in [−2−jπ, 2−jπ].

Lem. 2.2 proves that ϕ(x− n)n∈Z is an orthogonal basis of

V0 (band limited functions). Moreover, it is an interpolating

basis.

The FT of f = sinc(x) is the (shifted/delated) Haar SF

f(ω) = χ−π,π(ω).

We derive from (25) for the filter coefficients

h(ω) =√

2χ−π/2,π/2(ω), ω ∈ [−π, π].

Haar and Sinc orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 64

−1 −0.5 0 0.5 1 1.5 2

−0.5

0

0.5

1

1.5


−1 −0.5 0 0.5 1 1.5 2 2.5 3

−1.5

−1

−0.5

0

0.5

1

1.5

Haar wavelet

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4

−0.2

0

0.2

0.4

0.6

0.8

1Sinc function

−10 −8 −6 −4 −2 0 2 4 6 8 10−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6Sinc wavelet

Figure 9: Haar and Sinc scaling functions/wavelets.

Wavelet spaces B. Khoromskij, Leipzig 2005(L3) 65

The wavelet spaces have the properties:

(v) There is a wavelet function ψ(x) s.t. its integer translates

ψk(x) = ψ(x− k), and dyadic dilates ψj,k = ψ(2−jx− k), form

subspaces Wj which are complementary to Vj in Vj−1:

Vj−1 = Vj ⊕Wj , Wj⊥Vj . (26)

(vi) From the above relations it follows that L2(R) can be

decomposed into the approximation space Vj0and the sum of

the detail spaces Wj of higher resolutions j ≤ j0:

L2(R) = Vj0 ⊕j0⊕

j=−∞Wj =

∞⊕j=−∞

Wj , (27)

where j0 ∈ Z is a chosen level of resolution.

Orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 66

(26) means that the orthogonal projection of f on Vj−1 is a

sum of orthogonal projections on Vj and Wj, hence a “detail”

space Wj is the orthogonal complement of Vj in Vj−1:

PVj−1f = PVj f + PWj f.

PWj f gives the “details” of f that appear at the scale 2j−1

but which disappear at the coarser scale 2j.

Thm. 3.1. (Mallat, Meyer) Let ψ be the function whose FT is

ψ(2ω) =1√2e−iωh∗(ω + π)ϕ(ω),

where ϕ is the SF and h is the corresponding conjugate

mirror filter. Let us denote ψj,k(x) := 1√2j

ψ(

x−2jk2j

). For any

scale 2j, ψj,kk∈Z is an orthogonal basis of Wj. For all scales

ψj,k(j,k)∈Z2 is an orthogonal basis of L2(R).

High-pass filters B. Khoromskij, Leipzig 2005(L3) 67

Since ψ(x/2) ∈ W1 ⊂ V0, it can be decomposed in an

orthogonal basis of V0:

1√2ψ(x/2) =

∞∑n=−∞

g[n]ϕ(x− n) (28)

with g[n] = 1√2〈ψ(x/2), ϕ(x− n)〉. In (28) ϕ serves as a kind of

“potential” for generating ψ.

The FT of (28) with Thm. 3.1 yields

ψ(2ω) =1√2g(ω)ϕ(ω), i.e., g(ω) = e−iωh∗(ω + π).

Calculating the inverse FT of above relation leads to

g[n] = (−1)1−nh[1− n].

This is the so-called mirror filter (or high-pass filter) which is

important for the FWT algorithm.

Discrete wavelet transform B. Khoromskij, Leipzig 2005(L3) 68

All in all, the properties (i)-(vi) with Thm. 3.1 mean that any

function f ∈ L2(R) can be represented as a sum of linear

combinations of the scaling functions ϕj0 at a chosen

resolution j = j0 and the wavelet functions ψj at all finer

resolutions j ≤ j0:

f(x) =∞∑

k=−∞aj0 [k]ϕj0,k(x) +

j0∑j=−∞

∞∑k=−∞

dj [k]ψj,k(x). (29)

Here the coefficients aj0 [k] and dj [k] are obtained as the

scalar products with the appropriate basis functions,

aj [k] =∫

R

f(x)ϕj,k(x) dx, dj [k] =∫

R

f(x)ψj,k(x) dx. (30)

Eq. (29), (30) define the Discrete Wavelet Transform

(DWT).

Vanishing moments B. Khoromskij, Leipzig 2005(L3) 69

The wavelet ψ has p vanishing moments if∫R

xkψ(x)dx = 0 for 0 ≤ k ≤ p.

Now ψ is orthogonal to any polynomial of degree p− 1. If f is

locally Ck, then for k < p wavelets are orthogonal to the local

polynomial approximand (say, Taylor) yielding small amplitude

coefficients at fine scales.

ψ has p vanishing moments iff both ψ and h have vanishing

derivatives up to order p− 1 at ω = 0 and at ω = π,

respectively.

If ψ has p vanishing moments then its support is at least of

size 2p− 1 (Daubechies).

Haar wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 70

Ex. 3.1′. Recall the filter coefficients for the Haar scaling

function: h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise. The

Haar wavelet is thus given by

1√2ψ(

x

2) =

∞∑n=−∞

(−1)1−nh[1− n]ϕ(x− n) =1√2(ϕ(x− 1)− ϕ(x)).

Specifically, ψ(x) = −1, if 0 ≤ x < 1/2, ψ(x) = 1, if 1/2 ≤ x < 1and ψ(x) = 0 otherwise.

Clearly, this is an orthogonal wavelet providing (v)-(vi).

The Haar wavelet has the shotest support among all

orthogonal wavelets (p = 1). It can be applied only to

approximating non-smooth functions (signals).

However, it is a good example for educational purposes.

Sinc wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 71

Ex. 3.2′. The Sinc wavelet is constructed from the Sinc

MRA with ϕ(x) = sinc(x), which approximates functions by

their restriction to low frequency intervals. Thm. 3.1 yields

ψ(ω) =1√2e−iω/2h∗(ω/2 + π)ϕ(ω/2), ω ∈ [−π, π]

with ϕ(ω) = χ−π,π(ω), h(ω) =√

2χ−π/2,π/2(ω). This implies

ψ(ω) = e−iω/2 if ω ∈ [−2π,−π] ∪ [π, 2π]

and ψ(ω) = 0 otherwise. Hence

ψ(x) = ϕ(2x− 1)− ϕ(x− 1/2).

This is the analytic (C∞) wavelet with the decay O(|x|−1) as

|x| → ∞. It can be shown that ψ has an infinite number of

vanishing moments ???

Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 72

Because of Vj = Vj+1 ⊕Wj+1 a function f ∈ Vj may be

represented either in the scaling function basis

f =∞∑

k=−∞〈f, ϕj,k〉ϕj,k =

∞∑k=−∞

aj [k]ϕj,k

or with respect to orthogonal bases of Vj+1 and Wj+1

f =∞∑

k=−∞aj+1[k]ϕj+1,k+

∞∑k=−∞

dj+1[k]ϕj+1,k, dj+1[k] = 〈f, ψj+1,k〉 .

Thm. 3.2. (Mallat) At the decomposition

aj+1[n] =∞∑

k=−∞h[k − 2n]aj [k]; dj+1[n] =

∞∑k=−∞

g[k − 2n]aj [k].

At the reconstruction

aj [n] =∞∑

k=−∞h[n− 2k]aj+1[k] +

∞∑k=−∞

g[n− 2k]dj+1[k].


f0 ∈ V0 is split into f1 ∈ V1 at the coarser scale 21 and “detail”

d1 ∈W1 at scale 20. Iterating this process gives a sequence

f0, f1, f2, ... of more and more blurred versions and the details

d1, d2, d3, .... After J iterations the original signal can be

exactly (orthogonality) reconstructed f0 = fJ + d1 + ... + dJ .

The decomposition scheme

a0 → a1 → a2 → · · · → aJ

d1 d2 · · · dJ .

The reconstruction scheme

a0 ← a1 ← a2 ← · · · ← aJ

d1 d2 · · · dJ .


Given f = f0 ∈ V0, both the decomposition and reconstruction

are nothing but representations w.r.t. changes of basis

functions

V0 → VJ ⊕W1 ⊕ · · · ⊕WJ .

Iterating the decomposition yields for given coefficients

a0 = [k], the coefficients D[l, k] := (aJ [k], dJ [k], dJ−1[k], ..., d1[k])

The translation a0[k] → D[l, k] is called the discrete

wavelet transform (DFT). The backward transform is

provided by the reconstruction D[l, k] → a0[k].In practice the signal a0 is 2J periodic hence we have N = 2J

coefficients. Then the DFT requires only O(mN) operations,

where m is the filter lenght.

Numerics I: Denoising B. Khoromskij, Leipzig 2005(L3) 75

We perform denoising of randomly perturbed Mexican hat

function. It can be rather accurately reconstructed with only

few wavelet coefficients (say, with ∼ 10 among N = 2048) up

to a threshold proportional to the random amplitude (about

10% of a signal ampl.).

−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5

0

0.5

1Mexican hat scaling function

−4 −2 0 2 4−1

−0.5

0

0.5

1

1.5

2

Figure 10: Denoising by Daubechies (4) wavelets for “Mexican hat”.

Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 76

We approximate the Mexican hat with α = 0.5 by Daubechies

(m) wavelets with the filter length m (next table).

Recall m = 2p− 1.

kW (ε) is the number of (nonzero) wavelet coefficients which

exceed the given threshold ε.

kW (ε) for Daubechies (m) wavelets approximating Mexican hat

m\ε 0.1 0.01 0.001 1.10-4 1.10-5 1.10-6 1.10-7 1.10-8

10 19 31 47 75 105 175 273 388

20 17 24 29 43 53 60 93 121

40 24 24 29 31 31 46 57 105

Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 77

Next table gives the Sinc-interpolation error vs. Sinc-wavelet

compressed representation, where the total number of

wavelet coefficients kW (ε) corresponds to the threshold ε.

The compression is not efficient since there are no “details” !

In fact, all the important coefficients are observed at

high-level resolution.

Sinc interpolation.

M 4 9 16 25 36 49 64 81 100

ε 0.005 0.003 0.001 2.10-4 4.10-5 4.10-7 4.10-8 6.10-9 9.10-10

Sinc-wavelets for Mexican hat.

mF |N 16|32 36|64 36|64 50|128 70|128 100|256 128|256 160|256 –

ε 0.01 0.005 0.002 4.10-4 6.10-5 6.10-6 6.10-7 6.10-6 –

kW (ε) 20 29 42 54 85 131 179 116 –



2. I. Daubechies: Ten Lectures on Wavelets. SIAM, Philadelphia, 1992.

3. G. Strang, T. Nguyen: Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997.

4. R. Schneider: Wavelets and Signal Processing. Lecture Notes. Chemnitz, Sommersemester 2000.



Lect. 4. Separable approximation to multi-variate functions B. Khoromskij, Leipzig 2005 79

Analytic methods of Kronecker-product representation to

non-local operators and related tensors are mainly based on

separable approximation to multi-variate functions in Rd.

I. Separation methods by tensor-product interpolation

• Polynomial interpolation

• Sinc interpolation

• Hyperbolic-cross approximation (Wavelet/FEM).

II. Approximating by exponential/trigonometric sums

• Sinc quadratures

• Exponential sums∑

ake−bkx

• Trigonometric sums∑

[ak sin(bkx) + a′k cos(b′kx)].

Item (II) applies to translation invariant functions or to

functions depending on the sum of spatial variables.

Tensor-product interpolation B. Khoromskij, Leipzig 2005(L4) 80

Approximation problem: Given a multi-variate func.

F : Ωd → R, (d ≥ 2), approximate it by a separable expansion

Fr(ζ1, ..., ζd) :=r∑

k=1

ckΦ1k(ζ1) · · ·Φd

k(ζd) ≈ F, Ω ∈ R, R+, (a, b),

where the set of univariate funct. Φk(·) : Ω→ R, 1 ≤ ≤ d,

1 ≤ k ≤ r, may be fixed or chosen adaptively, ck ∈ R.

For numerical efficiency the so-called separation rank r ∈ N

should be reasonably small.

Introduce the tensor-product interpolant IM with respect to

the first d− 1 variables (e.g., polynomial or Sinc interpolant)

IMF := I1M × · · · × Id−1

M F,

where IMF , 1 ≤ ≤ d− 1, denotes the univariate interpolation

applied to the variable ζ ∈ I = Ω, where I is the -th factor

in Ωd = I1 × ...× Id.

Best polynomial approximation B. Khoromskij, Leipzig 2005(L4) 81

In the complex plane C, we introduce the circular ring

Rρ := z ∈ C : 1/ρ < |z| < ρ with ρ > 1.

Thm. 4.1. (Laurent’s Theorem). Let f : C → C be analytic

and bounded by M > 0 in Rρ with ρ > 1, (in the following we

say f ∈ Aρ), and set

Cn :=12π

∫ 2π

0

f(eiθ)einθdθ, n = 0, ±1, ±2, . . . . (31)

Then for all z ∈ Rρ, f(z) =∞∑

n=−∞Cnzn, where the series

converges to f(z) for all z ∈ Rρ. Moreover |Cn| ≤ M/ρ|n|, and

for all θ ∈ [0, 2π] and arbitrary integer m,∣∣∣∣∣f(eiθ)−m∑

n=−m

Cneinθ

∣∣∣∣∣ ≤ 2M

ρ− 1ρ−m. (32)

Chebyshev polynomials B. Khoromskij, Leipzig 2005(L4) 82

By Eρ = Eρ(B) with the reference interval B := [−1, 1], we

denote the Bernstein’s regularity ellipse (with foci at w = ±1and the sum of semi-axes equal to ρ > 1),

Eρ := w ∈ C : |w − 1|+ |w + 1| ≤ ρ + ρ−1.Let Tn(w), n = 0, 1, 2, . . . , be the Chebyshev polynomials, which

may be defined recursively by

T0(w) = 1, T1(w) = w,

Tn+1(w) = 2wTn(w)− Tn−1(w), n = 1, 2, . . . .

Note that Tn(x) = cos(n arccos x), x ∈ [−1, 1], which implies

Tn(1) = 1, Tn(−1) = (−1)n.

It can be seen that with w = 12 (z + 1

z ), there holds

Tn(w) =12(zn + z−n). (33)

Best polynomial approximation by Chebyshev series B. Khoromskij, Leipzig 2005(L4) 83

Thm. 4.2. Let F be analytic and bounded by M in Eρ (with

ρ > 1). Then the expansion

F (w) = C0 + 2∞∑

n=1

CnTn(w), (34)

holds for all w ∈ Eρ (Chebyshev series), and with

Cn =1π

∫ 1

−1

F (w)Tn(w)√1− w2

dw.

Moreover, |Cn| ≤ M/ρn and for w ∈ B and for m = 1, 2, 3, . . . ,

|F (w)− C0 − 2m∑

n=1

CnTn(w)| ≤ 2M

ρ− 1ρ−m, w ∈ B. (35)

Proof of the main theorem B. Khoromskij, Leipzig 2005(L4) 84

Let Aρ,s := f ∈ Aρ : C−n = Cn, then each f ∈ Aρ,s has a representation

(cf. Thm. 4.1)

f(z) = C0 +

∞Xn=1

Cn(zn + z−n), z ∈ Rρ. (36)

Furthermore, from (36) it follows that f(1/z) = f(z), z ∈ Rρ.

Let us apply the mapping w = 12(z + 1

z), which satisfies w(1/z) = w(z). It is

a conformal transform of ξ ∈ Rρ : |ξ| > 1 onto Eρ as well as of

ξ ∈ Rρ : |ξ| < 1 onto Eρ (but not Rρ onto Eρ!). It provides a one to one

correspondence of functions F that are analytic and bounded by M in Eρ

with functions f in Aρ,s.

Since under this mapping we have (33), it follows that if f defined by

(36) is in Aρ,s, then the corresponding transformed function

F (w) = f(z(w)) that is analytic and bounded by M in Eρ is given by (34).

Now the result follows directly due to Thm. 4.1.

Lagrangian polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 85

Let PN (B) be the set of polynomials of degree ≤ N on B.

Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F

with respect to the Chebyshev-Gauss-Lobatto (CGL) nodes

ξj = cosπj

N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,

where ξj are zeroes of the polynomials (1− x2)T ′N (x), x ∈ B.

In turn, the Lagrangian interpolant IN of F has the form

INF :=N∑

j=0

F (ξj)lj(x) ∈ PN (B), (37)

i.e. IN (ξj) = F (ξj), j = 0, . . . , N, with lj(x) is the set of

interpolation polynomials

lj :=N∏

k=0,j =k

x− ξk

ξj − ξk∈ PN (B).

Clearly, lj(ξj) = 1 and lj(ξk) = 0 ∀k = j.

Lebesque constant for Chebyshev interpolation B. Khoromskij, Leipzig 2005(L4) 86

Given the set ξjNj=0 of interpolation points on [−1, 1] and the

associated Lagrangian interpolation operator IN . The

standard approximation theory for polynomial interpolation

includes the so-called Lebesque constant ΛN ∈ R>1 defined by

‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (38)

In the case of Chebyshev interpolation it can be shown that

ΛN grows at most logarithmically in N , more precisely,

ΛN ≤ 2π

log N + 1.

The interpolation points which produce the smallest value Λ∗N

of all ΛN are not known, but Bernstein ’54 proves that

Λ∗N =

2π

log N + O(1).

Error bound for polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 87

Thm. 4.3 Let u ∈ C∞[−1, 1] have an analytic extension to Eρ

bounded by M > 0 in Eρ (with ρ > 1). Then we have

‖u− INu‖∞,I ≤ (1 + ΛN )2M

ρ− 1ρ−N , N ∈ N≥1. (39)

Proof. Due to (35) one obtains for the best polynomial

approximations to u on [−1, 1],

minv∈PN

‖u− v‖∞,B ≤ 2M

ρ− 1ρ−N . (40)

Note that the interpolation operator IN is a projection, that

is, for all v ∈ PN we have INv = v. Then applying the triangle

inequality with v ∈ PN ,

‖u− INu‖∞,B = ‖u− v − IN (u− v)‖∞,B ≤ (1 + ΛN )‖u− v‖∞,B

completes the proof.

Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 88

Consider a multi-variate funct. f = f(x1, . . . , xd) : Rd → R,

d ≥ 2, defined on a box B1 ×B2 × . . .×Bd with Bk = [ak, bk].We set B := Bk = [−1, 1], k = 1, . . . , d, thus f : Bd → R.

The corresponding N-th order tensor product interpolation

operator is defined by

INf = I1N × I2

N × . . .× IdNf ∈ PN [Bd],

where IkNf denotes the interpolation polynomial with respect

to xk, k = 1, . . . , d, at nodes ξk ∈ Bk.

We choose the CGL nodes, hence the interpolation points

ξα ∈ Bd, α = (i1, . . . , id) ∈ Nd0, are obtained by the Cartesian

product of 1D-nodes,

ξα :=(

cosπi1N

, . . . , cosπidN

).


Again, IN is the projection map,

IN : C(Bd) → PN := p1 × . . .× pd : pi ∈ PN , i = 1, . . . d

that implies the following estimate to the multivariate

counterpart of the Lebesque constant (stability of IN in the

multidimensional case; cf. (38))

‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd). (41)

To derive an analogue of Thm. 4.3, we introduce the product

domain

E(j)ρ := B1 × . . .×Bj−1 × Eρ(Ij)×Bj+1 × . . .×Bd,

and denote by X−j the (d− 1)-dimensional subset of variables

x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj, j = 1, ..., d.


Assump. 4.1. Given f ∈ C∞(Bd), assume there is ρ > 1 such

that for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an

analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C with

respect to xj bounded in Eρ(Bj) by certain Mj > 0,independent on ξ.

Thm. 4.4. For f ∈ C∞(Bd), let Assump. 4.1 be satisfied.

Then the interpolation error can be estimated by

‖f − INf‖∞,Bd ≤ ΛdN

2Mρ(f)ρ− 1

ρ−N , (42)

where ΛN is the Lebesque constant for the one-dimensional

interpolant IkN , and

Mρ(f) := max1≤j≤d

maxx∈E(j)

ρ

|fj(x, ξ)|.


Proof. The multiple use of (38), (39) and the triangle

inequality lead to

|f − INf | ≤ |f − I1Nf |+ |I1

N (f − I2N × . . .× Id

Nf)|≤ |f − I1

Nf |+ |I1N (f − I2

Nf)|++ |I1

NI2N (f − I3

Nf)|+ . . . + |I1N × . . .× Id−1

N (f − IdNf)|

≤ [(1 + ΛN ) maxx∈E(1)

ρ

|f1(x, ξ)|+ ΛN (1 + ΛN ) maxx∈E(2)

ρ

|f2(x, ξ)|

+ . . . + Λd−1N (1 + ΛN ) max

x∈E(d)ρ

|fd(x, ξ)|] 2ρ− 1

ρ−N

≤ (1 + ΛN )(ΛdN − 1)

ΛN − 12Mρ

ρ− 1ρ−N .

Hence (42) follows since for x > 1 we have (1+x)(xn−1)x−1 ≤ xn.

Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 92

Now consider the separable approximation in the case Ω = R.

Extension to the case Ω = R+ or Ω = (a, b) is similar to those

for the univariate Sinc approximation.

Introduce the tensor-product Sinc interpolant CM with

respect to the first d− 1 variables,

CMf := C1M × ...× Cd−1

M f,

where CMf = C

M (f, h), 1 ≤ ≤ d, denotes the univariate Sinc

interpolation applied to the variable ζ ∈ I = R, where I is

the -th factor in Rd = I1 × ...× Id.

Ex. 4.1. Examples of approximated function

f(x) = |x|α, f(x) =exp(κ|x|)|x| , f(x, y) = sinc(|x||y|)

with x, y ∈ Rd.


The estimation of the error f −CMf requires the Lebesgue

constant ΛM ≥ 1 defined by

||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (43)

Stenger ’93 proves the inequality

ΛM = maxx∈R

M∑k=−M

|Sk,h(x)| ≤ 2π

(3 + log(M)). (44)

Note that we also have (orthogonality)

∞∑k=−∞

|Sk,h(x)|2 = 1 (x ∈ R) ,

which indicates ΛM = 1 with respect to the L2-norm.


For each fixed ∈ 1, . . . , d − 1, choose ζ ∈ I and define the remaining

parameter set by Y := I1 × ... × I−1 × I+1 × ... × Id ∈ Rd−1. This

introduces the univariate (parameter dependent) function F(·, y) : I → R,

which is the restriction of F onto the interval I with y ∈ Y.

Thm. 4.5. (Hackbusch, Khoromskij) For each = 1, ..., d− 1 we

assume that for any fixed y ∈ Y, F(·, y) satisfies

(a) F(·, y) ∈ H1(Dδ) with N(F, Dδ) ≤ N <∞ uniformly in y;

(b) F(·, y) has hyper-exponential decay with a = 1, C, b > 0 for

all y ∈ Y.

Then, for all y ∈ Y, the optimal choice h := log MM yields

|F (ζ, y)−CM (F, h)(ζ)| ≤ C

2πδΛd−2

M max=1,...,d−1

N e−πδMlog M (45)

with ΛM defined by (44).

Proof of the Sinc-interpolation error B. Khoromskij, Leipzig 2005(L4) 95

The multiple use of (43) and the triangle inequality lead to

|f −CMf | ≤ |f − C1Mf |+ |C1

M (f − C2M . . . Cd

Mf)|≤ |f − C1

Nf |+ |C1M (f − C2

Mf)|++ |C1

MC2M (f − C3

Mf)|+ . . . + |C1M . . . Cd−2

M (f − Cd−1M f)|

≤ [N1 + ΛMN2 + . . . + Λd−2M Nd−1]

12πδ

e−πδMlog M

≤ 1 + ΛM + ... + Λd−2M

2πδmax

=1,...,d−1N e

−πδMlog M .

Note thatΛd−1

M − 1ΛM − 1

≈ Λd−2M , ΛM →∞,

hence (45) follows.

Separation by integration B. Khoromskij, Leipzig 2005(L4) 96

If a function of ρ =∑d

i=1 xi can be written as the integral

ϕ(ρ) =∫

Ω

eρF (t)G(t)dt

over some Ω ⊂ R (say, Ω = R) and if a quadrature can be

applied, one obtains the separable approximation

ϕ(x1 + . . . + xd) ≈∑

ν

ωνeρF (xν)G(xν) =∑

νcν

d∏i=1

exiF (xν).

with cν = ωνG(xν). For this purpose we apply the Sinc

quadratures (cf. Lect. 2, 6).

Typical examples of such a function ϕ(ρ) are the following:

f(x) = 1/|x− y|, f(x) =1

x1 + ... + xd, xi ≥ 0

with x, y ∈ Rd.

Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 97

Besides, the best approximation of ϕ(ρ) by exponential sums,

ϕ(ρ) ≈r∑

ν=1

ωνe−tνρ (46)

(e.g., with respect to the L∞- or L2-norm), leads to an

approximation whose separation rank r is expected to be

close to optimal.

For non-monotone functions ϕ(ρ) the approximations by

trigonometric sums may do a job,

ϕ(ρ) ≈r∑

ν=1

cνe−iωνρ. (47)

Rem. 4.1. The approximation by exponential/trigonometric

sums applies to the matrix-valued function ϕ(A) as well with

A =∑d

i=1 Ai and pairwise commutable matrices Ai.


For n ≥ 1, consider the set E0n of exponential sums:

E0n :=

u =

n∑ν=1

ωνe−tνx : ων , tν ∈ R

.

Now one can address the problem of finding the best

approximation to f over the set E0n characterised by the best

approximation error d(f, E0n) := infv∈E0

n‖f − v‖∞.

The existence of an approximation by exponentials is based

on the fundamental Big Bernstein Theorem: If f is

completely monotone for x ≥ 0, i.e.,

(−1)nf (n)(x) ≥ 0 for all n ≥ 0, x ≥ 0,

then it is the restriction of the Laplace transform of a

measure to R+:

f(z) =∫

R+

e−tzdµ(t).


We recall the complete elliptic integral of the first kind with

modulus κ,

K(κ) =∫ 1

0

dt√(1− t2)(1− κ2t2)

(0 < κ < 1)

and define K′(κ) := K(κ′) by κ2 + (κ′)2 = 1.

Thm. 4.6. (Braess). Assume that f is completely monotone

and analytic for e z > 0, and let 0 < a < b. Then for the

uniform approximation on the interval [a, b],

limn→∞ d(f, E0

n)1/n ≤ 1ω2

, where ω = expπK(κ)K′(κ)

with κ =a

b.

In the cases f = ϕ(ρ) below, we have κ = 1/R for R >> 1.


Now applying the asymptotics

K(κ′) = ln 4κ + C1κ + ... for κ′ → 1,

K(κ) = π2 1 + 1

4κ2 + C1κ4 + ... for κ → 0,

of the complete elliptic integrals, we obtain

1ω2

= exp(−2πK(κ)

K(κ′)

)≈ exp

(− π2

ln(4R)

)≈ 1− π2

ln(4R).

The latter expression indicates that the number n of different

terms to achieve a tolerance ε is asymptotically

n ≈ | log ε|| log ω−2| ≈

| log ε| ln (4R)π2

.

This result shows the same asymptotical convergence in n as

that for the Sinc approximation (cf. Lect. 2).

Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 101

The best approximation to f(ρ), ρ ∈ [1, R] with respect to a

weighted L2-norm is reduced to the minimisation of an

explicitly given differentiable functional.

Given R > 1, N ≥ 1, find the 2N parameters

α1, ω1, ..., αN , ωN ∈ R, such that

FW (R; α1, ω1, ..., αN , ωN ) :=∫ R

1

W (x)(f(x)−

N∑i=1

ωie−αix

)2

dx = min .

In the important particular case of f(x) = 1/x and W (x) = 1,the integral can be calculated in a closed form

F1(R; α1, ω1, ..., αN , ωN ) = 1 − 1

R− 2

NXi=1

ωi [Ei(−αi) − Ei(−αiR)]

+1

2

NXi=1

ω2i

αi

he−2αi − e−2αiR

i+ 2

X1≤i<j≤N

ωiωj

αi + αj

he−(αi+αj) − e−(αi+αj)R

i

with the integral exponential function Ei(x) = −∫ x

−∞et

t dt.


In the special case R = ∞, the expression for F1(∞; . . .) even

simplifies.

Gradient or Newton type methods with a proper choice of the

initial guess can be used to obtain the minimiser of F1.

In general, the integral may be approximated by certain

quadrature.

Optimisation with respect to the maximum norm leads to the

nonlinear minimisation problem

infv∈E0n‖f − v‖L∞[1,R]

involving 2n parameters ων , tνnν=1. The numerical scheme

follows the Remez algorithm.


Best approximation to 1/√

ρ in L∞-norm is discussed in D.

Braess and W. Hackbusch, a complete list of numerical data

can be found in www.mis.mpg.de/scicomp/EXP SUM/1 x/tabelle.

All calculations using the weighted L2([1, R])-norm have been

performed by the MATLAB subroutine FMINS based on the

global minimisation by direct search.

best approximation to 1/√

ρ in weighted L2([1, R])-norm.

R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√

ρ

r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3

r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3

r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4

r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5

Why using trigonometric sums B. Khoromskij, Leipzig 2005(L4) 104

Prop. 4.7. (Beylkin, Mohlenkamp). Let d ≥ 2. The trigonometric

identity

sin

⎛⎝ d∑j=1

xj

⎞⎠ =d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)sin(αk − αj)

(48)

holds for all choices of αk ∈ R, s.t. sin(αk−αj) = 0 for all j = k.

In the case d = 2, the assertion (128) is easy to check. For

d > 2 it can be proven by induction (nontrivial exercise!).

Expansion (128) shows the lack of uniqueness (ambiguity) of

the best rank d Kronecker representation. Hence, the

convergence of algebraic separable approximations might be

non-robust.

Approximation by trigonometric sums can be designed either

using the quadrature method (cf. Lect. 7) and the direct

trigonometric interpolation or by nonlinear optimisation.

Literature to Lect. 4 B. Khoromskij, Leipzig 2005(L4) 105


of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. D. Braess and W. Hackbusch: Approximation of 1x

by exponential sums in [1, ∞). To appear in IMA JNA.

3. G. Beylkin and M.J. Mohlenkamp: Numerical operator calculus in higher dimension.

Proc. Natl. Acad. Sci. USA, 99 (2002), 10246-10251.

4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,

Leipzig 2003.



Lect. 5. Data-Sparse Matrix/Tensor Formats. B. Khoromskij, Leipzig 2005 106

We focus on combination of hierarchical and tensor-product

formats:

(i) H-matrix format with standard admissibility (applies on

graded meshes)

(ii) coarsening of the hierarchical format using weaker

admissibility criteria;

(iii) blended H-matrix approximation (combine with Toeplitz,

circulant, Hankel);

(iv) wire-basket approximation for L-harmonic kernels;

(v) fully separated block representation (O(N) complexity);

(vi) uniform (U-) and H2-matrices;

(vii) Kronecker tensor-product format;

(viii) hierarchical Kronecker tensor-product representation.

Hierarchical matrices B. Khoromskij, Leipzig 2005(L5) 107

Hierarchical (H-) matrices

MH,k(TI×I ,P), the class of data-sparse H-matrices - Hackbusch ’99

Further developments - Hackbusch, BNK, Grasedyck, Bebendorf, Borm.

H-matrix technique is a direct descendant of panel clustering,

fast multipole and mosaic-skeleton approximation.

In addition, it allows data-sparse matrix-matrix operations.

Main features:

• Matrix arithmetic of O(N logq N) - complexity,

N := |I| - cardinality of the index set I.

• Accurate approximation to general class of nonlocal

(integral) operators and operator-valued functions F(L)including the elliptic operator inverse L−1, e−tL, sign(L).

• Rigorous theoretical analysis.

H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 108

H-matrix arithmetic is completely recursive and it is based on

the hierarchical data organisation → efficient implementation.

The H-matrix format is well suited for representation of

integral (nonlocal) operators in FEM/BEM applications.

Thm. 5.1. (complexity of the H-matrix arithmetic)

Let k ∈ N denote the block-wise rank and TI×I be an H-tree

with depth L > 1.Then the arithmetic of N ×N-matrices belonging to

MH,k(TI×I ,P) has the complexity

NH,store ≤ 2CspkLN, NH·v ≤ 4CspkLN,

NH⊕H ≤ Cspk2N(C1L + C2k),

NHH ≤ C0C2spk2LN maxk, L, N

gInv(H)≤ NHH,

where Csp is the sparsity constant.

H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 109

Hierarchical Partitionings P1/2(I × I) and PW(I × I)

Figure 11: Standard- (left) and Weak-admissible H-partitionings for d =

1.

General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 110

Def. 5.1. A q-th order tensor is given by

A := [ai1...iq ] ∈ RId

, d = pq, p, q, n ∈ N,

where Id = I1 ⊗ ...⊗ Iq, I = I1 ⊗ ...⊗ I

p with multi-indices

i = (i,1, ..., i,p) ∈ I, = 1, ..., q, where i,m ∈ 1, ..., n, for

m = 1, ..., p (p is supposed to be small).

The inner product of two tensors A and B is defined as

(A, B) :=∑

(i1...iq)∈Id

ai1...iq bi1...iq ,

while the norm of A is given by ‖A‖F :=√

(A, A).

Ex. 5.1. Let A = a1 ⊗ a2, B = b1 ⊗ b2, ai, bi ∈ Rn (q = 2, p = 1).Then

(A, B) = (a1, b1)(a2, b2), ||A||F =√

(a1, a1)(a2, a2),

where the latter corresponds to the Frobenius norm.

General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 111

Tensor A of the form

A = V 1 ⊗ · · · ⊗ V q, V ∈ Rnp

is called the Kronecker product or decomposed tensor.

Probl. 1. Approximate A by a q-th order tensor Ar - a sum

of Kronecker products (with possibly small Kronecker rank r)

Ar =r∑

k=1

ckV 1k ⊗ · · · ⊗ V q

k ≈ A, ck ∈ R, (49)

where the low dimensional components V k ∈ Rnp

can be

further represented in a structured data-sparse form (say, in

the wavelet based, circulant or Toeplitz format).

Hence, Ar can be represented with the low cost qrnp (at

most) compared with npq.

Tensor-product format (49) has plenty of other merits.

Excursus to the HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 112

Probl. 2. Given A ∈ CN×N with N = nd (here q = d, p = 2), we

approximate A by a matrix Ar of the so-called HKT-format

Ar =r∑

k=1

skV 1k ⊗ · · · ⊗ V d

k ≈ A, sk ∈ R, V k ∈ R

n×n, (50)

where V k ∈MH,k (Alternative: wavelet representation to V

k ).

Given a tol. ε > 0, the Kronecker rank r = r(ε) can be

estimated

r =

⎧⎨⎩O(| log h|d−1| log ε|d−1), (Case a),

O(| log h| · | log ε|), (Case b).

Case a. IOs with asympt. smooth/analytic kernels g(x, y).

Case b. A class of analytic matrix-valued functions F(A);IOs with “off diagonal analytic” translation-invariant kernels.

HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 113

Case (a): Analytic approximation methods are based on a

separable representation to certain multi-variate function

F : Rd → R, d ≥ 2

(say, holomorphic function with isolated singularities):

Fr(ζ1, ..., ζd) :=r∑

i=1

siΦ1i (ζ1) · · ·Φd

i (ζd) ≈ F, (51)

Φi(ζ) is fixed or chosen adaptively.

Case (b): Making use of the r-term Sinc-quadratures for the

Laplace integral representation of F(A) or F (r):

F (r) =∫

R

f(t)e−trdt, F (r) =∫

R

f(t)e−tr2dt

with possible substitution A → r.

Related references B. Khoromskij, Leipzig 2005(L5) 114

H-, KT-, HKT- constructive approximations:

H-Matrix techniques - Group by Hackbusch at MPI MIS, Leipzig

Sinc interpolation/quadratures for analytic funct. with point singularities -

(Kotelnikov ’33; Whittaker ’35; Shannon ’49) Stenger; M. Sugihara; Hackbusch, BNK ’02-’05

Appr. by exponential sums (classical rational approximations, Remes

algorithm, minimization) - Braess, Hackbusch, BNK ’04-’05

IOs in the HKT format - Hackbusch, BNK, Tyrtyshnikov ’03; BNK ’05

HKT approx. to matrix-valued functions - Gavrilyuk, Hackbusch, BNK ’03;

Hackbusch, BNK ’04-’05

Kronecker tensor-product representation - Van Loan, Pitsianis ’93; Golub ’98;

Beylkin, Mohlenkamp ’02; Hackbusch, BNK, Tyrtyshnikov ’03; Grasedyck ’03; ...

Tensor-product + wavelets + sparse grids:

H-matrices/wavelets in density matrix calculation - Flad, Hackbusch, Kolb, Luo,

Schneider ’03-’05; Hutter, Sauter, ...

Applications in FEM/BEM, quantum chemistry, finacies, data mining -

Groups by W. Dahmen, M. Griebel, R. Schneider, C. Schwab, H. Yserentant

Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 115

The Kronecker product (KP) operation A⊗B of two matrices

A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh× ng matrix that has the

block-representation [aijB] (corresponds to p = 2).

1. Let C ∈ Rs×t, then the KP satisfies the associative law,

(A⊗B)⊗ C = A⊗ (B ⊗ C),

and therefore we do not use brackets in (50). The matrix

A⊗B ⊗ C := (A⊗B)⊗ C has (mhs) rows and (ngt) columns.

2. Let C ∈ Rn×r and D ∈ Rg×s, then the standard

matrix-matrix product in the Kronecker format takes the form

(A⊗B)(C ⊗D) = (AC)⊗ (BD).

The corresponding extension to q-th order tensors is

(A1 ⊗ ...⊗Aq)(B1 ⊗ ...⊗Bq) = (A1B1)⊗ ...⊗ (AqBq).

In the case p > 2 we have similar KP operations.

Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 116

3. We have the distributive law

(A + B)⊗ (C + D) = A⊗ C + A⊗D + B ⊗ C + B ⊗D.

4. Rank relation: rank(A⊗B) = rank(A)rank(B).

Ex. 5.1. In general A⊗B = B ⊗A. What is the condition on

A and B that provides A⊗B = B ⊗A ?

Invariance of some matrix properties:

(1) If A and B are diagonal then A⊗B is also diagonal, and

conversely (if A⊗B = 0).

(2) The upper/lower triangular matrices are preserved.

(3) Let A and B be Hermitian/normal matrices (A∗ = A resp.

A−1 = A). Then A⊗B is of the corresponding type.

(4) Let A ∈ Rn×n and B ∈ Rm×m. Then

det(A⊗B) = (detA)n(detB)m.

Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 117

Thm. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be invertible

matrices. Then

(A⊗B)−1 = A−1 ⊗B−1.

Proof. Since det(A) = 0, det(B) = 0 and the above property

(4) we have det(A⊗B) = 0. Thus (A⊗B)−1 exists and

(A−1 ⊗B−1)(A⊗B) = (A−1A)⊗ (B−1B) = In2 .

Lem. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be unitary matrices.

Then A⊗B is a unitary matrix.

Proof. Since A∗ = A−1, B∗ = B−1 we have

(A⊗B)∗ = A∗ ⊗B∗ = A−1 ⊗B−1 = (A⊗B)−1.


Define the commutator [A, B] := AB −BA.

Lem. 5.3. Let A ∈ Rn×n and B ∈ Rm×m. Then

[A⊗ In, Im ⊗B] = 0 ∈ Rm2×n2

.

Proof.

[A⊗ In, Im ⊗B] = (A⊗ In)(Im ⊗B)− (Im ⊗B)(A⊗ In)

= A⊗B −A⊗B = 0.

Rem. 5.1. Let A, B ∈ Rn×n, C, D ∈ Rm×m and [A, B] = 0,[C, D] = 0. Then

[A⊗ C, B ⊗D] = 0.

Proof. Apply the identity (A⊗B)(C ⊗D) = (AC)⊗ (BD).


Lem. 5.4. Let A ∈ Rn×n and B ∈ Rm×m. Then

tr(A⊗B) = tr(A)tr(B).

Proof. Since diag(aiiB) = aiidiag(B), we have

tr(A⊗B) =n∑

i=1

m∑j=1

aiibjj =n∑

i=1

aii

m∑j=1

bjj .

Thm. 5.3. Let A, B, I ∈ Rn×n. Then

exp(A⊗ I + I ⊗B) = (expA)⊗ (expB).

Proof. Since [A⊗ I, I ⊗B] = 0, we have

exp(A⊗ I + I ⊗B) = exp(A⊗ I) exp(I ⊗B).


Furthermore, since

exp(A⊗ I) =∞∑

k=0

(A⊗ I)k

k!, exp(I ⊗B) =

∞∑m=0

(I ⊗B)m

m!

the arbitrary term in exp(A⊗ I) exp(I ⊗B) is given by

1k!

1m!

(A⊗ I)k(I ⊗B)m.

Imposing

(A⊗I)k(I⊗B)m = (Ak⊗Ik)(Im⊗Bm) = (Ak⊗I)(I⊗Bm) ≡ Ak⊗Bm,

we finally arrive at

1k!

1m!

(A⊗ I)k(I ⊗B)m = (1k!

Ak)⊗ (1m!

Bm).


Thm. 5.3 can be extended to the case of many-term sum

exp(A1⊗I⊗...⊗I+I⊗A2⊗...⊗I+...+I⊗...⊗I⊗Aq) = (eA1)⊗...⊗(eAq ).

Rem. 5.2. Similar properties can be shown for other analytic

functions, e.g.,

sin(In ⊗A) = In ⊗ sin(A),

sin(A⊗ Im + In ⊗B) = sin(A)⊗ cos(B) + cos(A)⊗ sin(B),

sin(A ⊗ Im + In ⊗ B) =sin(A) ⊗ sin(B + (b − a)I)

sin(b − a)+

sin(A + (a − b)I) ⊗ sin(B))

sin(a − b)

for all values a, b such that sin(a− b) = 0. Analogously, for the

function cos.

Other simple properties:

(A⊗B)T = AT ⊗BT , (A⊗B)∗ = A∗ ⊗B∗.

Eigenvalue problem B. Khoromskij, Leipzig 2005(L5) 122

Thm. 5.4. Let A ∈ Rm×m and B ∈ Rn×n have the eigen-data

λ1, ..., λm, u1, ..., um, and µ1, ..., µn, v1, ..., vn, respectively. Then

A⊗B has the eigenvalues λjµk with the corresponding

eigenvectors uj ⊗ vk, 1 ≤ j ≤ m, 1 ≤ k ≤ n.

Thm. 5.5. Under the conditions of Thm. 5.4 the

eigenvalues/eigenfunctions of A⊗ In + Im ⊗B are given by

λj + µk and uj ⊗ vk, respectively.

Proof. Due to Thm. 5.4 we have

(A⊗ In + Im ⊗B)(uj ⊗ vk) = (A⊗ In)(uj ⊗ vk) + (Im ⊗B)(uj ⊗ vk)

= (Auj)⊗ (Invk) + (Imuj)⊗ (Bvk)

= (λjuj)⊗ vk + uj ⊗ (µkvk)

= (λj + µk)(uj ⊗ vk).

Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 123

For a matrix A ∈ Rm×n we use the vector representation

A → vec(A) ∈ Rmn, where vec(A) is an nm× 1 vector obtained

by “stacking” A’s columns (the FORTRAN-style ordering)

vec(A) := [a11, ..., an1, a12, ..., anm]T .

In this way, vec(A) is a rearranged version of A. For example,

we have the relation

vec(AY B) = (BT ⊗A)vec(Y ).

The matrix Sylvester equation for X ∈ Rm×n

AX + XBT = G ∈ Rm×m (52)

with A ∈ Rm×m, B ∈ Rn×n, can be written in vector form

(In ⊗A + B ⊗ Im)vec(X) = vec(G).

Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 124

Now the solvability conditions and certain solution methods

can be derived (cf. the results for eigenvalue problems).

Equation (52) is uniquely solvable if

λj(A) + µk(B) = 0.

Moreover, since In ⊗A and B ⊗ Im commute, we can apply all

methods proposed below to represent the inverse

(In ⊗A + B ⊗ Im)−1.

In particular, if A and B correspond to the discrete elliptic

operators in Rd with separable coefficients, we obtain the

low-rank tensor-product decomposition to the Sylvester

solution operator (cf. Lect. 7).

In the case A = B we arrive at the Lyapunov equation.

Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 125

Def. 5.2. Define the Hadamard product

C = A!B = ci1...iq(i1...iq)∈Id

of two tensors A, B ∈ RId

by the entry-wise multiplication

ci1...iq = ai1...iq · bi1...iq .

The following Lemma indicates the simple (but important)

property of the Hadamard product.

Lem. 5.5. Let both A and B be represented in the form

(49) with the Kronecker rank rA, rB and with V k substituted

by Ak ∈ RI

and Bk ∈ RI

, respectively. Then A!B is a tensor

with the Kronecker rank r = rArB given by

A!B =rA∑

k=1

rB∑m=1

ckcm(A1k !B1

m)⊗ ...⊗ (Aqk !Bq

m).


Proof. It is easy to check that

(A1 ⊗B1)! (A2 ⊗B2) = (A1 !A2)⊗ (B1 !B2),

and similar for q-term products. Applying the above relations,

we obtain

A!B =

(rA∑

k=1

ck

q⊗=1

Ak

)!(

rB∑m=1

cm

q⊗=1

Bm

)

=rA∑

k=1

rB∑m=1

ckcm

(q⊗

=1

Ak

)!(

q⊗=1

Bm

)

and the assertion follows.


Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and

B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)

that maps tensors defined on the index set L into those

defined on J .

Def. 5.3. The Hadamard “scalar” product [D, C]I ∈ RK of

two tensors D := [Di,k] ∈ RI×K and C := [Ci,k] ∈ RI×K with

K ∈ I,J ,L is defined by

[D, C]I :=∑i∈I

[Di,K]! [Ci,K],

where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.

Lem. 5.6. Let U, Y, B and T be given as above. Then, with

K = J , the following identity is valid

[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (53)


Proof. By definition of the Hadamard scalar product we have

[U ⊗ Y, T ·B]I =∑i∈I

[U ⊗ Y ]i,J ! [T ·B]i,J

=∑i∈I

[[U ]i · Y ]i,J ! [T ·B]i,J

= Y !(∑

i∈I[U ]i[T ·B]i,J

)

= Y !(

T ·∑i∈I

[U ]i[B]i,L

),

then the assertion follows.

Identity (135) is of the great importance in the forthcoming

applications since in the right-hand side the operator T is

removed from the scalar product and, so, it applies only once.

Complexity of the HKT -matrix arithmetics B. Khoromskij, Leipzig 2005(L5) 129

Complexity issues

Let V k ∈MH,k(TI×I ,P) in (50) and let N = nd.

• Data compression.

The storage for A is only O(rdn) = O(rdN1/d) with

r = O(logα N), α > 0.Hence, we enjoy the sub-linear complexity.

• Matrix-by-vector complexity of Ax, x ∈ CN .

For general x one has the linear cost O(rdkN log n).

If x = x1 × ...× xd, xi ∈ Cn, we again arrive at sub-linear

complexity O(rdkn log n) = O(rdkN1/d log n).

• Matrix-by-matrix complexity of AB and A!B.

The H-matr. struct. of the Kronecker factors leads to

O(r2dn logq n) = O(r2dN1/d logq n) op. instead of O(N3).

How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 130

1. Singular-value decomposition (SVD) and ACA methods in

the case of two-fold decompositions (q = 2).

2. Analytic approximation to the function-generated q-th

order tensors (q ≥ 2), (see Lect. 6).

Def. 5.4. Given the multi-variate function

g : Rd → R with d = qp, p, q ∈ N, q ≥ 2,

defined in a hypercube

Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q ∈ Rd, L > 0, where

‖ · ‖∞ means the ∞-norm of ζ ∈ Rp. On the index set Id, we

introduce the function-generated q-th order tensor

A ≡ A(g) := [ai1...iq ] ∈ RId

with ai1...iq := g(ζ1i1 , ..., ζ

qiq

). (54)

3. Algebraic recompression methods: iterated SVD/ACA,

iterated rank-r approximation to high order tensors (in

general, convergence theory is still open question).

How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 131

The incremental rank-one approximation algorithm:

(a) Fit the original tensor A by a rank-one tensor A1;

(b) Subtract A1 from the original tensor A;

(c) Approx. the residue A−A1 with another rank-one tensor.

On each step of the algorithm one solves the minimisation

problem: Find V ∈ Rnp such that

1/2||A− V 1 ⊗ · · · ⊗ V q||2F → min .

It can be solved by the generalised Rayleigh-Newton iteration.

Def. 5.5. We say that a tensor A is orthogonally decomposable if it can

be written as the sum (49) of r rank-one tensors s.t. for = 1, ..., q,

(V k , V

k′ ) = 0 for k = k′, (k, k′ = 1, ..., r).

Thm. 5.6. (Zhang, Golub) If a tensor of order q ≥ 3 is

orthogonally decomposable, then this decomposition is

unique, and the incremental rank-one approximation

algorithm correctly computes it.

Some heuristic algorithms B. Khoromskij, Leipzig 2005(L5) 132

Given a q-th order tensor A having the Kronecker rank m, one

can try to find the best approximation of A by a tensor of

rank r < m. This can be reduced to solving the minimisation

problem: Find V k ∈ Rnp

s.t.

12||A−

r∑k=1

V 1k ⊗ · · · ⊗ V q

k ||2F → min .

It can be realized by using, say, the Newton iteration applied

to the corresponding Lagrange equation. Under certain

simplifications, the constraint minimisation algorithm can be

implemented in O(m2np + (rmq)3) operations.

There is not too much converg. theory behind this algorithm,

moreover the solution is not unique (cf. Prop. 4.7).

However, in most practically interesting cases this algorithm

does a job.

Some conclusions B. Khoromskij, Leipzig 2005(L5) 133

Summarise:

Basic linear algebra can be performed using one-dimensional

operations, thus avoiding the exponential scaling in the

dimension d.

Bottleneck:

Lack of tractable algebraic methods for the robust multi-fold

Kronecker decomposition of high order tensors (for d ≥ 3) as

well as for the HKT-recompression in matrix operations.

However, there are quite satisfactory heuristic algorithms.

Observation:

Analytic approximation methods are of principal importance.

Classical example: an approximation by Gaussian sums.

Recent proposals: Sinc meth., approximation by exponential

sums, wavelet recompression, “approximate approximation”.




2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.

Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).

3. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic

Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.

4. C. Van Loan: The ubiquitous Kronecker product. J. of Comp. and Applied Math. 123 (2000) 85-100.

5. T. Zhang and G.H. Golub: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl.

23 (2001), 534-550.




Everything is more simple than one thinksbut at the same time more complex

than one can understand.J. W. von Goethe (1749-1832)

An Introduction to Structured Tensor-Product

Representation of Discrete Nonlocal Operators

Part II: Approximation of Operators and Related Matrices

Boris N. Khoromskij

University of Leipzig/MPI MIS, summer 2005


http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij

Lect. 6. HKT representation to integral operators B. Khoromskij, Leipzig 2005 136

In this lecture, we collect some known algebraic properties of

q-th order (q > 2) decomposed tensors, especially, in

comparison with the case q = 2, and then discuss the analytic

approximation methods:

(i) properties of multi-way decompositions,

(ii) separation methods for function-generated tensors,

(iii) approximation to the Galerkin matrices,

(iv) examples of integral operators (IOs) and numerics.

Analytic approximation methods may provide the decomposed

tensors with relatively high Kronecker rank, which can be

then reduced by algebraic “recompression algorithms”.

We stress that in spite of existing implementations (which are

usually not in public domain), the robust algebraic methods of

low-rank tensor decompos. still require further developments.

Why the multi-factor analysis is difficult ? B. Khoromskij, Leipzig 2005(L6) 137

Def. 6.1. The minimal number r in the representation

A =r∑

k=1

V 1k ⊗ · · · ⊗ V q

k , V k ∈ R

np

, (55)

is called a tensor rank of the q-th order tensor A. We suppose

that V k ∈ Rn (i.e., p = 1).

Finding of a tensor rank r and the corresponding

decomposition(s) for a high dimensional q-th order tensor is

the main issue of the multi-factor analysis !

For q = 2, Def. 6.1 coincides with the standard definition of

rank(A), which can be calculated by finite algorithm. The

corresponding tensor decomposition can be computed by the

SVD in O(n3) operations. Under the orthogonality

requirement this decomposition is unique.

Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 138

If q > 2, the situation changes dramatically.

I. rank(A) depends on the number field (say, R or C).

II. We do not know any finite algorithm to compute

r = rank(A), except simple bounds:

0 ≤ rank(A) ≤ nq−1.

III. For fixed q and n we do not know the exact value of

maxrank(A). J. Kruskal ’75 proved that:

– for any 2× 2× 2 tensor we have maxrank(A) = 3 < 4;– for 3× 3× 3 tensors there holds maxrank(A) = 5 < 9.

IV. “Probabilistic” properties of rank(A): in the set of 2× 2× 2tensors there is about 79% of rank-2 tensors and 21% of

rank-3 tensors, while rank-1 tensors appear with probability 0.

Clearly, for n× n matrices we have Prank(A) = n = 1.

Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 139

V. However, it is possible to prove very important uniqueness

property within the equivalence classes.

Two representations like (55) are considered as equivalent (essential

equivalence) if either

(a) they differ in the order of terms or

(b) for some set of paramers ak ∈ R such that

qQ=1

ak = 1 (k = 1, ..., r),

there is a transform V k → a

kV k .

A simplified version of the general uniqueness result is the

following (all factors have the same full rank r).

Prop. 1. (J. Kruskal, 1977) Let for each = 1, ..., q, the vectors V k ,

(k = 1, ..., r) with r = rank(A), are linear independent. If

(q − 2)r ≥ q − 1,

then the decomposition (55) is uniquely determined up to the

equivalence (a) - (b) above.

Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 140

Def. 5.4. (cf. Lect. 5) Given the multi-variate function

g : Ω→ R with d = qp, p, q ∈ N, q ≥ 2,

Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q with L > 0,ζ ∈ Rp. Let ζ1

i1, ..., ζq

iq be the set of collocation points leaving

on the tensor-product lattice in Ω and indexed by Id. We

recall the defintion of function-generated q-th order tensor:

A ≡ A(g) := [ai1...iq ] ∈ RId

with ai1...iq := g(ζ1i1 , ..., ζ

qiq

). (56)

First, we introduce a low Kronecker rank approximation to

the q-th order tensor A = A(g) ∈ RId

with |Id| = nqp,

Ar := A(gr), gr :=r∑

k=1

Φ1k(ζ1) · · ·Φq

k(ζq) ≈ g,

where gr is a separable approximation to g.


We assume that the error g − gr can be estimated in the

L∞(Ω)- or in L2(Ω)-norm, ‖u‖L2 :=√∫

Ωu2(ζ)dζ.

In particular, this might correspond to the Nystrom

discretisation of IOs in Rd (with q = d, p = 2),

(Au) (x) :=∫

Ω

g(x, y)u(y)dy, x, y ∈ Ω ∈ Rd.

In the latter case we have

gr :=r∑

k=1

Φ1k(x1, y1) · · ·Φd

k(xd, yd).

Furthermore, we denote Id = I ×J where I, J are associated

with x ∈ RI and y ∈ RJ .


For the error analysis of the Kronecker approximand we make

use of the Euclidean (Frobenius), and ‖ · ‖∞- tensor norms

‖x‖2 :=√∑

i∈Ix2i , ‖x‖∞ := max

i∈I|xi|, x ∈ R

I .

Let g − gr be smooth enough. Then for a quasi-uniform

distribution of collocation points we have

‖A(g)−A(gr)‖2 ≤ CN

1/2I N

1/2J

Lq/2‖g − gr‖L2 . (57)

The next lemma describes relations between the

approximation error ‖g − gr‖ evaluated in different norms and

the corresponding error ‖A(g)−A(gr)‖ of the Kronecker

product representation.


Lem. 6.1. We have ‖A−Ar‖∞ ≤ ‖g − gr‖L∞(Ω).

For any x ∈ RI, y ∈ RJ , the consistency error A−Ar can be

bounded by

|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1≤ N

1/2I N

1/2J ‖g − gr‖L∞(Ω) ‖x‖2‖y‖2, (58)

|〈(A−Ar)x, y〉| ≤ CN

1/2I N

1/2J

Lq/2‖g − gr‖L2(Ω) ‖x‖2‖y‖2. (59)

Proof. The first assertion follows by the construction of Ar,

‖A− Ar‖∞ = max(i1,...,id)∈Id

|g(ζ1i1 , ..., ζ

qiq

)−r∑

k=1

Φ1k(ζ1

i1) · · ·Φqk(ζq

iq)|

≤ ‖g − gr‖L∞(Ω) .

Now we readily obtain

|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω)

∑i∈I, j∈J

|xiyj| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1,


which proves (58) since ‖x‖1 ≤ N1/2I ‖x‖2 and ‖y‖1 ≤ N

1/2J ‖y‖2.

Now, applying the Cauchy-Schwarz inequality we have

|〈(A−Ar)x, y〉| ≤∑

i∈I, j∈J|(aij − ar,ij)xiyj|

≤ ‖A−Ar‖2√ ∑

i∈I, j∈Jx2i y

2j ≤ ‖A−Ar‖2‖x‖2‖y‖2.

Then (59) follows from the first norm equivalence in (57).

In many applications the generating function g(ζ) depends

only on a few scalar variables which are functionals of ζ.

Ex. 6.1. Consider a function depending only on one scalar

parameter,

g(ζ) = G(ρ(ζ)) where G : [0, a] → R

with ρ : [−L, L]p → [0, a], a > 0.


In the case ρ(ζ) = ‖ζ‖2, the separable approximation gr(ζ) can

be derived from an approximation Gr to the uni-variate

function G(ρ), by exponential sums.

It is easy to see that the approximation error g − gr arising in

Lem. 6.1 can be estimated via the corresponding error G−Gr.

Lem. 6.2. The following estimates are valid

‖g − gr‖L∞ = ‖G−Gr‖L∞ ,

‖g − gr‖L2(Ω) ≤ CLq−12 ‖G−Gr‖L2[0,a].

Proof. The first statement is trivial. The second bound is

obtained by passing to integration in the q-dimensional

spherical coordinates.

Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 146

Given the integral operator A : L2(Ω) → L2(Ω) in Rd, d ≥ 2,

(Au) (x) :=∫

Ω

g(x, y)u(y)dy, x, y ∈ Ω := [0, 1]d

with the shift-invariant kernel function g(x, y) = g(|x− y|).A principal ingredient in the HKT representation of the

Galerkin discretisations in Rd is a separable approximation of

the multi-variate function representing the kernel of an IO.

Clearly, g(x, y) can be represented in the form

g(x, y) = G(ζ1, ..., ζd) ≡ g

(√ζ21 + ... + ζ2

d

),

where ζ = |x − y| ∈ [0, 1], = 1, ..., d.

With fixed 0 ≤ α0 < 1, we introduce the auxiliary function

F (ζ1, ..., ζd) := (ζ1 · · · ζd−1)α0G(ζ1, ..., ζd). (60)


We suppose that a multi-variate function F : Rd → R can be

well approximated by a separable expansion

Fr(ζ1, ..., ζd) :=r∑

k=1

Φ1k(ζ1) · · ·Φd

k(ζd) ≈ F, (61)

where the set of functions Φk : = 1, ..., d, k = 1, ..., r with

Φk : [0, 1] → R is fixed or can be chosen adaptively.

We apply a Galerkin scheme by tensor-product test functions

φi(x1, ..., xd) = φi11 (x1)···φid

d (xd), i = (i1, ..., id), i ∈ In := 1, ..., n.

Now we approximate the Galerkin stiffness matrix

A = (Aφi, φj)L2i,j∈Idn∈ R

N×N , N = nd,

by a matrix A(r) of the form A(r) =r∑

k=1

V 1k ⊗ · · · ⊗ V d

k ≈ A.


Here the V k , = 1, ..., d, are n× n matrices given by

V k =

∫ 1

0

|x − y|−α Φk(|x − y|)φi

(x)φj

(y)dxdy

n

i,j=1

(62)

with α = α0 ≥ 0, = 1, ..., d− 1, and αd = 0 (see (60)).

Def. 6.2. A function g(x, y), x, y ∈ Rd, is called

asymptotically smooth if there exists γ ≥ 1, and p ∈ R such

that for all x, y ∈ Rd, x = y, and all multi-indices α, β such that

|α|+ |β| > 0 with |α| = α1 + ... + αd, we have

|∂αx ∂β

y g(x, y)| ≤ Cα!β!γ|α|+|β||x− y|−p−|α|−|β|.

The next lemma shows that the error ‖A−A(r)‖ with respect

to usual norms is directly related to the accuracy ‖F − Fr‖∞of the separable approximation (61) of F .


Lem. 6.3. Let (61) be valid, then for any i, j ∈ Idn, we have

|ai,j − ari,j| ≤ ‖F − Fr‖∞

d∏=1

∥∥∥|x − y|−α φi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1])

for the components of A−A(r).

Let us further assume that the function

g,k(u, v) := |u− v|−αΦk(|u− v|), (u, v) ∈ [0, 1]2,

is asymptotically smooth for = 1, ..., d, k = 1, ..., r. Then, for

low-order piecewise polynomial Galerkin basis functions, V k

can be approximated by a rank-m H-matrix V k with an error

‖V k − V

k ‖ ≤ Cηm for some η < 1.


Proof. By construction we obtain

|ai,j − ari,j| =

∣∣∣∣∣∫

Ω×Ω

(F − Fr)

(d∏

=1

|x − y|−α

)φi(x)φj(y)dxdy

∣∣∣∣∣≤ ‖F − Fr‖∞

∥∥∥∥∥(

d∏=1

|x − y|−α

)φi(x)φj(y)

∥∥∥∥∥L1(Ω×Ω)

= ‖F − Fr‖∞d∏

=1

∥∥∥|x − y|−α φi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1]),

where the last eq. follows by inserting the tensor-product

basis and by separating the 2d-dimensional integral.

Second, V k given by (62) appears to be the exact Galerkin

stiffness matrix for an IO with the kernel function g,k(u, v)(u, v) ∈ [0, 1]× [0, 1]. Since g,k(u, v) is supposed to be

asymptotically smooth, the result follows by the conventional

theory of the H-matrix approximation.


Note that due to Lem. 6.3, ‖A−A(r)‖ can be easily estimated

in the Frobenius, l2 or l∞ matrix norms. In particular,

‖A−A(r)‖∞ ≤ nd‖F−Fr‖∞d∏

=1

∥∥∥|x − y|−αφi

(x)φj

(y)∥∥∥

L1([0,1]×[0,1]).

Several methods of separable approximations to multi-variate

functions are presented in Part I.

In general, approximability property (61) can be validated by

using the tensor-product Sinc interpolation, where the factor

Φk(|u− v|) can be proved to be asymptotically smooth.

For the class of kernel functions approximated by the

quadrature-type methods, the factor Φk(|u− v|) even appears

to be globally smooth (indeed, it is the entire function).


Lem. 6.4. For both the tensor-product Sinc-interpolation

and for the quadrature methods the function g,k(u, v) (cf.

Lem. 6.3) is asymptotically smooth (AS).

Proof. In the first case we have

g,k(u, v) = |u− v|−αSk,h(φ−1(|u− v|)), u, v ∈ [0, 1],

where Sk,h refers for the k-th Sinc function with step-size h,

and φ−1(x) = arsinh(arcosh( 1x )). Since Sk,h(x), x ∈ R, is

holomorphic in x, and since the factor |u− v|−α is AS, we

conclude that g,k(u, v) has the same property.

Applying quadrature method, we obtain the entire function

Φk(|u− v|) = exp(−tk|u− v|2), tk > 0.

Then the previous argument completes the proof.


Lem. 6.3 and 6.4 prove the existence of a low Kronecker

rank HKT approximation to the class of multi-dimensional

integral operators.

Given a tolerance ε > 0, in general, we have the bound

r = O([

log(

1h

)log

(1ε

)log

(log

1ε

)]d−1)

,

where h = O(n−1) is the mesh-size of the FE discretisation.

In the case of translation-invariant kernels, we obtain a

dimensionally independent bound

r = O(

log n log(

1ε

)log

(log

1ε

)),

see examples below.

Main examples B. Khoromskij, Leipzig 2005(L6) 154

Toward a separable approximation to the multi-variate

functions

1x1 + ... + xd

and1√

x21 + ... + x2

d

(xi > 0, i = 1, ..., d).

Ex. 6.1. In the first case, to apply the Sinc method, we

make use of the Laplace integral transform

1ρ

=∫

R+

e−ρtdt (ρ > 0) (63)

with the integrand f(t) = e−ρt, assuming that ρ ∈ [1, R], R > 1.

In order to apply the improved error estimate , we make use

of substitutions t = log(1 + eu) and u = sinh(w) to obtain

1ρ

=∫

R

f2(w)dw with f2(w) =cosh(w)

1 + e− sinh(w)e−ρ log(1+esinh(w)).

(64)


The decay of f2 on the real axis is

f2(w) ≈ 12ew− ρ

2 ew

as w →∞; f2(w) ≈ 12e|w|− 1

2 e|w|as w → −∞,

corresponding to C = 12 , b = min1, ρ/2, a = 1 in Thm. 2.6.

Lem. 6.5. (Hackbusch, BNK) If ρ ∈ [1, R], the choice

δ = δ(R) = O(1/ log(R)), a = 1, b = 1/2 in Thm. 2.6 (with the

corresponding value of h) implies the uniform quadrature

error estimate

∣∣∣∣1ρ − IM (f2, h)∣∣∣∣ Ce

− π2M

log(3R) log(π2M) . (65)


In the case of 1/ρ = 1x1+...+xd

, the estimate (114) implies that

an approximation of accuracy ε is obtainable with

M ≤ O (log( 1

ε ) · log R), (66)

provided that 1 ≤ x1 + ... + xd ≤ R, which can be achieved by a

proper scaling. The numerical results even support the better

estimate M ≤ O (log( 1

ε ) + log R)

(see Fig. 19, 13).

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 12: The absolute quadrature error for (64) with 1 ≤ ρ ≤ 103, and

with M = 16 (left), M = 32 (middle), M = 64 (right).


0 0.5 1 1.5 2

x 104

−8

−6

−4

−2

0

2

4

6

8x 10

−6

0 0.5 1 1.5 2

x 104

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2x 10

−7

0 0.5 1 1.5 2

x 104

−4

−2

0

2

4

6x 10

−10

Figure 13: The absolute quadrature error for (64) with 1 ≤ r ≤ 18000,

and with M = 16 (left), M = 32 (middle), M = 64 (right).

Lem. 6.5 also shows that the separation rank r = 2M + 1depends only linear-logarithmically on both the tolerance

ε > 0 and the upper bound R of ρ = x1 + ... + xd. Hence, there

is no dependence on the dimension d.


Ex. 6.2. In the case of Newton potential 1/√

x21 + ... + x2

d, we

make use of the Gauss integral

1ρ

=2√π

∫R+

e−ρ2t2dt (ρ ∈ [1, R]) . (67)

To obtain robustness in ρ, we rewrite the Gauss integral (67)

using substitutions t = log(1 + eu) and u = sinh(w),

1ρ

=∫

R

f(w)dw with f(w) := cosh(w)F (sinh(w)) (68)

with

F (u) :=2√π

e−ρ2 log2(1+eu)

1 + e−u.


Lem. 6.6. Let δ < π/2, ρ ≥ 1. Then for the function f from

(271) we have f ∈ H1(Dδ).

In addition, Thm. 2.6 is satisfied with a = 1.

The improved (2M + 1)-point quadrature with the choice

δ(ρ) = πC+log(ρ) allows the error bound∣∣∣∣1ρ − IM (f, h)

∣∣∣∣ ≤ C1 exp(− π2M

(C + log(ρ)) log M

). (69)

Proof. It is easy to check that f is holomorphic in Dδ and

N(f, Dδ) <∞ uniformly in ρ (with the choice δ = δ(ρ)). Now

we check the double-exponential decay of the integrand as

|w| → ∞ and then apply Thm. 2.6, where

δ = δ(ρ) =π

C + log(ρ).


We apply (69) and obtain the bound (70),

M ≤ O (log( 1

ε ) · log R). (70)

Hence again there is no dependence on the dimension d.

Numerical examples for this quadrature with values ρ ∈ [1, R],R ≤ 5000, are presented in Fig. 21.

0 50 100 150 200−4

−3

−2

−1

0

1

2

3x 10

−8

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4x 10

−7

0 1000 2000 3000 4000 5000−5

0

5x 10

−7

Figure 14: The absolute quadrature error for M = 64 with R = 200 (left),

R = 1000 (middle), R = 5000 (right).

Further examples B. Khoromskij, Leipzig 2005(L6) 161

Again, we observe almost linear error growth in ρ. Similar

results were obtained in the case R > 5000 manifesting a

rather stable behaviour of the quadrature error with respect

to R.

Ex. 6.3. log(x + y)

In boundary element methods (BEM), one is interested in a

low separation rank representation of the kernel function

s(x, y) = log(x + y), x ∈ [0, 1], y ∈ [h, 1] with some small

mesh-size parameter h > 0. A representation like

1x + y

=k∑

m=1

Φm(x)Ψm(y) + δk with |δk| ≤ ε (71)

can be constructed by means of the quadrature applied to the

integral (64) with ρ = x + y and k = 2M + 1. Let ψm be the


anti-derivatives of a function Ψm. Integration of (71) yields

log(x + y) =

y∫1−x

dt

x + t=

y∫1−x

(k∑

m=1

Φm(x)Ψm(t) + δk

)dt

=k∑

m=1

Φm(x)[ψm(y)− ψm(1− x)] + Sk

= Φ0(x) +k∑

m=1

Φm(x)ψm(y) + Sk

with Φ0(x) = −k∑

m=1Φm(x)ψm(1− x) and |Sk| =

∣∣∣∣∣ y∫1−x

δkdt

∣∣∣∣∣ ≤ ε.

This resulting representation of log(x + y) has the separation

rank k + 1 and the same accuracy ε as (71).


Ex. 6.4. Helmholtz kernel in Rd

Given κ ∈ R, define the Helmholtz kernel function

g(x, y) :=cos(κ|x− y|)|x− y| = e

eiκ|x−y|

|x− y| for (x, y) ∈ [0, 1]d × [0, 1]d

in Cartesian coordinates x = (x1, ..., xd), y = (y1, ..., yd) ∈ Rd. The

Sinc approximation can be applied in the case of a weakly

admissible block (in the H-matrix techniques) w.r.t. the

transformed variables ζ1, ..., ζd . For (ζ1, ..., ζd) ∈ [0, 1]d, define

G(ζ1, ..., ζd) := g(x, y), ζ = |x − y|, = 1, ..., d,

which implies

G(ζ1, ..., ζd) := cos(

κ√

ζ21 + ... + ζ2

d

)/√

ζ21 + ... + ζ2

d .

We approximate the modified function

F (ζ1, ..., ζd) := (ζ1 · ... · ζd−1)α0G(ζ1, ..., ζd), 0 < α0 < 1,


on the domain Ω1 := [0, 1]d−1 × [h, 1], where h > 0 is a small

(mesh) parameter.

Now we apply Thm. 4.5 with δ = 1/| log h| to construct the

approximation GM (x) via the interpolation of F and obtain

|G(x)−GM (x)| ≤d−1∏=1

x−α0

∣∣EM (F, h)(φ−1(x))∣∣ (72)

≤ Chα0(1−d)| log h|Λd−1M N0(F, Dδ) e−πM/(| log h| log M)

with ζ ∈ (0, 1]d.

For this example N0(F, Dδ) = O(eκ), while the Kronecker rank

is given by r = (2M + 1)d−1. Clearly, for a large κ, the bound

(72) does not provide a satisfactory complexity.




2. D. Braess and W. Hackbusch: Approximation of 1x

by exponential sums in [1, ∞). To appear in IMA JNA.

3. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.

Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).

4. J. B. Kruskal: Three-way arrays: Rank and uniqueness of trilinear decompositions. Linear Algebra

Appl., 18 (1977), 95-138.



Lect. 7. Structured Representation to Matrix-Valued Functions B. Khoromskij, Leipzig 2005 166

The matrix-valued functions (MVF) of the discrete (elliptic)

operator L arise in wide range of applications. Structured

tensor-product representations are developed for several

classes of MVFs:

F1(L) :=L−α, α > 0,

F2(L) :=e−tL,

F3,k(L) := cos(t√L)L−k, k ∈ N,

F4(L) :=∫ ∞

0

e−tL∗Ge−tLdt,

F5(L) := sign(L).

Both the discrete elliptic inverse L−1 and the matrix

exponential e−tL play an important role in numerical PDEs.

Usually MVFs appear to be fully populated, hence data-sparse

formats are needed for their efficient representation.

Representation of Operators B. Khoromskij, Leipzig 2005(L7) 167

There are different methods to represent MVFs (set L = A):

• In the case of diagonalisable matrices, i.e., A = T−1DT

with D = diagd1, ..., dn - diagonal, one defines

F (A) = T−1F (D)T, F (D) = diagF (d1), ..., F (dn).• Dunford-Cauchy integral for analytic functions

F (A) =1

2πi

∫Γ

F (z)(zI −A)−1dz, Γ ∈ C.

• Laplace type transform

F (A) =∫

R

f(t)e−tAdt.

• Transforms via trigonometric kernels

F (A) =∫

R

[a(t) cos(tA) + b(t) sin(tA)]dt.

• Polynomial expansions or/and nonlinear iterations.

Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 168

Ex. 7.1. The solution operator to the initial value parabolic

problem∂u

∂t+ Lu(t) = 0, u(0) = u0 ∈ X, (73)

is given by

T (t;L) = e−tL =∫

Γ

e−zt(zI − L)−1dz,

where L is an elliptic operator (say, L = −∆) in a Hilbert

space X and u(t) is a vector-valued function u : R+ → X.

Given the initial vector u0, the solution of the initial value

problem can be represented by u(t) = T (t;L)u0.

A simple example of a parabolic PDE is the 1D heat equation

∂u

∂t− ∂2u

∂x2= 0, u : R+ × [0, 1]→ R

with the corresponding boundary and initial conditions.


Ex. 7.2. Initial-value problem for the second order

differential equation with an operator coefficient

u′′(t) + Lu(t) = 0, u(0) = u0, u′(0) = 0,

has the solution operator

C(t;L) := cos(t√L) =

∫Γ

cos(t√

z)(zI − L)−1dz,

(the hyperbolic operator cosine family), so that

u(t) = C(t;L)u0.

It represents the function-to-operator map cos(t√·)→ C(t;L).

An example of a hyperbolic PDE is the classical wave eq.

∂2u

∂t2− ∂2u

∂x2= 0

subject to the corresponding boundary and initial conditions.


Ex. 7.3. For the boundary value problem

d2u

dx2− Lu = 0, u(0) = 0, u(1) = u1, (74)

in a Hilbert space X, the solution operator is the normalised

hyperbolic operator sine family

E(x;L) :=(sinh(

√L))−1

sinh(x√L) =

∫Γ

sinh(x√

z)sinh(

√z)

(zI − L)−1dz,

so that u(x) = E(x;L)u1.

The simplest PDE of the type (74) is the Laplace equation in

a cylindric domain:

d2u

dx2+

d2u

dy2= 0, x ∈ [0, 1], y ∈ [c, d],

u(0, y) = 0, u(1, y) = u1(y).

Rem. 7.1 Constructions 7.1-7.3 are useful to avoid time

stepping and hence allow parallel (in time) computations.


Ex. 7.3. For the Sylvester matrix equation

AX + XB = G, (A, B, G ∈ Rn×n given)

the solution X ∈ Rn×n is given by the integral

X = F(A, B)G :=∫ ∞

0

e−tAGe−tBdt,

supposing that A, B provide existence of this integral (cf.

Lect. 5). The (nonlinear) Riccati matrix equation

AX + XA + XFX = G, (75)

where A, F, G ∈ Rn×n are given and X ∈ Rn×n is the unknown

matrix, can be solved by Newton’s iteration. At each iteration

step the Lyapunov equation has to be solved (Xk → X)

(A− FXk)Xk+1 + Xk+1(A− FXk) = −XkFXk + G.


Ex. 7.5. Let A ∈ Rn×n be a matrix whose spectrum σ(A)does not intersect the imeginary exis. The matrix function

F (A) = sign(A) is defined by

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I (76)

with Γ+ being any simply connected closed curve in C whose

interior contains all eigenvalues of A with positive real part.

The HKT representation to the MVF sign(A) is based on an

efficient quadrature for the integral

sign(A) =1cf

∫R+

f(tA)t

dt.

Efficiet numerical implementation is possible for certain

functions f having trigonometric structure (cf. Lect. 8).


Ex. 7.6. A negative fractional power of A is represented by

A−σ =1

Γ(σ)

∫ ∞

0

tσ−1e−tAdt, σ > 0, (77)

provided that the integral exists.

With the choice A = −∆, the representation (77) would be of

the particular interest in the cases:

(a) σ = 1 (inverse Laplacian),

(b) σ = 1/2 (preconditioning for the Laplace-Beltrami

operator (−∆)1/2, and for the hypersingular integral operator,

e.g., in BEM applications),

(c) σ = 2 (inverse biharmonic operator).

A positive fractional power of A, say A1/2, can be represented

by a simple factorisation

A1/2 = A A−1/2.


Ex. 7.7. In some cases iterative schemes (with possible

recompression at each iteration) can be applied.

(a) An approximation to A−1: given X0 ∈ Rn×n, the

Newton-Schulz iteration

Xk+1 = Xk(2I −AXk), k = 1, 2, ... (78)

converges to A−1 locally quadratically (cf. anylisis below).

Iteration (78) is nothing but the Newton method

Ψ′(Xk)(Xk+1 −Xk) = −Ψ(Xk)

for solving the nonlinear matrix equation

Ψ(X) := A−X−1 = 0.

In fact, Ψ(X + δ)−Ψ(X) = X−1δ(X + δ)−1 providing

Ψ′(Xk)(δ) = X−1k δX−1

k . Now (78) follows from

Xk+1 −Xk = −Xk(A−X−1k )Xk.


(b) Newton-Schulz iteration scheme to approximate sign(A):

Xk+1 = Xk +12[I − (Xk)2

]Xk , X0 = A/||A||2. (79)

For diagonalisable matrices we have locally quadratic

convergence Xk → sign(A) (see the analysis below).

This scheme was already successfully applied in many-particle

calculations.

The above mentioned schemes (a) and (b) are especially

efficient in the case q = 2, since the optimal SVD or ACA

recompression in the H- and HKT-formats can be applied.

(c) Newton’s method to calculate sign(A). The iteration

X0 = A, Xk+1 =12(Xk + X−1

k ) (80)

converges (locally quadratically) to sign(A). This method is

proved to be efficient in the H-matrix arithmetics.


Ex. 7.8. The matrix exponential can be defined and then

calculated by

exp(A) :=∞∑

k=0

1k!

Ak ≈ EN :=N−1∑k=0

1k!

Ak. (81)

This approximation converges exponentially (if N is large

enough, say, N ≥ e||A||),

||EN − exp(A)|| ≤∞∑

k=N

1k!||A||k ≤ C(||A||)

N !≈(

e||A||)N

)N

.

The Horner scheme to calculate (81) requires only N − 1matrix multiplications

AN := I; for k = N − 1 downto 1 do Ak :=1k

Ak+1A + I,

such that EN := A0.


If ||A|| > 1 the algorithm (81) may produce very large terms

for intermediate values of N !

Recal that for commutative matrices A, B we have

exp(A + B) = exp(A) exp(B), in particular exp(A) = [exp(A/2)]2.

Now, the algorithm (81) can be modified as follows:

(a) Choose n such that 12n ‖A‖ ≤ 1.

(b) Compute B = exp(A/2n) by algorithm (81).

(c) Compute exp(A) = B2n

in n ≈ log2(‖A‖) matrix quadrations.

If B = exp(A/2n) can be represented in certain data-sparse

format (e.g., H-matrix or Kronecker product form) then

truncating all the intermediate products B2m

, m = 1, ..., n, into

the fixed format leads to the desired representation of exp(A).In this case, the truncation error analysis is still an open

question.

Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 178

Newton-Schulz iteration (78) to compute A−1.

Denote the residual error by Ek = I −AXk, k = 0, 1, 2, . . .. It is

easy to see that

Xk+1 = Xk(I + Ek), k = 0, 1, 2, . . . ,

which implies (for k = 1, 2, . . .)

Ek = I−AXk−1(I+Ek−1) = I−(I−Ek−1)(I +Ek−1) = E2k−1. (82)

Applying (82) recursively, we find that

Ek = E2k

0 , k = 1, 2, . . . . (83)

It is also clear that

A−1 −Xk = A−1Ek = A−1E2k

0 = X0(I −E0)−1E2k

0 .


Under the assumption on the spectral radius of E0,

ρ ≡ ρ[E0] = maxj|λj | < 1,

where λj = λj(E0) are the eigenvalues of E0, we obtain that

the error Ek in (83) vanishes like ρ2k

.

Rem. 7.1. The iteration (78) can be applied to any

preconditioned matrix B = R0A, where R0 is a spectrally

equivalent preconditioner to A so that σ(B) is uniformly

bounded in n. Assuming that both R0 and R0A already have

the H-matrix representation, we then obtain the approximate

inverse of interest from

A−1 = (R0A)−1R0.

In some cases this approach provides the constructive proof

for the existence of the H-matrix inverse.


Let E0 = I −BX0. The requirement ρ[E0] < 1 can be achieved

under the following conditions.

Lem. 7.1. Let B have real eigenvalues in the interval

0 < m ≤ λj ≤ M , j = 1, 2, . . . , n. Let X0(w) = wI, then ρ[E0] < 1for all w ∈ (0, 2

M ). Moreover, if ρ(w) = ρ[E0(w)], then there

holds

ρ(w∗) = minw∈(0, 2

M )ρ(w) =

M −m

M + m< 1, w∗ =

2M + m

. (84)

Proof. This lemma is a reformulation of a standard

convergence result for the Richardson iteration.

Implementing (78) in the formatted H-matrix arithmetics one

can compute the H-matrix approximation Xk to A−1 with

O(log log ε−1) iterations, where ‖I −AXk‖ ≤ ε.


Newton-Schulz iteration (274) to compute sign(A).

Diagonalisable case. Let T be the unitary transform that

diagonalises A, i.e., A = T DT with di ∈ [−1, 1], then it also

diagonalises all Sk, k = 1, 2, .... Hence we have to show that

the scalar iteration

xk+1 = f(xk), with x0 ∈ [−1, 0) ∪ (0, 1]

and with f(x) := x + 12x(1− x2) ≡ xg(x), converges to sign(x0)

quadratically.

Clearly, f(x), x ∈ [−1, 1], is increasing and has the fixed points

x = −1, 0, 1. Since on the interval (−1, 1) we have g(x) > 1, it

implies 0 < xk < xk+1 ≤ 1 if x0 ∈ (0, 1] and −1 ≤ xk+1 < xk < 0 if

x0 ∈ [−1, 0).

Hence, both x = −1 and x = 1 are stable fixed points.


For example, consider the case with small initial guess x0 > 0.For x ∈ [−1/2, 1/2], we have g(x) ≥ q > 1 with q = 1 + 3/8, thus

the number of iterations xk+1 = xkg(xk) to achieve the value,

say, xk = 0.5 starting from x0 > 0 is about O(logq x0).

For xk ≥ 1/2, we enter the regime with quadratic

convergence. In fact, we just have

1− xk+1 =12(1− xk)2(xk + 2),

which implies |1− xk+1| ≤ 32 (1− xk)2. In this stage, to achieve

precision ε > 0 one requires O(log2 log2 ε−1) iterations.

For the initial guess we actually have x0 = cond(A)−1, which

implies that the total number of iterations is bounded by

O(log2 log2 ε−1) + O(logq cond(A)).


Note that iteration (274) can be written as Xk = Φ(Xk−1)with Φ(X) := X + 1

2

(I −X2

)X (see Lect. 8). Clearly, (274)

ensures that all Xk (k = 1, 2, ...) are simultaniously diagonalised

by the same matrix T , hence we have (with B = sign(A)):

Φ(X)−B = X −B +12(B2 −X2)X

=12(X −B)(B(B −X) + (B −X)(B + X)

= −(X −B)2(B +12X). (85)

The analysis for algorithm (80) in the diagonalisable case is

reduced to that one for the Newton meth. applied to the eq.

Ψ(x) := x2 − 1 = 0,

that is xk+1 = 12 (xk + 1

xk).


The iterative calculation may be not very simple !

Newton iteration to compute the square root A1/2 of the

symmetric positive definite matrix A: Given X0, the iteration

Xk∆k + ∆kXk = A−X2k , (86)

where ∆k = Xk+1 −Xk, converges to A1/2 quadratically

(locally). It requires solving matrix Lyapunov equation.

This scheme can be consider as the Newton iteration to solve

the nonlinear matrix equation

Ψ(X) := A−1 −X2 = 0.

Clearly,

Ψ(X + δ)−Ψ(X) = −X∆−∆X,

so our iteration can be interpreted as the Newton method for

solving Ψ(X) = 0 (see Lect. 8 for the analysis of truncated

iterations).


Iteration (86) can be written as Xk = Φk(Xk−1) corresponding

to the choice

Φk(X) := Φ(X),

where Φ(X) solves the matrix equation

X(Φ(X)−X) + (Φ(X)−X)X = A−X2.

Simple calculation shows that the latter equation implies

(with the substitution A = B2)

X(Φ(X)−B) + XB −X2 + (Φ(X)−B)X + BX −X2 = B2 −X2,

which leads to the matrix Lyapunov equation with respect to

Y = Φ(X)−B,

XY + Y X = (B −X)2.


Making use of the solution operator for the Lyapunov

equation (assume that X = X > 0), we arrive at the norm

estimate

‖Φ(X)−B‖ ≤∥∥∥∥∫ ∞

0

e−tX(B −X)2e−tXdt

∥∥∥∥ ≤ C‖B −X‖2.

This proves relation (3) in Lem. 8.1 with α = 2. Hence, Thm.

8.1 ensure the convergence of the truncated version of the

nonlinear iteration (86).

Note that the simpler iteration

X0 = a0A, Xk := Xk−1−12(Xk−1−X−1

k−1A) (k = 1, 2, . . .) , (87)

where a0 > 0 is the given constant, does not guarantee, in

general, the convergence of truncated iterations.

Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 187

We analyse the case of second order tensors (q = 2)

Ar =r∑

k=1

Uk ⊗ Vk, Uk ∈ Rm×m, Vk ∈ R

n×n.

Recall that for a matrix A ∈ Rm×n we use the vector

representation A → vec(A) ∈ Rmn, where vec(A) is an nm× 1vector obtained by “stacking” A’s columns

vec(A) := [a11, ..., an1, a12, ..., anm]T ,

so, vec(A) is a rearranged version of A. Introduce the linear

invertible operator L : Rmn×mn → Rm2×n2by

L(Ar) ≡ Ar :=r∑

k=1

vec(Vk)⊗ vec(Uk)T .

L is unitary with respect to the spectral or Frobenius norm,

but there is no permutation matrix P with Ar = PArPT .


Making use of the transform L allows to reduce the low

Kronecker rank approximation of A to those for the low-rank

approximation to A. For fixed r one may apply truncation

operator R of the form

R(A) := L−1(Πr(L(A))),

where Πr(A) is the best rank-r approximation to A in the

given norm (say, spectral or Frobenius norm).

We formulate the general statement. Let B = F(A) be

defined by the given matrix-valued function F and let R be

the truncation operator that satisfies

‖X −RX‖ ≤ CR‖X −B‖ (88)

for all X in the “small” neighbourhood S(B) of B.

In particular, we consider F(A) = A−1, F(A) =√

A and

F(A) = sign(A).


Consider the case (78). Introduce the modified (truncated)

Newton-Schultz iteration

Zk+1 = Xk(2I −AXk), Xk+1 = R(Zk+1), k = 1, 2, ... (89)

Thm. 7.1. Let (88) be satisfied. Then for any initial guess

X0 = R(X0) ∈ S(B), the truncated Newton-Schultz iteration

(89) converges to A−1 quadratically

||A−1 −Xk|| ≤ (1 + CR)||A|| ||A−1 −Xk||2, k = 1, 2, ...

Proof. Note that (88) leads to

B ≡ A−1 = R(A−1).

Now equation (89) implies

A−1 − Zk+1 = (A−1 −Xk)A(A−1 −Xk) which yields

||A−1 − Zk+1|| ≤ ||A|| ||A−1 −Xk||2. (90)


On the other hand, (88) implies

||Xk − Zk|| = ||R(Zk)− Zk|| ≤ CR||A−1 − Zk||,

hence the triangle inequality leads to

||A−1 −Xk|| ≤ ||A−1 − Zk||+ ||Zk −Xk|| ≤ (1 + CR)||A−1 − Zk||

Combinig this bound with (90) completes the proof.

Let us check (88) for the choice R(A) = L−1(Πr(L(A))). We

denote Y = L(X) and YB = L(B) and note that B = R(B)yields ΠrYB = YB.

In the following proof we make use of the standard stability

estimates for the singular values of the perturbed matrix

(Wielandt, Hoffman ’55).

Now we estimate in the Frobenis norm


‖L−1‖−1‖X −RX‖ ≤ ‖(I −Πr)Y ‖

=

√√√√ n∑k=r+1

σk(Y )2

=

√√√√ n∑k=r+1

(σk(Y )− σk(YB))2

≤n∑

k=r+1

|σk(Y )− σk(YB)|

≤n−r+1∑

k=1

σk(Y − YB)

≤ √n− r||L(X −B)||.

Estimate (88) now follows with CR =√

n− r‖L−1‖‖L‖.

Few remarks B. Khoromskij, Leipzig 2005(L7) 192

1. Similar result holds in the spectral norm. The factor√n− r can be omitted due to the Mirsky theorem.

2. The error estimate above allows the straightforward local

analysis for algorithm (86) with the truncation operator R.

3. The truncated Newton-Schulz iterations (89) and (86) can

be analysed in the H-matrix format as well using the similar

techniques (but applied block-wise).

4. In the case of three (or more) factors (q ≥ 3) we can

analyse the sub-optimal truncation operator R via Tucker’s

decomposition.


1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional

Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.

2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.

Preprint MPI MIS 2005.


http://www.mis.mpg.de/scicomp/Fulltext/khor7.ps

Lect. 8 Truncated iterations. Approximating a matrix exp(A) B. Khoromskij, Leipzig 2005(L8) 194

Let V be a normed space (e.g., n× n matrices) and consider

a function f : V → V. Assume that A ∈ V and B := f(A) can be

obtained by the locally convergent fixed-point iterations

Given X0 ∈ V, Xk = Φ(Xk−1), k = 1, 2, ... , (91)

where Φ : V → V is a one-step operator,

limk→∞

Xk = B = Φ(B). (92)

Lem. 8.1. Assume that there are constants cΦ, εΦ > 0 s.t.

‖Φ(X)−B‖ ≤ cΦ ‖X −B‖2 ∀ X with ‖X −B‖ ≤ εΦ, (93)

and set ε := min (εΦ, 1/cΦ). Then (92) holds for any X0

satisfying ||X0 −B|| < ε, and, moreover,

‖Xk −B‖ ≤ c−1Φ (cΦ ‖X0 −B‖ )2

k

(k = 0, 1, 2, . . .) . (94)

Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 195

Proof: Let ek := ‖Xk −B‖. Then, due to (93),

ek ≤ cΦe2k−1, provided that ek−1 ≤ εΦ. (95)

Since (95), ek−1 ≤ ε ≤ εΦ imply ek ≤ cΦε2 = ε (cΦε) ≤ ε. Hence,

all iterates stay in the ε-neighbourhood of B.

(94) is proved by induction:

ek ≤(95)

cΦe2k−1 =

induct. hypoth.cΦ ·

(c−1Φ (cΦe0)

2k−1)2

=c−1Φ (cΦe0)

2k

.

Whenever e0 < ε, (94) shows ek → 0.

Rem. 8.1. (94) together with e0 ≤ ε implies monotonicity:

‖Xk −B‖ ≤ ‖Xk−1 −B‖ . (96)

Rem. 8.2. Condition (93) is valid for the Newton iteration.


Let S ⊂ V be a subset (not necessarily a subspace) considered

as a class of certain structured elements (e.g. structured

matrices) and suppose that R : V → S is an operator mapping

elements from V onto suitable structured approximants in S.

We call R a truncation operator.

Define a truncated iterative process as follows:

Y0 := R(X0), Yk := R(Φ(Yk−1)) (k = 1, 2 . . . .) . (97)

Thm. 8.1. Under the premises of Lem. 8.1, assume that

‖X −R(X)‖ ≤ cR ‖X −B‖ ∀ X with ‖X −B‖ ≤ εΦ. (98)

Then there exists δ > 0 such that the truncated iteration

(97) converges to B so that for k = 1, 2, . . .

‖Yk −B‖ ≤ cRΦ ‖Yk−1 −B‖2 with cRΦ := (cR + 1)cΦ (99)

for any starting value Y0 = R(Y0) satisfying ‖Y0 −B‖ < δ.


Proof: Let ε := min (εΦ, 1/cΦ) and define Zk = Φ(Yk−1). By

(96) we have

‖Zk −B‖ ≤ ‖Yk−1 −B‖ ,

provided that ‖Yk−1 −B‖ ≤ ε. Then

‖Yk −B‖ = ‖R(Zk)− Zk + Zk −B‖ ≤ (cR + 1) ‖Zk −B‖ . (100a)

Assuming ‖Yk−1 −B‖ ≤ ε, the bounds ε ≤ εΦ and (93) ensure

‖Zk −B‖ = ‖Φk(Yk−1)−B‖ ≤ cΦ ‖Yk−1 −B‖2 . (100b)

Combining (100a) and (100b), we obtain (99) for any k,

provided that ‖Yk−1 −B‖ ≤ ε.

Similar to the proof of Lem. 8.1, the choice δ := min (ε, 1/cRΦ)guarantees that ‖Y0 −B‖ ≤ δ implies ‖Yk −B‖ ≤ δ ≤ ε, k ∈ N.


Cor. 8.1. Under the assumptions of Thm. 8.1, any starting

value Y0 with ‖Y0 −B‖ ≤ δ leads to

‖Yk −B‖ ≤ c−1RΦ (cRΦ ‖Y0 −B‖)2k

(k = 1, 2, . . .) , (101)

where cRΦ and δ are defined as above.

The condition (98) has a clear geometrical meaning. If

R(X) := argmin ‖X − Y ‖ : Y ∈ S

is the best approximation to X in the given norm, inequality

(98) holds with cR = 1, since B ∈ S. Therefore, (98) with

cR ≥ 1 can be viewed as a quasi-optimality condition.

If the norm is defined by a scalar product, S is a subspace

and R(X) is the orthogonal projection onto S, then (98) is

obviously fulfilled with cR = 1.

Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 199

The next lemma is easy to prove.

Lem. 8.2. Let B = R(B) be fixed and assume that R is

Lipschitz at B or R is a bounded linear operator. Then the

inequality (98) holds.

Let V = RI×I be the space of matrices and S ⊂ V a subspace

with a prescribed sparsity pattern P ⊂ I × I, i.e., X ∈ S if and

only if Xij = 0 for all (i, j) /∈ P. A familiar example of a

truncation in this case is R(X) defined entry-wise by

R(X)ij =

⎧⎨⎩ Xij for (i, j) /∈ P,

0 for (i, j) ∈ P.(102)

Since R is linear, it satisfies the hypotheses of Lem. 8.2.


Rem. 8.3. Usually, the subset S as above is not helpful since

sparse argument A ∈ S yields fully populated result f(A).

However, it is well-known that after a DWT

X → L(X) :=W−1XWone can apply a matrix compression.

Figure 15: Wavelet transform of a matrix: “fingrer”-like structure.


Such a matrix compression is of the form (102) and will be

denoted by Π. Then, the truncation R applied to X is the

composition of the DWT L, the pattern projection Π and the

back-transformation L−1:

R := L−1 Π L. (103)

The same form of R is typical as well for many other choices

of L and Π.

Next, we give the characterization of Π that ensures the

property (98) for R.


Lem. 8.3. Let V and W be normed spaces and L : V → W a

bounded linear operator with a bounded inverse. Given

B ∈ V , assume that Π : W → W satisfies

‖Z −Π(Z)‖ ≤ cΠ ‖Z − L(B)‖ ∀ Z ∈ W (104)

with∥∥L−1(Z)−B

∥∥ ≤ εΦ. Then the truncation operator R of

the form (103) satisfies condition (98) with cR := cΠ ‖L‖ ‖L−1‖.Proof: Let Z = L(X). Then, obviously,

‖R(X)−X‖ =∥∥L−1(Π(Z)− Z)

∥∥ ≤ cΠ‖L−1‖ ‖Z − L(B)‖ ,

and it remains to observe that

‖Z − L(B)‖ = ‖L(X)− L(B)‖ ≤ ‖L‖ ‖X −B‖ .


Applications of Lem. 8.3 (in the case of H-matrices) are

facilitated by the following construction. Define a suitable

system of normed spaces W1, . . . , WN and set

W := W1 × . . .×WN = H = (H1, . . . , HN ) : Hi ∈ Wi (105)

with ‖H‖ =√∑N

i=1 ‖Hi‖2.Let each Wi be associated with a truncation oper.

Πi : Wi → Wi satisfying (for some fixed Zi ∈Wi)

‖Hi −Π(Hi)‖ ≤ ci ‖Hi − Zi‖ ∀ Hi ∈ Wi and 1 ≤ i ≤ N. (106)


Lem. 8.4. Let W be the normed space from (105) and let

the truncation operators Πi satisfy (106), where the elements

Zi ∈Wi are defined by

L(B) = (Z1, . . . , ZN ).

Suppose that the product of the truncation operators Πi

defines Π : W → W via

Π(H) := (Π1(H1), . . . , ΠN (HN )) for H = (H1, . . . , HN ), Hi ∈ Wi.

Then R from (103) satisfies (98).

Proof: Let L(X) = H = (H1, . . . , HN ). Then, according to the

definitions of L and Π,

‖H −Π(H)‖ ≤√∑N

i=1c2i ‖Hi − Zi‖2 ≤ max

1≤i≤Nci

√∑N

i=1‖Hi − Zi‖2,

which proves (104) and allows us to use Lem. 8.3.


An important example of Π in the case of a matrix space W

is given by optimal low-rank approximations.

Lem. 8.5. Let W be a normed space of all matrices of a

fixed size and let S ⊂ W consist of all matrices whose rank

does not exceed r. Then for any H ∈W there exists a matrix

T ∈ S such that

‖H − T‖ = minrank Z≤r

‖H − Z‖ .

Proof: Consider a minimising sequence Zk ∈ S, i.e.,

limk→∞

‖H − Zk‖ = ρ := infrank Z≤r

‖H − Z‖ . Since the sequence Zk is

bounded, a convergent subsequence Zki → T exists. Its limit

satisfies ‖H − T‖ = ρ. The assertion T ∈ S is due to the fact that a

matrix of rank equal to p > r possesses a vicinity wherein any matrix is of

rank ≥ p (use the continuity of the determinant and the existence of a

nonzero minor of order p for a matrix of rank p).


Matrix theory provides well-developed tools for the construction of

low-rank approximations in the case of any unitarily invariant norm.

For most familiar unitarily invariant norms such as thespectral and the Frobenius norm, it can be establishedthrough simple arguments: It is well-known that

minrank Z≤r

‖H − Z‖2 = σr+1(H), minrank Z≤r

‖H − Z‖F =

s Xi≥r+1

σ2i (H).

Thus, the truncation property (98) is easy to achieve (with

cR = 1) when we are aware of the existence of the best

approximation element.

Sometimes (e.g., for three-way approximations of bounded tensor rank)

this is not the case. However, all cases are supported by extension of

Thm. 8.1 as we can always capitalise on a quasi-optimal construction:

Let ρ(H) = infT∈S

‖H − T‖. For a given fixed ε > 0, we can adapt an

ε-optimal approximation Π(H) to H in the sense that

ρ(H) ≤ ‖H − Π(H)‖ ≤ ρ(H) + ε.

Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 207

Let V = Rn×n be the space of n× n matrices, and consider

each matrix as a union of N disjoint blocks of possibly

different sizes, where each matrix block belongs the matrix

space Wi (1 ≤ i ≤ N). Given X ∈ V , let Li(X) ∈Wi be the ith

block of X and define the space W according to (105).

Figure 16: Standard- (left) and Weak-admissible H-partitionings.


The above-considered operator L : V → W reads

L(X) := (L1(X), . . . , LN (X)) (block-tracing operator).

If the Frobenius norm is used on the spaces V and

W1, . . . , WN , the norm induced on W is again the Frobenius

norm. Since the blocks are disjoint, L is isometrical. Hence,

the inverse L−1 exists and satisfies

‖L‖ = ‖L−1‖ = 1.

Fix a positive integer r and let Si ⊂ Wi be the subset of

matrices of rank ≤ r. Define S as the Cartesian product

S = S1 × . . .× SN ⊂ W.


Let H = Q1Σ(H)Q2 be the SVD of H (with unitary Q1 and

Q2) and let Σr(H) be the corresponding r-term truncation.

Besides, let Πi : Wi → Si be of the form

Π(H) := Q1Σr(H)Q2, (107)

providing the best possible approximant to H in the set S of

matrices of rank ≤ r, in the Frobenius norm. This involves

the SVD of the matrix block Wi. Defining Π : W → S as in

Lem. 8.4 and using Lem. 8.5, we can apply Thm. 8.1 to

R = L−1 Π L.

Note that exactly this kind of truncation is used in the theory

of H-matrices. The typical block partitioning in the

construction of hierarchical matrices is presented in Fig. 16.

Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 210

Let V1 = Rp×q and V2 = Rr×s, while V = Rpr×qs for some

p, q, r, s ∈ N. The Kronecker product is a mapping from V1 ⊗ V2

into V : for A ∈ V1 and B ∈ V2, the Kronecker product A⊗B is

given by the block matrix

⎡⎢⎢⎢⎣a11B a21B . . .

a12B a22B . . ....

.... . .

⎤⎥⎥⎥⎦ ∈ V .

We say that a matrix M ∈ V has a Kronecker rank ≤ k, if

there is a representation

M =∑

ν=1

Aν ×Bν with Aν ∈ V1, Bν ∈ V2 and ≤ k. (108)

We define the subset of structured matrices S by the set of

all matrices of Kronecker rank ≤ k. If k is not too large, this

is an interesting representation since matrices of the large

size pr × qs can be described by matrices Aν , Bν of small size.


As described in Lect. 6 (cf. operation vec(A)), there is a

simple isomorphism L from V = Rpr×qs to Rpq×rs such that the

representation (108) of M ∈ S ⊂ V = Rpr×qs is equivalent to

rank(L(M)) ≤ k. Hence, we obtain the situation of Lem. 8.5

with W := L(V ) = Rpq×rs.

The truncation operator is again of the form R = L−1 Π L,

where Π : W → W is the optimal SVD-based truncation or an

appropriate substitute.

Our framework can be applied also to tensor (multi-linear)

representation (108) where the number of factors is greater

than 2. In this case the truncation procedures are not so well

developed; however, some are available and claimed to be

efficient in particular applications (mostly for data analysis in

chemometrics, physicometrics, etc.)


Summerize (analysis of the truncated iterations):

Initially, the main purpose of this truncation was the

reduction of storage and of the matrix-by-vector complexity

for a given matrix in V .

In the sequel, the same truncation is used for computing

various matrix functions f(A) of A ∈ S ⊂ V, where B := f(A) is

known to be close to S (e.g., for f(A) = A−1, f(A) =√

A and

for f(A) = sign(A)).

The above results suggest some general framework for a

rigorous analysis of the basic truncated iterative algorithms.

Finally we remark that the optimal truncation is often

replaced by an approximate one which is cheaper to compute

(e.g., by cross approximation techniques (ACA), multi-way

decomposition algorithms, wavelet truncation).

Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 213

The elliptic operator A : V → V ′ with V = H10 (Ω), V ′ = H−1(Ω),

A =d∑

j=1

− ∂

∂xjaj(xj)

∂

∂xj+ bj(xj)

∂

∂xj+ cj(xj)

,

is supposed to have “separable” coefficients. The associated

bilinear form (with c(x) =∑

cj(xj))

a(u, v) =∫

Ω

d∑j=1

aj(x)∂u

∂xj

∂v

∂xj+

d∑j=1

bj(x)∂u

∂xjv + c(x)uv

dx

with a : V × V → R is assumed to be continuous and V -elliptic:

|a(u, v)| ≤ C‖u‖V ‖v‖V , e a(v, v) ≥ δ0‖v‖2V , δ0 > 0.

In tensor-product setting we have (x1, ..., xd) ∈ Ω := (0, 1)d ∈ Rd.


Let X = L2(Ω), then the corresponding elliptic operator A and

its discrete counterpart A (say, A is the FEM/FD stiffness

matrix corresponding to A) satisfy

‖(zI −A)−1‖X←X ≤ 1|z| sin(θ1 − θ)

∀ z ∈ C : θ1 ≤ | arg z| ≤ π,

(109)

for any θ1 ∈ (θ, π), where cos θ = δ0/C.

In the case of discrete elliptic operators A, the bound (109)

on the matrix resolvent is valid uniformly in the mesh-size h

(cf. example below).

The H-matrix and KHT formats are well suited to represent

the following MVFs:

exp(−tA), A−1,√

A, sign(A).


Ex. 8.1. Consider the elliptic operator of divergent type,

A := −d∑

j=1

∂jaj(xj)∂j , x ∈ Ω := (0, 1)d,

defined on V . We assume that aj ≥ a0 > 0 and introduce a

uniform grid with step size h and N = nd interior nodes. Using

the (2d + 1)-point stencil, we obtain the FD discretisation

Ahz := −d∑

j=1

2ajij

zi1...id− bj

ij−1zi1...(ij−1)...id− cj

ij+1zi1...(ij+1)...id

h2,

1 ≤ ij ≤ n, where z denotes the vector corresponding to

[zi1...id]nij=1 ∈ RN given in the tensor-product numbering.

As usual, we can regard d-dimensional n× . . .× n arrays

(tensors) also as one-dimensional ones (vectors) with nd

components.


The matrix A = Ah in (213) takes the form A =d∑

j=1

Aj with

A1 = V 1⊗I⊗. . .⊗I, A2 = I⊗V 2⊗. . .⊗I, . . . , Ad = I⊗. . .⊗I⊗V d,

V j =1h2

⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣

2aj1 −cj

1

−bj2 2aj

2 −cj2

. . .. . .

. . .

−bjn−1 2aj

n−1 −cjn−1

−bjn 2aj

n

⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦n×n

,

and I being the n× n identity. It is easy to see that Aj > 0 for all

j = 1, . . . , d. Moreover, Aj commute pairwise, i.e., AjAm = AmAj, hence

(cf. Thm. 5.3 in Lect. 5)

exp(A) =dY

j=1

exp(Aj) =dO

j=1

exp(V j). (110)


Ex. 8.2. In the situation of Example 8.1, we consider an

application to parabolic problems in Rd posed in the

semi-discrete form. Using the semigroup theory, the solution

of the first order evolution equation

du

dt+ Au = f, u(0) = u0 ∈ R

N ,

with a given initial vector u0 and with a given right-hand side

f ∈ L2(QT ), QT := (0, T )× RN , can be represented as

u(t) = exp(−tA)u0 +

t∫0

exp(−(t− s)A)f(s)ds, t ∈ (0, T ].

Assume that our input data can be represented in the

tensor-product form


u0 ≈r∑

k=1

uk1(x1)⊗ . . .⊗ uk

d(xd),

f(s) ≈r∑

k=1

fk1 (s; x1)⊗ . . .⊗ fk

d (s; xd)

with uki , fk

i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we

obtain the tensor-product approximation u(t) ≈ u(t) by

u(t) =r∑

k=1

⎧⎨⎩d⊗

j=1

exp(−tV j)ukj (xj) +

d⊗j=1

t∫0

exp((s− t)V j)fkj (s; xj)ds

⎫⎬⎭ ,

which can be implemented with complexity O(rdn logp n).

Probl. 1. Represent A−1 in the HKT -format.

Probl. 2. Approximate sign(A) in the HKT -format.

Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 219

Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate

r-term approximation fr(ρ) by exponential sums

|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (111)

with fr(ρ) :=r∑

k=1

ake−bkρ. The question is how accurate does

the ansatz fr(A) represent the matrix-valued function f(A)?

We consider two cases

(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a

diagonal D = diagd1, ..., dn, where di ∈ [1, R].

(B) There is the Dunford-Cauchy integral representation for

the analytic function f :

f(A) =1

2πi

∫Γ

f(z)(zI −A)−1dz.


Lem. 8.6. In Case (A) we have

‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.

In Case (B) let (112) hold with εR = g(z)εΓ, at least for ρ = z

such that z ∈ Γ. Then we have

‖f(A)− fr(A)‖ ≤ εΓ

2πmaxz∈Γ

|g(z)|∫

Γ

∥∥(zI −A)−1∥∥ d |z|.

In the case of discrete elliptic operator A, we have∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C

∫Γ

d |z||z| ,

where the constant depends on the coefficients of the related

operator A and Γ contains σ(A).


Proof: In the first case we readily obtain

‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖

with fi = f(di)− fr(di), which proves the statement. If T is

the unitary transform then ‖T‖ = ‖T−1‖ = 1.

In the cesond case we obtain

‖f(A)− fr(A)‖ =12π

∥∥∥∥∥∫

Γ

[f(z)−r∑

k=1

ake−bkz](zI −A)−1dz

∥∥∥∥∥≤ εΓ

2π

∫Γ

|g(z)| ∥∥(zI −A)−1∥∥ d |z|,

which proves the main assertion. Finally, in the case of

discrete elliptic operators we apply the resolvent estimate,∥∥(zI −A)−1∥∥ ≤ C

|z| .




2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iterations for Structured Matrices.

Preprint MPI MIS, Leipzig 2005.



Lect. 9. Kronecker-prod. representation to A−1 and sign(A) B. Khoromskij, Leipzig 2005 223

Outlook

1. Solution operator exp(−tA) for the linear parabolic eq.:

– well parallelisable;

– avoids time stepping !

2. Repsenting f(A) via approximation to f(z), z ∈ C by

exponential sums∑

ake−bkz.

3. Robust and asymptotically optimal Sinc-quadrature to

represent 1/ρα, ρ ∈ [1, R], α > 0, (cf. Ex. 7.6).

4. HKT representation to f(A) = A−1 and numerics.

5. Robust and asymptotically almost optimal Sinc-quadrature

to represent sign(ρ), |ρ| > a > 0.

6. Generalised HKT representation to f(A) = sign(A).

exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 224

Ex. 9.1. In the situation of Example 8.1, we consider an

application to parabolic problems in Rd posed in the

semi-discrete form (A ∈ RN×N , f ∈ RN). The solution of the

first order evolution equation

du

dt+ Au = f, u(0) = u0 ∈ R

N ,

with a given initial vector u0 and with a given right-hand side

f ∈ L2(QT ), QT := (0, T )× RN , can be represented as

u(t) = exp(−tA)u0 +

t∫0

exp(−(t− s)A)f(s)ds, t ∈ (0, T ].

Assume that our input data can be represented in the

tensor-product form as follows

exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 225

u0 ≈r∑

k=1

uk1(x1)⊗ . . .⊗ uk

d(xd),

f(s) ≈r∑

k=1

fk1 (s; x1)⊗ . . .⊗ fk

d (s; xd)

with uki , fk

i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we

obtain the tensor-product approximation u(t) ≈ u(t) by

u(t) :=r∑

k=1

⎧⎨⎩d⊗

j=1

exp(−tV j)ukj (xj) +

d⊗j=1

t∫0

exp((s− t)V j)fkj (s; xj)ds

⎫⎬⎭ ,

which can be implemented with complexity O(rdn logp n).

Probl. 1. Represent f(A) = A−1 in the HKT -format.

Probl. 2. Approximate f(A) = sign(A) in the HKT -format.

Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 226

Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate

r-term approximation fr(ρ) by exponential sums

|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (112)

with fr(ρ) :=r∑

k=1

ake−bkρ. The question is how accurate does

the ansatz fr(A) represent the matrix-valued function f(A)?

We consider two cases

(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a

diagonal D = diagd1, ..., dn, where di ∈ [1, R].

(B) The analytic function f has the Dunford-Cauchy integral

representation:

f(A) =1

2πi

∫Γ

f(z)(zI −A)−1dz,

where Γ “envelopes” σ(A).


Lem. 9.1. In Case (A) we have

‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.

In Case (B), let (112) hold with εR = g(ρ)εΓ, at least for

ρ = z ∈ Γ. Then we have

‖f(A)− fr(A)‖ ≤ εΓ

2πmaxz∈Γ

|g(z)|∫

Γ

∥∥(zI −A)−1∥∥ d |z|.

In the case of discrete elliptic operator A, we have∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C log

|λmax||λmin| , λmax, λmin ∈ σ(A),

where C depends on the ellipticity and continuity constants of

the related operator A.


Proof: In the first case we readily obtain

‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖with fi = f(di)− fr(di), which proves the statement. If T is

the unitary transform then ‖T‖ = ‖T−1‖ = 1.

In Case (B), we derive

‖f(A)− fr(A)‖ =12π

∥∥∥∥∥∫

Γ

[f(z)−r∑

k=1

ake−bkz](zI −A)−1dz

∥∥∥∥∥≤ εΓ

2π

∫Γ

|g(z)| ∥∥(zI −A)−1∥∥ d |z|,

which proves the general assertion. Finally, in the case of

discrete elliptic operators we shoose Γ in such a way that∥∥(zI −A)−1∥∥ ≤ C

|z| , (cf. Lect. 8), to obtain∫Γ

∥∥(zI −A)−1∥∥ d |z| ≤ C

∫Γ

d |z||z| .

sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 229

The change of variables ξ = log(1 + esinh(w)) in the Laplace

integral transform

1ρ

=∫ ∞

0

e−ρξdξ (ρ > 0) , (113)

leads to

1ρ

=∫

R

cosh(w)F (sinh(w); ρ)dw, with F (u; ρ) :=e−ρ log(1+eu)

1 + e−u.

Lem. 9.2. Let ρ ∈ [1, R] and define the quadrature

IM := hM∑

k=−M

cosh(kh)F (sinh(kh); ρ) ≈∫

R

f2(w; ρ)dw =1ρ.

Then choosing h = log(4πM)/M , implies

‖1/ρ− IM‖L∞[1,R] ≤ Ce− π2M√

2 log(3R) log(4πM) . (114)

sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 230

Proof. Choose δ(ρ) = π2√

2 log 3ρ(does not effect quadrature!).

Then for ρ ∈ (1,∞), f2(w; ρ) = cosh(w)F (sinh(w); ρ), w ∈ R, can

be analytically extended to Dδ := z ∈ C : |m z| ≤ δ with

δ < π/2, s.t. ∫∂Dδ

|f2(z; ρ)| |dz| ≤ const <∞ (115)

independent of ρ. Hence f2 ∈ H1(Dδ), while δ ∈ (0, δ(ρ)], ρ ≥ 1ensures the finite norm N(f2, Dδ) ≤ const <∞, uniform in ρ.

The decay of f2 on the real axis is

f2(w) ≈ 12ew− ρ

2 ew

as w →∞, f2(w) ≈ 12e|w|− 1

2 e|w|as w → −∞,

corresponding to C = 12 , b = 1/2, a = 1 in Thm. 2.6.

If ρ ∈ [1, R], the choice δ = δ(R) in Thm. 2.6 implies (114)

‖1/ρ− IM‖L∞[1,R] ≤ Ce−2πδ(R)M

log(4πM) .

A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 231

Rem. 9.1. Remind that the matrix exponential of a discrete

elliptic operator can be represented in the H-matrix format

with linear-logarithmic cost in view of

exp (−tA) =1

2πi

∫Γ

e−tz(zI −A)−1dz ≈∑

k

ake−tzk(zkI −A)−1.

Lem. 9.3. Suppose A = TDT−1 with e σ(D) ⊂ R>0 and let

A =∑d

j=1 Aj as above. Given M ∈ N, then there is the

HKT -approximand A−1M of the Kronecker rank r = 2M + 1,

that provides exponential convergence

‖A−1 −A−1M ‖ ≤ Ce−sM/ log(4πM), s =

π2

√2 log[3 cond(D)]

.

Proof. First, construct the sinc-quadrature fr(ρ) ≈ f(ρ) = 1/ρ(cf. Lem. 9.2) and then apply the corresponding matrix

approximant fr(A) (cf. Lem. 9.1):

A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 232

Choose h = C log M/M , zk = sinh(kh) and define

A−1 ≈ h

M∑k=−M

cosh(kh)F (zk; A) = h

M∑k=−M

cosh(kh)1 + e−zk

d⊗j=1

e− log(1+ezk )V j

.

Second, apply the H-matrix approx. to each individual

exponent exp(−αkV j) to obtain

A−1 ≈ hM∑

k=−M

cosh(kh)1 + e−zk

d⊗j=1

M1∑m=−M1

κm,j(zk)(ζm,jI − V j)−1=: A−1M .

(116)

Note that each sum in the tensor-product can be converted

into an H-matrix of the rank r1 ≤ (2M1 + 1)rank(ζm,jI − V j)−1

with M1 = O(| log ε|). However, since ζI − V j is a

three-diagonal matrix, the whole sum can be implemented

exactly with O(2M1n) operations.

Numerics I: 1x1+...+xd

- function generated tensor B. Khoromskij, Leipzig 2005(L9) 233

Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],

xi > 0,

1

ρ=

ZR

cosh(w)F (sinh(w))dw ≈ hMX

k=−M

cosh(kh)F (sinh(kh)),

F (u) = e−ρ log(1+eu)

1+e−u , M = O(log ε−1 log R), h = log MM

, r = 2M + 1.

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 17: The absolute quadrature error for R = 103 with M = 16

(left), M = 32 (middle), M = 64 (right). Similar results are observed for

R = 32 · 103.

Numerics II: Elliptic inverse A−1 B. Khoromskij, Leipzig 2005(L9) 234

HKT - approximation to (−∆h)−1 in Rd

Apply Sinc-quadrature in Lem. 9.3.

Kronecker approximation to (−∆h)−1 in [0, 1]d with N = nd, n = 128

M 4 9 16 25 36 49 64

d = 3 2.410-2 3.810-2 5.610-2 9.910-5 2.610-6 8.210-10 7.010-12

d = 6 1.910-2 1.510-3 3.710-4 7.710-7 4.510-9 8.210-12 1.110-14

d = 9 3.010-3 3.010-3 1.010-5 1.610-7 1.010-9 1.410-12 1.710-15

d = 12 3.010-7 3.910-5 1.010-8 7.810-9 1.810-10 5.010-13 5.610-16

Approximation to (−∆h)−1 in [0, 1]d with d = 3, M = 25.

n 4 8 16 32 64 128

ε 2.5 10-8 7.710-8 4.2 10-8 5.7 10-7 8.5 10-6 3.5 10-6

Observations.

1. Method applies on non-uniform grids and for variable

coefficients (generalisation of FFT).

2. We ensure the complexity O(dn logq n) with fixed q ≥ 1.3. Implementation of the matrix-vector multiplication

depends on the sparsity of an argument.

HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 235

Each term in the Kronecker-product representation

A(r) =r∑

k=1

ckV 1k × · · · × V d

k (117)

can be amplified by an extra factor Sk ∈ RN×N . Hence, we

introduce the generalised tensor-product format (GHKT)

A(r) =r∑

k=1

Sk ·(V 1

k × · · · × V dk

) ≈ A (118)

with a matrix Sk ∈ HKT (rS) with O(drSn logq n)-complexity,

where asymptotically rS " n. We denote A(r) ∈ GHKT (r, rS).

The format (118) will be applied to the MVF F (A) = sign(A).

In the following, we suppose that A = T D T−1, di ∈ R.


Lem. 9.4. Let A ∈ RN×N be such that 0 /∈ e σ(A), and let

the function f : R → R satisfy the following assumptions:

(A1) f(t) = −f(−t), t ∈ R,

(A2) cf :=∫∞0

f(t)t dt ∈ (0,∞) exists as an improper integral.

Then we have

sign(A) =1cf

∫R+

f(tA)t

dt ≡ I(A). (119)

Proof. First we note that for a ∈ R \ 0, the assumptions

(A1)-(A2) imply (119) with A substituted by a,

sign(a) =1cf

∫R+

f(ta)t

dt. (120)

Since A = T D T−1, we obtain

f(tA) = T f(tD) T−1. (121)


Moreover, sign(A) = T sign(D) T−1 holds and (120) implies the

desired relation:

1cf

∫R+

f(tA)t

dt = T

(1cf

∫R+

f(tD)t

dt

)T−1 = T sign(D) T−1 = sign(A).

Choice of f(t). We consider the following examples of f

fn(t) :=jn(t)tn−1

, n = 1, 2, . . . ,

where jn(t) are the spherical Bessel functions of the first kind.

In particular, we have j0(t) = sin(t)t and

j1(t) =sin(t)− t cos(t)

t2, j2(t) =

(3t3− 1

t

)sin(t)− 3

t2cos(t).

The functions jn(z) have the asymptotical property

z−njn(z) → 11 · 3 · 5 . . . (2n− 1)

as z → 0 (n = 0, 1, 2, . . .).


We also make use of the integral representation

jn(z) =zn

2n+1n!

∫ π

0

cos(z cos θ) sin2n+1 θ dθ (n = 0, 1, 2, . . .).

(122)

Since the matrix A is diagonalisable, the error analysis of the

quadrature rule is reduced to the scalar case (cf. Lem. 9.1).

An exponentially convergent quadrature for (120) with

f = f1(a) with a ∈ R. In general, one can expect a ∈ [1, Λ] with

1 " Λ, so we deal with the integration of a highly oscillatory

function

f1(at)/t =sin(at)− t cos(at)

t3

with a smooth weight. Hence, we have

f1(at)t

=j1(at)

t1≤ C

at2, t →∞. (123)


The latter implies∣∣∣∣∫ ∞

R

f1(at)t

dt

∣∣∣∣ ≤ C

aR, R > 0.

Given a tolerance ε > 0, we choose R > 0 such that R−1 = aε,

i.e., R = (aε)−1 ≤ ε−1, and then construct a quadrature on the

finite interval [0, R] (recall that a−1 ∈ [Λ−1, 1]). We can assume

without loss of generality that ε = 2−K1 , Λ = 2K0 with some

K0, K1 ∈ N, so that a−1 ∈ [2−K0 , 1].

We split [0, R] into the two parts [0, 2−K0 ] and ω := [2−K0 , R],where we set R = 2K1 .

We now decompose the integration interval ω =K1⋃

k=−K0

[bk, bk+1]

by the points bk = 2k, k = −K0, . . . , 0, . . . , K1.

Note that coefficients q1 = z−3 and q2 = z−2 can be

approximated on each interval δk = [bk, bk+1] by a polynomial


Pp,k of degree ≤ p such that, say,

maxt∈δk

|q1(t)−Pp,k(t)| ≤ Ce−cp (k = −K0, ..., K1). (124)

Next we use the integrals∫ x

0

tm sin(at)dt = −m∑

k=0

k!(mk

)xm−k

ak+1cos

(ax +

12kπ

),

∫ x

0

tm cos(at)dt =m∑

k=0

k!(mk

) xm−k

ak+1sin

(ax +

12kπ

)to obtain the following approximation on the interval ω:

1cf

∫ω

f1(at)t

dt #K1∑

k=−K0

p∑=0

[γk(1/a) sin(ask) + µk(1/a) cos(ack)] ,

providing an exponential convergence of the order O(e−cp).


Due to (122), the integrand f1(az)z is an entire function and,

in particular, holomorphic in the Bernstein ellipse Eρ with

ρ > 1/(2a), corresponding to the interval [0, a−1] (cf. Lect. 4).

Furthermore, maxz∈Eρ

∣∣∣ f1(az)z

∣∣∣ can be estimated by a constant not

depending on a. Therefore, the Gauss quadrature on [0, Λ−1]has exponential convergence. This yields the approximation

sign(λ) ∼ signM (λ) :=M∑

k=1

ak(1/λ) sin(skλ)+bk(1/λ) cos(ckλ) (125)

(with ak, bk polynomials of degr. ≤ p), such that for λ ∈ [1, Λ]

| sign(λ)− signM (λ)| ≤ C(K0 + K1) e−cp

with

K1 = | log ε|, K0 = log(cond(D)), M := (K0 + K1) p. (126)


Rem. 9.2. Matrices A−l, (l = 1, ..., p), can be repreresented by

(117) via the fixed set of tensor-skeletons

Φk := V 1k ⊗ ...⊗ V d

k , k = 1, ..., kA−1, (uniform tensor-basis). We

make use of Φk in (118).

Lem. 9.5. Let A be symmetric with minλ∈σ+(A) λ = O(1).Then, given ε > 0, the quadrature points and weights from

(125) and (126) fulfil∥∥∥∥∥sign(A)−M∑

k=1

[ak(A−1) sin(skA) + bk(A−1) cos(ckA)]

∥∥∥∥∥2

≤ C c(T )(K0 + K1)e−cp, (127)

where ak(A−1), bk(A−1) are polynomials of degree p as defined

in (124), M, K0, K1 are explained in (126) and

c(T ) = ‖T‖‖T−1‖.Proof. Since A = TDT−1, we use the representation (121),


where D has real entries. The estimate (123) implies that we

can restrict integration onto the interval [0, R] and derive∥∥∥∥∥ 1cf

∫ R

0

f1(tA)t

dt−M∑

k=1

[ak(1/A) sin(skA) + bk(1/A) cos(ckA)]

∥∥∥∥∥2

=

∥∥∥∥∥T(

1cf

∫ R

0

f1(tD)t

dt−M∑

k=1

[ak sin(skD) + bk cos(ckD)]

)T−1

∥∥∥∥∥2

≤ c(T ) maxλ∈σ+(A)

∣∣∣∣∣ 1cf

∫ R

0

f1(tλ)t

dt−M∑

k=1

[ak sin(skλ) + bk cos(ckλ)]

∣∣∣∣∣≤ C c(T ) [K0 + K1] e−cp.

Choosing M = p(K0 + K1) (cf. (126)) completes the proof.

Finally, we derive tensor-product representations of the

matrices sin(skA) and cos(ckA) involved in (127). For this

purpose, we apply Prop. 9.1 (cf. Lect.5).


Prop. 9.1. Let d ≥ 2. The trigonometric identity

sin

⎛⎝ d∑j=1

xj

⎞⎠ =d∑

j=1

sin(xj)∏

k∈1,...,d\j

sin(xk + αk − αj)sin(αk − αj)

(128)

holds for all real α1, . . . , αd s.t. sin(αk − αj) = 0 for all j = k.

The following statement extends the trigonometric identity

(128) to the case of matrix-valued functions sin(A) and cos(A).

Lem. 9.6. Let A =d∑

j=1

Aj ∈ RN×N with matrices Aj of the

form as in Lect. 8, where V j ∈ Rn×n (j = 1, . . . , d) and N = nd.

Suppose that α1, . . . , αd ⊂ R are chosen in such a way that

the representation (128) is valid. Then the following


tensor-product representation with exactly d terms

sin(A) =d∑

j=1

d⊗k=1

βkj sin(V j + δkjI), βkj =

⎧⎨⎩1

sin δkj, k = j,

1 k = j,

(129)

and with δkj = αk − αj, holds. A similar result holds for cos(A).

To guarantee the stability of representation (129) we have to

control the condition |αk − αj −mπ| > δ > 0 for m ∈ Z, k = j.

Lem. 9.5 and Lem. 9.6 lead to the GKHT-representation of

the matrix sign(A) with A−1 ∈ HKT (rA−1), sin(skA) ∈ HKT (d).

Setting rS = dM , r = rA−1, we get the complexity

O(d2MrA−1n logq n) provided that each V j (j = 1, . . . , d) can be

diagonalised with the cost O(n logq n), otherwise the cost is

O(n2 logq n).


If some of the assumptions above are not satisfied, one can

apply the integral representation to the matrix sign-function

of A ∈ RN×N ,

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I. (130)

The exponentially convergent quadrature

sign(A) ≈r∑

k=1

ck(zkI −A)−1 − I, r = O (log2 ε + log2 cond(A)

),

for the integral (274) provides the direct approximation of

F (A) = sign(A) by a sum of matrix resolvents. The quadrature

points and weights can be chosen symmetrically w.r.t. the

real axis. Using the standard results for the elliptic inverse,

we are led to the overall cost O(rd2n2 logq n), which is

quadratic in both d and n.

Literature to Lect. 9 B. Khoromskij, Leipzig 2005(L9) 247





Lect. 10. HKT repr. to the Hartree-Fock and Boltzmann eq. B. Khoromskij, Leipzig 2005 248

Outlook

1. Density Function Theory (DFT) via the Hartree-Fock eq.

(A) Reduction to the density matrix eq. via sign-matrices

(B) Representation of the Fock matr. in tensor-product form.

(C) Truncated nonlinear iteration to compute sign(F− µI).The proper formats:

– diagonally dominant, tensor-product data-sparse.

2. Boltzmann eq.

(A) Boltzmann collision integral in the HKT-representation

(B) Hadamard tensor-product operations

3. Ornstein-Zernike (OZ) integral eq. (brief survey).

4. Other directions.

Schrodinger and Hartry-Fock eq. B. Khoromskij, Leipzig 2005(L10) 249

The multi-dimensional Schrodinger eq. leads to the

challenging numerical problem.

The Schrodinger eq. for many-particle system reads as

HΨ = ΛΨ

with the Hamiltonian H = H[r1, ..., rNe ],

H := −12

Ne∑i=1

∆i−K∑

a=1

Ne∑i=1

Za

|ri −Ra|+∑

i<j≤Ne

1|ri − rj |+

∑a<b≤K

ZaZb

|Ra −Rb| ,

Za, Ra are charges and positions of the nuclei, ri ∈ R3. Hence

the problem is posed in Rd with high dimension d = 3Ne.

Desired size of the system is Ne = 10q, q = 1, 2, 3, 4, ...?

Focusing on density matrix computation.

Structured tensor representation to density matrix D in DFT

to approximate the ground state in the Schrodinger eq.

Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 250

In DFT the many-particle problem is mapped onto a system

of noninteracting particles, resulting in a significant

simplification of a computation process. The so-called density

matrices play the key role in order to achieve linear

(sub-linear) scaling in Hartree-Fock-DFT methods.

The Hartree-Fock equation (in R3 !) reads as

Fφi = εiφi, i = 1, ..., Ne/2

with the Hartree-Fock operator

Fφ(x) := −1

2∆φ(x) + Vc(x) φ(x) + 2

Zd3y

ρ(y, y)

|x − y| φ(x) −Z

d3yρ(x, y)

|x − y| φ(y),

x, y ∈ R3. Here the density function ρ(x, y) is defined by

ρ(x, y) :=∑k≤p

φk(x)φk(y), p = Ne/2.


Nonlocal operators related to the Hartree-Fock eq.

1. Integral operators with the Newton potential

(Nu)(y) =∫

Ω

1|x− y|u(x)dx, y ∈ Ω ∈ R

3.

2. IOs with product kernels in R3: J - Hartree potential,

K - exchange potential.

3. sign(·) - to represent the spectral projection D (density

matrix) formed from the “occupied orbitals”

D =12[I− sign(F[D]− µI)], D ∈ R

M×M , M = O(3Ne).

4. 1x+y+z - generated energy tensor in Rn2×n2×n2

, x, y, z ∈ R2.

Tensor decomposition of “orbital energy denominators”.


Suppose that ϕi(x), i = 1, ..., M , is the set of tensor-product

orthogonal basis funct. defined in a bounded hypercube in R3.

Let D = dkl ∈ RM×M be the corresponding matrix

representation to the DF, such that

ρ(x, y) ≈M∑

k,l=1

dklϕk(x)ϕl(y) =: ρ(x, y)

We define “Galerkin type” approximation to the Fock oper.

F = K0 + 2J−K,

where K0 is the Galerkin representation to the “local”

component of the Fock operator and J = Jij, K = Kij with

Kij =∫ ∫

ρ(x, y)|x− y|ϕ

i(x)ϕj(y)dxdy, Jij =∫ ∫

ρ(y, y)|x− y|ϕ

i(x)ϕj(x)dxdy,

are the discrete exchange and Hartree potentials, respectively.


Given D = dk, the matrices K = K[D], J = J[D] can be

calculated as the tensor-matrix products

K = T · D, J = TT (i,) · D, (131)

where a tensor T = T kij is given by

T kij =

∫ ∫ϕk(x)ϕi(x)ϕ(y)ϕj(y)

|x− y| dxdy,

and Kij =∑k,

dkTkij , Jij =

∑k,

dkTkij .

Now we obtain F[D] = K0 −K[D] + 2J[D].

Rem. 10.1. Let KN be the Nystrom discr. of N . Then

(131) simplifies by making use of the Hadamard prod.,

K = KN ! D, J = diag∧[KN · diag∨(D)

],

where diag∧ and diag∨ are the operators converting a vector

into diagonal matrix and vice versa.


Given F, the spectral projection D[k, l] formed from the

occupied orbitals can be computed via the solution of the

eigenvalue problem

FΨj = λjΨj , j = 1, ..., p; λ1 ≤ ... ≤ λp ≤ ...,

by

D[k, l] =∑j≤p

Ψj [k]T Ψj [l].

The complexity scales qubically in M .

Rem. 10.2. The idempotency (proj.) prop. holds: D2 = D.

To avoid the solution of an eigenvalue problem, it is possible

to represent D directly using the matrix sign function.

Lem. 10.1. Let us choose µ ∈ (λp, λp+1), then

D =12[I− sign(F− µI)].


Proof. Since Ψj is orthogonal, F is unitary diagonalisable,

sign(F− µI) =M∑

j=1

ΨTj sign(λj − µ)Ψj ,

hence

D =∑

λj<µ

ΨTj Ψj =

12[I−

M∑j=1

ΨTj sign(λj − µ)Ψj ].

We can implement the corresponding matrix operations in the

tensor-product arithmetics.

Assume that the density matrix D is already represented in

the Kronecker product form (with M = n3)

D =rD∑s=1

D1s ⊗D2

s ⊗D3s , Dm

s ∈ Rn×n,

Dms is associated with the couple (xm, ym), m = 1, 2, 3.


Let ϕi(x) = ϕi1(x1)ϕi2(x2)ϕi3(x3), i = (i1, i2, i3), im = 1, ..., n, and

suppose that the Newton potential can be represented by

1|x− y| ≈

rN∑s=1

N1s (x1, y1)N2

s (x2, y2)N3s (x3, y3).

Due to “separability” results for the Newton potential and

implying the tensor-product structure of a basis, we derive

T =rN∑s=1

T 1s ⊗ T 2

s ⊗ T 3s , T m

s ∈ Rn2×n2

,

where (for m = 1, 2, 3)

[T ms ]kmlm

imjm=∫

Nms (xm, ym)ϕkm(xm)ϕim(xm)ϕlm(ym)ϕjm(ym)dxmdym.

Both T ms and D require O(n4) and O(n2) memory units,

respectively, while the “MVM” T · D now costs O(n4)(compare with O(n12)).


Hierarchical/Wavelet formats for low-dim. components

There are two principal cases:

(A) The FEM-Galerkin approximation.

(B) The wavelet basis ϕi.Note that the kernel-functions Nm

s (xm, ym) are proved to be

asymptotically smooth. Hence, in case (A), the H-matrix

reperesentation to the matrices T ms does a job.

In turn, in case (B), the wavelet representation to the kernels

Nms (xm, ym) can be applied.

Thus, the storage and MVM-complexity related to T ms , is

reduced from O(n4) to linear cost O(n3).


For the Nystrom representation (cf. Rem. 10.1), we enjoy

the sublinear cost O(r2Dn2) for basic matrix-tensor operations

due to (assume that rD = rK)

KN ! D =rD∑

s,t=1

(K1t !D1

s)⊗ (K2t !D2

s)⊗ (K3t !D3

s),

where each Hadamard product is implemented in O(n2) oper.

Concerning the matrix J, we arrive at the optimal complexity

O(r2Dn2) again, due to

diag∨(D) =rD∑s=1

diag∨(D1s)⊗ diag∨(D2

s)⊗ diag∨(D3s).

Now apply the H-matrix format (rank rH) to represent Dms

and Kmt . The Hadamard product of two H-matrices,

Kmt !Dm

s , requires only O(r2Hn log n) op., hence we arrive at

O(n logq n) complexity HKT-arithmetics.

WKT is also applicable.

The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 259

The particle density f(t, x, v), x ∈ Ω ∈ R3, of dilute gas satisfies

the Boltzmann eq.

ft + (v, gradxf) = Q(f, f),

which describes the time evolution of f : R+ × Ω× R3 → R+.

With fixed t, x, the Boltzmann collision integral can be split as

Q(f, f) = Q+(f, f)(v) +Q−(f, f)(v),

where the loss part Q− has a simple form

Q−(f, f)(v) = f(v)∫

R3Btot(‖u‖)f(w)dw

with u = v − w being the relative velocity.

Integral Q− can be approximated by block-Toeplitz matrix in

the linear-logarithmic cost in N = n3.


The gain part can be represented by a double integral

Q+(f, f)(v) =∫

R3

∫S2

B(‖u‖, µ)f(v′)f(w′)dedw, (132)

v′ = 12 (v + w + ‖u‖e) ∈ R3, w′ = 1

2 (v + w − ‖u‖e) ∈ R3; e ∈ S2 ⊂ R3

is the unit vector.

In the case of inverse power cut-off potential, we have

B(‖u‖, µ) = ‖u‖1−4/νgν(µ), ν > 1, µ = cos(θ) =〈u, e〉‖u‖

with gν being a given function of the scattering angle only,

s.t. gν ∈ L1([−1, 1]).〈·, ·〉 denotes the L2- scalar product in Rp, ‖ · || ≡ || · ||2 :=

√〈·, ·〉(with p = 3).


Key point: the efficient calculation of the gain part.

Let F be the p-dimensional Fourier transform, then

Q+(f, f)(v) = Fy→v

[∫R3

g(u, y)F−1z→y[f(z − u)f(z + u)](u, y)du

](v)

with

g(u, y) = g(‖u‖, ‖y‖, | 〈u, y〉 |),that depends only on the three scalar var., ‖u‖, ‖y‖, 〈u, y〉.Indeed, up to a scaling factor

g(u, y) =∫ π

0

gν(cos θ)e−i〈u,y〉 cos θJ0(√‖u‖2‖y‖2 − 〈u, y〉2 sin θ) sin θdθ,

J0(z) is the Bessel function J0(z) = 12π

∫ 2π

0eiz cos ψdψ.

Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 262

Ex. 1. The variable hard spheres (p = 3)

g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖

π), u, y ∈ R

p, λ ∈ (−3, 1], (133)

where the sinc-function (Cardinal function) is defined by

sinc(z) =sin(πz)

πz, z ∈ C.

This model corresponds to the case of second order tensors

(q = 2) with V k ∈ Rn×n×n (cf. Lect. 5,6).

Ex. 2. The general kernel function

g2,λ(u, y) :=‖u− y‖λ√‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ R

p. (134)

The presence of | 〈u, y〉 | in the arguments of g2,λ(u, y) makes

the approximation process much more involved.

Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 263

Main result: Reduce the complexity from O(n6 log n) to

O(n4 log n) in the case (133), and to O(n5 log n) in the case

(134).


Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and

B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)

that maps tensors defined on the index set L into those

defined on J .

Def. 10.1. (cf. Def. 5.3) The Hadamard “scalar” product

[D, C]I ∈ RK of two tensors D := [Di,k] ∈ RI×K and

C := [Ci,k] ∈ RI×K with K ∈ I,J ,L is defined by

[D, C]I :=∑i∈I

[Di,K]! [Ci,K],

where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.

Lem. 10.2. (cf. Lem. 5.2) Let U, Y, B and T be given as

above. Then, with K = J , the following identity is valid

[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (135)


Proof. By definition of the Hadamard scalar product we have

[U ⊗ Y, T ·B]I =∑i∈I

[U ⊗ Y ]i,J ! [T ·B]i,J

=∑i∈I

[[U ]i · Y ]i,J ! [T ·B]i,J

= Y !(∑

i∈I[U ]i[T ·B]i,J

)

= Y !(

T ·∑i∈I

[U ]i[B]i,L

),

then the assertion follows.

Identity (135) is of the great importance in the current

applications since in the right-hand side the operator T is

removed from the scalar product and, so, it applies only once.

Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 266

In numerical modelling of a mono-atomic isotropic liquid with

spherically symmetric Lennard-Jones interaction potential

U(r) = 4ε[(σ/r)12 − (σ/r)6] between the particles (σ and ε are

the resp. size and energy parameters), the Ornstein-Zernike

equation relates the total correlation function h(r) with the

direct correlation function c(r) (with density ρ) by

h(r) = c(r) + ρ

∫R3

c(|r− r′|)h(r′)dr′. (136)

The ”closure” relation is

h(r) = exp[−βU(r) + h(r)− c(r) + B(r)]− 1. (137)

Key point: FFT vs. structured matrices in wavelet basis.

Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 267

0 2 4 6 8 10 12 14−1.5

−1

−0.5

0

0.5

1

1.5

0 2 4 6 8 10 12 14−15

−10

−5

0

5

Figure 18: Radial parts of correlation funct. h(r) (top) and c(r) of simple

mono-atomic liquid with Lennard–Jones potential param. ρ = 0.7, ε = 0.7.



Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],

xi > 0,

1

ρ=

ZR

cosh(w)F (sinh(w))dw ≈ hMX

k=−M

cosh(kh)F (sinh(kh)),

F (u) = e−ρ log(1+eu)

1+e−u , M = O(log ε−1 log R), r = 2M + 1, h = Cintlog M

M.

0 200 400 600 800 1000−8

−6

−4

−2

0

2

4

6x 10

−6

0 200 400 600 800 1000−2.5

−2

−1.5

−1

−0.5

0

0.5

1x 10

−8

0 200 400 600 800 1000−1

−0.5

0

0.5

1

1.5

2

2.5

3x 10

−13

Figure 19: The absolute quadrature error for R = 103 with M = 16

(left), M = 32 (middle), M = 64 (right). Similar results are observed for

R = 32 · 103.



4 7 10 13 16 19 22 25 28 3110

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1


erro

r

F = exp(t −r exp(t)), r=1.0, Cint=1.0

4 7 10 13 16 19 22 25 28 3110

−10

10−8

10−6

10−4

10−2

100


erro

r


4 7 10 13 16 19 22 25 28 3110

−10

10−8

10−6

10−4

10−2

100


erro

r


Figure 20: Quadrature for ρ = 1.0 with different Cint.

Application in QC. Arithmetics with function-generated energy matrix

Ejk =1

ej1 + ej2 + ej3 + ek1 + ek2 + ek3

(ej, ek

> 0), Ejk ∈ RJ×K

with j = (j1, j2, j3) ∈ J , k = (k1, k2, k3) ∈ K, j = 1, ..., NJ , k = 1, ..., NK, for

= 1, 2, 3. Construct a low Kronecker rank separable approximation to1

x1+...+xd,

Pdi=1 xi ∈ [1, R] via the sinc-quadrature/appr. by exp. sums.

For experimental data in quantum chemistry: NJ , NK ∈ [102, 103],

R ∈ [103, 104].

Numerics II: Newton potential (symmetric quadrature) B. Khoromskij, Leipzig 2005(L10) 270

Approximating the Gauss integral 1ρ

=R

RF (t; ρ)dt with ρ = |x− y|, x, y ∈ Rd,

hMX

k=−M

cosh(kh)F (sinh(kh); ρ) ≈Z

R

F (t; ρ)dt, F (t; ρ) =1√π

e−ρ2t2 . (138)

Rank r = M + 1 (symmetric) quadrature (138), ρ = 1.0

M 4 9 16 25 36

ε 1.110-4 1.510-6 2.310-9 2.010-12 < 1.010-15

The Gaussian int. with ρ = 0, 2, 1, 10; Cint = 1.0; applies for ρ ∈ [0.2, 10].

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−12

10−10

10−8

10−6

10−4

10−2

100

102


erro

r

F = exp(−r2t2), r=0.2, Cint

=1.0

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−16

10−14

10−12

10−10

10−8

10−6

10−4

10−2


erro

r

F = exp(−r2t2), r=1., Cint

=1.0

4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1


erro

r

F = exp(−r2t2), r=10., Cint

=1.0

Numerics III: Newton potential (robust quadrature) B. Khoromskij, Leipzig 2005(L10) 271

Robust nonsymmetric quadrature with

1

ρ=

ZR

F (u; ρ)du; F (u; ρ) :=2√π

e−ρ2 log2(1+eu)

1 + e−u, ρ ∈ [1, R].

0 50 100 150 200−4

−3

−2

−1

0

1

2

3x 10

−8

0 200 400 600 800 1000−3

−2

−1

0

1

2

3

4x 10

−7

0 1000 2000 3000 4000 5000−5

0

5x 10

−7

Figure 21: The absolute quadrature error for M = 64 with R = 200 (left),

R = 103 (middle), R = 5 · 103 (right). Similar results are observed in the

case R > 5 · 103.

Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 272

−10

−5

0

5

10

−10

−5

0

5

100

0.2

0.4

0.6

0.8

1

u

Fig 1: 1D Kernel Function f=1/(|u|+|v|); u=x−z, v=y−z

v −5

0

5

−5

0

5−2

−1

0

1

2

3

4

5

u

Fig 1: 1D Kernel Function f=|x||bet*sinc(|uv|); u=x−z, v=y−z

v

Figure 22: Function g2,λ(u, y) for λ = 0 (left) and g1,λ(u, y) for λ = 1.

g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖

π), u, y ∈ R

p, λ ∈ (−3, 1],

g2,λ(u, y) :=‖u − y‖λp

‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ Rp.

Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 273

4 8 12 16 20 24 28 32 36 40 44 4810

−12

10−10

10−8

10−6

10−4

10−2

100


err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16

4 8 12 16 20 24 28 32 36 40 44 4810

−8

10−7

10−6

10−5

10−4

10−3

10−2

10−1


err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25

4 8 12 16 20 24 28 32 36 40 44 4810

−5

10−4

10−3

10−2

10−1


err

or

|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36

Figure 23: L∞-error of the sinc-interp. to |x|λsinc(|x|y), x ∈ [−1, 1], λ = 1.

Best r-term approx. to 1/√

ρ byP

aie−biρ (W. Hackbusch ’05)

L∞- and weighted L2([1, R])-norm.

R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√

ρ

r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3

r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3

r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4

r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5

Approximating sign(A) B. Khoromskij, Leipzig 2005(L10) 274

General definition: Given A ∈ RM×M , M = nd,

sign(A) :=1πi

∫Γ+

(zI −A)−1dz − I ∈ RM×M

with Γ+ ∈ C being any simply closed curve that contains

σ+(A) = λ ∈ σ(A) : eλ > 0.Iterative evaluation:

The Newton-Schulz iteration: Xk → sign(A),

Xk = Xk−1 +12[I − (Xk−1)2

]Xk−1, k = 1, 2, ...

with X0 = A/||A||2 has locally quadratic convergence.

NSI - Convergence theory: Lem. 8.1 applies with α = 2.

Thm. 8.1. applies with α = 2, but under restrictive “nearly

commutativity” condition.

Numerics V: T-NSI to compute sign(∆h − µI) B. Khoromskij, Leipzig 2005(L10) 275

Figure 24: Exact/trunc. NSIs on 16×16- and 32×32-grids (r = 7, r = 10).

Comments on Numerics V. B. Khoromskij, Leipzig 2005(L10) 276

Numerics demonstrates robust and asymptotically optimal

convergence of T-NSI provided that the Kronecker rank is

chosen properly, otherwise, Xk → I (M. Espig, MPI MIS).

Bound on Kronecker-rank: r = O(d(| log ε|+ log cond(A))| log ε|).Complexity of T-NSI: O(dr4n2 + r6) + O(r2r4d3) + ...

T-NSI to compute sign(A − µI) with A = ∆h

Grid t(SVD) t(NSI) t(T-NSI) r(sign(A − µI))

4 × 4 0.02 0.0 0.02 4

8 × 8 0.03 0.03 0.15 6

16 × 16 0.74 0.85 0.64 7

32 × 32 108.5 56.5 17.4 10

64 × 64 6400. 4000. 210. 13

Here t(SV D), t(NSI), t(T −NSI) denote the CP-time (sec.)

required for SVD, exact NSI and truncated NSI, respectively.

Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 277

1. HKT -approximation (for d ≥ 3) is a subtle concept mostly

based on analytic tools with possible algebraic recompression.

It offers the low-Kronecker-rank data-sparse representation to

(a) Integral operators in Rd, e.g., with the Newton, Yukawa

and Helmholtz kernels

1|x− y| ;

e−µ|x−y|

|x− y| , µ ∈ R+;e−i κ2|x−y|

|x− y| , κ2 ∈ R,

(b) A−1, A being the discrete elliptic op. in [a, b]d,e.g., A = −∆− κ2,

(c) Certain class of the matrix-valued functions F(A), e.g.,

sign(A), exp(A),∫

R+

e−tAGe−tBdt.

Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 278

2. We enjoy the sub-linear cost O(dpn logq N), p, = 1, 2 with

N = nd.

3. Applications: FEM/BEM in elliptic and parabolic problems

in Rd, many-particle modelling based on DFT for the

Hartree-Fock eq., Boltzmann eq., Ornstein-Zernike eq., linear

algebra, complexity theory, control theory.

4. By-product: O(N logq N) - O(N1/d logq N) complexity

(approximate) direct elliptic problem solver on non-uniform

tensor grids in Rd and for variable (“separable”) coefficients

(generalisation of FFT).

Sub-linear cost O(N1/d logq N) in the case of tensor rhs.

5. Other directions: chemometrics, statistics, signal

processing (in biology).




2. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic

Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.

3. M. Fedorov, H.-J. Flad, L. Grasedyck, and B.N. Khoromskij: Low-rank wavelet solver for the

Ornstein-Zernike integral equation. Preprint 59, MPI MIS, Leipzig 2005.

4. W. Hackbusch, B.N. Khoromskij, E. Tyrtyshnikov: Approximate iteration for structured matrices.

MPI MIS, Leipzig 2005.

5. H.-J. Flad, W. Hackbusch, B.N. Khoromskij and R. Schneider: Concept of data-sparse tensor-product

approximation in many-particle modelling. Leipzig 2005, in progress.



An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations...

Documents

Transcript of An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations...