An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations...
Transcript of An Introduction to Structured Tensor-Product …In large scale applications the algebraic operations...
0-0
These notes are based on a lecture course given by the author in thesummer semester of 2005 for postgraduate students at the University ofLeipzig/Max-Planck-Institute for Mathematics in the Sciences. The purposeof this course was to provide an introduction to modern methods of a data-sparse representation to integral and more general nonlocal operators basedon the use of Kronecker tensor-product decomposition.
In recent years multifactor analysis has been recognised as a powerful(and really indispensable) tool to represent multi-dimensional data arisingin various applications. Well-known since three decades in chemometics,physicometrics, statistics, signal processing, data mining and in complexitytheory, nowadays this tool has also become attractive in numerical PDEs,many-particle calculations, and in solving integral equations.
Our goal is to introduce the main mathematical ideas and principles whichallow effective representation of some classes of high-dimensional operatorsin the Kronecker tensor-product form, as well as rigorous analysis of thearising approximations. Low Kronecker-rank representation of operators notonly relaxes the “curse of dimensionality”, but also provides efficient numer-ical methods of sub-linear complexity to approximate 2D- and 3D-problems.
Leipzig, July 2005.
1
Everything should be made as simpleas possible, but not simpler.
A. Einstein (1879-1955)
An Introduction to Structured Tensor-Product
Representation of Discrete Nonlocal Operators
Part I: Approximation Tools
Boris N. Khoromskij
University of Leipzig/MPI MIS, summer 2005
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij
Outline of the Lecture Course B. Khoromskij, Leipzig 2005(L1) 2
1. Ubiquitous data-sparse matrix arithmetics; look on Fourier kingdom.
2. Celebrated sampling theorem; Sinc interpolation and quadratures.
3. Introduction to wavelet techniques.
4. Separable approximation to multi-variate functions in Rd.
5. Kronecker-product decomposition of high-dimensional tensors.
Combination with H-matrix, FFT- and FWT-based formats.
6. Hierarchical Kronecker-product (HKT) representation to
multi-dimensional integral operators Au =R
Rd g(·, y)u(y)dy.
7. Structured representation to matrix-valued functions with application
to A−1,√
A, sign(A).
8. Truncated iteration: convergence and truncation error analysis.
9. HKT approximation to matrix-valued functions A−1,√
A, sign(A).
10. Application to the Hartree-Fock and Boltzmann equations.
Lect. 1. Ubiquitous data-sparse matrix arithm.; Fourier kingdom. B. Khoromskij, Leipzig 2005 3
Basic physical models are described by nonlocal data transfer.
In large scale applications the algebraic operations on high-dimensional,
densely populated matrices/tensors require huge computational resources.
Standard methods suffer from the “curse of dimensionality” (R. Bellman).
Examples of (discrete) nonlocal operators:
1. Multi-dimensional integral operators in Rd
2. Elliptic/parabolic solution operators (e.g., financial PDEs)
3. Lyapunov/Riccati matrix equations in control theory
4. Density matrix calculation for many-particle systems
5. Deterministic Boltzmann equation in R3 (dilute gas).
6. Ornstein-Zernike integral equation in R3(theory of disordered matter)
7. Chemometric, psychometric, stochastic models ...
Huge problems: special methods vs. super-computers B. Khoromskij, Leipzig 2005(L1) 4
Complexity of standard matrix operations:
NStor ≈ NA·v = O(N2) for the storage/MVM of fully populated
matrix A ∈ RN×N ; besides NA−1 ≈ NA·B ≈ NL·U = O(N3).
A paradigm of up-to-date numerical simulations:
the faster the computer is the better asymptotical complexity
of the algorithm is required (speed increases proportional to memory).
In low dimensions (d ≤ 3) the goal is O(N)-methods.
Basic principles: making use of hierarchical structures,
low-rank pattern and recursive algorithms.
In multi-dimensional perspective O(N) is not enough since the
“curse of dimensionality”: N = nd (3 · 1022 mol. in 1 cm3 of water).
The challenge is to develop O(n)-algorithms !
Main ideas: tensor-product data-struct. + H-matrix formats.
Old and new ideas or what we are going to discuss B. Khoromskij, Leipzig 2005(L1) 5
Based on recursions via hierarchical structures:
Classical Fourier (1768-1830) methods, FFT in O(N log N) op.
Circulant convolution, Toeplitz, Hankel matrices.
Multiresolution representation via wavelets, FWT in O(N) op.
Data and matrix compression in O(N) op.
Multigrid methods: O(N) - elliptic problem solvers.
Domain decomposition: O(N/p) - parallel algorithms.
Panel clustering, fast multipole, H-matrix in O(qdN logβ N) op.
Well suited for integral (nonlocal) operators in FEM/BEM.
Based on tensor-product data organization:
Kronecker tensor-product (KT) representation in RN , N = nd
(multiway decomposition): O(nq logβ n), q = q(d) - fixed.
Combination of KT formats with H-matrix, wavelet or
FFT-based structures: O(n logβ n) op.
Alternative directions: Compress the input data B. Khoromskij, Leipzig 2005(L1) 6
• High order methods: hp-FEM/BEM, spectral methods,
bcFEM (Khoromskij, Melenk), Richardson extrapolation.
• Adaptive mesh refinement: a priori/a posteriori strateg.
• Best N-term nonlinear approximation (wavelet/FEM)
• Dimension reduction: boundary/interface equations,
Schur complement methods.
• Combination of tensor-product basis with anisotropic
adaptivity: hyperbolic cross approximation by
FEM/wavelets, sparse grids.
• Model reduction: multi-scale, homogenization, genetic
algorithms, neural networks.
• Monte-Carlo methods (e.g., random walk dynamics).
Fourier kingdom. Fourier transform in L1(R) B. Khoromskij, Leipzig 2005(L1) 7
Continuous Fourier transform (S.G. Mallat)
f(ω) :=∫
R
f(t)e−iωtdt.
If f ∈ L1(R) then f ∈ C0(R) and |f(ω)| ≤ ∫R|f(t)|dt < +∞.
If f, f ∈ L1(R) then the inverse Fourier transform is given by
f(t) :=12π
∫R
f(ω)eiωtdω.
Let f, h ∈ L1(R). The convolution
g(t) = f ∗ h :=∫
R
f(t− u)h(u)du
then satisfies
g =12π
∫R
g(ω)eiωtdω ∈ L1(R) with g(ω) = h(ω)f(ω).
Important features of the Fourier transform B. Khoromskij, Leipzig 2005(L1) 8
Each frequency eiωt is amplified by a factor h.
Hence a convolution is called a frequency filtering with a
transfer function of a filter h.
Important relations between f(t) and its FT f(ω):
Inverse: f(t) ⇐⇒ 2πf(−ω)Convolution: (h ∗ f)(t) ⇐⇒ h(ω)f(ω)Multiplication: h(t)f(t) ⇐⇒ 1
2π (h ∗ f)(ω)Translation: f(t− u) ⇐⇒ e−iuω f(ω)Modulation: eiνtf(t) ⇐⇒ f(ω − ν)Scaling: f(t/s) ⇐⇒ |s|f(sω)Time derivatives: f (p)(t) ⇐⇒ (iω)pf(ω)Frequency derivatives: (−it)pf(t) ⇐⇒ f (p)(ω)Complex conjugate: f∗(t) ⇐⇒ f∗(−ω)Hermitian symmetry: f(t) ∈ R ⇐⇒ f(−ω) = f∗(ω).
Fourier transform in L2(R) B. Khoromskij, Leipzig 2005(L1) 9
The inner product of f, h ∈ L2(R) and L2(R)-norm:
〈f, h〉 =∫
R
f(t)h∗(t)dt, ||f ||2 = 〈f, f〉 =∫
R
|f(t)|2dt.
Let f, h ∈ L1(R) ∩ L2(R). The Parseval and Plancherel
formulas read, respectively, as
〈f, h〉 =12π
∫R
f(ω)h∗(ω)dω, ||f ||2 =12π
∫R
|f(ω)|2dω.
The global regularity of f(t) can be controlled by the decay
rate of |f(ω)|, i.e.,
|f (k)(t)| ≤ 12π
∫R
|f(ω)||ω|kdω, k = 0, 1, ...
and f (k) is continuous, if the corresponding integrals converge.
Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 10
Example 1.1. For a Dirac δ (tempered distribution)
concentrated at the origin t = 0, i.e.,∫
Rδ(t)f(t)dt = f(0),
δ(ω) =∫
R
δ(t)e−iωtdt = 1 (formal representation).
Example 1.2. The FT of the characteristic (indicator, step)
function f(t) = χ[−T,T ](t) =
⎧⎨⎩1 if t ∈ [−T, T ],
0 otherwise:
f(ω) =∫ T
−T
e−iωtdt =2 sin(Tω)
ω∈ L1(R) (not integrable).
Example 1.3. An ideal low-pass filter has a transfer function
h = χ[−ξ,ξ](ω), thus its inverse FT (impulse response) is
h(t) =12π
∫ ξ
−ξ
eiωtdω =sin(ξt)
πt.
With ξ = π, we obtain the classical sinc-function.
Examples of FT (I) B. Khoromskij, Leipzig 2005(L1) 11
−1 −0.5 0 0.5 1 1.5 2
−0.5
0
0.5
1
1.5
Haar scaling function
−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Sinc function
Figure 1: Haar (indicator) and Sinc scaling functions.
Functions χ[−π,π](t) (cf. Haar scaling function) and sinc(t)have the complementary (in fact, the opposite) features in
the time and frequency (Fourier) domains.
Numerous wavelet families realize certain compromise
between these two “extreme cases”.
Examples of FT (II) B. Khoromskij, Leipzig 2005(L1) 12
Example 1.4. A FT for a translated Dirac δτ (t) = δ(t− τ) is
calculated by evaluating e−iωt at t = τ :
δτ (ω) =∫
R
δ(t− τ)e−iωtdt = e−iωτ .
For the Dirac comb c(t) =∞∑
n=−∞δ(t− nT ) we have
c =∞∑
n=−∞e−inTω.
Example 1.5. A FT of a Gaussian f(t) = exp(−t2) ∈ C∞ is
also a Gaussian:
f(ω) =√
π exp(−ω2/4).
We readily get 2f ′(ω) + ωf(ω) = 0, which proves the statement.
Fourier series of 2π-periodic functions B. Khoromskij, Leipzig 2005(L1) 13
Denote by L2[−π, π] the Hilbert space of 2π-periodic functions
with the inner product and norm
〈f, h〉 =12π
∫ π
−π
f(ω)h∗(ω)dω, ||f ||2 =12π
∫ π
−π
|f(ω)|2dω.
Thm. 1.1. The family of functions e−ikωk∈Z is an
orthonormal basis of L2[−π, π].
Let lp(Z) be the space of complex-valued sequences f [k]k∈Z
such that∞∑
k=−∞|f [k]|p < +∞. Thm. 1.1 proves that if
f ∈ l2(Z), the Fourier series
f(ω) =∞∑
k=−∞f [k]e−iωk, with f [k] =
12π
∫ π
−π
f(ω)eiωkdω,
is the decomposition of f ∈ L2[−π, π] in the orthogonal Fourier
basis.
Discrete Fourier transform B. Khoromskij, Leipzig 2005(L1) 14
Let SN be the space of finite sequences f [n]0≤n<N of period
N . SN is an Euclidean space with the inner product
〈f, g〉 =N−1∑n=0
f [n]g∗[n].
Thm. 1.2. The familyek[n] = exp
(2iπkn
N
)0≤k<N
is an
orthogonal basis of SN with ||ek||2 = N . Any f ∈ SN can berepresented by
f =
N−1Xk=0
〈f, ek〉||ek||2
ek. (1)
Def. 1.1. The discrete Fourier transform (DFT) of f is
bf [k] := 〈f, ek〉 =
N−1Xn=0
f [n] exp
„−2iπkn
N
«, (N2 complex multipl.).
Due to (1) an inverse DFT is given by
f [n] :=1
N
N−1Xk=0
bf [k] exp
„2iπkn
N
«.
Fast Fourier transform: Outlook B. Khoromskij, Leipzig 2005(L1) 15
FFT: hierarchical recursive algorithm
The fast Fourier transform (FFT) can be traced back (1805)
to Gauss (1777 - 1855). First computer progr. Coolly/Tukey (1965).
FFT is to split the unknown Fourier coefficients f [k],k = 0, ..., N − 1, into the odd and even parts.
Let N = 2q. This allows to make use recursion:
a problem of dimension N = 2q (level q) is transformed to two
problems of dimension N/2 = 2q−1 (level q − 1) plus O(N)operations, etc. until it is reduced to N = 2q problems of
dimension 1 (level 0).
Since the cost per step is O(N) and the number of levels is
q = log2 N , this results in the linear-logarithmic complexity
O(N log2 N) ∼ CF N log2 N with small const. CF ∼ 4.
FFT: sketch of the algorithm B. Khoromskij, Leipzig 2005(L1) 16
When the frequency index is even, we group the terms n and n + N/2:
f [2k] =N/2−1∑
n=0
(f [n] + f [n + N/2]) exp(−2iπkn
N/2
).
When the frequency index is odd, we have
f [2k + 1] =N/2−1∑
n=0
exp(−2iπn
N
)(f [n]− f [n + N/2]) exp
(−2iπkn
N/2
).
First equation shows that even frequencies are obtained
calculating the DFT of N/2 periodic signal
fe[n] = f [n] + f [n + N/2],
second eq. implies that odd frequencies can be computed by
the DFT of the diagonally scaled N/2 periodic signal
fo[n] = exp(−2iπn
N
)(f [n]− f [n + N/2]).
FFT: Matrix representation B. Khoromskij, Leipzig 2005(L1) 17
The FT matrix FN = fk,nNk,n=1 is given by
fk,n := exp(−2iπkn
N) = W−nk, W = e2iπ/N .
The FFT recursion connects the M-point transform to two
copies of the M/2-point transform
FN =
⎛⎝ IN/2 DN/2
IN/2 −DN/2
⎞⎠⎛⎝ FN/2 0
0 FN/2
⎞⎠⎛⎝ even
odd
⎞⎠ .
IN/2 is the identity matrix, DN/2 is the diagonal matrix with
diagonal entries 1, W−1, ..., W−N/2. The permutation matrix
at the end transforms the input vector into its “even” and its
“odd” part.
Finally, the FFT algorithm keeps going, recursively:
FN → FN/2 → ... → F1.
FFT: complexity, inverse transform B. Khoromskij, Leipzig 2005(L1) 18
The DFT(N) may be calculated with two DFT(N/2) plus
CF N operations to compute fe[n] and fo[n], n = 0, ..., N/2− 1.We obtain the recursion
NFFT (N) = 2NFFT (N/2) + CF N with NFFT (1) = 0.
Setting N = 2q, q ∈ N, and introducing Q(q) = NFFT (N)/N , we
get
Q(q) = Q(q − 1) + CF with Q(0) = 0,
which implies Q(q) = CF q. Hence NFFT (N) = CF N log2 N .
The inverse FFT of f can be derived from the forward FFT
of its complex conjugate f∗ due to
f∗[n] :=1N
N−1∑k=0
f∗[k] exp(−2iπkn
N
).
FFT: fast discrete convolution B. Khoromskij, Leipzig 2005(L1) 19
Let g be the discrete convolution of two signals f, h supported
only by the indices 0 ≤ n ≤ M − 1,
g[n] = (f ∗ h)[n] =∞∑
k=−∞f [k]h[n− k].
The naive implementation requires M(M + 1) operations.
It can be represented as a matrix-by-vector product (MVP)
with the Toeplitz matrix
T = h[n− k]0≤n,k<M ∈ RM×M , g = Tf.
Extending f and h with over M samples by
h[M ] = 0, h[2M − i] = h[i], i = 1, ..., M − 1,
f [n] = 0, n = M, ..., 2M − 1,
we reduce the problem to the MVP with a circulant matrix
C ∈ R2M×2M specified by the first row h ∈ R2M .
FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 20
An n× n matrix C is called circulant if it has the form
C = circc1, . . . , cn :=
⎛⎜⎜⎜⎜⎜⎜⎝c1 c2 . . . cn
cn c1 . . . cn−1
......
. . ....
c2 . . . cn c1
⎞⎟⎟⎟⎟⎟⎟⎠ , ci ∈ C .
The set of all n× n circulant matrices is closed with respect
to addition and multiplication by a constant.
Any circulant matrix C is associated with the polynomial
pc(z) := c1 + c2z + . . . + cnzn−1, z ∈ C.
FFT: circulant convolution B. Khoromskij, Leipzig 2005(L1) 21
Matrix C has a diagonal representation in the Fourier basis,
C = FTn ΛcFn
with
Λc = diagpc(1), . . . , pc(ωn−1), ω = eiπ/n.
The eigenvector corresponding to the eigenvalue pc(ωj−1) is
given by jth column of Fn, i.e.,
ωj =1√n
(ω(k−1)(j−1))k=1,...,n.
The matrix-vector product with C costs 2CF n log2 n + O(n) op.
Multi-dimensional FFT can be performed by tensorization
process with the linear-logarithmic cost O(N log2 N), N = nd.
Literature to Lecture 1 B. Khoromskij, Leipzig 2005(L1) 22
1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.
2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.
3. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.
4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,
Leipzig 2003.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor1.ps
Lecture 2. Sampling Theorem, Sinc Approximation B. Khoromskij, Leipzig 2005 23
How to discretise analog signals ?
The class of functins f(t), t ∈ R (analog signals) can be
discretized by recording their sample values f(nh)n∈Z at
intervals h > 0.V.A. Kotelnikov (1933) and J. Whittaker (1935) proved a celebrated
theorem: band-limited signals can be exactly reconstructed
via their sampling values.
The sinc function (also called Cardinal function) is given as
sinc(x) :=sin(πx)
πxwith convention sinc(0) = 1.
Thm. 2.1. (Kotelnikov, Shannon, Whittaker) If the support of f is
included in [−π/h, π/h] then
f(t) =∞∑
n=−∞f(nh)Sn,h(t), t ∈ R, (2)
where Sn,h(t) = sinc(t/h− n).
Sampling Theorem B. Khoromskij, Leipzig 2005(L2) 24
−1 −0.5 0 0.5 1 1.5 2
−0.5
0
0.5
1
1.5
Haar scaling function
−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Sinc function
Figure 2: Haar (cf. bf of f = sinc) and Sinc scaling functions.
Sampling theorem plays an important role in tele/radio
communications, signal processing, stochastical models etc.
The class of band-limited functions has a direct
characterisation, namely, it is the Paley-Wiener space W (π/h)of entire functions of exponential type (see later on).
Proof of Sampling Theorem (I) B. Khoromskij, Leipzig 2005(L2) 25
Preliminaries to the proof.
(a) The Poisson formula is (in the sense of distributions)
c =∞∑
n=−∞e−inhω =
2π
h
∞∑k=−∞
δ
(ω − 2kπ
h
). (3)
Recall that c =∞∑
n=−∞e−inhω is the FT of the Dirac comb
c(t) =∞∑
n=−∞δ(t− nh) (cf. Ex. 1.4).
Since c is 2πh -periodic, it suffices to prove that c[−π/h,π/h] = 2π
h δ.
(b) To any sample f(nh) we associate a Dirac and introduce
the weighted Dirac sum fd(t) :=∞∑
n=−∞f(nh)δ(t− nh). Since the
FT of δ(t− nh) is e−inhω, we obtain fd =∞∑
n=−∞f(nh)e−inhω.
Proof of Sampling Theorem (II) B. Khoromskij, Leipzig 2005(L2) 26
(c) Now f(t) can be computed from the sample values f(nh)due to the simple relation between FTs fd and f as follows.
Lem. 2.2. The FT of fd is given by
fd(ω) =1h
∞∑k=−∞
f
(ω − 2kπ
h
).
Proof. f(nh)δ(t− nh) = f(t)δ(t− nh) implies
fd(t) := f(t)∞∑
n=−∞δ(t− nh) ≡ f(t)c(t).
Computing the FTs
fd =12π
f ∗ c(ω) (4)
we apply the Poisson formula (3) to represent c(ω).
Since f ∗ δ(ω− ξ) = f(ω− ξ), inserting the above formula to (4)
proves Lem. 2.2.
Proof of Sampling Theorem (III) B. Khoromskij, Leipzig 2005(L2) 27
Proof of Sampling Theorem.
If n = 0, the support of f(ω − nπ/h) does not intersect the
support of f(ω) since f(ω) = 0 for |ω| > π/h. Thus Lem. 2.2
implies
fd(ω) =f(ω)
hif |ω| ≤ π
h.
Recall that the FT of S0,h = sinc(t/h) is S0,h = hχ[−π/h,π/h].
Since supp(f) ∈ [−π/h, π/h], the previous relation results in
f(ω) = S0,h(ω)fd(ω).
The inverse FT of this equation, that is f(t) = S0,h ∗ fd(t),leads to the required result (since Sn,h(t) = S0,h(t− nh))
f(t) = S0,h ∗∞∑
n=−∞f(nh)δ(t− nh) =
∞∑n=−∞
f(nh)S0,h(t− nh).
Generalised sampling theorem B. Khoromskij, Leipzig 2005(L2) 28
Sampling Thm. as a decomposition in orthogonal basis.
Define the space Uh as a set of functions whose FTs have a
support included in [−π/h, π/h].
Lem. 2.2. A set of functions Sn,h(t)n∈Z is an orthogonal
basis of the space Uh. If f ∈ Uh then
f(nh) =1h〈f(t), Sn,h(t)〉 .
Cor. 2.3. The sinc-interpolation formula of Thm. 2.1 can be
interpreted as a decomposition of f ∈ Uh in an orthogonal
basis of Uh:
f(t) =1h
∞∑n=−∞
〈f(·), Sn,h(·)〉Sn,h(t).
If f ∈ Uh, one finds the orthogonal projection of f in Uh.
Proof of Lemma 2.2 B. Khoromskij, Leipzig 2005(L2) 29
Use Sampling Theorem and the Parseval formula.
Recall that S0,h = hχ[−π/h,π/h] and apply the Parseval formula
〈Sn,h(u), Sm,h(t)〉 = 12π
∫R
h2χ[−π/h,π/h]e−inhωeimhωdω
= h2
2π
π/h∫−π/h
e−i(n−m)hωdω = hδ[n−m].
Hence, Sn,h(t)n∈Z is the orthogonal family. Since
Sn,h(t) ∈ Uh, Thm. 2.1 implies that any f ∈ Uh can be
represented as a linear combination of Sn,h(t)n∈Z, i.e., the
latter is an orthogonal basis of Uh.
To verify the second assertion, we again apply the Parseval
formula to obtain
〈f(t), Sn,h(t)〉 =h
2π
∫ π/h
−π/h
f(ω)einhωdω = hf(nh).
Sinc-interpolation of entire functions B. Khoromskij, Leipzig 2005(L2) 30
When the Sinc-interpolant represents a funct. exactly?
C(f, h)(x) =∞∑
k=−∞f(kh)Sk,h(x).
Def. 2.5 Let h > 0, and let W(π/h) denote the family of
entire functions, s.t.∫
R|f(t)|2dt < ∞, and s.t. for all z ∈ C
|f(z)| ≤ Ceπ|z|/h with constant C > 0.
Thm. 2.4 (Stenger) h−1/2Sk,h(x)k∈Z is a complete
L2(R)-orthonormal sequence in W(π/h). Every f ∈W(π/h) has
the cardinal series representation
f(x) = C(f, h)(x), x ∈ R.
Proof: Consequence of the classical Paley-Wiener Theorem.
Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 31
Interpolant C(f, h) provides an incredibly accurate approx.
on R for functions that are analytic and uniformly bounded on
the strip
Dδ := z ∈ C : |m z| ≤ δ, 0 < δ <π
2,
such that
N(f, Dδ) :=∫
R
(|f(x + iδ)|+ |f(x− iδ)|) dx < ∞.
This defines the Hardy space H1(Dδ).
For functions f ∈ H1(Dδ)
supx∈R
|f(x)− C(f, h)(x)| = O(e−πδ/h) h → 0. (5)
Sinc-approximation of analytic functions B. Khoromskij, Leipzig 2005(L2) 32
Likewise, if f ∈ H1(Dδ), the integral
I(f) =∫
Ω
f(x)dx (Ω = R or Ω = R+) (6)
can be approximated with exponential convergence by the
Sinc-quadrature
T (f, h) := h
∞∑k=−∞
f(kh)(
=∫
R
C(f, h)(x)dx ≈ I(f))
,
|I(f)− T (f, h)| = O(e−πδ/h) h → 0. (7)
Analogues estimates hold for (computable) trucated sums
CM (f, h) =∑M
k=−M f(kh)Sk,h(x), TM (f, h) = h∑M
k=−M f(kh).
Standard error estimates B. Khoromskij, Leipzig 2005(L2) 33
Thm. 2.5. (Stenger) If f ∈ H1(Dδ) and |f(x)| ≤ C exp(−b|x|) for
all x ∈ R b, C > 0, then
‖f − CM (f, h)‖∞ ≤ C
[e−πδ/h
2πδN(f, Dδ) +
1bh
e−bhM
], (8)
|I(f)− TM (f, h)| ≤ C
[e−2πδ/h
1− e−2πδ/hN(f, Dδ) +
1be−bhM
]. (9)
Proof: First term of the rhs in (8) represents the
approximation error (5),
‖f(x)− C(f, h)(x)‖∞ ≤ N(f, Dδ)2πδ sinh(πδ/h)
,
while the second one gives the truncation error
‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1
|f(kh)|
≤ 2C∞∑
k=M+1
e−bkh ≤ 2Cbh e−bhM .
Exponential convergence rate B. Khoromskij, Leipzig 2005(L2) 34
Similar arguments apply to (9).
For interpolation error (8), the choice
h =√
πδ/bM
implies the exponential convergence rate
‖f − CM (f, h)‖∞ ≤ CM1/2e−√
πδbM . (10)
In fact, for the chosen h, the first term in the right-hand side
in (8) dominates, hence (10) follows. Usually we set δ = π/2.
For the quadrature error (9), the choice
h =√
2πδ/bM
yields
|I(f)− TM (f, h)| ≤ Ce−√
2πδbM . (11)
Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 35
If f has a double-exponential decay as |x| → ∞, i.e.,
|f(x)| ≤ C exp(−bea|x|) for all x ∈ R with a, b, C > 0, (12)
the convergence rate of Sinc-interpolation and quadrature
can be improved up to O(e−cM/ log M ) (cf. Thm. 2.5).
Thm. 2.6. (Gavrilyuk, Hackbusch, Khoromskij) Let f ∈ H1(Dδ) with
some δ < π2 , and let (12) hold. Then the choice
h = log( 2πaMb )/ (aM) leads for the quadrature error
|I − TM (f, h)| ≤ C N(f, Dδ)e−2πδaM/ log(2πaM/b). (13)
The choice h = log(πaMb )/ (aM) implies for the interpolation
error
‖f − CM (f, h)‖∞ ≤ CN(f, Dδ)
2πδe−πδaM/ log(πaM/b). (14)
Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 36
Proof. The quadrature error has a bound
|I − TM (f, h)| ≤ C
[e−2πδ/h
1− e−2πδ/hN(f, Dδ) +
e−ahM
abexp(−beahM )
].
In fact the bound for |I − T (f, h)| is the same as in Thm. 2.5.
For the rest sum we use the simple estimate to obtain
∑k: |k|>M
exp(−bea|kh|) = 2∞∑
k=M+1
exp(−bea|kh|)
≤ 2∫ ∞
M
exp(−bea|xh|)dx ≤ 2e−ahM
abhexp(−beahM ).
Now (13) follows.
Error bounds in the case of double-exponential decay B. Khoromskij, Leipzig 2005(L2) 37
The interpolation error of CM (f, h) satisfies
‖f − CM (f, h)‖∞ ≤ C
[e−πδ/h
2πδN(f, Dδ) +
e−ahM
abhexp(−beahM )
].
Again, the approximation error allows the same estimate as in
the standard case. The truncation error bound is determined
by the decay rate of f as |x| → ∞,
‖C(f, h)(x)− CM (f, h)(x)‖∞ ≤ ∑|k|≥M+1
|f(kh)|
≤ 2C∞∑
k=M+1
e−beakh ≤ 2CbaheahM e−beahM
,
which proves (14).
Sinc-interpolation on (a, b) via Thm. 2.5 B. Khoromskij, Leipzig 2005(L2) 38
To apply Thm. 2.5 in the case Ω = (a, b) (say, Ω = R+) one
has to substitute the variable x ∈ Ω by x = ϕ(ζ) such that
ϕ : R → (a, b) is a bijection. This changes f : (a, b)→ R into
f1 := ϕ′ · (f ϕ) : R → R (quadrature),
f1 := f ϕ (interpolation).
Assuming f1 ∈ H1(Dδ), one can apply (10)-(11) to the
transformed function.
Ex. 2.1. In the case of interval, (a, b):
ϕ−1(z) = log[(z − a)/(b− z)], e z = x.
Ex. 2.2. In the case of semi-axis, R+ := (0,∞):
ϕ−1(z) = log[sinh(z)] or ϕ−1(z) = log(z).
Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 39
Polynomial decay. Let us set Ω = R+ and assume:
(i) f can be analytically extended from R+ into the sector
D(1)δ = z ∈ C : | arg(z)| < δ for some 0 < δ < π/2, (15)
(actually, ϕ−1 : D(1)δ → Dδ is the conformal map),
(ii) f satisfies the inequality
|f(z)| ≤ c|z|α−1(1+|z|)−α−β for some 0 < α, β ≤ 1 and ∀z ∈ D(1)δ .
Let α = 1. Choosing any M ∈ N and taking
h(1) =√
2πδ/(βM), (16)
we define the corresponding quadrature rule (with ϕ(ζ) = eζ)
I(1)M = h(1)
M∑k=−βM
κ(1)k f(z(1)
k ), z(1)k = ekh(1)
, κ(1)k = ekh(1)
,
Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 40
possessing the exponential convergence rate∣∣∣I − I(1)M
∣∣∣ ≤ Ce−√
2πδβM (17)
with a positive constant C independent of M .
d
d 0
Dd1
id
0
d
d
Dd3
Figure 3: The analyticity sector D(1)δ (left) and the “bullet-shaped” do-
main D(2)δ .
Sinc quadratures on R+ B. Khoromskij, Leipzig 2005(L2) 41
Exponentail decay. Assume that the integrand f can be
analytically extended into the “bullet-shaped” domain
D(2)δ = z ∈ C : | arg(sinh z)| < δ, 0 < δ < π/2,
and that f satisfies
|f(z)| ≤ C
( |z|1 + |z|
)α−1
e−β e z in D(2)δ , α, β ∈ (0, 1]. (18)
Setting α = 1 and choosing h(2) = h(1), κ(2)k = 1 + e−2kh(2)
and
M ∈ N, we obtain the quadrature
I(2)M = h(2)
M∑k=−βM
κ(2)k f(z(2)
k ), z(2)k = log[ekh(2)
+√
1 + e2kh(2) ],
possessing again the exponential convergence rate.
Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 42
For applications in FEM/BEM, we reformulate the result of
Thm. 2.6 for parameter dependent functions g(x, y),y ∈ Y ⊂ Rm, defined on the reference interval x ∈ (0, 1].Introduce the mapping
ζ ∈ Dδ → φ(ζ) =1
cosh(sinh(ζ)), δ <
π
2. (19)
Clearly, (0, 1] = φ(R) and, also, φ(ζ) decays twice exponentially,
|φ(ζ)| ≤ 2 exp(−cos δ
2e|e ζ|), ζ ∈ Dδ.
In particular, we have |φ(ζ)| ≤ 2 exp(− 12e|ζ|), ζ ∈ R. Let
Dφ(δ) := φ(ζ) : ζ ∈ Dδ ⊃ (0, 1] be the image of Dδ. One
checks easily that Dφ(δ) ⊂ Sr(0)\0, where Sr(0) is the disc
around zero with a radius r > 1.
Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 43
Hence, if a function g is holomorphic on Dφ(δ), then
f(ζ) := φα(ζ)g(φ(ζ)) for any α > 0
is also holomorphic on Dδ. Now the Sinc interpolation
CM (f(·, y), h)(ζ) =M∑
k=−M
f(kh, y)Sk,h(ζ)
with the back-transformation ζ = φ−1(x) = arsinh(arcosh( 1x))
and multiplication by x−α yields the separable approximation
gM (x, y) :=M∑
k=−M
φ(kh)α
xαg(φ(kh), y)Sk,h(φ−1(x)) ≈ g(x, y) (20)
of g(x, y) for x ∈ (0, 1] = φ(R) and y ∈ Y . Since φ(ζ) is an even
function, the separation rank in (20) is reduced to r = M + 1.
Improved Sinc-interpolation on (a, b) via Thm. 2.6 B. Khoromskij, Leipzig 2005(L2) 44
Cor. 2.7. Assume that for all y ∈ Y the functions g(·, y) and
f(ζ, y) := φα(ζ)g(φ(ζ), y) satisfy:
(a) g(·, y) is holomorphic on Dφ(δ), and supy∈Y N(f, Dδ) < ∞(b) f(·, y) satisfies (12) with a = 1 and with certain C, b ∀y ∈ Y .
Then, for all y ∈ Y , the optimal choice h := log MM yields
EM (ζ) := |f(ζ, y)− CM (f(·, y), h)(ζ)| ≤ CN(f, Dδ)
2πδe−
πδMlog M , (21)
|g(x, y)− gM (x, y)| ≤ |x|−α ∣∣EM (f(·, y), h)(φ−1(x))∣∣ . (22)
Proof: Due to the properties of φ : Dδ → Dφ(δ), condition (a) implies
f ∈ H1(Dδ), hence, in view of (b), we can apply Thm. 2.6. NowN(f,Dδ)
2πδe−πδM/ log M corresponds to approx. err., while the evaluation of
truncation err. yields the bound 2Cb log M
e−bM , which is asymptotically
faster decaying as M → ∞. Now (21) follows.
Transforming to approximand (20) implies the bound (22) for g − gM .
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 45
Ex. 2.3. Separable approximation to the function
g(x, y) = |x|λ sinc(|x| |y|), λ ∈ (−3, 1],
arising from the Boltzmann equation.
4 8 12 16 20 24 28 32 36 40 44 4810
−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16
4 8 12 16 20 24 28 32 36 40 44 4810
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25
4 8 12 16 20 24 28 32 36 40 44 4810
−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36
Figure 4: L∞-error of the sinc-interpolation to |x|λsinc(|x|y), x ∈[−1, 1], y ∈ [1, 36], λ = 1.
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 46
Ex. 2.4. Sinc-interpolation for g(x, y) = exp(−xy), x, y ≥ 0.
Consider the auxiliary function f(x, y) = x1+x exp(−xy), x ∈ R+,
y ∈ [1, R], which satisfies all the conditions above with
α = β = 1 (exponential decay). With the choice of
interpolation points xk := log[ekh +√
1 + e2kh] ∈ R+, it can be
approximated with exponential convergence.
4 8 12 16 20 24 28 32 36 40 44 4810
−14
10−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=1.
4 8 12 16 20 24 28 32 36 40 44 4810
−14
10−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=10.
4 8 12 16 20 24 28 32 36 40 44 4810
−10
10−8
10−6
10−4
10−2
M − number of quadrature points
erro
r
|x|s exp(−y|x|), x ∈ [−1,1],s=1,y=100.
Figure 5: L∞-error of the sinc-interpolation of exp(−|x|y), x ∈ [−1, 1], y ∈ [1, 100] .
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 47
Ex. 2.5. Mexican hat scaling function
−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5
0
0.5
1Mexican hat scaling function
Figure 6: Mexican hat f(x) = (1− x2) exp(−αx2), α > 0.
Sinc interpolation to the Mexican hat, r = M + 1.
α\M 4 9 16 25 36 49 64 81 100
1 0.05 6.10-4 7.10-7 1.10-10 2.10-15 1.10-15 - - -
10 0.17 0.13 0.12 0.04 0.01 0.004 0.0009 1.710-4 2.610-5
0.1 3.8 2.6 0.6 0.08 0.006 1.610-5 2.10-7 2.510-9 2.10-11
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 48
Ex. 2.6. (Helmholtz kernel in Rd). Define
f(ζ, η, ν) :=eiκ|x−y|
|x− y| , ζ = |x1 − y1|, η = |x2 − y2|, ν = |x3 − y3|.
For (ζ, η) ∈ [0, 1]× [0, b], consider
F (ζ, η) := f(ζ, η, 0) =eiκ√
ζ2+η2√ζ2 + η2
.
We approximate the modified function
F0(ζ, η) := ζα0(F (ζ, η)− F (0, η)), 0 < α0 < 1, (23)
on the domain Ω1 := [δ, 1]× [0, b], where δ > 0 is a small
parameter. The considerations for the remaining domain
Ω2 := [0, δ]× [δ, b] are completely similar.
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 49
4 8 12 16 20 24 28 32 36 40 44 4810
−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|β cos(κ |x|)/|x|, x ∈ [−1,1], β=0.95
4 8 12 16 20 24 28 32 36 40 44 4810
−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=1.
4 8 12 16 20 24 28 32 36 40 44 4810
−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
|x|β cos(κ |x|)/|x|, x ∈ [−1,1],β=0.95, κ=10
Figure 7: Error (depending on κ !) for the Sinc-interpolation to F0 with
κ = 0.01, 1.0, 10, respectively, from left to right.
Numerics for the Sinc interpolation B. Khoromskij, Leipzig 2005(L2) 50
−1 −0.5 0 0.5 1−5
0
5x 10
−6
−1 −0.5 0 0.5 1−6
−4
−2
0
2
4
6
8x 10
−8
−1 −0.5 0 0.5 1−5
0
5x 10
−9
Figure 8: Pointwise error for the Sinc-interpolation to F0 with κ = 0.01
for r = 25 (left), r = 37 (middle) and r = 49.
Literature to Lecture 2 B. Khoromskij, Leipzig 2005(L2) 51
1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.
2. W. Hackbusch: Hierarchiche Matrizen - Algorithmen und Analysis. Vorlesungsmanuskript, Leipzig 2004.
3. I.P. Gavrilyuk, W. Hackbusch, and B.N. Khoromskij: Tensor-product approximation to elliptic and parabolic
solution operators in higher dimensions. Preprint 83, MPI MIS, Leipzig 2003; Computing (to appear).
4. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Preprint 16, MPI MIS, Leipzig 2004.
5. F. Stenger: Numerical methods based on Sinc and analytic functions. Springer-Verlag, 1993.
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor2.ps
Lecture 3. Introduction to wavelet methods B. Khoromskij, Leipzig 2005 52
Wavelet is the mathematical microscop (B. Hubbard)
Purposes:
• Audio/video compression, radar processing
• Surface identification/analysis
• Image analysis (e.g., “finger prints”, medical imaging)
• Communications (radio, TV)
• Numerical PDEs and IEs, many-particle systems, ...
The fundamental theory behind wavelets is known as the
multi–resolution analysis (MRA).
The MRA provides a great deal of possibilities for multi-level
data and signal processing getting widespread popularity.
Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 53
The multiresolution approach is based on the idea that the
wavelet functions generate a hierarchical sequence of
subspaces in L2(R), which forms the MRA,
Vj+1 ⊂ Vj ⊂ ... ⊂ V0 ⊂ V−1 ⊂ ....
A signal f0 ∈ V0 (at scale 20) is split into a “blurred” version
f1 ∈ V1 at the coarser scale 21 and “detail” d1 ∈ W1 at scale 20.
Repeating this process gives a sequence f0, f1, f2, ... of more
and more blurred versions and the details d1, d2, d3, ....
Each dj can be represented in the wavelet basis using the
“filter coefficients” (high-pass filters), while fj are given in
the scaling function basis via the low-pass filters.
After J iterations the original signal can be exactly
reconstructed f0 = fJ + d1 + ... + dJ .
Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 54
MRA is completely recursive and hence ideal for computation.
Important ingredient is the discrete wavelet transform
(DWT).
DFT allows fast implementation (FWT) with the linear cost
O(N), N = 2J .
Orthogonal wavelets are generated by
– the scaling function (SF) ϕ(x) (mother wavelet) and
– the wavelet ψ(x) (father wavelet).
Sinc approximation method (cf. Lect. 2) can be inspected
within the wavelet concept: Sinc MRA, Sinc wavelet.
It is instructive to compare the Sinc and Haar MRA.
Basic ideas and history B. Khoromskij, Leipzig 2005(L3) 55
A wavelet ψ(x) is a function of zero average∫R
ψ(x)dx = 0.
Using dilated and translated versions of ψ defined by
ψu,s(x) =1√sψ(
x − u
s),
one can apply the continious wavelet transform (cf. the continious FT)
Wf(u, s) :=∫
R
f(x)ψ∗u,s(x)dx.
This provides two-dimensional representation of
one-dimensional signal, which indicates some redundancy.
Elimination of this redundancy can be done by constructing a
basis of the signal space. Hence the next step would be the
discrete wavelet transform. First example is given by the
classical Haar wavelet (Haar 1910).
Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 56
The SF ϕ(x) generates an orthogonal MRA if it satisfies the
following conditions (i)-(iv):
(i) Translates of these functions with integers
ϕk(x) = ϕ(x− k), k ∈ Z,
are linearly independent and produce the Riesz bases of the
subspace V0 ⊂ L2(R): there exist A, B > 0 s.t. for all
f =∞∑
k=−∞a[k]ϕk(x) ∈ V0, we have
A‖f‖2 ≤∞∑
k=−∞a[k]2 ≤ B‖f‖2.
In the case of orthogonal basis A = B = 1.
Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 57
(ii) Dyadic dilates of these functions ϕj,k = ϕ(2−jx− k), j ∈ Z,
generate hierarchical sets of subspaces Vj. Specifically, Vj
contains all scaling functions on level j. This means that if a
function f(x) ∈ Vj, its integer translates proportional to the
scale 2j have to be contained in the same space,
f(x) ∈ Vj ⇔ f(x− 2jk) ∈ Vj , k ∈ Z.
(iii) The scaling function spaces satisfy Vj+1 ⊂ Vj, i.e., an
approximation at a resolution 2−j contains all the information
to compute an approximation at coarser resolution 2−j−1.
Moreover, if f(x) ∈ Vj, the dilated function f(x/2) has to be
contained in the coarser resolution space Vj+1
f(x) ∈ Vj ⇔ f(x/2) ∈ Vj+1, j ∈ Z.
Multi–Resolution Analysis B. Khoromskij, Leipzig 2005(L3) 58
(iv) The scaling function spaces also satisfy
(a) limj→∞
Vj =∞⋂
j=−∞Vj = 0,
(b)∞⋃
j=−∞Vj is dense in L2(R).
Specifically, (b) means
limj→−∞
Vj = Closure
⎛⎝ ∞⋃j=−∞
Vj
⎞⎠ = L2(R).
Recall that 2−j is the resolution and 2j is a scale parameter.
Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 59
The set of functions ϕj,k(x) is supposed to be orthogonal. It
means that for any k, k′ ∈ Z:∫R
ϕj,k(x)ϕj,k′(x) dx = δkk′ , j ∈ Z.
Let ϕnn∈Z be an orthogonal basis of V0. Then the family
ϕj,nn∈Z is an orthogonal basis of Vj, j ∈ Z, where
ϕj,n(x) :=1
2j/2ϕ
(x− n
2j
).
The orthogonal projection of f over Vj is given by
PVj f =∞∑
n=−∞aj [n]ϕj,n, aj [n] = 〈f, ϕj,n〉 ,
where aj [n] provide a discrete approximation at the scale 2j.
Scaling (delation) equation B. Khoromskij, Leipzig 2005(L3) 60
Scaling (delation) equation.
Since 2−1/2ϕ(x/2) ∈ V1 ⊂ V0, we can decompose
1√2ϕ(x/2) =
∞∑n=−∞
h[n]ϕ(x− n) with (24)
h[n] =1√2〈ϕ(x/2), ϕ(x− n)〉 .
In signal processing, the sequence h[n] is interpreted as a
discrete filter usually called as a conjugate mirror filter
(Mallat, Meyer) or low-pass filter.
For scaling functions with compact support h[n] is the finite
sequence (cf. the Haar SF).
If ϕ(x) has infinite support h[n] might be an infinite sequence
(cf. the Sinc SF).
Scaling equation B. Khoromskij, Leipzig 2005(L3) 61
The FT of (24) implies
ϕ(2ω) =1√2h(ω)ϕ(ω) for h(ω) =
∞∑n=−∞
h[n]e−inω.
For any p ≥ 0, the previous implies
ϕ(2−p+1ω) =1√2h(2−pω)ϕ(2−pω).
Thus, by substitution, we obtain (with arbitrary P ∈ N)
ϕ(ω) =
(P∏
p=1
h(2−pω)√2
)ϕ(2−P ω) =
( ∞∏p=1
h(2−pω)√2
)ϕ(0) (25)
(the latter, if ϕ(ω) is continuous at ω = 0).
Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 62
Ex. 3.1. Define the Haar scaling function
ϕ(x) = χ0,1(x).
The Haar MRA corresponds to the approximation by
piecewise const. funct., cond. (i)-(iv) can be easily checked.
Clearly, ϕk is the orthogonal basis (i.e., A = B = 1).
Vj ⊂ L2(R) consists of functions which are constant for
x ∈ [n2j , (n + 1)2j) and n ∈ Z, so that Vj ⊂ Vj−1.
The approximation at a resolution 2−j is a projection on a set
of piecewise constant functions on intervals of size 2j.
The filter coefficients h[n] = 1√2〈ϕ(x/2), ϕ(x− n)〉 , are given by
h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise.
Haar and Sinc MRA: check cond. (i)-(iv) B. Khoromskij, Leipzig 2005(L3) 63
Ex. 3.2. To approximate smooth (analytic) data one make
use of the Sinc (Shannon) scaling function
ϕ(x) = sinc(x) :=sin(πx)
πx.
Vj ⊂ L2(R) is defined as the set of functions whose FT has a
support included in [−2−jπ, 2−jπ].
Lem. 2.2 proves that ϕ(x− n)n∈Z is an orthogonal basis of
V0 (band limited functions). Moreover, it is an interpolating
basis.
The FT of f = sinc(x) is the (shifted/delated) Haar SF
f(ω) = χ−π,π(ω).
We derive from (25) for the filter coefficients
h(ω) =√
2χ−π/2,π/2(ω), ω ∈ [−π, π].
Haar and Sinc orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 64
−1 −0.5 0 0.5 1 1.5 2
−0.5
0
0.5
1
1.5
Haar scaling function
−1 −0.5 0 0.5 1 1.5 2 2.5 3
−1.5
−1
−0.5
0
0.5
1
1.5
Haar wavelet
−10 −8 −6 −4 −2 0 2 4 6 8 10−0.4
−0.2
0
0.2
0.4
0.6
0.8
1Sinc function
−10 −8 −6 −4 −2 0 2 4 6 8 10−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6Sinc wavelet
Figure 9: Haar and Sinc scaling functions/wavelets.
Wavelet spaces B. Khoromskij, Leipzig 2005(L3) 65
The wavelet spaces have the properties:
(v) There is a wavelet function ψ(x) s.t. its integer translates
ψk(x) = ψ(x− k), and dyadic dilates ψj,k = ψ(2−jx− k), form
subspaces Wj which are complementary to Vj in Vj−1:
Vj−1 = Vj ⊕Wj , Wj⊥Vj . (26)
(vi) From the above relations it follows that L2(R) can be
decomposed into the approximation space Vj0and the sum of
the detail spaces Wj of higher resolutions j ≤ j0:
L2(R) = Vj0 ⊕j0⊕
j=−∞Wj =
∞⊕j=−∞
Wj , (27)
where j0 ∈ Z is a chosen level of resolution.
Orthogonal wavelets B. Khoromskij, Leipzig 2005(L3) 66
(26) means that the orthogonal projection of f on Vj−1 is a
sum of orthogonal projections on Vj and Wj, hence a “detail”
space Wj is the orthogonal complement of Vj in Vj−1:
PVj−1f = PVj f + PWj f.
PWj f gives the “details” of f that appear at the scale 2j−1
but which disappear at the coarser scale 2j.
Thm. 3.1. (Mallat, Meyer) Let ψ be the function whose FT is
ψ(2ω) =1√2e−iωh∗(ω + π)ϕ(ω),
where ϕ is the SF and h is the corresponding conjugate
mirror filter. Let us denote ψj,k(x) := 1√2j
ψ(
x−2jk2j
). For any
scale 2j, ψj,kk∈Z is an orthogonal basis of Wj. For all scales
ψj,k(j,k)∈Z2 is an orthogonal basis of L2(R).
High-pass filters B. Khoromskij, Leipzig 2005(L3) 67
Since ψ(x/2) ∈ W1 ⊂ V0, it can be decomposed in an
orthogonal basis of V0:
1√2ψ(x/2) =
∞∑n=−∞
g[n]ϕ(x− n) (28)
with g[n] = 1√2〈ψ(x/2), ϕ(x− n)〉. In (28) ϕ serves as a kind of
“potential” for generating ψ.
The FT of (28) with Thm. 3.1 yields
ψ(2ω) =1√2g(ω)ϕ(ω), i.e., g(ω) = e−iωh∗(ω + π).
Calculating the inverse FT of above relation leads to
g[n] = (−1)1−nh[1− n].
This is the so-called mirror filter (or high-pass filter) which is
important for the FWT algorithm.
Discrete wavelet transform B. Khoromskij, Leipzig 2005(L3) 68
All in all, the properties (i)-(vi) with Thm. 3.1 mean that any
function f ∈ L2(R) can be represented as a sum of linear
combinations of the scaling functions ϕj0 at a chosen
resolution j = j0 and the wavelet functions ψj at all finer
resolutions j ≤ j0:
f(x) =∞∑
k=−∞aj0 [k]ϕj0,k(x) +
j0∑j=−∞
∞∑k=−∞
dj [k]ψj,k(x). (29)
Here the coefficients aj0 [k] and dj [k] are obtained as the
scalar products with the appropriate basis functions,
aj [k] =∫
R
f(x)ϕj,k(x) dx, dj [k] =∫
R
f(x)ψj,k(x) dx. (30)
Eq. (29), (30) define the Discrete Wavelet Transform
(DWT).
Vanishing moments B. Khoromskij, Leipzig 2005(L3) 69
The wavelet ψ has p vanishing moments if∫R
xkψ(x)dx = 0 for 0 ≤ k ≤ p.
Now ψ is orthogonal to any polynomial of degree p− 1. If f is
locally Ck, then for k < p wavelets are orthogonal to the local
polynomial approximand (say, Taylor) yielding small amplitude
coefficients at fine scales.
ψ has p vanishing moments iff both ψ and h have vanishing
derivatives up to order p− 1 at ω = 0 and at ω = π,
respectively.
If ψ has p vanishing moments then its support is at least of
size 2p− 1 (Daubechies).
Haar wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 70
Ex. 3.1′. Recall the filter coefficients for the Haar scaling
function: h[n] = 2−1/2 if n = 0, 1 and h[n] = 0 otherwise. The
Haar wavelet is thus given by
1√2ψ(
x
2) =
∞∑n=−∞
(−1)1−nh[1− n]ϕ(x− n) =1√2(ϕ(x− 1)− ϕ(x)).
Specifically, ψ(x) = −1, if 0 ≤ x < 1/2, ψ(x) = 1, if 1/2 ≤ x < 1and ψ(x) = 0 otherwise.
Clearly, this is an orthogonal wavelet providing (v)-(vi).
The Haar wavelet has the shotest support among all
orthogonal wavelets (p = 1). It can be applied only to
approximating non-smooth functions (signals).
However, it is a good example for educational purposes.
Sinc wavelet: check cond. (v)-(vi) B. Khoromskij, Leipzig 2005(L3) 71
Ex. 3.2′. The Sinc wavelet is constructed from the Sinc
MRA with ϕ(x) = sinc(x), which approximates functions by
their restriction to low frequency intervals. Thm. 3.1 yields
ψ(ω) =1√2e−iω/2h∗(ω/2 + π)ϕ(ω/2), ω ∈ [−π, π]
with ϕ(ω) = χ−π,π(ω), h(ω) =√
2χ−π/2,π/2(ω). This implies
ψ(ω) = e−iω/2 if ω ∈ [−2π,−π] ∪ [π, 2π]
and ψ(ω) = 0 otherwise. Hence
ψ(x) = ϕ(2x− 1)− ϕ(x− 1/2).
This is the analytic (C∞) wavelet with the decay O(|x|−1) as
|x| → ∞. It can be shown that ψ has an infinite number of
vanishing moments ???
Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 72
Because of Vj = Vj+1 ⊕Wj+1 a function f ∈ Vj may be
represented either in the scaling function basis
f =∞∑
k=−∞〈f, ϕj,k〉ϕj,k =
∞∑k=−∞
aj [k]ϕj,k
or with respect to orthogonal bases of Vj+1 and Wj+1
f =∞∑
k=−∞aj+1[k]ϕj+1,k+
∞∑k=−∞
dj+1[k]ϕj+1,k, dj+1[k] = 〈f, ψj+1,k〉 .
Thm. 3.2. (Mallat) At the decomposition
aj+1[n] =∞∑
k=−∞h[k − 2n]aj [k]; dj+1[n] =
∞∑k=−∞
g[k − 2n]aj [k].
At the reconstruction
aj [n] =∞∑
k=−∞h[n− 2k]aj+1[k] +
∞∑k=−∞
g[n− 2k]dj+1[k].
Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 73
f0 ∈ V0 is split into f1 ∈ V1 at the coarser scale 21 and “detail”
d1 ∈W1 at scale 20. Iterating this process gives a sequence
f0, f1, f2, ... of more and more blurred versions and the details
d1, d2, d3, .... After J iterations the original signal can be
exactly (orthogonality) reconstructed f0 = fJ + d1 + ... + dJ .
The decomposition scheme
a0 → a1 → a2 → · · · → aJ
d1 d2 · · · dJ .
The reconstruction scheme
a0 ← a1 ← a2 ← · · · ← aJ
d1 d2 · · · dJ .
Fast orthogonal wavelet transform B. Khoromskij, Leipzig 2005(L3) 74
Given f = f0 ∈ V0, both the decomposition and reconstruction
are nothing but representations w.r.t. changes of basis
functions
V0 → VJ ⊕W1 ⊕ · · · ⊕WJ .
Iterating the decomposition yields for given coefficients
a0 = [k], the coefficients D[l, k] := (aJ [k], dJ [k], dJ−1[k], ..., d1[k])
The translation a0[k] → D[l, k] is called the discrete
wavelet transform (DFT). The backward transform is
provided by the reconstruction D[l, k] → a0[k].In practice the signal a0 is 2J periodic hence we have N = 2J
coefficients. Then the DFT requires only O(mN) operations,
where m is the filter lenght.
Numerics I: Denoising B. Khoromskij, Leipzig 2005(L3) 75
We perform denoising of randomly perturbed Mexican hat
function. It can be rather accurately reconstructed with only
few wavelet coefficients (say, with ∼ 10 among N = 2048) up
to a threshold proportional to the random amplitude (about
10% of a signal ampl.).
−5 −4 −3 −2 −1 0 1 2 3 4 5 6−0.5
0
0.5
1Mexican hat scaling function
−4 −2 0 2 4−1
−0.5
0
0.5
1
1.5
2
Figure 10: Denoising by Daubechies (4) wavelets for “Mexican hat”.
Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 76
We approximate the Mexican hat with α = 0.5 by Daubechies
(m) wavelets with the filter length m (next table).
Recall m = 2p− 1.
kW (ε) is the number of (nonzero) wavelet coefficients which
exceed the given threshold ε.
kW (ε) for Daubechies (m) wavelets approximating Mexican hat
m\ε 0.1 0.01 0.001 1.10-4 1.10-5 1.10-6 1.10-7 1.10-8
10 19 31 47 75 105 175 273 388
20 17 24 29 43 53 60 93 121
40 24 24 29 31 31 46 57 105
Numerics II: Approximating smooth signals B. Khoromskij, Leipzig 2005(L3) 77
Next table gives the Sinc-interpolation error vs. Sinc-wavelet
compressed representation, where the total number of
wavelet coefficients kW (ε) corresponds to the threshold ε.
The compression is not efficient since there are no “details” !
In fact, all the important coefficients are observed at
high-level resolution.
Sinc interpolation.
M 4 9 16 25 36 49 64 81 100
ε 0.005 0.003 0.001 2.10-4 4.10-5 4.10-7 4.10-8 6.10-9 9.10-10
Sinc-wavelets for Mexican hat.
mF |N 16|32 36|64 36|64 50|128 70|128 100|256 128|256 160|256 –
ε 0.01 0.005 0.002 4.10-4 6.10-5 6.10-6 6.10-7 6.10-6 –
kW (ε) 20 29 42 54 85 131 179 116 –
Literature to Lecture 3 B. Khoromskij, Leipzig 2005(L3) 78
1. S.G. Mallat: A Wavelet Tour of Signal Processing. Academic Press, San Diego, 1999.
2. I. Daubechies: Ten Lectures on Wavelets. SIAM, Philadelphia, 1992.
3. G. Strang, T. Nguyen: Wavelets and Filter Banks. Wellesley-Cambridge Press, 1997.
4. R. Schneider: Wavelets and Signal Processing. Lecture Notes. Chemnitz, Sommersemester 2000.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor3.ps
Lect. 4. Separable approximation to multi-variate functions B. Khoromskij, Leipzig 2005 79
Analytic methods of Kronecker-product representation to
non-local operators and related tensors are mainly based on
separable approximation to multi-variate functions in Rd.
I. Separation methods by tensor-product interpolation
• Polynomial interpolation
• Sinc interpolation
• Hyperbolic-cross approximation (Wavelet/FEM).
II. Approximating by exponential/trigonometric sums
• Sinc quadratures
• Exponential sums∑
ake−bkx
• Trigonometric sums∑
[ak sin(bkx) + a′k cos(b′kx)].
Item (II) applies to translation invariant functions or to
functions depending on the sum of spatial variables.
Tensor-product interpolation B. Khoromskij, Leipzig 2005(L4) 80
Approximation problem: Given a multi-variate func.
F : Ωd → R, (d ≥ 2), approximate it by a separable expansion
Fr(ζ1, ..., ζd) :=r∑
k=1
ckΦ1k(ζ1) · · ·Φd
k(ζd) ≈ F, Ω ∈ R, R+, (a, b),
where the set of univariate funct. Φk(·) : Ω→ R, 1 ≤ ≤ d,
1 ≤ k ≤ r, may be fixed or chosen adaptively, ck ∈ R.
For numerical efficiency the so-called separation rank r ∈ N
should be reasonably small.
Introduce the tensor-product interpolant IM with respect to
the first d− 1 variables (e.g., polynomial or Sinc interpolant)
IMF := I1M × · · · × Id−1
M F,
where IMF , 1 ≤ ≤ d− 1, denotes the univariate interpolation
applied to the variable ζ ∈ I = Ω, where I is the -th factor
in Ωd = I1 × ...× Id.
Best polynomial approximation B. Khoromskij, Leipzig 2005(L4) 81
In the complex plane C, we introduce the circular ring
Rρ := z ∈ C : 1/ρ < |z| < ρ with ρ > 1.
Thm. 4.1. (Laurent’s Theorem). Let f : C → C be analytic
and bounded by M > 0 in Rρ with ρ > 1, (in the following we
say f ∈ Aρ), and set
Cn :=12π
∫ 2π
0
f(eiθ)einθdθ, n = 0, ±1, ±2, . . . . (31)
Then for all z ∈ Rρ, f(z) =∞∑
n=−∞Cnzn, where the series
converges to f(z) for all z ∈ Rρ. Moreover |Cn| ≤ M/ρ|n|, and
for all θ ∈ [0, 2π] and arbitrary integer m,∣∣∣∣∣f(eiθ)−m∑
n=−m
Cneinθ
∣∣∣∣∣ ≤ 2M
ρ− 1ρ−m. (32)
Chebyshev polynomials B. Khoromskij, Leipzig 2005(L4) 82
By Eρ = Eρ(B) with the reference interval B := [−1, 1], we
denote the Bernstein’s regularity ellipse (with foci at w = ±1and the sum of semi-axes equal to ρ > 1),
Eρ := w ∈ C : |w − 1|+ |w + 1| ≤ ρ + ρ−1.Let Tn(w), n = 0, 1, 2, . . . , be the Chebyshev polynomials, which
may be defined recursively by
T0(w) = 1, T1(w) = w,
Tn+1(w) = 2wTn(w)− Tn−1(w), n = 1, 2, . . . .
Note that Tn(x) = cos(n arccos x), x ∈ [−1, 1], which implies
Tn(1) = 1, Tn(−1) = (−1)n.
It can be seen that with w = 12 (z + 1
z ), there holds
Tn(w) =12(zn + z−n). (33)
Best polynomial approximation by Chebyshev series B. Khoromskij, Leipzig 2005(L4) 83
Thm. 4.2. Let F be analytic and bounded by M in Eρ (with
ρ > 1). Then the expansion
F (w) = C0 + 2∞∑
n=1
CnTn(w), (34)
holds for all w ∈ Eρ (Chebyshev series), and with
Cn =1π
∫ 1
−1
F (w)Tn(w)√1− w2
dw.
Moreover, |Cn| ≤ M/ρn and for w ∈ B and for m = 1, 2, 3, . . . ,
|F (w)− C0 − 2m∑
n=1
CnTn(w)| ≤ 2M
ρ− 1ρ−m, w ∈ B. (35)
Proof of the main theorem B. Khoromskij, Leipzig 2005(L4) 84
Let Aρ,s := f ∈ Aρ : C−n = Cn, then each f ∈ Aρ,s has a representation
(cf. Thm. 4.1)
f(z) = C0 +
∞Xn=1
Cn(zn + z−n), z ∈ Rρ. (36)
Furthermore, from (36) it follows that f(1/z) = f(z), z ∈ Rρ.
Let us apply the mapping w = 12(z + 1
z), which satisfies w(1/z) = w(z). It is
a conformal transform of ξ ∈ Rρ : |ξ| > 1 onto Eρ as well as of
ξ ∈ Rρ : |ξ| < 1 onto Eρ (but not Rρ onto Eρ!). It provides a one to one
correspondence of functions F that are analytic and bounded by M in Eρ
with functions f in Aρ,s.
Since under this mapping we have (33), it follows that if f defined by
(36) is in Aρ,s, then the corresponding transformed function
F (w) = f(z(w)) that is analytic and bounded by M in Eρ is given by (34).
Now the result follows directly due to Thm. 4.1.
Lagrangian polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 85
Let PN (B) be the set of polynomials of degree ≤ N on B.
Define by [INF ](x) ∈ PN (B) the interpolation polynomial of F
with respect to the Chebyshev-Gauss-Lobatto (CGL) nodes
ξj = cosπj
N∈ B, j = 0, 1, . . . , N, with ξ0 = 1, ξN = −1,
where ξj are zeroes of the polynomials (1− x2)T ′N (x), x ∈ B.
In turn, the Lagrangian interpolant IN of F has the form
INF :=N∑
j=0
F (ξj)lj(x) ∈ PN (B), (37)
i.e. IN (ξj) = F (ξj), j = 0, . . . , N, with lj(x) is the set of
interpolation polynomials
lj :=N∏
k=0,j =k
x− ξk
ξj − ξk∈ PN (B).
Clearly, lj(ξj) = 1 and lj(ξk) = 0 ∀k = j.
Lebesque constant for Chebyshev interpolation B. Khoromskij, Leipzig 2005(L4) 86
Given the set ξjNj=0 of interpolation points on [−1, 1] and the
associated Lagrangian interpolation operator IN . The
standard approximation theory for polynomial interpolation
includes the so-called Lebesque constant ΛN ∈ R>1 defined by
‖INu‖∞,B ≤ ΛN‖u‖∞,B ∀u ∈ C(B). (38)
In the case of Chebyshev interpolation it can be shown that
ΛN grows at most logarithmically in N , more precisely,
ΛN ≤ 2π
log N + 1.
The interpolation points which produce the smallest value Λ∗N
of all ΛN are not known, but Bernstein ’54 proves that
Λ∗N =
2π
log N + O(1).
Error bound for polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 87
Thm. 4.3 Let u ∈ C∞[−1, 1] have an analytic extension to Eρ
bounded by M > 0 in Eρ (with ρ > 1). Then we have
‖u− INu‖∞,I ≤ (1 + ΛN )2M
ρ− 1ρ−N , N ∈ N≥1. (39)
Proof. Due to (35) one obtains for the best polynomial
approximations to u on [−1, 1],
minv∈PN
‖u− v‖∞,B ≤ 2M
ρ− 1ρ−N . (40)
Note that the interpolation operator IN is a projection, that
is, for all v ∈ PN we have INv = v. Then applying the triangle
inequality with v ∈ PN ,
‖u− INu‖∞,B = ‖u− v − IN (u− v)‖∞,B ≤ (1 + ΛN )‖u− v‖∞,B
completes the proof.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 88
Consider a multi-variate funct. f = f(x1, . . . , xd) : Rd → R,
d ≥ 2, defined on a box B1 ×B2 × . . .×Bd with Bk = [ak, bk].We set B := Bk = [−1, 1], k = 1, . . . , d, thus f : Bd → R.
The corresponding N-th order tensor product interpolation
operator is defined by
INf = I1N × I2
N × . . .× IdNf ∈ PN [Bd],
where IkNf denotes the interpolation polynomial with respect
to xk, k = 1, . . . , d, at nodes ξk ∈ Bk.
We choose the CGL nodes, hence the interpolation points
ξα ∈ Bd, α = (i1, . . . , id) ∈ Nd0, are obtained by the Cartesian
product of 1D-nodes,
ξα :=(
cosπi1N
, . . . , cosπidN
).
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 89
Again, IN is the projection map,
IN : C(Bd) → PN := p1 × . . .× pd : pi ∈ PN , i = 1, . . . d
that implies the following estimate to the multivariate
counterpart of the Lebesque constant (stability of IN in the
multidimensional case; cf. (38))
‖INf‖∞,Bd ≤ ΛdN‖f‖∞,Bd ∀ f ∈ C(Bd). (41)
To derive an analogue of Thm. 4.3, we introduce the product
domain
E(j)ρ := B1 × . . .×Bj−1 × Eρ(Ij)×Bj+1 × . . .×Bd,
and denote by X−j the (d− 1)-dimensional subset of variables
x1, . . . , xj−1, xj+1, . . . , xd with xj ∈ Bj, j = 1, ..., d.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 90
Assump. 4.1. Given f ∈ C∞(Bd), assume there is ρ > 1 such
that for all j = 1, . . . , d, and each fixed ξ ∈ X−j, there exists an
analytic extension fj(xj , ξ) of f(xj , ξ) to Eρ(Bj) ⊂ C with
respect to xj bounded in Eρ(Bj) by certain Mj > 0,independent on ξ.
Thm. 4.4. For f ∈ C∞(Bd), let Assump. 4.1 be satisfied.
Then the interpolation error can be estimated by
‖f − INf‖∞,Bd ≤ ΛdN
2Mρ(f)ρ− 1
ρ−N , (42)
where ΛN is the Lebesque constant for the one-dimensional
interpolant IkN , and
Mρ(f) := max1≤j≤d
maxx∈E(j)
ρ
|fj(x, ξ)|.
Tensor-product polynomial interpolation B. Khoromskij, Leipzig 2005(L4) 91
Proof. The multiple use of (38), (39) and the triangle
inequality lead to
|f − INf | ≤ |f − I1Nf |+ |I1
N (f − I2N × . . .× Id
Nf)|≤ |f − I1
Nf |+ |I1N (f − I2
Nf)|++ |I1
NI2N (f − I3
Nf)|+ . . . + |I1N × . . .× Id−1
N (f − IdNf)|
≤ [(1 + ΛN ) maxx∈E(1)
ρ
|f1(x, ξ)|+ ΛN (1 + ΛN ) maxx∈E(2)
ρ
|f2(x, ξ)|
+ . . . + Λd−1N (1 + ΛN ) max
x∈E(d)ρ
|fd(x, ξ)|] 2ρ− 1
ρ−N
≤ (1 + ΛN )(ΛdN − 1)
ΛN − 12Mρ
ρ− 1ρ−N .
Hence (42) follows since for x > 1 we have (1+x)(xn−1)x−1 ≤ xn.
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 92
Now consider the separable approximation in the case Ω = R.
Extension to the case Ω = R+ or Ω = (a, b) is similar to those
for the univariate Sinc approximation.
Introduce the tensor-product Sinc interpolant CM with
respect to the first d− 1 variables,
CMf := C1M × ...× Cd−1
M f,
where CMf = C
M (f, h), 1 ≤ ≤ d, denotes the univariate Sinc
interpolation applied to the variable ζ ∈ I = R, where I is
the -th factor in Rd = I1 × ...× Id.
Ex. 4.1. Examples of approximated function
f(x) = |x|α, f(x) =exp(κ|x|)|x| , f(x, y) = sinc(|x||y|)
with x, y ∈ Rd.
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 93
The estimation of the error f −CMf requires the Lebesgue
constant ΛM ≥ 1 defined by
||CM (f, h)||∞ ≤ ΛM ||f ||∞ for all f ∈ C(R). (43)
Stenger ’93 proves the inequality
ΛM = maxx∈R
M∑k=−M
|Sk,h(x)| ≤ 2π
(3 + log(M)). (44)
Note that we also have (orthogonality)
∞∑k=−∞
|Sk,h(x)|2 = 1 (x ∈ R) ,
which indicates ΛM = 1 with respect to the L2-norm.
Sinc-approximation of multi-variate functions B. Khoromskij, Leipzig 2005(L4) 94
For each fixed ∈ 1, . . . , d − 1, choose ζ ∈ I and define the remaining
parameter set by Y := I1 × ... × I−1 × I+1 × ... × Id ∈ Rd−1. This
introduces the univariate (parameter dependent) function F(·, y) : I → R,
which is the restriction of F onto the interval I with y ∈ Y.
Thm. 4.5. (Hackbusch, Khoromskij) For each = 1, ..., d− 1 we
assume that for any fixed y ∈ Y, F(·, y) satisfies
(a) F(·, y) ∈ H1(Dδ) with N(F, Dδ) ≤ N <∞ uniformly in y;
(b) F(·, y) has hyper-exponential decay with a = 1, C, b > 0 for
all y ∈ Y.
Then, for all y ∈ Y, the optimal choice h := log MM yields
|F (ζ, y)−CM (F, h)(ζ)| ≤ C
2πδΛd−2
M max=1,...,d−1
N e−πδMlog M (45)
with ΛM defined by (44).
Proof of the Sinc-interpolation error B. Khoromskij, Leipzig 2005(L4) 95
The multiple use of (43) and the triangle inequality lead to
|f −CMf | ≤ |f − C1Mf |+ |C1
M (f − C2M . . . Cd
Mf)|≤ |f − C1
Nf |+ |C1M (f − C2
Mf)|++ |C1
MC2M (f − C3
Mf)|+ . . . + |C1M . . . Cd−2
M (f − Cd−1M f)|
≤ [N1 + ΛMN2 + . . . + Λd−2M Nd−1]
12πδ
e−πδMlog M
≤ 1 + ΛM + ... + Λd−2M
2πδmax
=1,...,d−1N e
−πδMlog M .
Note thatΛd−1
M − 1ΛM − 1
≈ Λd−2M , ΛM →∞,
hence (45) follows.
Separation by integration B. Khoromskij, Leipzig 2005(L4) 96
If a function of ρ =∑d
i=1 xi can be written as the integral
ϕ(ρ) =∫
Ω
eρF (t)G(t)dt
over some Ω ⊂ R (say, Ω = R) and if a quadrature can be
applied, one obtains the separable approximation
ϕ(x1 + . . . + xd) ≈∑
ν
ωνeρF (xν)G(xν) =∑
νcν
d∏i=1
exiF (xν).
with cν = ωνG(xν). For this purpose we apply the Sinc
quadratures (cf. Lect. 2, 6).
Typical examples of such a function ϕ(ρ) are the following:
f(x) = 1/|x− y|, f(x) =1
x1 + ... + xd, xi ≥ 0
with x, y ∈ Rd.
Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 97
Besides, the best approximation of ϕ(ρ) by exponential sums,
ϕ(ρ) ≈r∑
ν=1
ωνe−tνρ (46)
(e.g., with respect to the L∞- or L2-norm), leads to an
approximation whose separation rank r is expected to be
close to optimal.
For non-monotone functions ϕ(ρ) the approximations by
trigonometric sums may do a job,
ϕ(ρ) ≈r∑
ν=1
cνe−iωνρ. (47)
Rem. 4.1. The approximation by exponential/trigonometric
sums applies to the matrix-valued function ϕ(A) as well with
A =∑d
i=1 Ai and pairwise commutable matrices Ai.
Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 98
For n ≥ 1, consider the set E0n of exponential sums:
E0n :=
u =
n∑ν=1
ωνe−tνx : ων , tν ∈ R
.
Now one can address the problem of finding the best
approximation to f over the set E0n characterised by the best
approximation error d(f, E0n) := infv∈E0
n‖f − v‖∞.
The existence of an approximation by exponentials is based
on the fundamental Big Bernstein Theorem: If f is
completely monotone for x ≥ 0, i.e.,
(−1)nf (n)(x) ≥ 0 for all n ≥ 0, x ≥ 0,
then it is the restriction of the Laplace transform of a
measure to R+:
f(z) =∫
R+
e−tzdµ(t).
Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 99
We recall the complete elliptic integral of the first kind with
modulus κ,
K(κ) =∫ 1
0
dt√(1− t2)(1− κ2t2)
(0 < κ < 1)
and define K′(κ) := K(κ′) by κ2 + (κ′)2 = 1.
Thm. 4.6. (Braess). Assume that f is completely monotone
and analytic for e z > 0, and let 0 < a < b. Then for the
uniform approximation on the interval [a, b],
limn→∞ d(f, E0
n)1/n ≤ 1ω2
, where ω = expπK(κ)K′(κ)
with κ =a
b.
In the cases f = ϕ(ρ) below, we have κ = 1/R for R >> 1.
Separation by exponential/trigonometric approximation B. Khoromskij, Leipzig 2005(L4) 100
Now applying the asymptotics
K(κ′) = ln 4κ + C1κ + ... for κ′ → 1,
K(κ) = π2 1 + 1
4κ2 + C1κ4 + ... for κ → 0,
of the complete elliptic integrals, we obtain
1ω2
= exp(−2πK(κ)
K(κ′)
)≈ exp
(− π2
ln(4R)
)≈ 1− π2
ln(4R).
The latter expression indicates that the number n of different
terms to achieve a tolerance ε is asymptotically
n ≈ | log ε|| log ω−2| ≈
| log ε| ln (4R)π2
.
This result shows the same asymptotical convergence in n as
that for the Sinc approximation (cf. Lect. 2).
Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 101
The best approximation to f(ρ), ρ ∈ [1, R] with respect to a
weighted L2-norm is reduced to the minimisation of an
explicitly given differentiable functional.
Given R > 1, N ≥ 1, find the 2N parameters
α1, ω1, ..., αN , ωN ∈ R, such that
FW (R; α1, ω1, ..., αN , ωN ) :=∫ R
1
W (x)(f(x)−
N∑i=1
ωie−αix
)2
dx = min .
In the important particular case of f(x) = 1/x and W (x) = 1,the integral can be calculated in a closed form
F1(R; α1, ω1, ..., αN , ωN ) = 1 − 1
R− 2
NXi=1
ωi [Ei(−αi) − Ei(−αiR)]
+1
2
NXi=1
ω2i
αi
he−2αi − e−2αiR
i+ 2
X1≤i<j≤N
ωiωj
αi + αj
he−(αi+αj) − e−(αi+αj)R
i
with the integral exponential function Ei(x) = −∫ x
−∞et
t dt.
Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 102
In the special case R = ∞, the expression for F1(∞; . . .) even
simplifies.
Gradient or Newton type methods with a proper choice of the
initial guess can be used to obtain the minimiser of F1.
In general, the integral may be approximated by certain
quadrature.
Optimisation with respect to the maximum norm leads to the
nonlinear minimisation problem
infv∈E0n‖f − v‖L∞[1,R]
involving 2n parameters ων , tνnν=1. The numerical scheme
follows the Remez algorithm.
Exponential approximations in L2-norm B. Khoromskij, Leipzig 2005(L4) 103
Best approximation to 1/√
ρ in L∞-norm is discussed in D.
Braess and W. Hackbusch, a complete list of numerical data
can be found in www.mis.mpg.de/scicomp/EXP SUM/1 x/tabelle.
All calculations using the weighted L2([1, R])-norm have been
performed by the MATLAB subroutine FMINS based on the
global minimisation by direct search.
best approximation to 1/√
ρ in weighted L2([1, R])-norm.
R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√
ρ
r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3
r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3
r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4
r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5
Why using trigonometric sums B. Khoromskij, Leipzig 2005(L4) 104
Prop. 4.7. (Beylkin, Mohlenkamp). Let d ≥ 2. The trigonometric
identity
sin
⎛⎝ d∑j=1
xj
⎞⎠ =d∑
j=1
sin(xj)∏
k∈1,...,d\j
sin(xk + αk − αj)sin(αk − αj)
(48)
holds for all choices of αk ∈ R, s.t. sin(αk−αj) = 0 for all j = k.
In the case d = 2, the assertion (128) is easy to check. For
d > 2 it can be proven by induction (nontrivial exercise!).
Expansion (128) shows the lack of uniqueness (ambiguity) of
the best rank d Kronecker representation. Hence, the
convergence of algebraic separable approximations might be
non-robust.
Approximation by trigonometric sums can be designed either
using the quadrature method (cf. Lect. 7) and the direct
trigonometric interpolation or by nonlinear optimisation.
Literature to Lect. 4 B. Khoromskij, Leipzig 2005(L4) 105
1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. D. Braess and W. Hackbusch: Approximation of 1x
by exponential sums in [1, ∞). To appear in IMA JNA.
3. G. Beylkin and M.J. Mohlenkamp: Numerical operator calculus in higher dimension.
Proc. Natl. Acad. Sci. USA, 99 (2002), 10246-10251.
4. B.N. Khoromskij: Data-sparse approximation of nonlocal operators. Lecture notes 17, MPI MIS,
Leipzig 2003.
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor4.ps
Lect. 5. Data-Sparse Matrix/Tensor Formats. B. Khoromskij, Leipzig 2005 106
We focus on combination of hierarchical and tensor-product
formats:
(i) H-matrix format with standard admissibility (applies on
graded meshes)
(ii) coarsening of the hierarchical format using weaker
admissibility criteria;
(iii) blended H-matrix approximation (combine with Toeplitz,
circulant, Hankel);
(iv) wire-basket approximation for L-harmonic kernels;
(v) fully separated block representation (O(N) complexity);
(vi) uniform (U-) and H2-matrices;
(vii) Kronecker tensor-product format;
(viii) hierarchical Kronecker tensor-product representation.
Hierarchical matrices B. Khoromskij, Leipzig 2005(L5) 107
Hierarchical (H-) matrices
MH,k(TI×I ,P), the class of data-sparse H-matrices - Hackbusch ’99
Further developments - Hackbusch, BNK, Grasedyck, Bebendorf, Borm.
H-matrix technique is a direct descendant of panel clustering,
fast multipole and mosaic-skeleton approximation.
In addition, it allows data-sparse matrix-matrix operations.
Main features:
• Matrix arithmetic of O(N logq N) - complexity,
N := |I| - cardinality of the index set I.
• Accurate approximation to general class of nonlocal
(integral) operators and operator-valued functions F(L)including the elliptic operator inverse L−1, e−tL, sign(L).
• Rigorous theoretical analysis.
H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 108
H-matrix arithmetic is completely recursive and it is based on
the hierarchical data organisation → efficient implementation.
The H-matrix format is well suited for representation of
integral (nonlocal) operators in FEM/BEM applications.
Thm. 5.1. (complexity of the H-matrix arithmetic)
Let k ∈ N denote the block-wise rank and TI×I be an H-tree
with depth L > 1.Then the arithmetic of N ×N-matrices belonging to
MH,k(TI×I ,P) has the complexity
NH,store ≤ 2CspkLN, NH·v ≤ 4CspkLN,
NH⊕H ≤ Cspk2N(C1L + C2k),
NHH ≤ C0C2spk2LN maxk, L, N
gInv(H)≤ NHH,
where Csp is the sparsity constant.
H-Matrix Format B. Khoromskij, Leipzig 2005(L5) 109
Hierarchical Partitionings P1/2(I × I) and PW(I × I)
Figure 11: Standard- (left) and Weak-admissible H-partitionings for d =
1.
General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 110
Def. 5.1. A q-th order tensor is given by
A := [ai1...iq ] ∈ RId
, d = pq, p, q, n ∈ N,
where Id = I1 ⊗ ...⊗ Iq, I = I1 ⊗ ...⊗ I
p with multi-indices
i = (i,1, ..., i,p) ∈ I, = 1, ..., q, where i,m ∈ 1, ..., n, for
m = 1, ..., p (p is supposed to be small).
The inner product of two tensors A and B is defined as
(A, B) :=∑
(i1...iq)∈Id
ai1...iq bi1...iq ,
while the norm of A is given by ‖A‖F :=√
(A, A).
Ex. 5.1. Let A = a1 ⊗ a2, B = b1 ⊗ b2, ai, bi ∈ Rn (q = 2, p = 1).Then
(A, B) = (a1, b1)(a2, b2), ||A||F =√
(a1, a1)(a2, a2),
where the latter corresponds to the Frobenius norm.
General Kronecker-product format B. Khoromskij, Leipzig 2005(L5) 111
Tensor A of the form
A = V 1 ⊗ · · · ⊗ V q, V ∈ Rnp
is called the Kronecker product or decomposed tensor.
Probl. 1. Approximate A by a q-th order tensor Ar - a sum
of Kronecker products (with possibly small Kronecker rank r)
Ar =r∑
k=1
ckV 1k ⊗ · · · ⊗ V q
k ≈ A, ck ∈ R, (49)
where the low dimensional components V k ∈ Rnp
can be
further represented in a structured data-sparse form (say, in
the wavelet based, circulant or Toeplitz format).
Hence, Ar can be represented with the low cost qrnp (at
most) compared with npq.
Tensor-product format (49) has plenty of other merits.
Excursus to the HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 112
Probl. 2. Given A ∈ CN×N with N = nd (here q = d, p = 2), we
approximate A by a matrix Ar of the so-called HKT-format
Ar =r∑
k=1
skV 1k ⊗ · · · ⊗ V d
k ≈ A, sk ∈ R, V k ∈ R
n×n, (50)
where V k ∈MH,k (Alternative: wavelet representation to V
k ).
Given a tol. ε > 0, the Kronecker rank r = r(ε) can be
estimated
r =
⎧⎨⎩O(| log h|d−1| log ε|d−1), (Case a),
O(| log h| · | log ε|), (Case b).
Case a. IOs with asympt. smooth/analytic kernels g(x, y).
Case b. A class of analytic matrix-valued functions F(A);IOs with “off diagonal analytic” translation-invariant kernels.
HKT-approximation of matrices B. Khoromskij, Leipzig 2005(L5) 113
Case (a): Analytic approximation methods are based on a
separable representation to certain multi-variate function
F : Rd → R, d ≥ 2
(say, holomorphic function with isolated singularities):
Fr(ζ1, ..., ζd) :=r∑
i=1
siΦ1i (ζ1) · · ·Φd
i (ζd) ≈ F, (51)
Φi(ζ) is fixed or chosen adaptively.
Case (b): Making use of the r-term Sinc-quadratures for the
Laplace integral representation of F(A) or F (r):
F (r) =∫
R
f(t)e−trdt, F (r) =∫
R
f(t)e−tr2dt
with possible substitution A → r.
Related references B. Khoromskij, Leipzig 2005(L5) 114
H-, KT-, HKT- constructive approximations:
H-Matrix techniques - Group by Hackbusch at MPI MIS, Leipzig
Sinc interpolation/quadratures for analytic funct. with point singularities -
(Kotelnikov ’33; Whittaker ’35; Shannon ’49) Stenger; M. Sugihara; Hackbusch, BNK ’02-’05
Appr. by exponential sums (classical rational approximations, Remes
algorithm, minimization) - Braess, Hackbusch, BNK ’04-’05
IOs in the HKT format - Hackbusch, BNK, Tyrtyshnikov ’03; BNK ’05
HKT approx. to matrix-valued functions - Gavrilyuk, Hackbusch, BNK ’03;
Hackbusch, BNK ’04-’05
Kronecker tensor-product representation - Van Loan, Pitsianis ’93; Golub ’98;
Beylkin, Mohlenkamp ’02; Hackbusch, BNK, Tyrtyshnikov ’03; Grasedyck ’03; ...
Tensor-product + wavelets + sparse grids:
H-matrices/wavelets in density matrix calculation - Flad, Hackbusch, Kolb, Luo,
Schneider ’03-’05; Hutter, Sauter, ...
Applications in FEM/BEM, quantum chemistry, finacies, data mining -
Groups by W. Dahmen, M. Griebel, R. Schneider, C. Schwab, H. Yserentant
Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 115
The Kronecker product (KP) operation A⊗B of two matrices
A = [aij ] ∈ Rm×n, B ∈ Rh×g is an mh× ng matrix that has the
block-representation [aijB] (corresponds to p = 2).
1. Let C ∈ Rs×t, then the KP satisfies the associative law,
(A⊗B)⊗ C = A⊗ (B ⊗ C),
and therefore we do not use brackets in (50). The matrix
A⊗B ⊗ C := (A⊗B)⊗ C has (mhs) rows and (ngt) columns.
2. Let C ∈ Rn×r and D ∈ Rg×s, then the standard
matrix-matrix product in the Kronecker format takes the form
(A⊗B)(C ⊗D) = (AC)⊗ (BD).
The corresponding extension to q-th order tensors is
(A1 ⊗ ...⊗Aq)(B1 ⊗ ...⊗Bq) = (A1B1)⊗ ...⊗ (AqBq).
In the case p > 2 we have similar KP operations.
Properties of the Kronecker product B. Khoromskij, Leipzig 2005(L5) 116
3. We have the distributive law
(A + B)⊗ (C + D) = A⊗ C + A⊗D + B ⊗ C + B ⊗D.
4. Rank relation: rank(A⊗B) = rank(A)rank(B).
Ex. 5.1. In general A⊗B = B ⊗A. What is the condition on
A and B that provides A⊗B = B ⊗A ?
Invariance of some matrix properties:
(1) If A and B are diagonal then A⊗B is also diagonal, and
conversely (if A⊗B = 0).
(2) The upper/lower triangular matrices are preserved.
(3) Let A and B be Hermitian/normal matrices (A∗ = A resp.
A−1 = A). Then A⊗B is of the corresponding type.
(4) Let A ∈ Rn×n and B ∈ Rm×m. Then
det(A⊗B) = (detA)n(detB)m.
Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 117
Thm. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be invertible
matrices. Then
(A⊗B)−1 = A−1 ⊗B−1.
Proof. Since det(A) = 0, det(B) = 0 and the above property
(4) we have det(A⊗B) = 0. Thus (A⊗B)−1 exists and
(A−1 ⊗B−1)(A⊗B) = (A−1A)⊗ (B−1B) = In2 .
Lem. 5.2. Let A ∈ Rn×n and B ∈ Rm×m be unitary matrices.
Then A⊗B is a unitary matrix.
Proof. Since A∗ = A−1, B∗ = B−1 we have
(A⊗B)∗ = A∗ ⊗B∗ = A−1 ⊗B−1 = (A⊗B)−1.
Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 118
Define the commutator [A, B] := AB −BA.
Lem. 5.3. Let A ∈ Rn×n and B ∈ Rm×m. Then
[A⊗ In, Im ⊗B] = 0 ∈ Rm2×n2
.
Proof.
[A⊗ In, Im ⊗B] = (A⊗ In)(Im ⊗B)− (Im ⊗B)(A⊗ In)
= A⊗B −A⊗B = 0.
Rem. 5.1. Let A, B ∈ Rn×n, C, D ∈ Rm×m and [A, B] = 0,[C, D] = 0. Then
[A⊗ C, B ⊗D] = 0.
Proof. Apply the identity (A⊗B)(C ⊗D) = (AC)⊗ (BD).
Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 119
Lem. 5.4. Let A ∈ Rn×n and B ∈ Rm×m. Then
tr(A⊗B) = tr(A)tr(B).
Proof. Since diag(aiiB) = aiidiag(B), we have
tr(A⊗B) =n∑
i=1
m∑j=1
aiibjj =n∑
i=1
aii
m∑j=1
bjj .
Thm. 5.3. Let A, B, I ∈ Rn×n. Then
exp(A⊗ I + I ⊗B) = (expA)⊗ (expB).
Proof. Since [A⊗ I, I ⊗B] = 0, we have
exp(A⊗ I + I ⊗B) = exp(A⊗ I) exp(I ⊗B).
Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 120
Furthermore, since
exp(A⊗ I) =∞∑
k=0
(A⊗ I)k
k!, exp(I ⊗B) =
∞∑m=0
(I ⊗B)m
m!
the arbitrary term in exp(A⊗ I) exp(I ⊗B) is given by
1k!
1m!
(A⊗ I)k(I ⊗B)m.
Imposing
(A⊗I)k(I⊗B)m = (Ak⊗Ik)(Im⊗Bm) = (Ak⊗I)(I⊗Bm) ≡ Ak⊗Bm,
we finally arrive at
1k!
1m!
(A⊗ I)k(I ⊗B)m = (1k!
Ak)⊗ (1m!
Bm).
Kronecker product: matrix operations B. Khoromskij, Leipzig 2005(L5) 121
Thm. 5.3 can be extended to the case of many-term sum
exp(A1⊗I⊗...⊗I+I⊗A2⊗...⊗I+...+I⊗...⊗I⊗Aq) = (eA1)⊗...⊗(eAq ).
Rem. 5.2. Similar properties can be shown for other analytic
functions, e.g.,
sin(In ⊗A) = In ⊗ sin(A),
sin(A⊗ Im + In ⊗B) = sin(A)⊗ cos(B) + cos(A)⊗ sin(B),
sin(A ⊗ Im + In ⊗ B) =sin(A) ⊗ sin(B + (b − a)I)
sin(b − a)+
sin(A + (a − b)I) ⊗ sin(B))
sin(a − b)
for all values a, b such that sin(a− b) = 0. Analogously, for the
function cos.
Other simple properties:
(A⊗B)T = AT ⊗BT , (A⊗B)∗ = A∗ ⊗B∗.
Eigenvalue problem B. Khoromskij, Leipzig 2005(L5) 122
Thm. 5.4. Let A ∈ Rm×m and B ∈ Rn×n have the eigen-data
λ1, ..., λm, u1, ..., um, and µ1, ..., µn, v1, ..., vn, respectively. Then
A⊗B has the eigenvalues λjµk with the corresponding
eigenvectors uj ⊗ vk, 1 ≤ j ≤ m, 1 ≤ k ≤ n.
Thm. 5.5. Under the conditions of Thm. 5.4 the
eigenvalues/eigenfunctions of A⊗ In + Im ⊗B are given by
λj + µk and uj ⊗ vk, respectively.
Proof. Due to Thm. 5.4 we have
(A⊗ In + Im ⊗B)(uj ⊗ vk) = (A⊗ In)(uj ⊗ vk) + (Im ⊗B)(uj ⊗ vk)
= (Auj)⊗ (Invk) + (Imuj)⊗ (Bvk)
= (λjuj)⊗ vk + uj ⊗ (µkvk)
= (λj + µk)(uj ⊗ vk).
Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 123
For a matrix A ∈ Rm×n we use the vector representation
A → vec(A) ∈ Rmn, where vec(A) is an nm× 1 vector obtained
by “stacking” A’s columns (the FORTRAN-style ordering)
vec(A) := [a11, ..., an1, a12, ..., anm]T .
In this way, vec(A) is a rearranged version of A. For example,
we have the relation
vec(AY B) = (BT ⊗A)vec(Y ).
The matrix Sylvester equation for X ∈ Rm×n
AX + XBT = G ∈ Rm×m (52)
with A ∈ Rm×m, B ∈ Rn×n, can be written in vector form
(In ⊗A + B ⊗ Im)vec(X) = vec(G).
Lyapunov/Silvester equations B. Khoromskij, Leipzig 2005(L5) 124
Now the solvability conditions and certain solution methods
can be derived (cf. the results for eigenvalue problems).
Equation (52) is uniquely solvable if
λj(A) + µk(B) = 0.
Moreover, since In ⊗A and B ⊗ Im commute, we can apply all
methods proposed below to represent the inverse
(In ⊗A + B ⊗ Im)−1.
In particular, if A and B correspond to the discrete elliptic
operators in Rd with separable coefficients, we obtain the
low-rank tensor-product decomposition to the Sylvester
solution operator (cf. Lect. 7).
In the case A = B we arrive at the Lyapunov equation.
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 125
Def. 5.2. Define the Hadamard product
C = A!B = ci1...iq(i1...iq)∈Id
of two tensors A, B ∈ RId
by the entry-wise multiplication
ci1...iq = ai1...iq · bi1...iq .
The following Lemma indicates the simple (but important)
property of the Hadamard product.
Lem. 5.5. Let both A and B be represented in the form
(49) with the Kronecker rank rA, rB and with V k substituted
by Ak ∈ RI
and Bk ∈ RI
, respectively. Then A!B is a tensor
with the Kronecker rank r = rArB given by
A!B =rA∑
k=1
rB∑m=1
ckcm(A1k !B1
m)⊗ ...⊗ (Aqk !Bq
m).
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 126
Proof. It is easy to check that
(A1 ⊗B1)! (A2 ⊗B2) = (A1 !A2)⊗ (B1 !B2),
and similar for q-term products. Applying the above relations,
we obtain
A!B =
(rA∑
k=1
ck
q⊗=1
Ak
)!(
rB∑m=1
cm
q⊗=1
Bm
)
=rA∑
k=1
rB∑m=1
ckcm
(q⊗
=1
Ak
)!(
q⊗=1
Bm
)
and the assertion follows.
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 127
Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and
B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)
that maps tensors defined on the index set L into those
defined on J .
Def. 5.3. The Hadamard “scalar” product [D, C]I ∈ RK of
two tensors D := [Di,k] ∈ RI×K and C := [Ci,k] ∈ RI×K with
K ∈ I,J ,L is defined by
[D, C]I :=∑i∈I
[Di,K]! [Ci,K],
where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.
Lem. 5.6. Let U, Y, B and T be given as above. Then, with
K = J , the following identity is valid
[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (53)
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L5) 128
Proof. By definition of the Hadamard scalar product we have
[U ⊗ Y, T ·B]I =∑i∈I
[U ⊗ Y ]i,J ! [T ·B]i,J
=∑i∈I
[[U ]i · Y ]i,J ! [T ·B]i,J
= Y !(∑
i∈I[U ]i[T ·B]i,J
)
= Y !(
T ·∑i∈I
[U ]i[B]i,L
),
then the assertion follows.
Identity (135) is of the great importance in the forthcoming
applications since in the right-hand side the operator T is
removed from the scalar product and, so, it applies only once.
Complexity of the HKT -matrix arithmetics B. Khoromskij, Leipzig 2005(L5) 129
Complexity issues
Let V k ∈MH,k(TI×I ,P) in (50) and let N = nd.
• Data compression.
The storage for A is only O(rdn) = O(rdN1/d) with
r = O(logα N), α > 0.Hence, we enjoy the sub-linear complexity.
• Matrix-by-vector complexity of Ax, x ∈ CN .
For general x one has the linear cost O(rdkN log n).
If x = x1 × ...× xd, xi ∈ Cn, we again arrive at sub-linear
complexity O(rdkn log n) = O(rdkN1/d log n).
• Matrix-by-matrix complexity of AB and A!B.
The H-matr. struct. of the Kronecker factors leads to
O(r2dn logq n) = O(r2dN1/d logq n) op. instead of O(N3).
How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 130
1. Singular-value decomposition (SVD) and ACA methods in
the case of two-fold decompositions (q = 2).
2. Analytic approximation to the function-generated q-th
order tensors (q ≥ 2), (see Lect. 6).
Def. 5.4. Given the multi-variate function
g : Rd → R with d = qp, p, q ∈ N, q ≥ 2,
defined in a hypercube
Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q ∈ Rd, L > 0, where
‖ · ‖∞ means the ∞-norm of ζ ∈ Rp. On the index set Id, we
introduce the function-generated q-th order tensor
A ≡ A(g) := [ai1...iq ] ∈ RId
with ai1...iq := g(ζ1i1 , ..., ζ
qiq
). (54)
3. Algebraic recompression methods: iterated SVD/ACA,
iterated rank-r approximation to high order tensors (in
general, convergence theory is still open question).
How to construct a Kronecker product ? B. Khoromskij, Leipzig 2005(L5) 131
The incremental rank-one approximation algorithm:
(a) Fit the original tensor A by a rank-one tensor A1;
(b) Subtract A1 from the original tensor A;
(c) Approx. the residue A−A1 with another rank-one tensor.
On each step of the algorithm one solves the minimisation
problem: Find V ∈ Rnp such that
1/2||A− V 1 ⊗ · · · ⊗ V q||2F → min .
It can be solved by the generalised Rayleigh-Newton iteration.
Def. 5.5. We say that a tensor A is orthogonally decomposable if it can
be written as the sum (49) of r rank-one tensors s.t. for = 1, ..., q,
(V k , V
k′ ) = 0 for k = k′, (k, k′ = 1, ..., r).
Thm. 5.6. (Zhang, Golub) If a tensor of order q ≥ 3 is
orthogonally decomposable, then this decomposition is
unique, and the incremental rank-one approximation
algorithm correctly computes it.
Some heuristic algorithms B. Khoromskij, Leipzig 2005(L5) 132
Given a q-th order tensor A having the Kronecker rank m, one
can try to find the best approximation of A by a tensor of
rank r < m. This can be reduced to solving the minimisation
problem: Find V k ∈ Rnp
s.t.
12||A−
r∑k=1
V 1k ⊗ · · · ⊗ V q
k ||2F → min .
It can be realized by using, say, the Newton iteration applied
to the corresponding Lagrange equation. Under certain
simplifications, the constraint minimisation algorithm can be
implemented in O(m2np + (rmq)3) operations.
There is not too much converg. theory behind this algorithm,
moreover the solution is not unique (cf. Prop. 4.7).
However, in most practically interesting cases this algorithm
does a job.
Some conclusions B. Khoromskij, Leipzig 2005(L5) 133
Summarise:
Basic linear algebra can be performed using one-dimensional
operations, thus avoiding the exponential scaling in the
dimension d.
Bottleneck:
Lack of tractable algebraic methods for the robust multi-fold
Kronecker decomposition of high order tensors (for d ≥ 3) as
well as for the HKT-recompression in matrix operations.
However, there are quite satisfactory heuristic algorithms.
Observation:
Analytic approximation methods are of principal importance.
Classical example: an approximation by Gaussian sums.
Recent proposals: Sinc meth., approximation by exponential
sums, wavelet recompression, “approximate approximation”.
Literature to Lecture 5 B. Khoromskij, Leipzig 2005(L5) 134
1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.
Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).
3. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic
Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.
4. C. Van Loan: The ubiquitous Kronecker product. J. of Comp. and Applied Math. 123 (2000) 85-100.
5. T. Zhang and G.H. Golub: Rank-one approximation to high order tensors. SIAM J. Matrix Anal. Appl.
23 (2001), 534-550.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor5.ps
Literature to Lecture 5 B. Khoromskij, Leipzig 2005(L5) 135
Everything is more simple than one thinksbut at the same time more complex
than one can understand.J. W. von Goethe (1749-1832)
An Introduction to Structured Tensor-Product
Representation of Discrete Nonlocal Operators
Part II: Approximation of Operators and Related Matrices
Boris N. Khoromskij
University of Leipzig/MPI MIS, summer 2005
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij
Lect. 6. HKT representation to integral operators B. Khoromskij, Leipzig 2005 136
In this lecture, we collect some known algebraic properties of
q-th order (q > 2) decomposed tensors, especially, in
comparison with the case q = 2, and then discuss the analytic
approximation methods:
(i) properties of multi-way decompositions,
(ii) separation methods for function-generated tensors,
(iii) approximation to the Galerkin matrices,
(iv) examples of integral operators (IOs) and numerics.
Analytic approximation methods may provide the decomposed
tensors with relatively high Kronecker rank, which can be
then reduced by algebraic “recompression algorithms”.
We stress that in spite of existing implementations (which are
usually not in public domain), the robust algebraic methods of
low-rank tensor decompos. still require further developments.
Why the multi-factor analysis is difficult ? B. Khoromskij, Leipzig 2005(L6) 137
Def. 6.1. The minimal number r in the representation
A =r∑
k=1
V 1k ⊗ · · · ⊗ V q
k , V k ∈ R
np
, (55)
is called a tensor rank of the q-th order tensor A. We suppose
that V k ∈ Rn (i.e., p = 1).
Finding of a tensor rank r and the corresponding
decomposition(s) for a high dimensional q-th order tensor is
the main issue of the multi-factor analysis !
For q = 2, Def. 6.1 coincides with the standard definition of
rank(A), which can be calculated by finite algorithm. The
corresponding tensor decomposition can be computed by the
SVD in O(n3) operations. Under the orthogonality
requirement this decomposition is unique.
Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 138
If q > 2, the situation changes dramatically.
I. rank(A) depends on the number field (say, R or C).
II. We do not know any finite algorithm to compute
r = rank(A), except simple bounds:
0 ≤ rank(A) ≤ nq−1.
III. For fixed q and n we do not know the exact value of
maxrank(A). J. Kruskal ’75 proved that:
– for any 2× 2× 2 tensor we have maxrank(A) = 3 < 4;– for 3× 3× 3 tensors there holds maxrank(A) = 5 < 9.
IV. “Probabilistic” properties of rank(A): in the set of 2× 2× 2tensors there is about 79% of rank-2 tensors and 21% of
rank-3 tensors, while rank-1 tensors appear with probability 0.
Clearly, for n× n matrices we have Prank(A) = n = 1.
Little analogy between the cases q ≥ 3 and q = 2 B. Khoromskij, Leipzig 2005(L6) 139
V. However, it is possible to prove very important uniqueness
property within the equivalence classes.
Two representations like (55) are considered as equivalent (essential
equivalence) if either
(a) they differ in the order of terms or
(b) for some set of paramers ak ∈ R such that
qQ=1
ak = 1 (k = 1, ..., r),
there is a transform V k → a
kV k .
A simplified version of the general uniqueness result is the
following (all factors have the same full rank r).
Prop. 1. (J. Kruskal, 1977) Let for each = 1, ..., q, the vectors V k ,
(k = 1, ..., r) with r = rank(A), are linear independent. If
(q − 2)r ≥ q − 1,
then the decomposition (55) is uniquely determined up to the
equivalence (a) - (b) above.
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 140
Def. 5.4. (cf. Lect. 5) Given the multi-variate function
g : Ω→ R with d = qp, p, q ∈ N, q ≥ 2,
Ω = (ζ1, ..., ζq) ∈ Rd : ‖ζ‖∞ ≤ L, = 1, ..., q with L > 0,ζ ∈ Rp. Let ζ1
i1, ..., ζq
iq be the set of collocation points leaving
on the tensor-product lattice in Ω and indexed by Id. We
recall the defintion of function-generated q-th order tensor:
A ≡ A(g) := [ai1...iq ] ∈ RId
with ai1...iq := g(ζ1i1 , ..., ζ
qiq
). (56)
First, we introduce a low Kronecker rank approximation to
the q-th order tensor A = A(g) ∈ RId
with |Id| = nqp,
Ar := A(gr), gr :=r∑
k=1
Φ1k(ζ1) · · ·Φq
k(ζq) ≈ g,
where gr is a separable approximation to g.
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 141
We assume that the error g − gr can be estimated in the
L∞(Ω)- or in L2(Ω)-norm, ‖u‖L2 :=√∫
Ωu2(ζ)dζ.
In particular, this might correspond to the Nystrom
discretisation of IOs in Rd (with q = d, p = 2),
(Au) (x) :=∫
Ω
g(x, y)u(y)dy, x, y ∈ Ω ∈ Rd.
In the latter case we have
gr :=r∑
k=1
Φ1k(x1, y1) · · ·Φd
k(xd, yd).
Furthermore, we denote Id = I ×J where I, J are associated
with x ∈ RI and y ∈ RJ .
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 142
For the error analysis of the Kronecker approximand we make
use of the Euclidean (Frobenius), and ‖ · ‖∞- tensor norms
‖x‖2 :=√∑
i∈Ix2i , ‖x‖∞ := max
i∈I|xi|, x ∈ R
I .
Let g − gr be smooth enough. Then for a quasi-uniform
distribution of collocation points we have
‖A(g)−A(gr)‖2 ≤ CN
1/2I N
1/2J
Lq/2‖g − gr‖L2 . (57)
The next lemma describes relations between the
approximation error ‖g − gr‖ evaluated in different norms and
the corresponding error ‖A(g)−A(gr)‖ of the Kronecker
product representation.
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 143
Lem. 6.1. We have ‖A−Ar‖∞ ≤ ‖g − gr‖L∞(Ω).
For any x ∈ RI, y ∈ RJ , the consistency error A−Ar can be
bounded by
|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1≤ N
1/2I N
1/2J ‖g − gr‖L∞(Ω) ‖x‖2‖y‖2, (58)
|〈(A−Ar)x, y〉| ≤ CN
1/2I N
1/2J
Lq/2‖g − gr‖L2(Ω) ‖x‖2‖y‖2. (59)
Proof. The first assertion follows by the construction of Ar,
‖A− Ar‖∞ = max(i1,...,id)∈Id
|g(ζ1i1 , ..., ζ
qiq
)−r∑
k=1
Φ1k(ζ1
i1) · · ·Φqk(ζq
iq)|
≤ ‖g − gr‖L∞(Ω) .
Now we readily obtain
|〈(A−Ar)x, y〉| ≤ ‖g − gr‖L∞(Ω)
∑i∈I, j∈J
|xiyj| ≤ ‖g − gr‖L∞(Ω) ‖x‖1‖y‖1,
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 144
which proves (58) since ‖x‖1 ≤ N1/2I ‖x‖2 and ‖y‖1 ≤ N
1/2J ‖y‖2.
Now, applying the Cauchy-Schwarz inequality we have
|〈(A−Ar)x, y〉| ≤∑
i∈I, j∈J|(aij − ar,ij)xiyj|
≤ ‖A−Ar‖2√ ∑
i∈I, j∈Jx2i y
2j ≤ ‖A−Ar‖2‖x‖2‖y‖2.
Then (59) follows from the first norm equivalence in (57).
In many applications the generating function g(ζ) depends
only on a few scalar variables which are functionals of ζ.
Ex. 6.1. Consider a function depending only on one scalar
parameter,
g(ζ) = G(ρ(ζ)) where G : [0, a] → R
with ρ : [−L, L]p → [0, a], a > 0.
Function-generated tensors B. Khoromskij, Leipzig 2005(L6) 145
In the case ρ(ζ) = ‖ζ‖2, the separable approximation gr(ζ) can
be derived from an approximation Gr to the uni-variate
function G(ρ), by exponential sums.
It is easy to see that the approximation error g − gr arising in
Lem. 6.1 can be estimated via the corresponding error G−Gr.
Lem. 6.2. The following estimates are valid
‖g − gr‖L∞ = ‖G−Gr‖L∞ ,
‖g − gr‖L2(Ω) ≤ CLq−12 ‖G−Gr‖L2[0,a].
Proof. The first statement is trivial. The second bound is
obtained by passing to integration in the q-dimensional
spherical coordinates.
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 146
Given the integral operator A : L2(Ω) → L2(Ω) in Rd, d ≥ 2,
(Au) (x) :=∫
Ω
g(x, y)u(y)dy, x, y ∈ Ω := [0, 1]d
with the shift-invariant kernel function g(x, y) = g(|x− y|).A principal ingredient in the HKT representation of the
Galerkin discretisations in Rd is a separable approximation of
the multi-variate function representing the kernel of an IO.
Clearly, g(x, y) can be represented in the form
g(x, y) = G(ζ1, ..., ζd) ≡ g
(√ζ21 + ... + ζ2
d
),
where ζ = |x − y| ∈ [0, 1], = 1, ..., d.
With fixed 0 ≤ α0 < 1, we introduce the auxiliary function
F (ζ1, ..., ζd) := (ζ1 · · · ζd−1)α0G(ζ1, ..., ζd). (60)
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 147
We suppose that a multi-variate function F : Rd → R can be
well approximated by a separable expansion
Fr(ζ1, ..., ζd) :=r∑
k=1
Φ1k(ζ1) · · ·Φd
k(ζd) ≈ F, (61)
where the set of functions Φk : = 1, ..., d, k = 1, ..., r with
Φk : [0, 1] → R is fixed or can be chosen adaptively.
We apply a Galerkin scheme by tensor-product test functions
φi(x1, ..., xd) = φi11 (x1)···φid
d (xd), i = (i1, ..., id), i ∈ In := 1, ..., n.
Now we approximate the Galerkin stiffness matrix
A = (Aφi, φj)L2i,j∈Idn∈ R
N×N , N = nd,
by a matrix A(r) of the form A(r) =r∑
k=1
V 1k ⊗ · · · ⊗ V d
k ≈ A.
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 148
Here the V k , = 1, ..., d, are n× n matrices given by
V k =
∫ 1
0
|x − y|−α Φk(|x − y|)φi
(x)φj
(y)dxdy
n
i,j=1
(62)
with α = α0 ≥ 0, = 1, ..., d− 1, and αd = 0 (see (60)).
Def. 6.2. A function g(x, y), x, y ∈ Rd, is called
asymptotically smooth if there exists γ ≥ 1, and p ∈ R such
that for all x, y ∈ Rd, x = y, and all multi-indices α, β such that
|α|+ |β| > 0 with |α| = α1 + ... + αd, we have
|∂αx ∂β
y g(x, y)| ≤ Cα!β!γ|α|+|β||x− y|−p−|α|−|β|.
The next lemma shows that the error ‖A−A(r)‖ with respect
to usual norms is directly related to the accuracy ‖F − Fr‖∞of the separable approximation (61) of F .
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 149
Lem. 6.3. Let (61) be valid, then for any i, j ∈ Idn, we have
|ai,j − ari,j| ≤ ‖F − Fr‖∞
d∏=1
∥∥∥|x − y|−α φi
(x)φj
(y)∥∥∥
L1([0,1]×[0,1])
for the components of A−A(r).
Let us further assume that the function
g,k(u, v) := |u− v|−αΦk(|u− v|), (u, v) ∈ [0, 1]2,
is asymptotically smooth for = 1, ..., d, k = 1, ..., r. Then, for
low-order piecewise polynomial Galerkin basis functions, V k
can be approximated by a rank-m H-matrix V k with an error
‖V k − V
k ‖ ≤ Cηm for some η < 1.
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 150
Proof. By construction we obtain
|ai,j − ari,j| =
∣∣∣∣∣∫
Ω×Ω
(F − Fr)
(d∏
=1
|x − y|−α
)φi(x)φj(y)dxdy
∣∣∣∣∣≤ ‖F − Fr‖∞
∥∥∥∥∥(
d∏=1
|x − y|−α
)φi(x)φj(y)
∥∥∥∥∥L1(Ω×Ω)
= ‖F − Fr‖∞d∏
=1
∥∥∥|x − y|−α φi
(x)φj
(y)∥∥∥
L1([0,1]×[0,1]),
where the last eq. follows by inserting the tensor-product
basis and by separating the 2d-dimensional integral.
Second, V k given by (62) appears to be the exact Galerkin
stiffness matrix for an IO with the kernel function g,k(u, v)(u, v) ∈ [0, 1]× [0, 1]. Since g,k(u, v) is supposed to be
asymptotically smooth, the result follows by the conventional
theory of the H-matrix approximation.
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 151
Note that due to Lem. 6.3, ‖A−A(r)‖ can be easily estimated
in the Frobenius, l2 or l∞ matrix norms. In particular,
‖A−A(r)‖∞ ≤ nd‖F−Fr‖∞d∏
=1
∥∥∥|x − y|−αφi
(x)φj
(y)∥∥∥
L1([0,1]×[0,1]).
Several methods of separable approximations to multi-variate
functions are presented in Part I.
In general, approximability property (61) can be validated by
using the tensor-product Sinc interpolation, where the factor
Φk(|u− v|) can be proved to be asymptotically smooth.
For the class of kernel functions approximated by the
quadrature-type methods, the factor Φk(|u− v|) even appears
to be globally smooth (indeed, it is the entire function).
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 152
Lem. 6.4. For both the tensor-product Sinc-interpolation
and for the quadrature methods the function g,k(u, v) (cf.
Lem. 6.3) is asymptotically smooth (AS).
Proof. In the first case we have
g,k(u, v) = |u− v|−αSk,h(φ−1(|u− v|)), u, v ∈ [0, 1],
where Sk,h refers for the k-th Sinc function with step-size h,
and φ−1(x) = arsinh(arcosh( 1x )). Since Sk,h(x), x ∈ R, is
holomorphic in x, and since the factor |u− v|−α is AS, we
conclude that g,k(u, v) has the same property.
Applying quadrature method, we obtain the entire function
Φk(|u− v|) = exp(−tk|u− v|2), tk > 0.
Then the previous argument completes the proof.
Galerkin discretisation B. Khoromskij, Leipzig 2005(L6) 153
Lem. 6.3 and 6.4 prove the existence of a low Kronecker
rank HKT approximation to the class of multi-dimensional
integral operators.
Given a tolerance ε > 0, in general, we have the bound
r = O([
log(
1h
)log
(1ε
)log
(log
1ε
)]d−1)
,
where h = O(n−1) is the mesh-size of the FE discretisation.
In the case of translation-invariant kernels, we obtain a
dimensionally independent bound
r = O(
log n log(
1ε
)log
(log
1ε
)),
see examples below.
Main examples B. Khoromskij, Leipzig 2005(L6) 154
Toward a separable approximation to the multi-variate
functions
1x1 + ... + xd
and1√
x21 + ... + x2
d
(xi > 0, i = 1, ..., d).
Ex. 6.1. In the first case, to apply the Sinc method, we
make use of the Laplace integral transform
1ρ
=∫
R+
e−ρtdt (ρ > 0) (63)
with the integrand f(t) = e−ρt, assuming that ρ ∈ [1, R], R > 1.
In order to apply the improved error estimate , we make use
of substitutions t = log(1 + eu) and u = sinh(w) to obtain
1ρ
=∫
R
f2(w)dw with f2(w) =cosh(w)
1 + e− sinh(w)e−ρ log(1+esinh(w)).
(64)
Main examples B. Khoromskij, Leipzig 2005(L6) 155
The decay of f2 on the real axis is
f2(w) ≈ 12ew− ρ
2 ew
as w →∞; f2(w) ≈ 12e|w|− 1
2 e|w|as w → −∞,
corresponding to C = 12 , b = min1, ρ/2, a = 1 in Thm. 2.6.
Lem. 6.5. (Hackbusch, BNK) If ρ ∈ [1, R], the choice
δ = δ(R) = O(1/ log(R)), a = 1, b = 1/2 in Thm. 2.6 (with the
corresponding value of h) implies the uniform quadrature
error estimate
∣∣∣∣1ρ − IM (f2, h)∣∣∣∣ Ce
− π2M
log(3R) log(π2M) . (65)
Main examples B. Khoromskij, Leipzig 2005(L6) 156
In the case of 1/ρ = 1x1+...+xd
, the estimate (114) implies that
an approximation of accuracy ε is obtainable with
M ≤ O (log( 1
ε ) · log R), (66)
provided that 1 ≤ x1 + ... + xd ≤ R, which can be achieved by a
proper scaling. The numerical results even support the better
estimate M ≤ O (log( 1
ε ) + log R)
(see Fig. 19, 13).
0 200 400 600 800 1000−8
−6
−4
−2
0
2
4
6x 10
−6
0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1x 10
−8
0 200 400 600 800 1000−1
−0.5
0
0.5
1
1.5
2
2.5
3x 10
−13
Figure 12: The absolute quadrature error for (64) with 1 ≤ ρ ≤ 103, and
with M = 16 (left), M = 32 (middle), M = 64 (right).
Main examples B. Khoromskij, Leipzig 2005(L6) 157
0 0.5 1 1.5 2
x 104
−8
−6
−4
−2
0
2
4
6
8x 10
−6
0 0.5 1 1.5 2
x 104
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2x 10
−7
0 0.5 1 1.5 2
x 104
−4
−2
0
2
4
6x 10
−10
Figure 13: The absolute quadrature error for (64) with 1 ≤ r ≤ 18000,
and with M = 16 (left), M = 32 (middle), M = 64 (right).
Lem. 6.5 also shows that the separation rank r = 2M + 1depends only linear-logarithmically on both the tolerance
ε > 0 and the upper bound R of ρ = x1 + ... + xd. Hence, there
is no dependence on the dimension d.
Main examples B. Khoromskij, Leipzig 2005(L6) 158
Ex. 6.2. In the case of Newton potential 1/√
x21 + ... + x2
d, we
make use of the Gauss integral
1ρ
=2√π
∫R+
e−ρ2t2dt (ρ ∈ [1, R]) . (67)
To obtain robustness in ρ, we rewrite the Gauss integral (67)
using substitutions t = log(1 + eu) and u = sinh(w),
1ρ
=∫
R
f(w)dw with f(w) := cosh(w)F (sinh(w)) (68)
with
F (u) :=2√π
e−ρ2 log2(1+eu)
1 + e−u.
Main examples B. Khoromskij, Leipzig 2005(L6) 159
Lem. 6.6. Let δ < π/2, ρ ≥ 1. Then for the function f from
(271) we have f ∈ H1(Dδ).
In addition, Thm. 2.6 is satisfied with a = 1.
The improved (2M + 1)-point quadrature with the choice
δ(ρ) = πC+log(ρ) allows the error bound∣∣∣∣1ρ − IM (f, h)
∣∣∣∣ ≤ C1 exp(− π2M
(C + log(ρ)) log M
). (69)
Proof. It is easy to check that f is holomorphic in Dδ and
N(f, Dδ) <∞ uniformly in ρ (with the choice δ = δ(ρ)). Now
we check the double-exponential decay of the integrand as
|w| → ∞ and then apply Thm. 2.6, where
δ = δ(ρ) =π
C + log(ρ).
Main examples B. Khoromskij, Leipzig 2005(L6) 160
We apply (69) and obtain the bound (70),
M ≤ O (log( 1
ε ) · log R). (70)
Hence again there is no dependence on the dimension d.
Numerical examples for this quadrature with values ρ ∈ [1, R],R ≤ 5000, are presented in Fig. 21.
0 50 100 150 200−4
−3
−2
−1
0
1
2
3x 10
−8
0 200 400 600 800 1000−3
−2
−1
0
1
2
3
4x 10
−7
0 1000 2000 3000 4000 5000−5
0
5x 10
−7
Figure 14: The absolute quadrature error for M = 64 with R = 200 (left),
R = 1000 (middle), R = 5000 (right).
Further examples B. Khoromskij, Leipzig 2005(L6) 161
Again, we observe almost linear error growth in ρ. Similar
results were obtained in the case R > 5000 manifesting a
rather stable behaviour of the quadrature error with respect
to R.
Ex. 6.3. log(x + y)
In boundary element methods (BEM), one is interested in a
low separation rank representation of the kernel function
s(x, y) = log(x + y), x ∈ [0, 1], y ∈ [h, 1] with some small
mesh-size parameter h > 0. A representation like
1x + y
=k∑
m=1
Φm(x)Ψm(y) + δk with |δk| ≤ ε (71)
can be constructed by means of the quadrature applied to the
integral (64) with ρ = x + y and k = 2M + 1. Let ψm be the
Further examples B. Khoromskij, Leipzig 2005(L6) 162
anti-derivatives of a function Ψm. Integration of (71) yields
log(x + y) =
y∫1−x
dt
x + t=
y∫1−x
(k∑
m=1
Φm(x)Ψm(t) + δk
)dt
=k∑
m=1
Φm(x)[ψm(y)− ψm(1− x)] + Sk
= Φ0(x) +k∑
m=1
Φm(x)ψm(y) + Sk
with Φ0(x) = −k∑
m=1Φm(x)ψm(1− x) and |Sk| =
∣∣∣∣∣ y∫1−x
δkdt
∣∣∣∣∣ ≤ ε.
This resulting representation of log(x + y) has the separation
rank k + 1 and the same accuracy ε as (71).
Further examples B. Khoromskij, Leipzig 2005(L6) 163
Ex. 6.4. Helmholtz kernel in Rd
Given κ ∈ R, define the Helmholtz kernel function
g(x, y) :=cos(κ|x− y|)|x− y| = e
eiκ|x−y|
|x− y| for (x, y) ∈ [0, 1]d × [0, 1]d
in Cartesian coordinates x = (x1, ..., xd), y = (y1, ..., yd) ∈ Rd. The
Sinc approximation can be applied in the case of a weakly
admissible block (in the H-matrix techniques) w.r.t. the
transformed variables ζ1, ..., ζd . For (ζ1, ..., ζd) ∈ [0, 1]d, define
G(ζ1, ..., ζd) := g(x, y), ζ = |x − y|, = 1, ..., d,
which implies
G(ζ1, ..., ζd) := cos(
κ√
ζ21 + ... + ζ2
d
)/√
ζ21 + ... + ζ2
d .
We approximate the modified function
F (ζ1, ..., ζd) := (ζ1 · ... · ζd−1)α0G(ζ1, ..., ζd), 0 < α0 < 1,
Further examples B. Khoromskij, Leipzig 2005(L6) 164
on the domain Ω1 := [0, 1]d−1 × [h, 1], where h > 0 is a small
(mesh) parameter.
Now we apply Thm. 4.5 with δ = 1/| log h| to construct the
approximation GM (x) via the interpolation of F and obtain
|G(x)−GM (x)| ≤d−1∏=1
x−α0
∣∣EM (F, h)(φ−1(x))∣∣ (72)
≤ Chα0(1−d)| log h|Λd−1M N0(F, Dδ) e−πM/(| log h| log M)
with ζ ∈ (0, 1]d.
For this example N0(F, Dδ) = O(eκ), while the Kronecker rank
is given by r = (2M + 1)d−1. Clearly, for a large κ, the bound
(72) does not provide a satisfactory complexity.
Literature to Lecture 6 B. Khoromskij, Leipzig 2005(L6) 165
1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. D. Braess and W. Hackbusch: Approximation of 1x
by exponential sums in [1, ∞). To appear in IMA JNA.
3. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Hierarchical Kronecker tensor-product approximation.
Preprint 35, MPI MIS, Leipzig 2003 (JNA, to appear).
4. J. B. Kruskal: Three-way arrays: Rank and uniqueness of trilinear decompositions. Linear Algebra
Appl., 18 (1977), 95-138.
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor6.ps
Lect. 7. Structured Representation to Matrix-Valued Functions B. Khoromskij, Leipzig 2005 166
The matrix-valued functions (MVF) of the discrete (elliptic)
operator L arise in wide range of applications. Structured
tensor-product representations are developed for several
classes of MVFs:
F1(L) :=L−α, α > 0,
F2(L) :=e−tL,
F3,k(L) := cos(t√L)L−k, k ∈ N,
F4(L) :=∫ ∞
0
e−tL∗Ge−tLdt,
F5(L) := sign(L).
Both the discrete elliptic inverse L−1 and the matrix
exponential e−tL play an important role in numerical PDEs.
Usually MVFs appear to be fully populated, hence data-sparse
formats are needed for their efficient representation.
Representation of Operators B. Khoromskij, Leipzig 2005(L7) 167
There are different methods to represent MVFs (set L = A):
• In the case of diagonalisable matrices, i.e., A = T−1DT
with D = diagd1, ..., dn - diagonal, one defines
F (A) = T−1F (D)T, F (D) = diagF (d1), ..., F (dn).• Dunford-Cauchy integral for analytic functions
F (A) =1
2πi
∫Γ
F (z)(zI −A)−1dz, Γ ∈ C.
• Laplace type transform
F (A) =∫
R
f(t)e−tAdt.
• Transforms via trigonometric kernels
F (A) =∫
R
[a(t) cos(tA) + b(t) sin(tA)]dt.
• Polynomial expansions or/and nonlinear iterations.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 168
Ex. 7.1. The solution operator to the initial value parabolic
problem∂u
∂t+ Lu(t) = 0, u(0) = u0 ∈ X, (73)
is given by
T (t;L) = e−tL =∫
Γ
e−zt(zI − L)−1dz,
where L is an elliptic operator (say, L = −∆) in a Hilbert
space X and u(t) is a vector-valued function u : R+ → X.
Given the initial vector u0, the solution of the initial value
problem can be represented by u(t) = T (t;L)u0.
A simple example of a parabolic PDE is the 1D heat equation
∂u
∂t− ∂2u
∂x2= 0, u : R+ × [0, 1]→ R
with the corresponding boundary and initial conditions.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 169
Ex. 7.2. Initial-value problem for the second order
differential equation with an operator coefficient
u′′(t) + Lu(t) = 0, u(0) = u0, u′(0) = 0,
has the solution operator
C(t;L) := cos(t√L) =
∫Γ
cos(t√
z)(zI − L)−1dz,
(the hyperbolic operator cosine family), so that
u(t) = C(t;L)u0.
It represents the function-to-operator map cos(t√·)→ C(t;L).
An example of a hyperbolic PDE is the classical wave eq.
∂2u
∂t2− ∂2u
∂x2= 0
subject to the corresponding boundary and initial conditions.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 170
Ex. 7.3. For the boundary value problem
d2u
dx2− Lu = 0, u(0) = 0, u(1) = u1, (74)
in a Hilbert space X, the solution operator is the normalised
hyperbolic operator sine family
E(x;L) :=(sinh(
√L))−1
sinh(x√L) =
∫Γ
sinh(x√
z)sinh(
√z)
(zI − L)−1dz,
so that u(x) = E(x;L)u1.
The simplest PDE of the type (74) is the Laplace equation in
a cylindric domain:
d2u
dx2+
d2u
dy2= 0, x ∈ [0, 1], y ∈ [c, d],
u(0, y) = 0, u(1, y) = u1(y).
Rem. 7.1 Constructions 7.1-7.3 are useful to avoid time
stepping and hence allow parallel (in time) computations.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 171
Ex. 7.3. For the Sylvester matrix equation
AX + XB = G, (A, B, G ∈ Rn×n given)
the solution X ∈ Rn×n is given by the integral
X = F(A, B)G :=∫ ∞
0
e−tAGe−tBdt,
supposing that A, B provide existence of this integral (cf.
Lect. 5). The (nonlinear) Riccati matrix equation
AX + XA + XFX = G, (75)
where A, F, G ∈ Rn×n are given and X ∈ Rn×n is the unknown
matrix, can be solved by Newton’s iteration. At each iteration
step the Lyapunov equation has to be solved (Xk → X)
(A− FXk)Xk+1 + Xk+1(A− FXk) = −XkFXk + G.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 172
Ex. 7.5. Let A ∈ Rn×n be a matrix whose spectrum σ(A)does not intersect the imeginary exis. The matrix function
F (A) = sign(A) is defined by
sign(A) :=1πi
∫Γ+
(zI −A)−1dz − I (76)
with Γ+ being any simply connected closed curve in C whose
interior contains all eigenvalues of A with positive real part.
The HKT representation to the MVF sign(A) is based on an
efficient quadrature for the integral
sign(A) =1cf
∫R+
f(tA)t
dt.
Efficiet numerical implementation is possible for certain
functions f having trigonometric structure (cf. Lect. 8).
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 173
Ex. 7.6. A negative fractional power of A is represented by
A−σ =1
Γ(σ)
∫ ∞
0
tσ−1e−tAdt, σ > 0, (77)
provided that the integral exists.
With the choice A = −∆, the representation (77) would be of
the particular interest in the cases:
(a) σ = 1 (inverse Laplacian),
(b) σ = 1/2 (preconditioning for the Laplace-Beltrami
operator (−∆)1/2, and for the hypersingular integral operator,
e.g., in BEM applications),
(c) σ = 2 (inverse biharmonic operator).
A positive fractional power of A, say A1/2, can be represented
by a simple factorisation
A1/2 = A A−1/2.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 174
Ex. 7.7. In some cases iterative schemes (with possible
recompression at each iteration) can be applied.
(a) An approximation to A−1: given X0 ∈ Rn×n, the
Newton-Schulz iteration
Xk+1 = Xk(2I −AXk), k = 1, 2, ... (78)
converges to A−1 locally quadratically (cf. anylisis below).
Iteration (78) is nothing but the Newton method
Ψ′(Xk)(Xk+1 −Xk) = −Ψ(Xk)
for solving the nonlinear matrix equation
Ψ(X) := A−X−1 = 0.
In fact, Ψ(X + δ)−Ψ(X) = X−1δ(X + δ)−1 providing
Ψ′(Xk)(δ) = X−1k δX−1
k . Now (78) follows from
Xk+1 −Xk = −Xk(A−X−1k )Xk.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 175
(b) Newton-Schulz iteration scheme to approximate sign(A):
Xk+1 = Xk +12[I − (Xk)2
]Xk , X0 = A/||A||2. (79)
For diagonalisable matrices we have locally quadratic
convergence Xk → sign(A) (see the analysis below).
This scheme was already successfully applied in many-particle
calculations.
The above mentioned schemes (a) and (b) are especially
efficient in the case q = 2, since the optimal SVD or ACA
recompression in the H- and HKT-formats can be applied.
(c) Newton’s method to calculate sign(A). The iteration
X0 = A, Xk+1 =12(Xk + X−1
k ) (80)
converges (locally quadratically) to sign(A). This method is
proved to be efficient in the H-matrix arithmetics.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 176
Ex. 7.8. The matrix exponential can be defined and then
calculated by
exp(A) :=∞∑
k=0
1k!
Ak ≈ EN :=N−1∑k=0
1k!
Ak. (81)
This approximation converges exponentially (if N is large
enough, say, N ≥ e||A||),
||EN − exp(A)|| ≤∞∑
k=N
1k!||A||k ≤ C(||A||)
N !≈(
e||A||)N
)N
.
The Horner scheme to calculate (81) requires only N − 1matrix multiplications
AN := I; for k = N − 1 downto 1 do Ak :=1k
Ak+1A + I,
such that EN := A0.
Examples of matrix-valued functions B. Khoromskij, Leipzig 2005(L7) 177
If ||A|| > 1 the algorithm (81) may produce very large terms
for intermediate values of N !
Recal that for commutative matrices A, B we have
exp(A + B) = exp(A) exp(B), in particular exp(A) = [exp(A/2)]2.
Now, the algorithm (81) can be modified as follows:
(a) Choose n such that 12n ‖A‖ ≤ 1.
(b) Compute B = exp(A/2n) by algorithm (81).
(c) Compute exp(A) = B2n
in n ≈ log2(‖A‖) matrix quadrations.
If B = exp(A/2n) can be represented in certain data-sparse
format (e.g., H-matrix or Kronecker product form) then
truncating all the intermediate products B2m
, m = 1, ..., n, into
the fixed format leads to the desired representation of exp(A).In this case, the truncation error analysis is still an open
question.
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 178
Newton-Schulz iteration (78) to compute A−1.
Denote the residual error by Ek = I −AXk, k = 0, 1, 2, . . .. It is
easy to see that
Xk+1 = Xk(I + Ek), k = 0, 1, 2, . . . ,
which implies (for k = 1, 2, . . .)
Ek = I−AXk−1(I+Ek−1) = I−(I−Ek−1)(I +Ek−1) = E2k−1. (82)
Applying (82) recursively, we find that
Ek = E2k
0 , k = 1, 2, . . . . (83)
It is also clear that
A−1 −Xk = A−1Ek = A−1E2k
0 = X0(I −E0)−1E2k
0 .
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 179
Under the assumption on the spectral radius of E0,
ρ ≡ ρ[E0] = maxj|λj | < 1,
where λj = λj(E0) are the eigenvalues of E0, we obtain that
the error Ek in (83) vanishes like ρ2k
.
Rem. 7.1. The iteration (78) can be applied to any
preconditioned matrix B = R0A, where R0 is a spectrally
equivalent preconditioner to A so that σ(B) is uniformly
bounded in n. Assuming that both R0 and R0A already have
the H-matrix representation, we then obtain the approximate
inverse of interest from
A−1 = (R0A)−1R0.
In some cases this approach provides the constructive proof
for the existence of the H-matrix inverse.
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 180
Let E0 = I −BX0. The requirement ρ[E0] < 1 can be achieved
under the following conditions.
Lem. 7.1. Let B have real eigenvalues in the interval
0 < m ≤ λj ≤ M , j = 1, 2, . . . , n. Let X0(w) = wI, then ρ[E0] < 1for all w ∈ (0, 2
M ). Moreover, if ρ(w) = ρ[E0(w)], then there
holds
ρ(w∗) = minw∈(0, 2
M )ρ(w) =
M −m
M + m< 1, w∗ =
2M + m
. (84)
Proof. This lemma is a reformulation of a standard
convergence result for the Richardson iteration.
Implementing (78) in the formatted H-matrix arithmetics one
can compute the H-matrix approximation Xk to A−1 with
O(log log ε−1) iterations, where ‖I −AXk‖ ≤ ε.
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 181
Newton-Schulz iteration (274) to compute sign(A).
Diagonalisable case. Let T be the unitary transform that
diagonalises A, i.e., A = T DT with di ∈ [−1, 1], then it also
diagonalises all Sk, k = 1, 2, .... Hence we have to show that
the scalar iteration
xk+1 = f(xk), with x0 ∈ [−1, 0) ∪ (0, 1]
and with f(x) := x + 12x(1− x2) ≡ xg(x), converges to sign(x0)
quadratically.
Clearly, f(x), x ∈ [−1, 1], is increasing and has the fixed points
x = −1, 0, 1. Since on the interval (−1, 1) we have g(x) > 1, it
implies 0 < xk < xk+1 ≤ 1 if x0 ∈ (0, 1] and −1 ≤ xk+1 < xk < 0 if
x0 ∈ [−1, 0).
Hence, both x = −1 and x = 1 are stable fixed points.
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 182
For example, consider the case with small initial guess x0 > 0.For x ∈ [−1/2, 1/2], we have g(x) ≥ q > 1 with q = 1 + 3/8, thus
the number of iterations xk+1 = xkg(xk) to achieve the value,
say, xk = 0.5 starting from x0 > 0 is about O(logq x0).
For xk ≥ 1/2, we enter the regime with quadratic
convergence. In fact, we just have
1− xk+1 =12(1− xk)2(xk + 2),
which implies |1− xk+1| ≤ 32 (1− xk)2. In this stage, to achieve
precision ε > 0 one requires O(log2 log2 ε−1) iterations.
For the initial guess we actually have x0 = cond(A)−1, which
implies that the total number of iterations is bounded by
O(log2 log2 ε−1) + O(logq cond(A)).
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 183
Note that iteration (274) can be written as Xk = Φ(Xk−1)with Φ(X) := X + 1
2
(I −X2
)X (see Lect. 8). Clearly, (274)
ensures that all Xk (k = 1, 2, ...) are simultaniously diagonalised
by the same matrix T , hence we have (with B = sign(A)):
Φ(X)−B = X −B +12(B2 −X2)X
=12(X −B)(B(B −X) + (B −X)(B + X)
= −(X −B)2(B +12X). (85)
The analysis for algorithm (80) in the diagonalisable case is
reduced to that one for the Newton meth. applied to the eq.
Ψ(x) := x2 − 1 = 0,
that is xk+1 = 12 (xk + 1
xk).
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 184
The iterative calculation may be not very simple !
Newton iteration to compute the square root A1/2 of the
symmetric positive definite matrix A: Given X0, the iteration
Xk∆k + ∆kXk = A−X2k , (86)
where ∆k = Xk+1 −Xk, converges to A1/2 quadratically
(locally). It requires solving matrix Lyapunov equation.
This scheme can be consider as the Newton iteration to solve
the nonlinear matrix equation
Ψ(X) := A−1 −X2 = 0.
Clearly,
Ψ(X + δ)−Ψ(X) = −X∆−∆X,
so our iteration can be interpreted as the Newton method for
solving Ψ(X) = 0 (see Lect. 8 for the analysis of truncated
iterations).
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 185
Iteration (86) can be written as Xk = Φk(Xk−1) corresponding
to the choice
Φk(X) := Φ(X),
where Φ(X) solves the matrix equation
X(Φ(X)−X) + (Φ(X)−X)X = A−X2.
Simple calculation shows that the latter equation implies
(with the substitution A = B2)
X(Φ(X)−B) + XB −X2 + (Φ(X)−B)X + BX −X2 = B2 −X2,
which leads to the matrix Lyapunov equation with respect to
Y = Φ(X)−B,
XY + Y X = (B −X)2.
Analysis of iterative schemes B. Khoromskij, Leipzig 2005(L7) 186
Making use of the solution operator for the Lyapunov
equation (assume that X = X > 0), we arrive at the norm
estimate
‖Φ(X)−B‖ ≤∥∥∥∥∫ ∞
0
e−tX(B −X)2e−tXdt
∥∥∥∥ ≤ C‖B −X‖2.
This proves relation (3) in Lem. 8.1 with α = 2. Hence, Thm.
8.1 ensure the convergence of the truncated version of the
nonlinear iteration (86).
Note that the simpler iteration
X0 = a0A, Xk := Xk−1−12(Xk−1−X−1
k−1A) (k = 1, 2, . . .) , (87)
where a0 > 0 is the given constant, does not guarantee, in
general, the convergence of truncated iterations.
Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 187
We analyse the case of second order tensors (q = 2)
Ar =r∑
k=1
Uk ⊗ Vk, Uk ∈ Rm×m, Vk ∈ R
n×n.
Recall that for a matrix A ∈ Rm×n we use the vector
representation A → vec(A) ∈ Rmn, where vec(A) is an nm× 1vector obtained by “stacking” A’s columns
vec(A) := [a11, ..., an1, a12, ..., anm]T ,
so, vec(A) is a rearranged version of A. Introduce the linear
invertible operator L : Rmn×mn → Rm2×n2by
L(Ar) ≡ Ar :=r∑
k=1
vec(Vk)⊗ vec(Uk)T .
L is unitary with respect to the spectral or Frobenius norm,
but there is no permutation matrix P with Ar = PArPT .
Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 188
Making use of the transform L allows to reduce the low
Kronecker rank approximation of A to those for the low-rank
approximation to A. For fixed r one may apply truncation
operator R of the form
R(A) := L−1(Πr(L(A))),
where Πr(A) is the best rank-r approximation to A in the
given norm (say, spectral or Frobenius norm).
We formulate the general statement. Let B = F(A) be
defined by the given matrix-valued function F and let R be
the truncation operator that satisfies
‖X −RX‖ ≤ CR‖X −B‖ (88)
for all X in the “small” neighbourhood S(B) of B.
In particular, we consider F(A) = A−1, F(A) =√
A and
F(A) = sign(A).
Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 189
Consider the case (78). Introduce the modified (truncated)
Newton-Schultz iteration
Zk+1 = Xk(2I −AXk), Xk+1 = R(Zk+1), k = 1, 2, ... (89)
Thm. 7.1. Let (88) be satisfied. Then for any initial guess
X0 = R(X0) ∈ S(B), the truncated Newton-Schultz iteration
(89) converges to A−1 quadratically
||A−1 −Xk|| ≤ (1 + CR)||A|| ||A−1 −Xk||2, k = 1, 2, ...
Proof. Note that (88) leads to
B ≡ A−1 = R(A−1).
Now equation (89) implies
A−1 − Zk+1 = (A−1 −Xk)A(A−1 −Xk) which yields
||A−1 − Zk+1|| ≤ ||A|| ||A−1 −Xk||2. (90)
Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 190
On the other hand, (88) implies
||Xk − Zk|| = ||R(Zk)− Zk|| ≤ CR||A−1 − Zk||,
hence the triangle inequality leads to
||A−1 −Xk|| ≤ ||A−1 − Zk||+ ||Zk −Xk|| ≤ (1 + CR)||A−1 − Zk||
Combinig this bound with (90) completes the proof.
Let us check (88) for the choice R(A) = L−1(Πr(L(A))). We
denote Y = L(X) and YB = L(B) and note that B = R(B)yields ΠrYB = YB.
In the following proof we make use of the standard stability
estimates for the singular values of the perturbed matrix
(Wielandt, Hoffman ’55).
Now we estimate in the Frobenis norm
Truncated Newton iteration to compute A−1 B. Khoromskij, Leipzig 2005(L7) 191
‖L−1‖−1‖X −RX‖ ≤ ‖(I −Πr)Y ‖
=
√√√√ n∑k=r+1
σk(Y )2
=
√√√√ n∑k=r+1
(σk(Y )− σk(YB))2
≤n∑
k=r+1
|σk(Y )− σk(YB)|
≤n−r+1∑
k=1
σk(Y − YB)
≤ √n− r||L(X −B)||.
Estimate (88) now follows with CR =√
n− r‖L−1‖‖L‖.
Few remarks B. Khoromskij, Leipzig 2005(L7) 192
1. Similar result holds in the spectral norm. The factor√n− r can be omitted due to the Mirsky theorem.
2. The error estimate above allows the straightforward local
analysis for algorithm (86) with the truncation operator R.
3. The truncated Newton-Schulz iterations (89) and (86) can
be analysed in the H-matrix format as well using the similar
techniques (but applied block-wise).
4. In the case of three (or more) factors (q ≥ 3) we can
analyse the sub-optimal truncation operator R via Tucker’s
decomposition.
Literature to Lecture 7 B. Khoromskij, Leipzig 2005(L7) 193
1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional
Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iteration for Structured Matrices.
Preprint MPI MIS 2005.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/khor7.ps
Lect. 8 Truncated iterations. Approximating a matrix exp(A) B. Khoromskij, Leipzig 2005(L8) 194
Let V be a normed space (e.g., n× n matrices) and consider
a function f : V → V. Assume that A ∈ V and B := f(A) can be
obtained by the locally convergent fixed-point iterations
Given X0 ∈ V, Xk = Φ(Xk−1), k = 1, 2, ... , (91)
where Φ : V → V is a one-step operator,
limk→∞
Xk = B = Φ(B). (92)
Lem. 8.1. Assume that there are constants cΦ, εΦ > 0 s.t.
‖Φ(X)−B‖ ≤ cΦ ‖X −B‖2 ∀ X with ‖X −B‖ ≤ εΦ, (93)
and set ε := min (εΦ, 1/cΦ). Then (92) holds for any X0
satisfying ||X0 −B|| < ε, and, moreover,
‖Xk −B‖ ≤ c−1Φ (cΦ ‖X0 −B‖ )2
k
(k = 0, 1, 2, . . .) . (94)
Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 195
Proof: Let ek := ‖Xk −B‖. Then, due to (93),
ek ≤ cΦe2k−1, provided that ek−1 ≤ εΦ. (95)
Since (95), ek−1 ≤ ε ≤ εΦ imply ek ≤ cΦε2 = ε (cΦε) ≤ ε. Hence,
all iterates stay in the ε-neighbourhood of B.
(94) is proved by induction:
ek ≤(95)
cΦe2k−1 =
induct. hypoth.cΦ ·
(c−1Φ (cΦe0)
2k−1)2
=c−1Φ (cΦe0)
2k
.
Whenever e0 < ε, (94) shows ek → 0.
Rem. 8.1. (94) together with e0 ≤ ε implies monotonicity:
‖Xk −B‖ ≤ ‖Xk−1 −B‖ . (96)
Rem. 8.2. Condition (93) is valid for the Newton iteration.
Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 196
Let S ⊂ V be a subset (not necessarily a subspace) considered
as a class of certain structured elements (e.g. structured
matrices) and suppose that R : V → S is an operator mapping
elements from V onto suitable structured approximants in S.
We call R a truncation operator.
Define a truncated iterative process as follows:
Y0 := R(X0), Yk := R(Φ(Yk−1)) (k = 1, 2 . . . .) . (97)
Thm. 8.1. Under the premises of Lem. 8.1, assume that
‖X −R(X)‖ ≤ cR ‖X −B‖ ∀ X with ‖X −B‖ ≤ εΦ. (98)
Then there exists δ > 0 such that the truncated iteration
(97) converges to B so that for k = 1, 2, . . .
‖Yk −B‖ ≤ cRΦ ‖Yk−1 −B‖2 with cRΦ := (cR + 1)cΦ (99)
for any starting value Y0 = R(Y0) satisfying ‖Y0 −B‖ < δ.
Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 197
Proof: Let ε := min (εΦ, 1/cΦ) and define Zk = Φ(Yk−1). By
(96) we have
‖Zk −B‖ ≤ ‖Yk−1 −B‖ ,
provided that ‖Yk−1 −B‖ ≤ ε. Then
‖Yk −B‖ = ‖R(Zk)− Zk + Zk −B‖ ≤ (cR + 1) ‖Zk −B‖ . (100a)
Assuming ‖Yk−1 −B‖ ≤ ε, the bounds ε ≤ εΦ and (93) ensure
‖Zk −B‖ = ‖Φk(Yk−1)−B‖ ≤ cΦ ‖Yk−1 −B‖2 . (100b)
Combining (100a) and (100b), we obtain (99) for any k,
provided that ‖Yk−1 −B‖ ≤ ε.
Similar to the proof of Lem. 8.1, the choice δ := min (ε, 1/cRΦ)guarantees that ‖Y0 −B‖ ≤ δ implies ‖Yk −B‖ ≤ δ ≤ ε, k ∈ N.
Truncated iterations. B. Khoromskij, Leipzig 2005(L8) 198
Cor. 8.1. Under the assumptions of Thm. 8.1, any starting
value Y0 with ‖Y0 −B‖ ≤ δ leads to
‖Yk −B‖ ≤ c−1RΦ (cRΦ ‖Y0 −B‖)2k
(k = 1, 2, . . .) , (101)
where cRΦ and δ are defined as above.
The condition (98) has a clear geometrical meaning. If
R(X) := argmin ‖X − Y ‖ : Y ∈ S
is the best approximation to X in the given norm, inequality
(98) holds with cR = 1, since B ∈ S. Therefore, (98) with
cR ≥ 1 can be viewed as a quasi-optimality condition.
If the norm is defined by a scalar product, S is a subspace
and R(X) is the orthogonal projection onto S, then (98) is
obviously fulfilled with cR = 1.
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 199
The next lemma is easy to prove.
Lem. 8.2. Let B = R(B) be fixed and assume that R is
Lipschitz at B or R is a bounded linear operator. Then the
inequality (98) holds.
Let V = RI×I be the space of matrices and S ⊂ V a subspace
with a prescribed sparsity pattern P ⊂ I × I, i.e., X ∈ S if and
only if Xij = 0 for all (i, j) /∈ P. A familiar example of a
truncation in this case is R(X) defined entry-wise by
R(X)ij =
⎧⎨⎩ Xij for (i, j) /∈ P,
0 for (i, j) ∈ P.(102)
Since R is linear, it satisfies the hypotheses of Lem. 8.2.
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 200
Rem. 8.3. Usually, the subset S as above is not helpful since
sparse argument A ∈ S yields fully populated result f(A).
However, it is well-known that after a DWT
X → L(X) :=W−1XWone can apply a matrix compression.
Figure 15: Wavelet transform of a matrix: “fingrer”-like structure.
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 201
Such a matrix compression is of the form (102) and will be
denoted by Π. Then, the truncation R applied to X is the
composition of the DWT L, the pattern projection Π and the
back-transformation L−1:
R := L−1 Π L. (103)
The same form of R is typical as well for many other choices
of L and Π.
Next, we give the characterization of Π that ensures the
property (98) for R.
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 202
Lem. 8.3. Let V and W be normed spaces and L : V → W a
bounded linear operator with a bounded inverse. Given
B ∈ V , assume that Π : W → W satisfies
‖Z −Π(Z)‖ ≤ cΠ ‖Z − L(B)‖ ∀ Z ∈ W (104)
with∥∥L−1(Z)−B
∥∥ ≤ εΦ. Then the truncation operator R of
the form (103) satisfies condition (98) with cR := cΠ ‖L‖ ‖L−1‖.Proof: Let Z = L(X). Then, obviously,
‖R(X)−X‖ =∥∥L−1(Π(Z)− Z)
∥∥ ≤ cΠ‖L−1‖ ‖Z − L(B)‖ ,
and it remains to observe that
‖Z − L(B)‖ = ‖L(X)− L(B)‖ ≤ ‖L‖ ‖X −B‖ .
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 203
Applications of Lem. 8.3 (in the case of H-matrices) are
facilitated by the following construction. Define a suitable
system of normed spaces W1, . . . , WN and set
W := W1 × . . .×WN = H = (H1, . . . , HN ) : Hi ∈ Wi (105)
with ‖H‖ =√∑N
i=1 ‖Hi‖2.Let each Wi be associated with a truncation oper.
Πi : Wi → Wi satisfying (for some fixed Zi ∈Wi)
‖Hi −Π(Hi)‖ ≤ ci ‖Hi − Zi‖ ∀ Hi ∈ Wi and 1 ≤ i ≤ N. (106)
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 204
Lem. 8.4. Let W be the normed space from (105) and let
the truncation operators Πi satisfy (106), where the elements
Zi ∈Wi are defined by
L(B) = (Z1, . . . , ZN ).
Suppose that the product of the truncation operators Πi
defines Π : W → W via
Π(H) := (Π1(H1), . . . , ΠN (HN )) for H = (H1, . . . , HN ), Hi ∈ Wi.
Then R from (103) satisfies (98).
Proof: Let L(X) = H = (H1, . . . , HN ). Then, according to the
definitions of L and Π,
‖H −Π(H)‖ ≤√∑N
i=1c2i ‖Hi − Zi‖2 ≤ max
1≤i≤Nci
√∑N
i=1‖Hi − Zi‖2,
which proves (104) and allows us to use Lem. 8.3.
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 205
An important example of Π in the case of a matrix space W
is given by optimal low-rank approximations.
Lem. 8.5. Let W be a normed space of all matrices of a
fixed size and let S ⊂ W consist of all matrices whose rank
does not exceed r. Then for any H ∈W there exists a matrix
T ∈ S such that
‖H − T‖ = minrank Z≤r
‖H − Z‖ .
Proof: Consider a minimising sequence Zk ∈ S, i.e.,
limk→∞
‖H − Zk‖ = ρ := infrank Z≤r
‖H − Z‖ . Since the sequence Zk is
bounded, a convergent subsequence Zki → T exists. Its limit
satisfies ‖H − T‖ = ρ. The assertion T ∈ S is due to the fact that a
matrix of rank equal to p > r possesses a vicinity wherein any matrix is of
rank ≥ p (use the continuity of the determinant and the existence of a
nonzero minor of order p for a matrix of rank p).
Analysis of truncation operators B. Khoromskij, Leipzig 2005(L8) 206
Matrix theory provides well-developed tools for the construction of
low-rank approximations in the case of any unitarily invariant norm.
For most familiar unitarily invariant norms such as thespectral and the Frobenius norm, it can be establishedthrough simple arguments: It is well-known that
minrank Z≤r
‖H − Z‖2 = σr+1(H), minrank Z≤r
‖H − Z‖F =
s Xi≥r+1
σ2i (H).
Thus, the truncation property (98) is easy to achieve (with
cR = 1) when we are aware of the existence of the best
approximation element.
Sometimes (e.g., for three-way approximations of bounded tensor rank)
this is not the case. However, all cases are supported by extension of
Thm. 8.1 as we can always capitalise on a quasi-optimal construction:
Let ρ(H) = infT∈S
‖H − T‖. For a given fixed ε > 0, we can adapt an
ε-optimal approximation Π(H) to H in the sense that
ρ(H) ≤ ‖H − Π(H)‖ ≤ ρ(H) + ε.
Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 207
Let V = Rn×n be the space of n× n matrices, and consider
each matrix as a union of N disjoint blocks of possibly
different sizes, where each matrix block belongs the matrix
space Wi (1 ≤ i ≤ N). Given X ∈ V , let Li(X) ∈Wi be the ith
block of X and define the space W according to (105).
Figure 16: Standard- (left) and Weak-admissible H-partitionings.
Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 208
The above-considered operator L : V → W reads
L(X) := (L1(X), . . . , LN (X)) (block-tracing operator).
If the Frobenius norm is used on the spaces V and
W1, . . . , WN , the norm induced on W is again the Frobenius
norm. Since the blocks are disjoint, L is isometrical. Hence,
the inverse L−1 exists and satisfies
‖L‖ = ‖L−1‖ = 1.
Fix a positive integer r and let Si ⊂ Wi be the subset of
matrices of rank ≤ r. Define S as the Cartesian product
S = S1 × . . .× SN ⊂ W.
Application to hierarchical block matrices B. Khoromskij, Leipzig 2005(L8) 209
Let H = Q1Σ(H)Q2 be the SVD of H (with unitary Q1 and
Q2) and let Σr(H) be the corresponding r-term truncation.
Besides, let Πi : Wi → Si be of the form
Π(H) := Q1Σr(H)Q2, (107)
providing the best possible approximant to H in the set S of
matrices of rank ≤ r, in the Frobenius norm. This involves
the SVD of the matrix block Wi. Defining Π : W → S as in
Lem. 8.4 and using Lem. 8.5, we can apply Thm. 8.1 to
R = L−1 Π L.
Note that exactly this kind of truncation is used in the theory
of H-matrices. The typical block partitioning in the
construction of hierarchical matrices is presented in Fig. 16.
Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 210
Let V1 = Rp×q and V2 = Rr×s, while V = Rpr×qs for some
p, q, r, s ∈ N. The Kronecker product is a mapping from V1 ⊗ V2
into V : for A ∈ V1 and B ∈ V2, the Kronecker product A⊗B is
given by the block matrix
⎡⎢⎢⎢⎣a11B a21B . . .
a12B a22B . . ....
.... . .
⎤⎥⎥⎥⎦ ∈ V .
We say that a matrix M ∈ V has a Kronecker rank ≤ k, if
there is a representation
M =∑
ν=1
Aν ×Bν with Aν ∈ V1, Bν ∈ V2 and ≤ k. (108)
We define the subset of structured matrices S by the set of
all matrices of Kronecker rank ≤ k. If k is not too large, this
is an interesting representation since matrices of the large
size pr × qs can be described by matrices Aν , Bν of small size.
Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 211
As described in Lect. 6 (cf. operation vec(A)), there is a
simple isomorphism L from V = Rpr×qs to Rpq×rs such that the
representation (108) of M ∈ S ⊂ V = Rpr×qs is equivalent to
rank(L(M)) ≤ k. Hence, we obtain the situation of Lem. 8.5
with W := L(V ) = Rpq×rs.
The truncation operator is again of the form R = L−1 Π L,
where Π : W → W is the optimal SVD-based truncation or an
appropriate substitute.
Our framework can be applied also to tensor (multi-linear)
representation (108) where the number of factors is greater
than 2. In this case the truncation procedures are not so well
developed; however, some are available and claimed to be
efficient in particular applications (mostly for data analysis in
chemometrics, physicometrics, etc.)
Application to tensor approximations B. Khoromskij, Leipzig 2005(L8) 212
Summerize (analysis of the truncated iterations):
Initially, the main purpose of this truncation was the
reduction of storage and of the matrix-by-vector complexity
for a given matrix in V .
In the sequel, the same truncation is used for computing
various matrix functions f(A) of A ∈ S ⊂ V, where B := f(A) is
known to be close to S (e.g., for f(A) = A−1, f(A) =√
A and
for f(A) = sign(A)).
The above results suggest some general framework for a
rigorous analysis of the basic truncated iterative algorithms.
Finally we remark that the optimal truncation is often
replaced by an approximate one which is cheaper to compute
(e.g., by cross approximation techniques (ACA), multi-way
decomposition algorithms, wavelet truncation).
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 213
The elliptic operator A : V → V ′ with V = H10 (Ω), V ′ = H−1(Ω),
A =d∑
j=1
− ∂
∂xjaj(xj)
∂
∂xj+ bj(xj)
∂
∂xj+ cj(xj)
,
is supposed to have “separable” coefficients. The associated
bilinear form (with c(x) =∑
cj(xj))
a(u, v) =∫
Ω
d∑j=1
aj(x)∂u
∂xj
∂v
∂xj+
d∑j=1
bj(x)∂u
∂xjv + c(x)uv
dx
with a : V × V → R is assumed to be continuous and V -elliptic:
|a(u, v)| ≤ C‖u‖V ‖v‖V , e a(v, v) ≥ δ0‖v‖2V , δ0 > 0.
In tensor-product setting we have (x1, ..., xd) ∈ Ω := (0, 1)d ∈ Rd.
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 214
Let X = L2(Ω), then the corresponding elliptic operator A and
its discrete counterpart A (say, A is the FEM/FD stiffness
matrix corresponding to A) satisfy
‖(zI −A)−1‖X←X ≤ 1|z| sin(θ1 − θ)
∀ z ∈ C : θ1 ≤ | arg z| ≤ π,
(109)
for any θ1 ∈ (θ, π), where cos θ = δ0/C.
In the case of discrete elliptic operators A, the bound (109)
on the matrix resolvent is valid uniformly in the mesh-size h
(cf. example below).
The H-matrix and KHT formats are well suited to represent
the following MVFs:
exp(−tA), A−1,√
A, sign(A).
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 215
Ex. 8.1. Consider the elliptic operator of divergent type,
A := −d∑
j=1
∂jaj(xj)∂j , x ∈ Ω := (0, 1)d,
defined on V . We assume that aj ≥ a0 > 0 and introduce a
uniform grid with step size h and N = nd interior nodes. Using
the (2d + 1)-point stencil, we obtain the FD discretisation
Ahz := −d∑
j=1
2ajij
zi1...id− bj
ij−1zi1...(ij−1)...id− cj
ij+1zi1...(ij+1)...id
h2,
1 ≤ ij ≤ n, where z denotes the vector corresponding to
[zi1...id]nij=1 ∈ RN given in the tensor-product numbering.
As usual, we can regard d-dimensional n× . . .× n arrays
(tensors) also as one-dimensional ones (vectors) with nd
components.
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 216
The matrix A = Ah in (213) takes the form A =d∑
j=1
Aj with
A1 = V 1⊗I⊗. . .⊗I, A2 = I⊗V 2⊗. . .⊗I, . . . , Ad = I⊗. . .⊗I⊗V d,
V j =1h2
⎡⎢⎢⎢⎢⎢⎢⎢⎢⎣
2aj1 −cj
1
−bj2 2aj
2 −cj2
. . .. . .
. . .
−bjn−1 2aj
n−1 −cjn−1
−bjn 2aj
n
⎤⎥⎥⎥⎥⎥⎥⎥⎥⎦n×n
,
and I being the n× n identity. It is easy to see that Aj > 0 for all
j = 1, . . . , d. Moreover, Aj commute pairwise, i.e., AjAm = AmAj, hence
(cf. Thm. 5.3 in Lect. 5)
exp(A) =dY
j=1
exp(Aj) =dO
j=1
exp(V j). (110)
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 217
Ex. 8.2. In the situation of Example 8.1, we consider an
application to parabolic problems in Rd posed in the
semi-discrete form. Using the semigroup theory, the solution
of the first order evolution equation
du
dt+ Au = f, u(0) = u0 ∈ R
N ,
with a given initial vector u0 and with a given right-hand side
f ∈ L2(QT ), QT := (0, T )× RN , can be represented as
u(t) = exp(−tA)u0 +
t∫0
exp(−(t− s)A)f(s)ds, t ∈ (0, T ].
Assume that our input data can be represented in the
tensor-product form
Approximating matrix-valued function exp(A) and A−1 B. Khoromskij, Leipzig 2005(L8) 218
u0 ≈r∑
k=1
uk1(x1)⊗ . . .⊗ uk
d(xd),
f(s) ≈r∑
k=1
fk1 (s; x1)⊗ . . .⊗ fk
d (s; xd)
with uki , fk
i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we
obtain the tensor-product approximation u(t) ≈ u(t) by
u(t) =r∑
k=1
⎧⎨⎩d⊗
j=1
exp(−tV j)ukj (xj) +
d⊗j=1
t∫0
exp((s− t)V j)fkj (s; xj)ds
⎫⎬⎭ ,
which can be implemented with complexity O(rdn logp n).
Probl. 1. Represent A−1 in the HKT -format.
Probl. 2. Approximate sign(A) in the HKT -format.
Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 219
Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate
r-term approximation fr(ρ) by exponential sums
|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (111)
with fr(ρ) :=r∑
k=1
ake−bkρ. The question is how accurate does
the ansatz fr(A) represent the matrix-valued function f(A)?
We consider two cases
(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a
diagonal D = diagd1, ..., dn, where di ∈ [1, R].
(B) There is the Dunford-Cauchy integral representation for
the analytic function f :
f(A) =1
2πi
∫Γ
f(z)(zI −A)−1dz.
Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 220
Lem. 8.6. In Case (A) we have
‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.
In Case (B) let (112) hold with εR = g(z)εΓ, at least for ρ = z
such that z ∈ Γ. Then we have
‖f(A)− fr(A)‖ ≤ εΓ
2πmaxz∈Γ
|g(z)|∫
Γ
∥∥(zI −A)−1∥∥ d |z|.
In the case of discrete elliptic operator A, we have∫Γ
∥∥(zI −A)−1∥∥ d |z| ≤ C
∫Γ
d |z||z| ,
where the constant depends on the coefficients of the related
operator A and Γ contains σ(A).
Approximating matrix-valued functions by exponential sums B. Khoromskij, Leipzig 2005(L8) 221
Proof: In the first case we readily obtain
‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖
with fi = f(di)− fr(di), which proves the statement. If T is
the unitary transform then ‖T‖ = ‖T−1‖ = 1.
In the cesond case we obtain
‖f(A)− fr(A)‖ =12π
∥∥∥∥∥∫
Γ
[f(z)−r∑
k=1
ake−bkz](zI −A)−1dz
∥∥∥∥∥≤ εΓ
2π
∫Γ
|g(z)| ∥∥(zI −A)−1∥∥ d |z|,
which proves the main assertion. Finally, in the case of
discrete elliptic operators we apply the resolvent estimate,∥∥(zI −A)−1∥∥ ≤ C
|z| .
Literature to Lecture 8 B. Khoromskij, Leipzig 2005(L8) 222
1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional
Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. W. Hackbusch, B.N. Khoromskij and E. Tyrtyshnikov: Approximate Iterations for Structured Matrices.
Preprint MPI MIS, Leipzig 2005.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor8.ps
Lect. 9. Kronecker-prod. representation to A−1 and sign(A) B. Khoromskij, Leipzig 2005 223
Outlook
1. Solution operator exp(−tA) for the linear parabolic eq.:
– well parallelisable;
– avoids time stepping !
2. Repsenting f(A) via approximation to f(z), z ∈ C by
exponential sums∑
ake−bkz.
3. Robust and asymptotically optimal Sinc-quadrature to
represent 1/ρα, ρ ∈ [1, R], α > 0, (cf. Ex. 7.6).
4. HKT representation to f(A) = A−1 and numerics.
5. Robust and asymptotically almost optimal Sinc-quadrature
to represent sign(ρ), |ρ| > a > 0.
6. Generalised HKT representation to f(A) = sign(A).
exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 224
Ex. 9.1. In the situation of Example 8.1, we consider an
application to parabolic problems in Rd posed in the
semi-discrete form (A ∈ RN×N , f ∈ RN). The solution of the
first order evolution equation
du
dt+ Au = f, u(0) = u0 ∈ R
N ,
with a given initial vector u0 and with a given right-hand side
f ∈ L2(QT ), QT := (0, T )× RN , can be represented as
u(t) = exp(−tA)u0 +
t∫0
exp(−(t− s)A)f(s)ds, t ∈ (0, T ].
Assume that our input data can be represented in the
tensor-product form as follows
exp(−tA) as the solution operator for parabolic PDEs B. Khoromskij, Leipzig 2005(L9) 225
u0 ≈r∑
k=1
uk1(x1)⊗ . . .⊗ uk
d(xd),
f(s) ≈r∑
k=1
fk1 (s; x1)⊗ . . .⊗ fk
d (s; xd)
with uki , fk
i ∈ Rn, i = 1, ..., d, and with r = O(| log ε|q). Then we
obtain the tensor-product approximation u(t) ≈ u(t) by
u(t) :=r∑
k=1
⎧⎨⎩d⊗
j=1
exp(−tV j)ukj (xj) +
d⊗j=1
t∫0
exp((s− t)V j)fkj (s; xj)ds
⎫⎬⎭ ,
which can be implemented with complexity O(rdn logp n).
Probl. 1. Represent f(A) = A−1 in the HKT -format.
Probl. 2. Approximate f(A) = sign(A) in the HKT -format.
Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 226
Assume that for given f(ρ), ρ ∈ [1, R], there is an accurate
r-term approximation fr(ρ) by exponential sums
|f(ρ)− fr(ρ)| ≤ εR, ρ ∈ [1, R] (112)
with fr(ρ) :=r∑
k=1
ake−bkρ. The question is how accurate does
the ansatz fr(A) represent the matrix-valued function f(A)?
We consider two cases
(A) Real-diagonalisable matrix A, i.e., A = T−1DT with a
diagonal D = diagd1, ..., dn, where di ∈ [1, R].
(B) The analytic function f has the Dunford-Cauchy integral
representation:
f(A) =1
2πi
∫Γ
f(z)(zI −A)−1dz,
where Γ “envelopes” σ(A).
Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 227
Lem. 9.1. In Case (A) we have
‖f(A)− fr(A)‖ ≤ ‖T‖ ‖T−1‖ εR.
In Case (B), let (112) hold with εR = g(ρ)εΓ, at least for
ρ = z ∈ Γ. Then we have
‖f(A)− fr(A)‖ ≤ εΓ
2πmaxz∈Γ
|g(z)|∫
Γ
∥∥(zI −A)−1∥∥ d |z|.
In the case of discrete elliptic operator A, we have∫Γ
∥∥(zI −A)−1∥∥ d |z| ≤ C log
|λmax||λmin| , λmax, λmin ∈ σ(A),
where C depends on the ellipticity and continuity constants of
the related operator A.
Approximating MVFs by exponential sums B. Khoromskij, Leipzig 2005(L9) 228
Proof: In the first case we readily obtain
‖f(A)− fr(A)‖ = ‖T−1 diagf1, ..., fnT‖with fi = f(di)− fr(di), which proves the statement. If T is
the unitary transform then ‖T‖ = ‖T−1‖ = 1.
In Case (B), we derive
‖f(A)− fr(A)‖ =12π
∥∥∥∥∥∫
Γ
[f(z)−r∑
k=1
ake−bkz](zI −A)−1dz
∥∥∥∥∥≤ εΓ
2π
∫Γ
|g(z)| ∥∥(zI −A)−1∥∥ d |z|,
which proves the general assertion. Finally, in the case of
discrete elliptic operators we shoose Γ in such a way that∥∥(zI −A)−1∥∥ ≤ C
|z| , (cf. Lect. 8), to obtain∫Γ
∥∥(zI −A)−1∥∥ d |z| ≤ C
∫Γ
d |z||z| .
sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 229
The change of variables ξ = log(1 + esinh(w)) in the Laplace
integral transform
1ρ
=∫ ∞
0
e−ρξdξ (ρ > 0) , (113)
leads to
1ρ
=∫
R
cosh(w)F (sinh(w); ρ)dw, with F (u; ρ) :=e−ρ log(1+eu)
1 + e−u.
Lem. 9.2. Let ρ ∈ [1, R] and define the quadrature
IM := hM∑
k=−M
cosh(kh)F (sinh(kh); ρ) ≈∫
R
f2(w; ρ)dw =1ρ.
Then choosing h = log(4πM)/M , implies
‖1/ρ− IM‖L∞[1,R] ≤ Ce− π2M√
2 log(3R) log(4πM) . (114)
sinc-quadrature for the Laplace integral transform B. Khoromskij, Leipzig 2005(L9) 230
Proof. Choose δ(ρ) = π2√
2 log 3ρ(does not effect quadrature!).
Then for ρ ∈ (1,∞), f2(w; ρ) = cosh(w)F (sinh(w); ρ), w ∈ R, can
be analytically extended to Dδ := z ∈ C : |m z| ≤ δ with
δ < π/2, s.t. ∫∂Dδ
|f2(z; ρ)| |dz| ≤ const <∞ (115)
independent of ρ. Hence f2 ∈ H1(Dδ), while δ ∈ (0, δ(ρ)], ρ ≥ 1ensures the finite norm N(f2, Dδ) ≤ const <∞, uniform in ρ.
The decay of f2 on the real axis is
f2(w) ≈ 12ew− ρ
2 ew
as w →∞, f2(w) ≈ 12e|w|− 1
2 e|w|as w → −∞,
corresponding to C = 12 , b = 1/2, a = 1 in Thm. 2.6.
If ρ ∈ [1, R], the choice δ = δ(R) in Thm. 2.6 implies (114)
‖1/ρ− IM‖L∞[1,R] ≤ Ce−2πδ(R)M
log(4πM) .
A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 231
Rem. 9.1. Remind that the matrix exponential of a discrete
elliptic operator can be represented in the H-matrix format
with linear-logarithmic cost in view of
exp (−tA) =1
2πi
∫Γ
e−tz(zI −A)−1dz ≈∑
k
ake−tzk(zkI −A)−1.
Lem. 9.3. Suppose A = TDT−1 with e σ(D) ⊂ R>0 and let
A =∑d
j=1 Aj as above. Given M ∈ N, then there is the
HKT -approximand A−1M of the Kronecker rank r = 2M + 1,
that provides exponential convergence
‖A−1 −A−1M ‖ ≤ Ce−sM/ log(4πM), s =
π2
√2 log[3 cond(D)]
.
Proof. First, construct the sinc-quadrature fr(ρ) ≈ f(ρ) = 1/ρ(cf. Lem. 9.2) and then apply the corresponding matrix
approximant fr(A) (cf. Lem. 9.1):
A HKT-representation to A−1 B. Khoromskij, Leipzig 2005(L9) 232
Choose h = C log M/M , zk = sinh(kh) and define
A−1 ≈ h
M∑k=−M
cosh(kh)F (zk; A) = h
M∑k=−M
cosh(kh)1 + e−zk
d⊗j=1
e− log(1+ezk )V j
.
Second, apply the H-matrix approx. to each individual
exponent exp(−αkV j) to obtain
A−1 ≈ hM∑
k=−M
cosh(kh)1 + e−zk
d⊗j=1
M1∑m=−M1
κm,j(zk)(ζm,jI − V j)−1=: A−1M .
(116)
Note that each sum in the tensor-product can be converted
into an H-matrix of the rank r1 ≤ (2M1 + 1)rank(ζm,jI − V j)−1
with M1 = O(| log ε|). However, since ζI − V j is a
three-diagonal matrix, the whole sum can be implemented
exactly with O(2M1n) operations.
Numerics I: 1x1+...+xd
- function generated tensor B. Khoromskij, Leipzig 2005(L9) 233
Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],
xi > 0,
1
ρ=
ZR
cosh(w)F (sinh(w))dw ≈ hMX
k=−M
cosh(kh)F (sinh(kh)),
F (u) = e−ρ log(1+eu)
1+e−u , M = O(log ε−1 log R), h = log MM
, r = 2M + 1.
0 200 400 600 800 1000−8
−6
−4
−2
0
2
4
6x 10
−6
0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1x 10
−8
0 200 400 600 800 1000−1
−0.5
0
0.5
1
1.5
2
2.5
3x 10
−13
Figure 17: The absolute quadrature error for R = 103 with M = 16
(left), M = 32 (middle), M = 64 (right). Similar results are observed for
R = 32 · 103.
Numerics II: Elliptic inverse A−1 B. Khoromskij, Leipzig 2005(L9) 234
HKT - approximation to (−∆h)−1 in Rd
Apply Sinc-quadrature in Lem. 9.3.
Kronecker approximation to (−∆h)−1 in [0, 1]d with N = nd, n = 128
M 4 9 16 25 36 49 64
d = 3 2.410-2 3.810-2 5.610-2 9.910-5 2.610-6 8.210-10 7.010-12
d = 6 1.910-2 1.510-3 3.710-4 7.710-7 4.510-9 8.210-12 1.110-14
d = 9 3.010-3 3.010-3 1.010-5 1.610-7 1.010-9 1.410-12 1.710-15
d = 12 3.010-7 3.910-5 1.010-8 7.810-9 1.810-10 5.010-13 5.610-16
Approximation to (−∆h)−1 in [0, 1]d with d = 3, M = 25.
n 4 8 16 32 64 128
ε 2.5 10-8 7.710-8 4.2 10-8 5.7 10-7 8.5 10-6 3.5 10-6
Observations.
1. Method applies on non-uniform grids and for variable
coefficients (generalisation of FFT).
2. We ensure the complexity O(dn logq n) with fixed q ≥ 1.3. Implementation of the matrix-vector multiplication
depends on the sparsity of an argument.
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 235
Each term in the Kronecker-product representation
A(r) =r∑
k=1
ckV 1k × · · · × V d
k (117)
can be amplified by an extra factor Sk ∈ RN×N . Hence, we
introduce the generalised tensor-product format (GHKT)
A(r) =r∑
k=1
Sk ·(V 1
k × · · · × V dk
) ≈ A (118)
with a matrix Sk ∈ HKT (rS) with O(drSn logq n)-complexity,
where asymptotically rS " n. We denote A(r) ∈ GHKT (r, rS).
The format (118) will be applied to the MVF F (A) = sign(A).
In the following, we suppose that A = T D T−1, di ∈ R.
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 236
Lem. 9.4. Let A ∈ RN×N be such that 0 /∈ e σ(A), and let
the function f : R → R satisfy the following assumptions:
(A1) f(t) = −f(−t), t ∈ R,
(A2) cf :=∫∞0
f(t)t dt ∈ (0,∞) exists as an improper integral.
Then we have
sign(A) =1cf
∫R+
f(tA)t
dt ≡ I(A). (119)
Proof. First we note that for a ∈ R \ 0, the assumptions
(A1)-(A2) imply (119) with A substituted by a,
sign(a) =1cf
∫R+
f(ta)t
dt. (120)
Since A = T D T−1, we obtain
f(tA) = T f(tD) T−1. (121)
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 237
Moreover, sign(A) = T sign(D) T−1 holds and (120) implies the
desired relation:
1cf
∫R+
f(tA)t
dt = T
(1cf
∫R+
f(tD)t
dt
)T−1 = T sign(D) T−1 = sign(A).
Choice of f(t). We consider the following examples of f
fn(t) :=jn(t)tn−1
, n = 1, 2, . . . ,
where jn(t) are the spherical Bessel functions of the first kind.
In particular, we have j0(t) = sin(t)t and
j1(t) =sin(t)− t cos(t)
t2, j2(t) =
(3t3− 1
t
)sin(t)− 3
t2cos(t).
The functions jn(z) have the asymptotical property
z−njn(z) → 11 · 3 · 5 . . . (2n− 1)
as z → 0 (n = 0, 1, 2, . . .).
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 238
We also make use of the integral representation
jn(z) =zn
2n+1n!
∫ π
0
cos(z cos θ) sin2n+1 θ dθ (n = 0, 1, 2, . . .).
(122)
Since the matrix A is diagonalisable, the error analysis of the
quadrature rule is reduced to the scalar case (cf. Lem. 9.1).
An exponentially convergent quadrature for (120) with
f = f1(a) with a ∈ R. In general, one can expect a ∈ [1, Λ] with
1 " Λ, so we deal with the integration of a highly oscillatory
function
f1(at)/t =sin(at)− t cos(at)
t3
with a smooth weight. Hence, we have
f1(at)t
=j1(at)
t1≤ C
at2, t →∞. (123)
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 239
The latter implies∣∣∣∣∫ ∞
R
f1(at)t
dt
∣∣∣∣ ≤ C
aR, R > 0.
Given a tolerance ε > 0, we choose R > 0 such that R−1 = aε,
i.e., R = (aε)−1 ≤ ε−1, and then construct a quadrature on the
finite interval [0, R] (recall that a−1 ∈ [Λ−1, 1]). We can assume
without loss of generality that ε = 2−K1 , Λ = 2K0 with some
K0, K1 ∈ N, so that a−1 ∈ [2−K0 , 1].
We split [0, R] into the two parts [0, 2−K0 ] and ω := [2−K0 , R],where we set R = 2K1 .
We now decompose the integration interval ω =K1⋃
k=−K0
[bk, bk+1]
by the points bk = 2k, k = −K0, . . . , 0, . . . , K1.
Note that coefficients q1 = z−3 and q2 = z−2 can be
approximated on each interval δk = [bk, bk+1] by a polynomial
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 240
Pp,k of degree ≤ p such that, say,
maxt∈δk
|q1(t)−Pp,k(t)| ≤ Ce−cp (k = −K0, ..., K1). (124)
Next we use the integrals∫ x
0
tm sin(at)dt = −m∑
k=0
k!(mk
)xm−k
ak+1cos
(ax +
12kπ
),
∫ x
0
tm cos(at)dt =m∑
k=0
k!(mk
) xm−k
ak+1sin
(ax +
12kπ
)to obtain the following approximation on the interval ω:
1cf
∫ω
f1(at)t
dt #K1∑
k=−K0
p∑=0
[γk(1/a) sin(ask) + µk(1/a) cos(ack)] ,
providing an exponential convergence of the order O(e−cp).
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 241
Due to (122), the integrand f1(az)z is an entire function and,
in particular, holomorphic in the Bernstein ellipse Eρ with
ρ > 1/(2a), corresponding to the interval [0, a−1] (cf. Lect. 4).
Furthermore, maxz∈Eρ
∣∣∣ f1(az)z
∣∣∣ can be estimated by a constant not
depending on a. Therefore, the Gauss quadrature on [0, Λ−1]has exponential convergence. This yields the approximation
sign(λ) ∼ signM (λ) :=M∑
k=1
ak(1/λ) sin(skλ)+bk(1/λ) cos(ckλ) (125)
(with ak, bk polynomials of degr. ≤ p), such that for λ ∈ [1, Λ]
| sign(λ)− signM (λ)| ≤ C(K0 + K1) e−cp
with
K1 = | log ε|, K0 = log(cond(D)), M := (K0 + K1) p. (126)
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 242
Rem. 9.2. Matrices A−l, (l = 1, ..., p), can be repreresented by
(117) via the fixed set of tensor-skeletons
Φk := V 1k ⊗ ...⊗ V d
k , k = 1, ..., kA−1, (uniform tensor-basis). We
make use of Φk in (118).
Lem. 9.5. Let A be symmetric with minλ∈σ+(A) λ = O(1).Then, given ε > 0, the quadrature points and weights from
(125) and (126) fulfil∥∥∥∥∥sign(A)−M∑
k=1
[ak(A−1) sin(skA) + bk(A−1) cos(ckA)]
∥∥∥∥∥2
≤ C c(T )(K0 + K1)e−cp, (127)
where ak(A−1), bk(A−1) are polynomials of degree p as defined
in (124), M, K0, K1 are explained in (126) and
c(T ) = ‖T‖‖T−1‖.Proof. Since A = TDT−1, we use the representation (121),
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 243
where D has real entries. The estimate (123) implies that we
can restrict integration onto the interval [0, R] and derive∥∥∥∥∥ 1cf
∫ R
0
f1(tA)t
dt−M∑
k=1
[ak(1/A) sin(skA) + bk(1/A) cos(ckA)]
∥∥∥∥∥2
=
∥∥∥∥∥T(
1cf
∫ R
0
f1(tD)t
dt−M∑
k=1
[ak sin(skD) + bk cos(ckD)]
)T−1
∥∥∥∥∥2
≤ c(T ) maxλ∈σ+(A)
∣∣∣∣∣ 1cf
∫ R
0
f1(tλ)t
dt−M∑
k=1
[ak sin(skλ) + bk cos(ckλ)]
∣∣∣∣∣≤ C c(T ) [K0 + K1] e−cp.
Choosing M = p(K0 + K1) (cf. (126)) completes the proof.
Finally, we derive tensor-product representations of the
matrices sin(skA) and cos(ckA) involved in (127). For this
purpose, we apply Prop. 9.1 (cf. Lect.5).
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 244
Prop. 9.1. Let d ≥ 2. The trigonometric identity
sin
⎛⎝ d∑j=1
xj
⎞⎠ =d∑
j=1
sin(xj)∏
k∈1,...,d\j
sin(xk + αk − αj)sin(αk − αj)
(128)
holds for all real α1, . . . , αd s.t. sin(αk − αj) = 0 for all j = k.
The following statement extends the trigonometric identity
(128) to the case of matrix-valued functions sin(A) and cos(A).
Lem. 9.6. Let A =d∑
j=1
Aj ∈ RN×N with matrices Aj of the
form as in Lect. 8, where V j ∈ Rn×n (j = 1, . . . , d) and N = nd.
Suppose that α1, . . . , αd ⊂ R are chosen in such a way that
the representation (128) is valid. Then the following
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 245
tensor-product representation with exactly d terms
sin(A) =d∑
j=1
d⊗k=1
βkj sin(V j + δkjI), βkj =
⎧⎨⎩1
sin δkj, k = j,
1 k = j,
(129)
and with δkj = αk − αj, holds. A similar result holds for cos(A).
To guarantee the stability of representation (129) we have to
control the condition |αk − αj −mπ| > δ > 0 for m ∈ Z, k = j.
Lem. 9.5 and Lem. 9.6 lead to the GKHT-representation of
the matrix sign(A) with A−1 ∈ HKT (rA−1), sin(skA) ∈ HKT (d).
Setting rS = dM , r = rA−1, we get the complexity
O(d2MrA−1n logq n) provided that each V j (j = 1, . . . , d) can be
diagonalised with the cost O(n logq n), otherwise the cost is
O(n2 logq n).
HKT-approximation to sign(A): complexity bound B. Khoromskij, Leipzig 2005(L9) 246
If some of the assumptions above are not satisfied, one can
apply the integral representation to the matrix sign-function
of A ∈ RN×N ,
sign(A) :=1πi
∫Γ+
(zI −A)−1dz − I. (130)
The exponentially convergent quadrature
sign(A) ≈r∑
k=1
ck(zkI −A)−1 − I, r = O (log2 ε + log2 cond(A)
),
for the integral (274) provides the direct approximation of
F (A) = sign(A) by a sum of matrix resolvents. The quadrature
points and weights can be chosen symmetrically w.r.t. the
real axis. Using the standard results for the elliptic inverse,
we are led to the overall cost O(rd2n2 logq n), which is
quadratic in both d and n.
Literature to Lect. 9 B. Khoromskij, Leipzig 2005(L9) 247
1. W. Hackbusch and B.N. Khoromskij: Low-Rank Kronecker-Product Approximation to Multi-Dimensional
Nonlocal Operators . Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
URL: http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor9.ps
Lect. 10. HKT repr. to the Hartree-Fock and Boltzmann eq. B. Khoromskij, Leipzig 2005 248
Outlook
1. Density Function Theory (DFT) via the Hartree-Fock eq.
(A) Reduction to the density matrix eq. via sign-matrices
(B) Representation of the Fock matr. in tensor-product form.
(C) Truncated nonlinear iteration to compute sign(F− µI).The proper formats:
– diagonally dominant, tensor-product data-sparse.
2. Boltzmann eq.
(A) Boltzmann collision integral in the HKT-representation
(B) Hadamard tensor-product operations
3. Ornstein-Zernike (OZ) integral eq. (brief survey).
4. Other directions.
Schrodinger and Hartry-Fock eq. B. Khoromskij, Leipzig 2005(L10) 249
The multi-dimensional Schrodinger eq. leads to the
challenging numerical problem.
The Schrodinger eq. for many-particle system reads as
HΨ = ΛΨ
with the Hamiltonian H = H[r1, ..., rNe ],
H := −12
Ne∑i=1
∆i−K∑
a=1
Ne∑i=1
Za
|ri −Ra|+∑
i<j≤Ne
1|ri − rj |+
∑a<b≤K
ZaZb
|Ra −Rb| ,
Za, Ra are charges and positions of the nuclei, ri ∈ R3. Hence
the problem is posed in Rd with high dimension d = 3Ne.
Desired size of the system is Ne = 10q, q = 1, 2, 3, 4, ...?
Focusing on density matrix computation.
Structured tensor representation to density matrix D in DFT
to approximate the ground state in the Schrodinger eq.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 250
In DFT the many-particle problem is mapped onto a system
of noninteracting particles, resulting in a significant
simplification of a computation process. The so-called density
matrices play the key role in order to achieve linear
(sub-linear) scaling in Hartree-Fock-DFT methods.
The Hartree-Fock equation (in R3 !) reads as
Fφi = εiφi, i = 1, ..., Ne/2
with the Hartree-Fock operator
Fφ(x) := −1
2∆φ(x) + Vc(x) φ(x) + 2
Zd3y
ρ(y, y)
|x − y| φ(x) −Z
d3yρ(x, y)
|x − y| φ(y),
x, y ∈ R3. Here the density function ρ(x, y) is defined by
ρ(x, y) :=∑k≤p
φk(x)φk(y), p = Ne/2.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 251
Nonlocal operators related to the Hartree-Fock eq.
1. Integral operators with the Newton potential
(Nu)(y) =∫
Ω
1|x− y|u(x)dx, y ∈ Ω ∈ R
3.
2. IOs with product kernels in R3: J - Hartree potential,
K - exchange potential.
3. sign(·) - to represent the spectral projection D (density
matrix) formed from the “occupied orbitals”
D =12[I− sign(F[D]− µI)], D ∈ R
M×M , M = O(3Ne).
4. 1x+y+z - generated energy tensor in Rn2×n2×n2
, x, y, z ∈ R2.
Tensor decomposition of “orbital energy denominators”.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 252
Suppose that ϕi(x), i = 1, ..., M , is the set of tensor-product
orthogonal basis funct. defined in a bounded hypercube in R3.
Let D = dkl ∈ RM×M be the corresponding matrix
representation to the DF, such that
ρ(x, y) ≈M∑
k,l=1
dklϕk(x)ϕl(y) =: ρ(x, y)
We define “Galerkin type” approximation to the Fock oper.
F = K0 + 2J−K,
where K0 is the Galerkin representation to the “local”
component of the Fock operator and J = Jij, K = Kij with
Kij =∫ ∫
ρ(x, y)|x− y|ϕ
i(x)ϕj(y)dxdy, Jij =∫ ∫
ρ(y, y)|x− y|ϕ
i(x)ϕj(x)dxdy,
are the discrete exchange and Hartree potentials, respectively.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 253
Given D = dk, the matrices K = K[D], J = J[D] can be
calculated as the tensor-matrix products
K = T · D, J = TT (i,) · D, (131)
where a tensor T = T kij is given by
T kij =
∫ ∫ϕk(x)ϕi(x)ϕ(y)ϕj(y)
|x− y| dxdy,
and Kij =∑k,
dkTkij , Jij =
∑k,
dkTkij .
Now we obtain F[D] = K0 −K[D] + 2J[D].
Rem. 10.1. Let KN be the Nystrom discr. of N . Then
(131) simplifies by making use of the Hadamard prod.,
K = KN ! D, J = diag∧[KN · diag∨(D)
],
where diag∧ and diag∨ are the operators converting a vector
into diagonal matrix and vice versa.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 254
Given F, the spectral projection D[k, l] formed from the
occupied orbitals can be computed via the solution of the
eigenvalue problem
FΨj = λjΨj , j = 1, ..., p; λ1 ≤ ... ≤ λp ≤ ...,
by
D[k, l] =∑j≤p
Ψj [k]T Ψj [l].
The complexity scales qubically in M .
Rem. 10.2. The idempotency (proj.) prop. holds: D2 = D.
To avoid the solution of an eigenvalue problem, it is possible
to represent D directly using the matrix sign function.
Lem. 10.1. Let us choose µ ∈ (λp, λp+1), then
D =12[I− sign(F− µI)].
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 255
Proof. Since Ψj is orthogonal, F is unitary diagonalisable,
sign(F− µI) =M∑
j=1
ΨTj sign(λj − µ)Ψj ,
hence
D =∑
λj<µ
ΨTj Ψj =
12[I−
M∑j=1
ΨTj sign(λj − µ)Ψj ].
We can implement the corresponding matrix operations in the
tensor-product arithmetics.
Assume that the density matrix D is already represented in
the Kronecker product form (with M = n3)
D =rD∑s=1
D1s ⊗D2
s ⊗D3s , Dm
s ∈ Rn×n,
Dms is associated with the couple (xm, ym), m = 1, 2, 3.
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 256
Let ϕi(x) = ϕi1(x1)ϕi2(x2)ϕi3(x3), i = (i1, i2, i3), im = 1, ..., n, and
suppose that the Newton potential can be represented by
1|x− y| ≈
rN∑s=1
N1s (x1, y1)N2
s (x2, y2)N3s (x3, y3).
Due to “separability” results for the Newton potential and
implying the tensor-product structure of a basis, we derive
T =rN∑s=1
T 1s ⊗ T 2
s ⊗ T 3s , T m
s ∈ Rn2×n2
,
where (for m = 1, 2, 3)
[T ms ]kmlm
imjm=∫
Nms (xm, ym)ϕkm(xm)ϕim(xm)ϕlm(ym)ϕjm(ym)dxmdym.
Both T ms and D require O(n4) and O(n2) memory units,
respectively, while the “MVM” T · D now costs O(n4)(compare with O(n12)).
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 257
Hierarchical/Wavelet formats for low-dim. components
There are two principal cases:
(A) The FEM-Galerkin approximation.
(B) The wavelet basis ϕi.Note that the kernel-functions Nm
s (xm, ym) are proved to be
asymptotically smooth. Hence, in case (A), the H-matrix
reperesentation to the matrices T ms does a job.
In turn, in case (B), the wavelet representation to the kernels
Nms (xm, ym) can be applied.
Thus, the storage and MVM-complexity related to T ms , is
reduced from O(n4) to linear cost O(n3).
Nonlocal operators in many-particle models B. Khoromskij, Leipzig 2005(L10) 258
For the Nystrom representation (cf. Rem. 10.1), we enjoy
the sublinear cost O(r2Dn2) for basic matrix-tensor operations
due to (assume that rD = rK)
KN ! D =rD∑
s,t=1
(K1t !D1
s)⊗ (K2t !D2
s)⊗ (K3t !D3
s),
where each Hadamard product is implemented in O(n2) oper.
Concerning the matrix J, we arrive at the optimal complexity
O(r2Dn2) again, due to
diag∨(D) =rD∑s=1
diag∨(D1s)⊗ diag∨(D2
s)⊗ diag∨(D3s).
Now apply the H-matrix format (rank rH) to represent Dms
and Kmt . The Hadamard product of two H-matrices,
Kmt !Dm
s , requires only O(r2Hn log n) op., hence we arrive at
O(n logq n) complexity HKT-arithmetics.
WKT is also applicable.
The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 259
The particle density f(t, x, v), x ∈ Ω ∈ R3, of dilute gas satisfies
the Boltzmann eq.
ft + (v, gradxf) = Q(f, f),
which describes the time evolution of f : R+ × Ω× R3 → R+.
With fixed t, x, the Boltzmann collision integral can be split as
Q(f, f) = Q+(f, f)(v) +Q−(f, f)(v),
where the loss part Q− has a simple form
Q−(f, f)(v) = f(v)∫
R3Btot(‖u‖)f(w)dw
with u = v − w being the relative velocity.
Integral Q− can be approximated by block-Toeplitz matrix in
the linear-logarithmic cost in N = n3.
The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 260
The gain part can be represented by a double integral
Q+(f, f)(v) =∫
R3
∫S2
B(‖u‖, µ)f(v′)f(w′)dedw, (132)
v′ = 12 (v + w + ‖u‖e) ∈ R3, w′ = 1
2 (v + w − ‖u‖e) ∈ R3; e ∈ S2 ⊂ R3
is the unit vector.
In the case of inverse power cut-off potential, we have
B(‖u‖, µ) = ‖u‖1−4/νgν(µ), ν > 1, µ = cos(θ) =〈u, e〉‖u‖
with gν being a given function of the scattering angle only,
s.t. gν ∈ L1([−1, 1]).〈·, ·〉 denotes the L2- scalar product in Rp, ‖ · || ≡ || · ||2 :=
√〈·, ·〉(with p = 3).
The deterministic Boltzmann eq. in R3 B. Khoromskij, Leipzig 2005(L10) 261
Key point: the efficient calculation of the gain part.
Let F be the p-dimensional Fourier transform, then
Q+(f, f)(v) = Fy→v
[∫R3
g(u, y)F−1z→y[f(z − u)f(z + u)](u, y)du
](v)
with
g(u, y) = g(‖u‖, ‖y‖, | 〈u, y〉 |),that depends only on the three scalar var., ‖u‖, ‖y‖, 〈u, y〉.Indeed, up to a scaling factor
g(u, y) =∫ π
0
gν(cos θ)e−i〈u,y〉 cos θJ0(√‖u‖2‖y‖2 − 〈u, y〉2 sin θ) sin θdθ,
J0(z) is the Bessel function J0(z) = 12π
∫ 2π
0eiz cos ψdψ.
Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 262
Ex. 1. The variable hard spheres (p = 3)
g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖
π), u, y ∈ R
p, λ ∈ (−3, 1], (133)
where the sinc-function (Cardinal function) is defined by
sinc(z) =sin(πz)
πz, z ∈ C.
This model corresponds to the case of second order tensors
(q = 2) with V k ∈ Rn×n×n (cf. Lect. 5,6).
Ex. 2. The general kernel function
g2,λ(u, y) :=‖u− y‖λ√‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ R
p. (134)
The presence of | 〈u, y〉 | in the arguments of g2,λ(u, y) makes
the approximation process much more involved.
Choice of the Kernel Function B. Khoromskij, Leipzig 2005(L10) 263
Main result: Reduce the complexity from O(n6 log n) to
O(n4 log n) in the case (133), and to O(n5 log n) in the case
(134).
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L10) 264
Given tensors U ⊗ Y ∈ RI×J with U ∈ RI, Y ∈ RJ , and
B ∈ RI×L. Let T : RL → RJ be the linear operator (tensor)
that maps tensors defined on the index set L into those
defined on J .
Def. 10.1. (cf. Def. 5.3) The Hadamard “scalar” product
[D, C]I ∈ RK of two tensors D := [Di,k] ∈ RI×K and
C := [Ci,k] ∈ RI×K with K ∈ I,J ,L is defined by
[D, C]I :=∑i∈I
[Di,K]! [Ci,K],
where ! denotes the Hadamard product on the index set Kand [Di,K] := [Di,k]k∈K.
Lem. 10.2. (cf. Lem. 5.2) Let U, Y, B and T be given as
above. Then, with K = J , the following identity is valid
[U ⊗ Y, T ·B]I = Y ! (T · [U, B]I) ∈ RJ . (135)
Kronecker Hadamard product B. Khoromskij, Leipzig 2005(L10) 265
Proof. By definition of the Hadamard scalar product we have
[U ⊗ Y, T ·B]I =∑i∈I
[U ⊗ Y ]i,J ! [T ·B]i,J
=∑i∈I
[[U ]i · Y ]i,J ! [T ·B]i,J
= Y !(∑
i∈I[U ]i[T ·B]i,J
)
= Y !(
T ·∑i∈I
[U ]i[B]i,L
),
then the assertion follows.
Identity (135) is of the great importance in the current
applications since in the right-hand side the operator T is
removed from the scalar product and, so, it applies only once.
Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 266
In numerical modelling of a mono-atomic isotropic liquid with
spherically symmetric Lennard-Jones interaction potential
U(r) = 4ε[(σ/r)12 − (σ/r)6] between the particles (σ and ε are
the resp. size and energy parameters), the Ornstein-Zernike
equation relates the total correlation function h(r) with the
direct correlation function c(r) (with density ρ) by
h(r) = c(r) + ρ
∫R3
c(|r− r′|)h(r′)dr′. (136)
The ”closure” relation is
h(r) = exp[−βU(r) + h(r)− c(r) + B(r)]− 1. (137)
Key point: FFT vs. structured matrices in wavelet basis.
Ornstein-Zernike eq. in R3 B. Khoromskij, Leipzig 2005(L10) 267
0 2 4 6 8 10 12 14−1.5
−1
−0.5
0
0.5
1
1.5
0 2 4 6 8 10 12 14−15
−10
−5
0
5
Figure 18: Radial parts of correlation funct. h(r) (top) and c(r) of simple
mono-atomic liquid with Lennard–Jones potential param. ρ = 0.7, ε = 0.7.
Numerics I: 1x1+...+xd
- function generated tensor B. Khoromskij, Leipzig 2005(L10) 268
Robust exponentially convergent sinc-quadrature, ρ = x1 + ... + xd ∈ [1, R],
xi > 0,
1
ρ=
ZR
cosh(w)F (sinh(w))dw ≈ hMX
k=−M
cosh(kh)F (sinh(kh)),
F (u) = e−ρ log(1+eu)
1+e−u , M = O(log ε−1 log R), r = 2M + 1, h = Cintlog M
M.
0 200 400 600 800 1000−8
−6
−4
−2
0
2
4
6x 10
−6
0 200 400 600 800 1000−2.5
−2
−1.5
−1
−0.5
0
0.5
1x 10
−8
0 200 400 600 800 1000−1
−0.5
0
0.5
1
1.5
2
2.5
3x 10
−13
Figure 19: The absolute quadrature error for R = 103 with M = 16
(left), M = 32 (middle), M = 64 (right). Similar results are observed for
R = 32 · 103.
Numerics I: 1x1+...+xd
- function generated tensor B. Khoromskij, Leipzig 2005(L10) 269
4 7 10 13 16 19 22 25 28 3110
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
F = exp(t −r exp(t)), r=1.0, Cint=1.0
4 7 10 13 16 19 22 25 28 3110
−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
F = exp(t −r exp(t)), r=1.0, Cint=1.1
4 7 10 13 16 19 22 25 28 3110
−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
erro
r
F = exp(t −r exp(t)), r=1.0, Cint=1.2
Figure 20: Quadrature for ρ = 1.0 with different Cint.
Application in QC. Arithmetics with function-generated energy matrix
Ejk =1
ej1 + ej2 + ej3 + ek1 + ek2 + ek3
(ej, ek
> 0), Ejk ∈ RJ×K
with j = (j1, j2, j3) ∈ J , k = (k1, k2, k3) ∈ K, j = 1, ..., NJ , k = 1, ..., NK, for
= 1, 2, 3. Construct a low Kronecker rank separable approximation to1
x1+...+xd,
Pdi=1 xi ∈ [1, R] via the sinc-quadrature/appr. by exp. sums.
For experimental data in quantum chemistry: NJ , NK ∈ [102, 103],
R ∈ [103, 104].
Numerics II: Newton potential (symmetric quadrature) B. Khoromskij, Leipzig 2005(L10) 270
Approximating the Gauss integral 1ρ
=R
RF (t; ρ)dt with ρ = |x− y|, x, y ∈ Rd,
hMX
k=−M
cosh(kh)F (sinh(kh); ρ) ≈Z
R
F (t; ρ)dt, F (t; ρ) =1√π
e−ρ2t2 . (138)
Rank r = M + 1 (symmetric) quadrature (138), ρ = 1.0
M 4 9 16 25 36
ε 1.110-4 1.510-6 2.310-9 2.010-12 < 1.010-15
The Gaussian int. with ρ = 0, 2, 1, 10; Cint = 1.0; applies for ρ ∈ [0.2, 10].
4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610
−12
10−10
10−8
10−6
10−4
10−2
100
102
M − number of quadrature points
erro
r
F = exp(−r2t2), r=0.2, Cint
=1.0
4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610
−16
10−14
10−12
10−10
10−8
10−6
10−4
10−2
M − number of quadrature points
erro
r
F = exp(−r2t2), r=1., Cint
=1.0
4 7 10 13 16 19 22 25 28 31 34 37 40 43 4610
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
M − number of quadrature points
erro
r
F = exp(−r2t2), r=10., Cint
=1.0
Numerics III: Newton potential (robust quadrature) B. Khoromskij, Leipzig 2005(L10) 271
Robust nonsymmetric quadrature with
1
ρ=
ZR
F (u; ρ)du; F (u; ρ) :=2√π
e−ρ2 log2(1+eu)
1 + e−u, ρ ∈ [1, R].
0 50 100 150 200−4
−3
−2
−1
0
1
2
3x 10
−8
0 200 400 600 800 1000−3
−2
−1
0
1
2
3
4x 10
−7
0 1000 2000 3000 4000 5000−5
0
5x 10
−7
Figure 21: The absolute quadrature error for M = 64 with R = 200 (left),
R = 103 (middle), R = 5 · 103 (right). Similar results are observed in the
case R > 5 · 103.
Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 272
−10
−5
0
5
10
−10
−5
0
5
100
0.2
0.4
0.6
0.8
1
u
Fig 1: 1D Kernel Function f=1/(|u|+|v|); u=x−z, v=y−z
v −5
0
5
−5
0
5−2
−1
0
1
2
3
4
5
u
Fig 1: 1D Kernel Function f=|x||bet*sinc(|uv|); u=x−z, v=y−z
v
Figure 22: Function g2,λ(u, y) for λ = 0 (left) and g1,λ(u, y) for λ = 1.
g1,λ(u, y) := ‖u‖λ sinc(‖u‖‖y‖
π), u, y ∈ R
p, λ ∈ (−3, 1],
g2,λ(u, y) :=‖u − y‖λp
‖u‖2 + ‖y‖2 + 2| 〈u, y〉 | , u, y ∈ Rp.
Numerics IV: Boltzmann equation B. Khoromskij, Leipzig 2005(L10) 273
4 8 12 16 20 24 28 32 36 40 44 4810
−12
10−10
10−8
10−6
10−4
10−2
100
M − number of quadrature points
err
or
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=16
4 8 12 16 20 24 28 32 36 40 44 4810
−8
10−7
10−6
10−5
10−4
10−3
10−2
10−1
M − number of quadrature points
err
or
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=25
4 8 12 16 20 24 28 32 36 40 44 4810
−5
10−4
10−3
10−2
10−1
M − number of quadrature points
err
or
|x|s sinc(y|x|), x ∈ [−1,1],s=1,y=36
Figure 23: L∞-error of the sinc-interp. to |x|λsinc(|x|y), x ∈ [−1, 1], λ = 1.
Best r-term approx. to 1/√
ρ byP
aie−biρ (W. Hackbusch ’05)
L∞- and weighted L2([1, R])-norm.
R 10 50 100 200 ‖ · ‖L∞ W (ρ) = 1/√
ρ
r = 4 3.710-4 9.610-4 1.510-3 2.210-3 1.910-3 4.810-3
r = 5 2.810-4 2.810-4 3.710-4 5.810-4 4.210-4 1.210-3
r = 6 8.010-5 9.810-5 1.110-4 1.610-4 9.510-5 3.310-4
r = 7 3.510-5 3.810-5 3.910-5 4.710-5 2.210-5 8.110-5
Approximating sign(A) B. Khoromskij, Leipzig 2005(L10) 274
General definition: Given A ∈ RM×M , M = nd,
sign(A) :=1πi
∫Γ+
(zI −A)−1dz − I ∈ RM×M
with Γ+ ∈ C being any simply closed curve that contains
σ+(A) = λ ∈ σ(A) : eλ > 0.Iterative evaluation:
The Newton-Schulz iteration: Xk → sign(A),
Xk = Xk−1 +12[I − (Xk−1)2
]Xk−1, k = 1, 2, ...
with X0 = A/||A||2 has locally quadratic convergence.
NSI - Convergence theory: Lem. 8.1 applies with α = 2.
Thm. 8.1. applies with α = 2, but under restrictive “nearly
commutativity” condition.
Numerics V: T-NSI to compute sign(∆h − µI) B. Khoromskij, Leipzig 2005(L10) 275
Figure 24: Exact/trunc. NSIs on 16×16- and 32×32-grids (r = 7, r = 10).
Comments on Numerics V. B. Khoromskij, Leipzig 2005(L10) 276
Numerics demonstrates robust and asymptotically optimal
convergence of T-NSI provided that the Kronecker rank is
chosen properly, otherwise, Xk → I (M. Espig, MPI MIS).
Bound on Kronecker-rank: r = O(d(| log ε|+ log cond(A))| log ε|).Complexity of T-NSI: O(dr4n2 + r6) + O(r2r4d3) + ...
T-NSI to compute sign(A − µI) with A = ∆h
Grid t(SVD) t(NSI) t(T-NSI) r(sign(A − µI))
4 × 4 0.02 0.0 0.02 4
8 × 8 0.03 0.03 0.15 6
16 × 16 0.74 0.85 0.64 7
32 × 32 108.5 56.5 17.4 10
64 × 64 6400. 4000. 210. 13
Here t(SV D), t(NSI), t(T −NSI) denote the CP-time (sec.)
required for SVD, exact NSI and truncated NSI, respectively.
Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 277
1. HKT -approximation (for d ≥ 3) is a subtle concept mostly
based on analytic tools with possible algebraic recompression.
It offers the low-Kronecker-rank data-sparse representation to
(a) Integral operators in Rd, e.g., with the Newton, Yukawa
and Helmholtz kernels
1|x− y| ;
e−µ|x−y|
|x− y| , µ ∈ R+;e−i κ2|x−y|
|x− y| , κ2 ∈ R,
(b) A−1, A being the discrete elliptic op. in [a, b]d,e.g., A = −∆− κ2,
(c) Certain class of the matrix-valued functions F(A), e.g.,
sign(A), exp(A),∫
R+
e−tAGe−tBdt.
Concluding Remarks B. Khoromskij, Leipzig 2005(L10) 278
2. We enjoy the sub-linear cost O(dpn logq N), p, = 1, 2 with
N = nd.
3. Applications: FEM/BEM in elliptic and parabolic problems
in Rd, many-particle modelling based on DFT for the
Hartree-Fock eq., Boltzmann eq., Ornstein-Zernike eq., linear
algebra, complexity theory, control theory.
4. By-product: O(N logq N) - O(N1/d logq N) complexity
(approximate) direct elliptic problem solver on non-uniform
tensor grids in Rd and for variable (“separable”) coefficients
(generalisation of FFT).
Sub-linear cost O(N1/d logq N) in the case of tensor rhs.
5. Other directions: chemometrics, statistics, signal
processing (in biology).
Literature to Lecture 10 B. Khoromskij, Leipzig 2005(L10) 279
1. W. Hackbusch and B.N. Khoromskij: Hierarchical Kronecker Tensor-Product Approximation to a Class
of Nonlocal Operators in High Dimensions. Parts I/II. Preprints 29/30, MPI MIS, Leipzig 2005.
2. B.N. Khoromskij: Structured data-sparse approximation to high order tensors arising from the deterministic
Boltzmann equation. Preprint 4, MPI MIS, Leipzig 2005.
3. M. Fedorov, H.-J. Flad, L. Grasedyck, and B.N. Khoromskij: Low-rank wavelet solver for the
Ornstein-Zernike integral equation. Preprint 59, MPI MIS, Leipzig 2005.
4. W. Hackbusch, B.N. Khoromskij, E. Tyrtyshnikov: Approximate iteration for structured matrices.
MPI MIS, Leipzig 2005.
5. H.-J. Flad, W. Hackbusch, B.N. Khoromskij and R. Schneider: Concept of data-sparse tensor-product
approximation in many-particle modelling. Leipzig 2005, in progress.
http://personal-homepages.mis.mpg.de/bokh
http://www.mis.mpg.de/scicomp/Fulltext/Khoromskij/khor10.ps