tensor approximation tools free of the curse of dimensionality
Transcript of tensor approximation tools free of the curse of dimensionality
TENSOR APPROXIMATION TOOLS
FREE OF THE CURSE OF DIMENSIONALITY
Eugene Tyrtyshnikov
Institute of Numerical Mathematics
Russian Academy of Sciences
(joint work with Ivan Oseledets)
WHAT ARE TENSORS?
Tensors = d-dimensional arrays:
A = [aij...k]
i ∈ I, j ∈ J, ... , k ∈ K
Tensor A has:
• dimensionality (order) d = number of indices(modes, axes, directions, ways)
• size n1 × ... × nd
(number of nodes along each axis)
WHAT IS A PROBLEM?
NUMBER OF TENSOR ELEMENTS = nd
GROWS EXPONENTIALLY IN d
WATER AND UNIVERSE
H2O molecule has 18 electrons. Each electron has 3 coordinates.
Thus we have 18 × 3 = 54 axes.
If we take 32 nodes on each axis, we obtain 3254 ≈ 1081 points,
which is close to the number of atoms in the universe.
CURSE OF DIMENSIONALITY
WE SURVIVE WITH
• COMPACT (LOW-PARAMETRIC)
REPRESENTATIONS FOR TENSORS
• METHODS FOR COMPUTATIONS
IN COMPACT REPRESENTATIONS
TUCKER DECOMPOSITION
a(i1, ..., id) =
r1∑
α1=1
...
rd∑
αd=1
g(α1, ..., αd) q1(i1, α1) ... qd(id, αd)
L. R. Tucker, Some mathematical notes on three-mode factor analysis,
Psychometrika, V. 31, P. 279–311 (1966).
COMPONENTS:
• 2D arrays q1, ..., qd with dnr entries
• d-dimensional array g(α1, ..., αd) with rd entries
CURSE OF DIMENSIONALITY REMAINS
CANONICAL DECOMPOSITION (PARAFAC, CANDECOMP)
a(i1, ..., id) =R∑
α=1
u1(i1, α) ... ud(id, α)
Number of defining parameters is dRn.
DRAWBACKS:
• INSTABILITY (cf. Lim, de Silva)
x1, ... , xd, y1, ... , yd linearly independent
a =d∑
t=1
zt1 ⊗ ... ⊗ zt
d, ztk =
{xk, k 6= tyk, k = t
a =1
ε(x1 + εy1) ⊗ ... ⊗ (xd + εyd) − 1
εx1 ⊗ ... ⊗ xd + O(ε)
• EVENTUAL LACK OF ROBUST ALGORITHMS
a(i1, ..., id) =
r1∑
α1=1
...
rd∑
αd=1
g(α1, ..., αd) q1(i1, α1) ... qd(id, αd)
TUCKER DECOMPOSITION
a(i1, ..., id) =∑
α1, ..., αd−1
g1(i1, α1) g2(α1, i2, α2) . . .
. . . gd−1(αd−2, id−1, αd−1) gd(αd−1, id)
TENSOR-TRAIN DECOMPOSITION
TENSORS AND MATRICES
Let A = [aijklm].
Take up a pair of mutually complementary long indices
(ij) and (klm)
(kl) and (ijm)
.........
Tensor A gives rise to unfolding matrices:
B1 = [b(ij),(klm)]
B2 = [b(kl),(ijm)]
.........
By definition,
b(ij),(klm) = b(kl),(ijm) = ... = aijklm
DIMENSIONALITY CAN BE DECREASED
a(i1, ..., id) = a(i1, ..., ik; ik+1, ..., is)
=r∑
s=1
u(i1, ..., ik; s) v(ik+1, ..., id; s)
Dimension d reduces to dimensions k + 1 and d − k + 1.
Proceed by recursion.
Binary tree arises.
TUCKER VIA RECURSION
���2�3�4�5
��α� �2�3�4�5α�
�2α2 �3�4�5α�α2
�3α3 �4�5α�α2α3
�4α4 �5α�α2α3α4
�5α5α�α2α3α4α5
a(i1, i2, i3, i4, i5) =∑
α1,α2,α3,α4,α5
g(α1, α2, α3, α4, α5)·
·q1(i1, α1) q2(i2, α2) q3(i3, α3) q4(i4, α4) q5(i5, α5)
BINARY TREE IMPLIES
• Any auxiliary index belongs to exactly twoleaf tensors.
• Tensor is the sum over all auxiliary indices of the productof elements of the leaf tensors.
HOW TO AVOID rd PARAMETERS
• Let any leaf tensor have at most onespatial index.
• Let any leaf tensor have at most two (three)auxiliary indices.
TREE WITHOUT TUCKER
���2�3�4�5
��α� �2�3�4�5α�
�2α2 �3�4�5α�α2
�3α�α3 �4�5α2α3
�4α2α4 �5α3α4
TENSOR-TRAIN DECOMPOSITION
a(i1, i2, i3, i4, i5) =∑
α1,α2,α3,α4
g1(i1, α1) g2(α1, i3, α3) g3(α3, i5, α4) g4(α4, i4, α2) g5(α2, i2)
HOW MANY PARAMETERS
NUMBER OF TT PARAMETERS = 2nr + (d − 2)nr2
EXTENDED TT DECOMPOSITION
���2�3�4�5
���2α� �3�4�5α�
��α2 �2α�α2 �3α3 �4�5α�α3
�4α�α4 �5α3α4�2α5
α�α2α5
�4α6α�α4α6 �5α7
α3α4α7
NUMBER OF EXTENDED TT PARAMETERS = dnr + (d − 2)r3
TREE IS NOT NEEDED!
ALL IS DEFINED BY A PERMUTATION OF SPATIAL INDICES
TENSOR-TRAIN DECOMPOSITION
a(i1, i2, i3, i4, i5) =∑
β1,β2,β3,β4
g1(iσ(1), β1) g2(β1, iσ(2), β2) g3(β2, iσ(3), β4) g4(β4, iσ(5), β5) g5(β5, iσ(5))
TT = Tree–Tucker ⇒ neither Tree, nor Tucker ⇒TENSOR TRAIN
MINIMAL TT DECOMPOSITION
Let 1 ≤ βk ≤ rk.
What are minimal values for compression ranks rk?
rk ≥ rankAσk
Aσk =
[aσ(iσ(1), ..., iσ(k); iσ(k+1), ..., iσ(d))
]
aσ(iσ(1), ..., iσ(k); iσ(k+1), ..., iσ(d)) = a(i1, ..., id)
GENERAL PROPERTIES
THEOREM 1.
Assume that a tensor a(i1, ..., id) possesses a canonical decomposi-tion with R terms. Then a(i1, ..., id) admits a TT decomposition ofrank R or less.
THEOREM 2.
Assume that a tensor a(i1, ..., id), when ε-perturbed, with any smallε possesses a canonical decomposition with R terms. Then a(i1, ..., id)admits a TT decomposition of rank R or less.
FROM CANONICAL TO TENSOR TRAIN
a(i1, ..., id) =R∑
s=1u(i1, s)... u(id, s) =
∑α1,....,αd−1
u(i1, α1) δ(α1, α2)u(i2, α2) ...
... δ(αd−2, αd−1)u(id−1, αd−1) u(id, αd−1)
FREE!
EFFECTIVE RANK OF A TENSOR
ERank(a) = lim supε→+0
min|b − a| ≤ ε
b ∈ C(n1, ..., nd)
RANK(b)
F(n1, ..., nd): all tensors of size n1 × ... × nd with entries from F.
Let a ∈ F(n1, ..., nd) ⊂ C(n1, ..., nd). Thencanonical rank over F depends on F, effective rank does not.
Close to border rank concept (Bini-Capovani).Which still depends on F.
THEOREM 2 (reformulated)
Let a ∈ F(n1, ..., nd). Then for this tensor there exists a TT decom-position of rank r ≤ ERank(a) with entries of all tensors belongingto F.
EXAMPLE 1
d-dimensional tensor in the matrix form
A = Λ ⊗ I ⊗ ... ⊗ I + I ⊗ Λ ⊗ ... ⊗ I + ... + I ⊗ ... ⊗ I ⊗ Λ
⇒
P (h) ≡ ⊗ds=1(I + hΛ) = I + hA + O(h2)
⇒
A =1
hP (h) − 1
hP (0) + O(h)
⇒
ERank(A) = 2
EXAMPLE 2
Real-valued tensor F by the function
f(x1, ..., xd) = sin(x1 + ... + xd)
on some 1D grids for x1, ..., xd.
Beylkin et al: canonical rank over R of F does not exceed d(it is likely to be exactly d).However,
sin x =exp(ix) − exp(−ix)
2i
⇒
ERank(F ) = 2
EXAMPLE 3
d-dimensional tensor A from discretization of operator
A =∑
1≤i≤j≤d
aij
∂
∂xi
∂
∂xj
on a tensor grid for variables x1, ..., xd.
Canonical rank ∼ d2/2.
However,
ERank(A) ≤ 3
2d + 1
(N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov)
TENSOR TRAIN DECOMPOSITION
a(i1, ...id) =∑
α0,...,αd
g1(α0, i1, α1) g2(α1, i2, α2)... gd(αd−1, id, αd)
MATRIX FORM
a(i1, ..., id) = Gi11 Gi2
2 ... Gidd
MINIMAL TT COMPRESSION RANKS:
rk = rankAk, Ak = [a(i1...ik)(ik+1...id)], 0 ≤ k ≤ d
size(Gikk ) = rk−1 × rk
THE KEY TO EVERYTHING
PROBLEM OF RECOMPRESSION:
Given a tensor train, but with large ranks.
Let us try to find in ε-vicinity a tensor train
with lesser compression ranks.
METHOD OF TT RECOMPRESSION (I. V. Oseledets):
• NUMBER OF OPERATIONS IS LINEAR
IN DIMENSIONALITY d AND MODE SIZE n
• THE RESULT HAS GUARANTEED
APPROXIMATION ACCURACY
METHOD OF TENSOR TRAIN RECOMPRESSION
Minimal TT compression ranks = ranks of unfolding matrices Ak
Matrices Ak are of size nk × nd−k, but never appearas full arrays of nd elements.
Nevertheless, the SVD for Ak are constructed with orthogonal (uni-tary) matrices in a compact factorized form.
When neglecting smallest singular values, we provideGUARANTEED ACCURACY.
To show the idea, consider a TT decomposition
a(i1, i2, i3) =∑
α1,α2
g1(i1, α1) g2(α1, i2, α2) g3(α2, i3)
TENSOR TRAIN RECOMPRESSION
RIGHT TO LEFT by QR
a(i1, i2, i3) =∑
α1,α2
g1(i1, α1) g2(α1, i2, α2) g3(α2; i3)
=∑
α1,α′2
g1(i1, α1) g2(α1, i2; α′2) q3(α
′2; i3)
=∑
α′1,α′
2
g1(i1; α′1) q2(α
′1, i2; α′
2) q3(α′2; i3)
Matrices q2(α′1; i2, α′
2), q3(α′2; i3) obtain orthonormal rows.
g3(α2; i3) =∑
α′2
r3(α2; α′2) q3(α
′2; i3) QR
g2(α1, i2; α′2) =
∑α2
g2(α1, i2; α2) r3(α2, α′2)
g2(α1; i2, α′2) =
∑
α′1
r2(α1; α′1) q2(α
′1; i2, α′
2) QR
g1(i1; α′1) =
∑α1
g1(i1; α1) r2(α1; α′2)
TENSOR TRAIN RECOMPRESSION
LEFT TO RIGHT by SVD
a(i1, i2, i3) =∑
α′1,α′
2
g1(i1; α′1) q2(α
′1, i2, α′
2) q3(α′2, i3)
=∑
α′′1 ,α′
2
z1(i1; α′′1) g2(α
′′1 ; i2, α′
2) q3(α′2, i3)
=∑
α′′1 ,α′′
2
z1(i1; α′′1) z2(α
′′1 ; i2, α′′
2) g3(α′′2 , i3)
Matrices z1(i1; α′′1), z2(α
′′1 , i2; α
′′2) obtain orthonormal columns.
LEMMA ON ORTHONORMALITY
Let k ≤ l and matrices
qk(αk−1 ; ik, αk), ... , ql(αl−1 ; il, αl)
have orthonormal rows. Then the matrix
Qk(αk ; i) ≡ Qk(αk−1 ; ik, ..., il, αl) ≡∑
αk,...,αl−1
qk(αk−1 ; ik, αk) ... ql(αl−1 ; il, αl)
has orthonormal rows as well.
PROOF BY INDUCTION.
Qk(αk−1 ; ik, i) =∑αk
qk(αk−1 ; ik, i) Qk+1(αk ; i) ⇒∑ik,i
Qk(α ; ik, i) Qk(β ; ik, i) =
∑ik,i
∑µ,ν
qk(α ; ik, µ) Qk+1(µ ; i)qk(β ; ik, ν) Qk+1(ν ; i) =
∑ik
∑µ,ν
qk(α, ; ik, µ) qk(β ; ik, ν) δ(µ, ν) =
∑ik,αk
qk(α, ; ik, αk) qk(β ; ik, αk) = δ(α, β)
TENSOR TRAIN RECOMPRESSION
a(i1, i2, i3) =∑
α′1,α′
2
g1(i1, α′1) q2(α
′1, i2, α′
2) q3(α′2, i3)
=∑
α′′1 ,α′
2
z1(i1, α′′1) g2(α
′′1 , i2, α′
2) q3(α′2, i3)
=∑
α′′1 ,α′′
2
z1(i1, α′′1) z2(α
′′1 , i2, α′′
2) g3(α′′2 , i3)
rankA1 = rank[g1(α
′′0 , i1; α
′1)
]
rankA2 = rank[g2(α
′′1 , i2; α
′2)
]
rankA3 = rank[g3(α
′′2 , i3; α
′3)
]
• Complexity of computation of compression ranks is linear in d.
• “Truncation” is performed in the SVD of small-size matrices.
• NUMBER OF OPERATIONS = O(dnr3)
• GUARANTEED ACCURACY =√
d ε(in the Frobenius norm)
TT APPROXIMATION FOR LAPLACIAN
d TT recompression time Canonical rank Compresison rank
10 0.01 sec 10 220 0.09 sec 20 240 0.78 sec 40 280 13 sec 80 2
160 152 sec 160 2200 248 sec 200 2
1D grids are of size 32.
Tensor has modes of size n = 1024.
WHAT CAN WE DO WITH TENSOR TRAINS?
a(i1, ...id) =∑
α1,...,αd−1
g1(i1, α1) g2(α1, i2, α2)... gd(αd−1, id)
• RECOMPRESSION: given a tensor train with TT-ranks r, we canapproximate it by another tensor train with a guaranteed accu-racy using O(dnr3) operations.
• QUASI-OPTIMALITY OF RECOMPRESSION:
ERROR ≤√
d − 1 · BEST APPROX. ERROR WITH SAME TT-RANKS
• EFFICIENT APPROXIMATE MATRIX OPERATIONS
CANONICAL VERSUS TENSOR-TRAIN
Canonical Tensor-Train
Number of parameters O(dnR) O(dnr + (d − 2)r3)
Matrix-by-vector O(dn2R2) O(dn2r2 + dr6)
Addition O(dnR) O(dnr)
Recompression O(dnR2 + d3R3) O(dnr2 + dr4)
Tensor-vector contraction O(dnR) O(dnr + dr3)
TENSOR-VECTOR CONTRACTION
γ =∑
i1,...,id
a(i1, ..., id) x1(i1) ... xd(id)
ALGORITHM:
• Compute matrices
Zk =
∑
ik
gk(ik, αk−1, αk) xk(ik)
• Multiply matricesγ = Z1Z2...Zk
NUMBER OF OPERATIONS = O(dnr2)
RECOVER A d-DIMENSIONAL TENSOR
FROM A “SMALL” PORTION OF ITS ELEMENTS
Given a procedure for computation of a(i1, ..., id).
We need to choose “true” elements and use them to constructa TT approximation for this tensor.
TT decomposition with maximal compression rank ris allowed to be constructed from some O(dnr2) elements.
HOW THIS PROBLEM IS SOLVED FOR MATRICES
Let A be close to a matrix of rank r:
σr+1(A) ≤ ε
Then there exists a cross of r columns C and r rows R such that
|(A − CG−1R)ij| ≤ (r + 1)ε
G is an r × r matrix on the intersection of C and R
Take G of maximal volume among all r × r submatrices in A.
S.A.Goreinov, E.E.Tyrtyshnikov:
The maximal-volume concept in approximation by low-rank matrices,
Contemporary Mathematics, Vol. 208 (2001), 47–51.
S.A.Goreinov, E.E.Tyrtyshnikov, N.L.Zamarashkin:
A theory of pseudo-skeleton approximations, Linear Algebra Appl.
261: 1–21 (1997). Doklady RAS (1995).
GOOD INSTEAD OF BEST: PSEUDO-MAX-VOLUME
Given A of size n × r, find a row permutation to move a good sub-
matrix to the upper r × r block. Since volume does not change byright-side multiplications, assume that
A =
1. . .
1ar+1,1 ... ar+1,r
... ... ...an1 ... anr
NECESSARY FOR MAX-VOL: |aij| ≤ 1, r + 1 ≤ i ≤ n, 1 ≤ j ≤ r
Let this define a good submatrix. Then here is an algorithm:
• If |aij| ≥ 1 + δ, then swap rows i and j.
• Make I in the first r rows by right-side multiplication.
• Check new |aij|. Quit if all are less than 1 + δ.
• Otherwise repeat.
MATRIX CROSS ALGORITHM
• Assume we are given some initial column indices j1, ..., jr.
• Find maximal-volume row indices i1, ..., ir in these columns.
• Find maximal-volume column indices in the rows i1, ..., ir.
• Proceed choosing columns and rows untilthe skeleton cross approximations stabilize.
E.E.Tyrtyshnikov, Incomplete cross approximation in themosaic-skeleton method, Computing 64, no. 4 (2000), 367–380.
TENSOR-TRAIN CROSS INTERPOLATION
Given a(i1, i2, i3, i4), consider the unfoldings and r-column sets:
A1 = [a(i1 ; i2, i3, i4)], J1 = {i(β1)2 i
(β1)3 i
(β1)4 }
A2 = [a(i1, i2 ; i3, i4)], J2 = {i(β2)3 i
(β2)4 }
A3 = [a(i1, i2, i3 ; i4)], J3 = {i(β3)4 }
Successively choose good rows:
I1 = {i(α1)1 } in a(i1 ; i2, i3, i4) : a =
∑α1
g1(i1; α1) a2(α1; i2, i3, i4)
I2 = {i(α2)1 i
(α2)2 } in a2(α1, i2 ; i3, i4) : a2 =
∑α2
g2(α1, i2; α2) a3(α2, i3; i4)
I3 = {i(α3)1 i
(α3)2 i
(α3)3 } in a3(α2, i3 ; i4) : a3 =
∑α3
g3(α2, i3; α3) g4(α3; i4)
Finally
a =∑
α1,α2,α3,α4
g1(i1, α1) g2(α1, i2, α2) g3(α2, i3, α3) g4(α3, i4)
TT-CROSS INTERPOLATION OF A TENSOR
Tensor A of size n1 × n2 × . . . × nd with compression ranks
rk = rankAk, Ak = A(i1i2 . . . ik; ik+1 . . . id)
is recovered by elements of TT-cross
Ck(αk−1, ik, βk) = A(i(αk−1)1 , i
(αk−1)2 , . . . , i
(αk−1)
k−1 , ik, j(βk)k+1 , . . . , j
(βk)d )
TT-cross is defined by index sets
Ik = {i(αk)1 . . . i
(αk)k }, 1 ≤ αk ≤ rk
Jk = {j(βk)k+1 . . . j
(βk)d }, 1 ≤ βk ≤ rk
Nested property for α sets.
Require nonsingularity of rk × rk matrices
Ak(αk, βk) = A(i(αk)1 , i
(αk)2 , . . . , i
(αk)k ; j
(βk)k+1 , . . . , j
(βk)d )
αk, βk = 1, ..., rk
FORMULA FOR TT-INTERPOLATION
A(i1, i2, . . . , id) =∑
α1,...,αd−1
C1(α0, i1, α1) C2(α1, i2, α2) . . . Cd(αd−1, id, αd)
Ck(αk−1, ik, αk) =∑
α′k
Ck(αk−1, ik, α′k) A−1
k (α′k, αk)
k = 1, . . . , d
Ad = I
TENSOR-TRAIN CROSS ALGORITHM
• Assume we are given rk initial column indices j(βk)k+1 , ..., j
(βk)d
in the unfolding matrices Ak.
• Find rk maximal-volume rows in submatrices in Ak of the form
a(i(αk−1)1 , ..., i
(αk−1)
k−1 , ik; j(βk)k+1 , ..., j
(βk)d ).
• Use the row indices obtained and do the same from right to leftto find new column indices.
• Proceed with these sweeps from left to right and from right to left.
• Stop when tensor trains stabilize.
EXAMPLE OF TT-CROSS APPROXIMATION
HILBERT TENSOR
a(i1, i2, . . . , id) =1
i1 + i2 + . . . + id
d = 60, n = 32
rmax Time Iterations Relative accuracy2 1.37 5 1.897278e+003 4.22 7 5.949094e-024 7.19 7 2.226874e-025 15.42 9 2.706828e-036 21.82 9 1.782433e-047 29.62 9 2.151107e-058 38.12 9 4.650634e-069 48.97 9 5.233465e-0710 59.14 9 6.552869e-0811 72.14 9 7.915633e-0912 75.27 8 2.814507e-09
COMPUTATION OF d-DIMENSIONAL INTEGRALS: example 1
I(d) =
∫sin(x1 + x2 + . . . + xd) dx1dx2 . . . dxd =
Im
∫
[0,1]dei(x1+x2+...+xd) dx1dx2 . . . dxd = Im((
ei − 1
i)d)
Use the Chebyshev (Clenshaw-Curtis) quadrature with n = 11 nodes.All nd values are NEVER COMPUTED!
Instead, we find a TT cross and construct a TT approximation forthis tensor.
d I Relative accuracy Time10 -6.299353e-01 1.409952e-15 0.14100 -3.926795e-03 2.915654e-13 0.77500 -7.287664e-10 2.370536e-12 4.641000 -2.637513e-19 3.482065e-11 11.602000 2.628834e-37 8.905594e-12 33.054000 9.400335e-74 2.284085e-10 105.49
COMPUTATION OF d-DIMENSIONAL INTEGRALS: example 2
I(d) =
∫
[0,1]d
√x2
1 + x22 + . . . x2
d dx1dx2 . . . dxd
d = 100
Chebyshev quadrature with n = 41 nodes plusTT-cross of size rmax = 32 give a “reference solution”.For comparison, take n = 11 nodes:
rmax Relative accuracy Time2 1.747414e-01 1.764 2.823821e-03 11.528 4.178328e-05 42.7610 3.875489e-07 66.2812 2.560370e-07 94.3914 4.922604e-08 127.6016 9.789895e-10 167.0218 1.166096e-10 211.0920 2.706435e-11 260.13
INCREASE DIMENSIONALITY
(TENSORS INSTEAD MATRICES)
Matrix is a 2-way array.
A d-level matrix is naturally viewed as a 2d-way array:
A(i, j) = A(i1, i2, . . . , id; j1, j2, . . . , jd)
i ↔ (i1...id), j ↔ (j1...jd)
Important to consider a related reshaped array:
B(i1j1, . . . , idjd) = A(i1, i2, . . . , id; j1, j2, . . . , jd)
Matrix A is represented by tensor B.
MINIMAL TENSOR TRAINS
a(i1 . . . id; j1 . . . jd) =∑
1≤αk≤rk
g1(i1j1, α1) g2(α1, i2j2, α2) . . . gd−1(αd−2, id−1jd−1, αd−1) gd(αd−1, idjd)
Minimal possible values of compression ranks rk are equal to theranks of specific unfolding matrices:
rk = rankAk, Ak = [A(i1j1, . . . , ikjk; ik+1jk+1, . . . , idjd)]
If all rk = 1 then
A = G1 ⊗ . . . ⊗ Gd
In general
A =∑
α1
G1α1 ⊗
∑
α2
G2α1α2 ⊗
∑
α3
G3α2α3 . . . . . .
NO CURSE OF DIMENSIONALITY
Let 1 ≤ ik, jk ≤ n and rk = r.
Then the number of representation parameters is dn2r2.
Dependence on d is linear!
SO LET US MAKE d AS LARGE AS POSSIBLE
BY ADDING FICTITIOUS AXES
Assume we had d0 levels. If n = 2d1 then set d = d0d1.Then
memory = 4dr2
d = log2(size(A))
LOGARITHMIC IN THE SIZE OF MATRIX
CAUCHY–TOEPLITZ EXAMPLE
A =
[1
i − j + 1/2
]
Relative accuracy Compression ranks for A and A−1
1.e-5 3 7 8 8 8 7 7 7 31.e-7 3 7 9 10 10 9 9 7 31.e-9 3 7 11 11 11 11 11 7 31.e-11 3 7 12 13 13 13 12 7 31.e-13 3 7 14 14 15 14 14 7 3
n = 1024, d0 = 1, d1 = 10
INVERSES TO BANDED TOEPLITZ MATRICES
Let A be a band Toeplitz matrix : Aij = [a(i − j)]
ak = 0, |k| > s, s is half-bandwidth.
THEOREM
Let size(A) = 2d × 2d and det A 6= 0. Then
rk(A−1) ≤ 4s2 + 1, k = 1, . . . , d − 1,
the estimate being sharp.
COROLLARY
The inverse to a band Toeplitz matrix A of size 2d × 2d with half-bandwidth s has a TT representation with the number of parameters
O(s4 log2 n).
Using Newton with approximations we obtain the inversion algorithm
with complexity O(log2 n).
AVERAGE COMPRESSION RANK
r =
√memory
2d⇒ memory = 4dr2
INVERSION OF d0-DIMENSIONAL LAPLACIAN
BY MODIFIED NEWTON
d1 = 10
Physical dimensionality (= d0) 1 3 5 10 30 50
Average compression rank of A 2.8 3.5 3.6 3.7 3.8 3.8Average compression rank
of approximation to A−1 7.3 18.6 19.2 17.4 16.1 16.5
Time (sec) 2. 10. 17. 23. 27. 33.
||AX − I||/||I|| 1.e-2 6.e-3 2.e-3 5.e-5 4.e-5 4.e-5
The last matrix size is 2100.
INVERSION OF 10-DIMENSIONAL LAPLACIAN VIA INTEGRALREPRESENTATION BY STENGER FORMULA
∞∫
0
exp(−At)dt ≈ hτ
M∑k=−M
wk exp(−tk
τA
)
h = π/√
M, wk = tk = exp(hk), λmin(A/τ ) ≥ 1
CONCLUSIONS AND PERSPECTIVES
• Tensor-train decompositions and corresponding algorithms (seehttp://pub.inm.ras.ru) provide us with excellent approximationtools for vectors and matrices. TT-toolbox for Matlab is available:http://spring.inm.ras.ru/osel.
• The memory needed depends on the matrix size logarithmically.It is terrific advantage when compression ranks are small. It isexactly so in many applications.
• Approximate inverses can be computed in the tensor-train formatgenerally with complexity logarithmic in the size of matrix.
• Applications unclude huge-scale matrices (with size up to 2100)and as well typical large-scale and even modest-scale matrices(like images).
• The key to efficient tensor-train operations is the recompression
algorithm with complexity O(dnr6) and reliability of the SVD.
• Modified Newton method with truncations and integral represen-tations of matrix functions are viable in the tensor-train format.
GOOD PERSPECTIVES
• Multi-variate interpolation (construction of tensor trains from asmall portion of all elements, tensor cross methods using the max-imal volume concept).
• Fast computation of integrals in d dimensions (no Monte Carlo).
• Approximate matrix operations (e.g. inversion)with complexity O(log2 n).
linear in d = linear in log2 n
• New direction in data compression and image processing(movies).
• Statistical interpretation of tensor trains.
• Applications to quantum chemistry, multi-parametric optimiza-tion, stochastic PDEs, data mining etc.
MORE DETAILS and WORK IN PROGRESS
• I. V. Oseledets and E. E. Tyrtyshnikov, “Breaking the curse od dimensionality, or how to
use SVD in many dimensions”, Research Report 09-03, Hong Kong: ICM HKBU, 2009
(www.math.hkbu.edu.hk/ICM/pdf/09-03.pdf), SIAM J. Sci. Comput., 2009.
• I. Oseledets, “Compact matrix form of the d-dimensional tensor decomposition”, SIAM
J. Sci. Comput., 2009.
• I. V. Oseledets, "Tensors inside matrices give logarithmic complexity", SIAM J. Matrix
Anal. Appl., 2009.
• I. V. Oseledets, “TT-Cross Approximation for Multidimensional Arrays”, Research Report
09-11, Hong Kong: ICM HKBU, 2009 (www.math.hkbu.edu.hk/ICM/pdf/09-11.pdf),
Linear ALgebra Appl., 2009.
• I. Oseledets, E. E. Tyrtyshnikov, “On a recursive decomposition of multi-dimensional
tensors”, Doklady RAS, vol. 427, no. 2 (2009).
• I. Oseledets, “On a new tensor decomposition”, Doklady RAS, vol. 427, no. 3 (2009).
• I. Oseledets, “On approximation of matrices with logarithmic number of parameters”,
Doklady RAS, vol. 427, no. 4 (2009).
• N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov, Tensor structure of the inverse to a
banded Toeplitz matrix, Doklady RAS, vol. 427, no. 5 (2009).
• Efficient ranks of tensors and stability of TT approximations, TTM for image processing,
TT approximations in electronic structure calculations. In preparation.