tensor approximation tools free of the curse of dimensionality

TENSOR APPROXIMATION TOOLS

FREE OF THE CURSE OF DIMENSIONALITY

Eugene Tyrtyshnikov

Institute of Numerical Mathematics

Russian Academy of Sciences

(joint work with Ivan Oseledets)

WHAT ARE TENSORS?

Tensors = d-dimensional arrays:

A = [aij...k]

i ∈ I, j ∈ J, ... , k ∈ K

Tensor A has:

• dimensionality (order) d = number of indices(modes, axes, directions, ways)

• size n1 × ... × nd

(number of nodes along each axis)

WHAT IS A PROBLEM?

NUMBER OF TENSOR ELEMENTS = nd

GROWS EXPONENTIALLY IN d

WATER AND UNIVERSE

H2O molecule has 18 electrons. Each electron has 3 coordinates.

Thus we have 18 × 3 = 54 axes.

If we take 32 nodes on each axis, we obtain 3254 ≈ 1081 points,

which is close to the number of atoms in the universe.

CURSE OF DIMENSIONALITY

WE SURVIVE WITH

• COMPACT (LOW-PARAMETRIC)

REPRESENTATIONS FOR TENSORS

• METHODS FOR COMPUTATIONS

IN COMPACT REPRESENTATIONS

TUCKER DECOMPOSITION

a(i1, ..., id) =

r1∑

α1=1

...

rd∑

αd=1

g(α1, ..., αd) q1(i1, α1) ... qd(id, αd)

L. R. Tucker, Some mathematical notes on three-mode factor analysis,

Psychometrika, V. 31, P. 279–311 (1966).

COMPONENTS:

• 2D arrays q1, ..., qd with dnr entries

• d-dimensional array g(α1, ..., αd) with rd entries

CURSE OF DIMENSIONALITY REMAINS

CANONICAL DECOMPOSITION (PARAFAC, CANDECOMP)

a(i1, ..., id) =R∑

α=1

u1(i1, α) ... ud(id, α)

Number of defining parameters is dRn.

DRAWBACKS:

• INSTABILITY (cf. Lim, de Silva)

x1, ... , xd, y1, ... , yd linearly independent

a =d∑

t=1

zt1 ⊗ ... ⊗ zt

d, ztk =

{xk, k 6= tyk, k = t

a =1

ε(x1 + εy1) ⊗ ... ⊗ (xd + εyd) − 1

εx1 ⊗ ... ⊗ xd + O(ε)

• EVENTUAL LACK OF ROBUST ALGORITHMS

a(i1, ..., id) =

r1∑

α1=1

...

rd∑

αd=1

g(α1, ..., αd) q1(i1, α1) ... qd(id, αd)

TUCKER DECOMPOSITION

a(i1, ..., id) =R∑

α=1

u1(i1, α) ... ud(id, α)

CANONICAL DECOMPOSITION (PARAFAC, CANDECOMP)

a(i1, ..., id) =∑

α1, ..., αd−1

g1(i1, α1) g2(α1, i2, α2) . . .

. . . gd−1(αd−2, id−1, αd−1) gd(αd−1, id)

TENSOR-TRAIN DECOMPOSITION

TENSORS AND MATRICES

Let A = [aijklm].

Take up a pair of mutually complementary long indices

(ij) and (klm)

(kl) and (ijm)

.........

Tensor A gives rise to unfolding matrices:

B1 = [b(ij),(klm)]

B2 = [b(kl),(ijm)]

.........

By definition,

b(ij),(klm) = b(kl),(ijm) = ... = aijklm

DIMENSIONALITY CAN BE DECREASED

a(i1, ..., id) = a(i1, ..., ik; ik+1, ..., is)

=r∑

s=1

u(i1, ..., ik; s) v(ik+1, ..., id; s)

Dimension d reduces to dimensions k + 1 and d − k + 1.

Proceed by recursion.

Binary tree arises.

TUCKER VIA RECURSION

��2�3�4�5

��α� �2�3�4�5α�

�2α2 �3�4�5α�α2

�3α3 �4�5α�α2α3

�4α4 �5α�α2α3α4

�5α5α�α2α3α4α5

a(i1, i2, i3, i4, i5) =∑

α1,α2,α3,α4,α5

g(α1, α2, α3, α4, α5)·

·q1(i1, α1) q2(i2, α2) q3(i3, α3) q4(i4, α4) q5(i5, α5)

BINARY TREE IMPLIES

• Any auxiliary index belongs to exactly twoleaf tensors.

• Tensor is the sum over all auxiliary indices of the productof elements of the leaf tensors.

HOW TO AVOID rd PARAMETERS

• Let any leaf tensor have at most onespatial index.

• Let any leaf tensor have at most two (three)auxiliary indices.

TREE WITHOUT TUCKER

��2�3�4�5

��α� �2�3�4�5α�

�2α2 �3�4�5α�α2

�3α�α3 �4�5α2α3

�4α2α4 �5α3α4


a(i1, i2, i3, i4, i5) =∑

α1,α2,α3,α4

g1(i1, α1) g2(α1, i3, α3) g3(α3, i5, α4) g4(α4, i4, α2) g5(α2, i2)

HOW MANY PARAMETERS

NUMBER OF TT PARAMETERS = 2nr + (d − 2)nr2

EXTENDED TT DECOMPOSITION

��2�3�4�5

��2α� �3�4�5α�

��α2 �2α�α2 �3α3 �4�5α�α3

�4α�α4 �5α3α4�2α5

α�α2α5

�4α6α�α4α6 �5α7

α3α4α7

NUMBER OF EXTENDED TT PARAMETERS = dnr + (d − 2)r3

TREE IS NOT NEEDED!

ALL IS DEFINED BY A PERMUTATION OF SPATIAL INDICES


a(i1, i2, i3, i4, i5) =∑

β1,β2,β3,β4

g1(iσ(1), β1) g2(β1, iσ(2), β2) g3(β2, iσ(3), β4) g4(β4, iσ(5), β5) g5(β5, iσ(5))

TT = Tree–Tucker ⇒ neither Tree, nor Tucker ⇒TENSOR TRAIN

MINIMAL TT DECOMPOSITION

Let 1 ≤ βk ≤ rk.

What are minimal values for compression ranks rk?

rk ≥ rankAσk

Aσk =

[aσ(iσ(1), ..., iσ(k); iσ(k+1), ..., iσ(d))

]

aσ(iσ(1), ..., iσ(k); iσ(k+1), ..., iσ(d)) = a(i1, ..., id)

GENERAL PROPERTIES

THEOREM 1.

Assume that a tensor a(i1, ..., id) possesses a canonical decomposi-tion with R terms. Then a(i1, ..., id) admits a TT decomposition ofrank R or less.

THEOREM 2.

Assume that a tensor a(i1, ..., id), when ε-perturbed, with any smallε possesses a canonical decomposition with R terms. Then a(i1, ..., id)admits a TT decomposition of rank R or less.

FROM CANONICAL TO TENSOR TRAIN

a(i1, ..., id) =R∑

s=1u(i1, s)... u(id, s) =

∑α1,....,αd−1

u(i1, α1) δ(α1, α2)u(i2, α2) ...

... δ(αd−2, αd−1)u(id−1, αd−1) u(id, αd−1)

FREE!

EFFECTIVE RANK OF A TENSOR

ERank(a) = lim supε→+0

min|b − a| ≤ ε

b ∈ C(n1, ..., nd)

RANK(b)

F(n1, ..., nd): all tensors of size n1 × ... × nd with entries from F.

Let a ∈ F(n1, ..., nd) ⊂ C(n1, ..., nd). Thencanonical rank over F depends on F, effective rank does not.

Close to border rank concept (Bini-Capovani).Which still depends on F.

THEOREM 2 (reformulated)

Let a ∈ F(n1, ..., nd). Then for this tensor there exists a TT decom-position of rank r ≤ ERank(a) with entries of all tensors belongingto F.

EXAMPLE 1

d-dimensional tensor in the matrix form

A = Λ ⊗ I ⊗ ... ⊗ I + I ⊗ Λ ⊗ ... ⊗ I + ... + I ⊗ ... ⊗ I ⊗ Λ

⇒

P (h) ≡ ⊗ds=1(I + hΛ) = I + hA + O(h2)

⇒

A =1

hP (h) − 1

hP (0) + O(h)

⇒

ERank(A) = 2

EXAMPLE 2

Real-valued tensor F by the function

f(x1, ..., xd) = sin(x1 + ... + xd)

on some 1D grids for x1, ..., xd.

Beylkin et al: canonical rank over R of F does not exceed d(it is likely to be exactly d).However,

sin x =exp(ix) − exp(−ix)

2i

⇒

ERank(F ) = 2

EXAMPLE 3

d-dimensional tensor A from discretization of operator

A =∑

1≤i≤j≤d

aij

∂

∂xi

∂

∂xj

on a tensor grid for variables x1, ..., xd.

Canonical rank ∼ d2/2.

However,

ERank(A) ≤ 3

2d + 1

(N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov)

TENSOR TRAIN DECOMPOSITION

a(i1, ...id) =∑

α0,...,αd

g1(α0, i1, α1) g2(α1, i2, α2)... gd(αd−1, id, αd)

MATRIX FORM

a(i1, ..., id) = Gi11 Gi2

2 ... Gidd

MINIMAL TT COMPRESSION RANKS:

rk = rankAk, Ak = [a(i1...ik)(ik+1...id)], 0 ≤ k ≤ d

size(Gikk ) = rk−1 × rk

THE KEY TO EVERYTHING

PROBLEM OF RECOMPRESSION:

Given a tensor train, but with large ranks.

Let us try to find in ε-vicinity a tensor train

with lesser compression ranks.

METHOD OF TT RECOMPRESSION (I. V. Oseledets):

• NUMBER OF OPERATIONS IS LINEAR

IN DIMENSIONALITY d AND MODE SIZE n

• THE RESULT HAS GUARANTEED

APPROXIMATION ACCURACY

METHOD OF TENSOR TRAIN RECOMPRESSION

Minimal TT compression ranks = ranks of unfolding matrices Ak

Matrices Ak are of size nk × nd−k, but never appearas full arrays of nd elements.

Nevertheless, the SVD for Ak are constructed with orthogonal (uni-tary) matrices in a compact factorized form.

When neglecting smallest singular values, we provideGUARANTEED ACCURACY.

To show the idea, consider a TT decomposition

a(i1, i2, i3) =∑

α1,α2

g1(i1, α1) g2(α1, i2, α2) g3(α2, i3)

TENSOR TRAIN RECOMPRESSION

RIGHT TO LEFT by QR

a(i1, i2, i3) =∑

α1,α2

g1(i1, α1) g2(α1, i2, α2) g3(α2; i3)

=∑

α1,α′2

g1(i1, α1) g2(α1, i2; α′2) q3(α

′2; i3)

=∑

α′1,α′

2

g1(i1; α′1) q2(α

′1, i2; α′

2) q3(α′2; i3)

Matrices q2(α′1; i2, α′

2), q3(α′2; i3) obtain orthonormal rows.

g3(α2; i3) =∑

α′2

r3(α2; α′2) q3(α

′2; i3) QR

g2(α1, i2; α′2) =

∑α2

g2(α1, i2; α2) r3(α2, α′2)

g2(α1; i2, α′2) =

∑

α′1

r2(α1; α′1) q2(α

′1; i2, α′

2) QR

g1(i1; α′1) =

∑α1

g1(i1; α1) r2(α1; α′2)


LEFT TO RIGHT by SVD

a(i1, i2, i3) =∑

α′1,α′

2

g1(i1; α′1) q2(α

′1, i2, α′

2) q3(α′2, i3)

=∑

α′′1 ,α′

2

z1(i1; α′′1) g2(α

′′1 ; i2, α′

2) q3(α′2, i3)

=∑

α′′1 ,α′′

2

z1(i1; α′′1) z2(α

′′1 ; i2, α′′

2) g3(α′′2 , i3)

Matrices z1(i1; α′′1), z2(α

′′1 , i2; α

′′2) obtain orthonormal columns.

LEMMA ON ORTHONORMALITY

Let k ≤ l and matrices

qk(αk−1 ; ik, αk), ... , ql(αl−1 ; il, αl)

have orthonormal rows. Then the matrix

Qk(αk ; i) ≡ Qk(αk−1 ; ik, ..., il, αl) ≡∑

αk,...,αl−1

qk(αk−1 ; ik, αk) ... ql(αl−1 ; il, αl)

has orthonormal rows as well.

PROOF BY INDUCTION.

Qk(αk−1 ; ik, i) =∑αk

qk(αk−1 ; ik, i) Qk+1(αk ; i) ⇒∑ik,i

Qk(α ; ik, i) Qk(β ; ik, i) =

∑ik,i

∑µ,ν

qk(α ; ik, µ) Qk+1(µ ; i)qk(β ; ik, ν) Qk+1(ν ; i) =

∑ik

∑µ,ν

qk(α, ; ik, µ) qk(β ; ik, ν) δ(µ, ν) =

∑ik,αk

qk(α, ; ik, αk) qk(β ; ik, αk) = δ(α, β)


a(i1, i2, i3) =∑

α′1,α′

2

g1(i1, α′1) q2(α

′1, i2, α′

2) q3(α′2, i3)

=∑

α′′1 ,α′

2

z1(i1, α′′1) g2(α

′′1 , i2, α′

2) q3(α′2, i3)

=∑

α′′1 ,α′′

2

z1(i1, α′′1) z2(α

′′1 , i2, α′′

2) g3(α′′2 , i3)

rankA1 = rank[g1(α

′′0 , i1; α

′1)

]

rankA2 = rank[g2(α

′′1 , i2; α

′2)

]

rankA3 = rank[g3(α

′′2 , i3; α

′3)

]

• Complexity of computation of compression ranks is linear in d.

• “Truncation” is performed in the SVD of small-size matrices.

• NUMBER OF OPERATIONS = O(dnr3)

• GUARANTEED ACCURACY =√

d ε(in the Frobenius norm)

TT APPROXIMATION FOR LAPLACIAN

d TT recompression time Canonical rank Compresison rank

10 0.01 sec 10 220 0.09 sec 20 240 0.78 sec 40 280 13 sec 80 2

160 152 sec 160 2200 248 sec 200 2

1D grids are of size 32.

Tensor has modes of size n = 1024.

WHAT CAN WE DO WITH TENSOR TRAINS?

a(i1, ...id) =∑

α1,...,αd−1

g1(i1, α1) g2(α1, i2, α2)... gd(αd−1, id)

• RECOMPRESSION: given a tensor train with TT-ranks r, we canapproximate it by another tensor train with a guaranteed accu-racy using O(dnr3) operations.

• QUASI-OPTIMALITY OF RECOMPRESSION:

ERROR ≤√

d − 1 · BEST APPROX. ERROR WITH SAME TT-RANKS

• EFFICIENT APPROXIMATE MATRIX OPERATIONS

CANONICAL VERSUS TENSOR-TRAIN

Canonical Tensor-Train

Number of parameters O(dnR) O(dnr + (d − 2)r3)

Matrix-by-vector O(dn2R2) O(dn2r2 + dr6)

Addition O(dnR) O(dnr)

Recompression O(dnR2 + d3R3) O(dnr2 + dr4)

Tensor-vector contraction O(dnR) O(dnr + dr3)

TENSOR-VECTOR CONTRACTION

γ =∑

i1,...,id

a(i1, ..., id) x1(i1) ... xd(id)

ALGORITHM:

• Compute matrices

Zk =

∑

ik

gk(ik, αk−1, αk) xk(ik)

• Multiply matricesγ = Z1Z2...Zk

NUMBER OF OPERATIONS = O(dnr2)

RECOVER A d-DIMENSIONAL TENSOR

FROM A “SMALL” PORTION OF ITS ELEMENTS

Given a procedure for computation of a(i1, ..., id).

We need to choose “true” elements and use them to constructa TT approximation for this tensor.

TT decomposition with maximal compression rank ris allowed to be constructed from some O(dnr2) elements.

HOW THIS PROBLEM IS SOLVED FOR MATRICES

Let A be close to a matrix of rank r:

σr+1(A) ≤ ε

Then there exists a cross of r columns C and r rows R such that

|(A − CG−1R)ij| ≤ (r + 1)ε

G is an r × r matrix on the intersection of C and R

Take G of maximal volume among all r × r submatrices in A.

S.A.Goreinov, E.E.Tyrtyshnikov:

The maximal-volume concept in approximation by low-rank matrices,

Contemporary Mathematics, Vol. 208 (2001), 47–51.

S.A.Goreinov, E.E.Tyrtyshnikov, N.L.Zamarashkin:

A theory of pseudo-skeleton approximations, Linear Algebra Appl.

261: 1–21 (1997). Doklady RAS (1995).

GOOD INSTEAD OF BEST: PSEUDO-MAX-VOLUME

Given A of size n × r, find a row permutation to move a good sub-

matrix to the upper r × r block. Since volume does not change byright-side multiplications, assume that

A =

1. . .

1ar+1,1 ... ar+1,r

... ... ...an1 ... anr

NECESSARY FOR MAX-VOL: |aij| ≤ 1, r + 1 ≤ i ≤ n, 1 ≤ j ≤ r

Let this define a good submatrix. Then here is an algorithm:

• If |aij| ≥ 1 + δ, then swap rows i and j.

• Make I in the first r rows by right-side multiplication.

• Check new |aij|. Quit if all are less than 1 + δ.

• Otherwise repeat.

MATRIX CROSS ALGORITHM

• Assume we are given some initial column indices j1, ..., jr.

• Find maximal-volume row indices i1, ..., ir in these columns.

• Find maximal-volume column indices in the rows i1, ..., ir.

• Proceed choosing columns and rows untilthe skeleton cross approximations stabilize.

E.E.Tyrtyshnikov, Incomplete cross approximation in themosaic-skeleton method, Computing 64, no. 4 (2000), 367–380.

TENSOR-TRAIN CROSS INTERPOLATION

Given a(i1, i2, i3, i4), consider the unfoldings and r-column sets:

A1 = [a(i1 ; i2, i3, i4)], J1 = {i(β1)2 i

(β1)3 i

(β1)4 }

A2 = [a(i1, i2 ; i3, i4)], J2 = {i(β2)3 i

(β2)4 }

A3 = [a(i1, i2, i3 ; i4)], J3 = {i(β3)4 }

Successively choose good rows:

I1 = {i(α1)1 } in a(i1 ; i2, i3, i4) : a =

∑α1

g1(i1; α1) a2(α1; i2, i3, i4)

I2 = {i(α2)1 i

(α2)2 } in a2(α1, i2 ; i3, i4) : a2 =

∑α2

g2(α1, i2; α2) a3(α2, i3; i4)

I3 = {i(α3)1 i

(α3)2 i

(α3)3 } in a3(α2, i3 ; i4) : a3 =

∑α3

g3(α2, i3; α3) g4(α3; i4)

Finally

a =∑

α1,α2,α3,α4

g1(i1, α1) g2(α1, i2, α2) g3(α2, i3, α3) g4(α3, i4)

TT-CROSS INTERPOLATION OF A TENSOR

Tensor A of size n1 × n2 × . . . × nd with compression ranks

rk = rankAk, Ak = A(i1i2 . . . ik; ik+1 . . . id)

is recovered by elements of TT-cross

Ck(αk−1, ik, βk) = A(i(αk−1)1 , i

(αk−1)2 , . . . , i

(αk−1)

k−1 , ik, j(βk)k+1 , . . . , j

(βk)d )

TT-cross is defined by index sets

Ik = {i(αk)1 . . . i

(αk)k }, 1 ≤ αk ≤ rk

Jk = {j(βk)k+1 . . . j

(βk)d }, 1 ≤ βk ≤ rk

Nested property for α sets.

Require nonsingularity of rk × rk matrices

Ak(αk, βk) = A(i(αk)1 , i

(αk)2 , . . . , i

(αk)k ; j

(βk)k+1 , . . . , j

(βk)d )

αk, βk = 1, ..., rk

FORMULA FOR TT-INTERPOLATION

A(i1, i2, . . . , id) =∑

α1,...,αd−1

C1(α0, i1, α1) C2(α1, i2, α2) . . . Cd(αd−1, id, αd)

Ck(αk−1, ik, αk) =∑

α′k

Ck(αk−1, ik, α′k) A−1

k (α′k, αk)

k = 1, . . . , d

Ad = I

TENSOR-TRAIN CROSS ALGORITHM

• Assume we are given rk initial column indices j(βk)k+1 , ..., j

(βk)d

in the unfolding matrices Ak.

• Find rk maximal-volume rows in submatrices in Ak of the form

a(i(αk−1)1 , ..., i

(αk−1)

k−1 , ik; j(βk)k+1 , ..., j

(βk)d ).

• Use the row indices obtained and do the same from right to leftto find new column indices.

• Proceed with these sweeps from left to right and from right to left.

• Stop when tensor trains stabilize.

EXAMPLE OF TT-CROSS APPROXIMATION

HILBERT TENSOR

a(i1, i2, . . . , id) =1

i1 + i2 + . . . + id

d = 60, n = 32

rmax Time Iterations Relative accuracy2 1.37 5 1.897278e+003 4.22 7 5.949094e-024 7.19 7 2.226874e-025 15.42 9 2.706828e-036 21.82 9 1.782433e-047 29.62 9 2.151107e-058 38.12 9 4.650634e-069 48.97 9 5.233465e-0710 59.14 9 6.552869e-0811 72.14 9 7.915633e-0912 75.27 8 2.814507e-09

COMPUTATION OF d-DIMENSIONAL INTEGRALS: example 1

I(d) =

∫sin(x1 + x2 + . . . + xd) dx1dx2 . . . dxd =

Im

∫

[0,1]dei(x1+x2+...+xd) dx1dx2 . . . dxd = Im((

ei − 1

i)d)

Use the Chebyshev (Clenshaw-Curtis) quadrature with n = 11 nodes.All nd values are NEVER COMPUTED!

Instead, we find a TT cross and construct a TT approximation forthis tensor.

d I Relative accuracy Time10 -6.299353e-01 1.409952e-15 0.14100 -3.926795e-03 2.915654e-13 0.77500 -7.287664e-10 2.370536e-12 4.641000 -2.637513e-19 3.482065e-11 11.602000 2.628834e-37 8.905594e-12 33.054000 9.400335e-74 2.284085e-10 105.49

COMPUTATION OF d-DIMENSIONAL INTEGRALS: example 2

I(d) =

∫

[0,1]d

√x2

1 + x22 + . . . x2

d dx1dx2 . . . dxd

d = 100

Chebyshev quadrature with n = 41 nodes plusTT-cross of size rmax = 32 give a “reference solution”.For comparison, take n = 11 nodes:

rmax Relative accuracy Time2 1.747414e-01 1.764 2.823821e-03 11.528 4.178328e-05 42.7610 3.875489e-07 66.2812 2.560370e-07 94.3914 4.922604e-08 127.6016 9.789895e-10 167.0218 1.166096e-10 211.0920 2.706435e-11 260.13

INCREASE DIMENSIONALITY

(TENSORS INSTEAD MATRICES)

Matrix is a 2-way array.

A d-level matrix is naturally viewed as a 2d-way array:

A(i, j) = A(i1, i2, . . . , id; j1, j2, . . . , jd)

i ↔ (i1...id), j ↔ (j1...jd)

Important to consider a related reshaped array:

B(i1j1, . . . , idjd) = A(i1, i2, . . . , id; j1, j2, . . . , jd)

Matrix A is represented by tensor B.

MINIMAL TENSOR TRAINS

a(i1 . . . id; j1 . . . jd) =∑

1≤αk≤rk

g1(i1j1, α1) g2(α1, i2j2, α2) . . . gd−1(αd−2, id−1jd−1, αd−1) gd(αd−1, idjd)

Minimal possible values of compression ranks rk are equal to theranks of specific unfolding matrices:

rk = rankAk, Ak = [A(i1j1, . . . , ikjk; ik+1jk+1, . . . , idjd)]

If all rk = 1 then

A = G1 ⊗ . . . ⊗ Gd

In general

A =∑

α1

G1α1 ⊗

∑

α2

G2α1α2 ⊗

∑

α3

G3α2α3 . . . . . .

NO CURSE OF DIMENSIONALITY

Let 1 ≤ ik, jk ≤ n and rk = r.

Then the number of representation parameters is dn2r2.

Dependence on d is linear!

SO LET US MAKE d AS LARGE AS POSSIBLE

BY ADDING FICTITIOUS AXES

Assume we had d0 levels. If n = 2d1 then set d = d0d1.Then

memory = 4dr2

d = log2(size(A))

LOGARITHMIC IN THE SIZE OF MATRIX

CAUCHY–TOEPLITZ EXAMPLE

A =

[1

i − j + 1/2

]

Relative accuracy Compression ranks for A and A−1

1.e-5 3 7 8 8 8 7 7 7 31.e-7 3 7 9 10 10 9 9 7 31.e-9 3 7 11 11 11 11 11 7 31.e-11 3 7 12 13 13 13 12 7 31.e-13 3 7 14 14 15 14 14 7 3

n = 1024, d0 = 1, d1 = 10

INVERSES TO BANDED TOEPLITZ MATRICES

Let A be a band Toeplitz matrix : Aij = [a(i − j)]

ak = 0, |k| > s, s is half-bandwidth.

THEOREM

Let size(A) = 2d × 2d and det A 6= 0. Then

rk(A−1) ≤ 4s2 + 1, k = 1, . . . , d − 1,

the estimate being sharp.

COROLLARY

The inverse to a band Toeplitz matrix A of size 2d × 2d with half-bandwidth s has a TT representation with the number of parameters

O(s4 log2 n).

Using Newton with approximations we obtain the inversion algorithm

with complexity O(log2 n).

AVERAGE COMPRESSION RANK

r =

√memory

2d⇒ memory = 4dr2

INVERSION OF d0-DIMENSIONAL LAPLACIAN

BY MODIFIED NEWTON

d1 = 10

Physical dimensionality (= d0) 1 3 5 10 30 50

Average compression rank of A 2.8 3.5 3.6 3.7 3.8 3.8Average compression rank

of approximation to A−1 7.3 18.6 19.2 17.4 16.1 16.5

Time (sec) 2. 10. 17. 23. 27. 33.

||AX − I||/||I|| 1.e-2 6.e-3 2.e-3 5.e-5 4.e-5 4.e-5

The last matrix size is 2100.

INVERSION OF 10-DIMENSIONAL LAPLACIAN VIA INTEGRALREPRESENTATION BY STENGER FORMULA

∞∫

0

exp(−At)dt ≈ hτ

M∑k=−M

wk exp(−tk

τA

)

h = π/√

M, wk = tk = exp(hk), λmin(A/τ ) ≥ 1

CONCLUSIONS AND PERSPECTIVES

• Tensor-train decompositions and corresponding algorithms (seehttp://pub.inm.ras.ru) provide us with excellent approximationtools for vectors and matrices. TT-toolbox for Matlab is available:http://spring.inm.ras.ru/osel.

• The memory needed depends on the matrix size logarithmically.It is terrific advantage when compression ranks are small. It isexactly so in many applications.

• Approximate inverses can be computed in the tensor-train formatgenerally with complexity logarithmic in the size of matrix.

• Applications unclude huge-scale matrices (with size up to 2100)and as well typical large-scale and even modest-scale matrices(like images).

• The key to efficient tensor-train operations is the recompression

algorithm with complexity O(dnr6) and reliability of the SVD.

• Modified Newton method with truncations and integral represen-tations of matrix functions are viable in the tensor-train format.

GOOD PERSPECTIVES

• Multi-variate interpolation (construction of tensor trains from asmall portion of all elements, tensor cross methods using the max-imal volume concept).

• Fast computation of integrals in d dimensions (no Monte Carlo).

• Approximate matrix operations (e.g. inversion)with complexity O(log2 n).

linear in d = linear in log2 n

• New direction in data compression and image processing(movies).

• Statistical interpretation of tensor trains.

• Applications to quantum chemistry, multi-parametric optimiza-tion, stochastic PDEs, data mining etc.

MORE DETAILS and WORK IN PROGRESS

• I. V. Oseledets and E. E. Tyrtyshnikov, “Breaking the curse od dimensionality, or how to

use SVD in many dimensions”, Research Report 09-03, Hong Kong: ICM HKBU, 2009

(www.math.hkbu.edu.hk/ICM/pdf/09-03.pdf), SIAM J. Sci. Comput., 2009.

• I. Oseledets, “Compact matrix form of the d-dimensional tensor decomposition”, SIAM

J. Sci. Comput., 2009.

• I. V. Oseledets, "Tensors inside matrices give logarithmic complexity", SIAM J. Matrix

Anal. Appl., 2009.

• I. V. Oseledets, “TT-Cross Approximation for Multidimensional Arrays”, Research Report

09-11, Hong Kong: ICM HKBU, 2009 (www.math.hkbu.edu.hk/ICM/pdf/09-11.pdf),

Linear ALgebra Appl., 2009.

• I. Oseledets, E. E. Tyrtyshnikov, “On a recursive decomposition of multi-dimensional

tensors”, Doklady RAS, vol. 427, no. 2 (2009).

• I. Oseledets, “On a new tensor decomposition”, Doklady RAS, vol. 427, no. 3 (2009).

• I. Oseledets, “On approximation of matrices with logarithmic number of parameters”,

Doklady RAS, vol. 427, no. 4 (2009).

• N. Zamarashkin, I. Oseledets, E. Tyrtyshnikov, Tensor structure of the inverse to a

banded Toeplitz matrix, Doklady RAS, vol. 427, no. 5 (2009).

• Efficient ranks of tensors and stability of TT approximations, TTM for image processing,

TT approximations in electronic structure calculations. In preparation.

tensor approximation tools free of the curse of dimensionality

Documents

Transcript of tensor approximation tools free of the curse of dimensionality