The pivoted Cholesky decomposition and its application to … · 2010. 9. 28. · Karhunen-Loeve...

The pivoted Cholesky decompositionand its application to stochastic PDEs

Helmut Harbrecht, Michael Peters,

and Reinhold Schneider

H. Harbrecht

Institute of Applied Analysis and Numerical Simulation

University of Stuttgart (Germany)

1

Overview

• Motivation

• Pivoted Cholesky decomposition

• Karhunen-Loeve expansion

• Second moment analysis

• Concluding remarks

Helmut Harbrecht

2

Motivation

I elliptic boundary value problems can be solved with high accuracy,provided that the input data are known exactly

I practical significance of highly accurate numerical solutions is limiteddue to inexact input data

Model equation: −div[α(ω)∇u(ω)

]= f (ω) in D(ω)

u(ω) = 0 on ∂D(ω)

Quantities of interest:• expectation: Eu(x) =

∫D

u(x,ω)dP(ω)

• two-point correlation: Coru(x,y) =∫

Du(x,ω)u(y,ω)dP(ω)

• variance: Vu(x) = Eu2(x)−E2u(x) = Coru(x,x)−E2

u(x)

Goal: For given mean and two-point correlation of the stochastic input data, compute

the mean and the two-point correlation of the random solution of the boundary value

problem.

Helmut Harbrecht

3

Karhunen-Loeve expansion

Approximation of stochastic fields α ∈ L2(D)⊗L2P(Ω) by the

truncated Karhunen-Loeve expansion

α(x,ω)≈ Eα(x)+m

∑i=1

√λiϕi(x)ψi(ω)

with orthogonal collections ϕi ⊂ L2(D) and ψi ⊂ L2P(Ω).

The Karhunen-Loeve expansion involves the computation of the dominant eigenpairs

(λi,ϕi) of the integral operator

(K ϕi)(x) =∫

DCovarα(x,y)ϕi(y)dy = λiϕi(x), x ∈ D

with covariance kernel

Covarα(x,y) =∫

Ω

(α(x,ω)−Eα(x)

)(α(y,ω)−Eα(y)

)dP(ω)⊂ L2(D×D).

eigenvalue problem for a nonlocal operator requires fast methods

Theorem. (Schwab/Todor [2006])

If Covarα ∈ H p(D×D), then the eigenvalues λmm∈N of K decay like

λm . `−p/d as m→ ∞.

Helmut Harbrecht

4

Pivoted Cholesky decomposition

Lemma. Let the matrix

A =

[a bT

b C

]∈ Rn×n

be symmetric and positive semi-definite with a > 0. Then, the Schur complement

S := C− 1a

bbT ∈ R(n−1)×(n−1)

is well-defined and also symmetric and positive semi-definite.

I Observation. Pivoting enables to apply the Cholesky decomposition to posi-

tive semi-definite matrices. Hence, if A has finite rank m, the pivoted Cholesky

decomposition terminates with a rank m decomposition

A = LmLTm.

I Question. What happens if A is nearly positive semi-definite, i.e.,

‖A−Am‖2 ≤ ε

with Am being a positive definite rank m matrix?

Helmut Harbrecht

5

Pivoted Cholesky decomposition

I Trace norm. The best possible reduction of the trace error in one Cholesky

step is achieved if the trace of the Schur complement becomes as small as

possible. This amounts to the problem

trace(A−Am) = traceS = trace(A−Am−1)−1

a(m−1)i,i

∥∥∥a(m−1)i

∥∥∥2

2→

nmini=1

.

too expensive!

I Strategy. Remove the largest diagonal coefficient of the remainder matrix:

trace(A−Am) = traceS = trace(A−Am−1)−maxi

a(m−1)i,i .

total pivoting!

Algorithm (total pivoting): Permute the matrix such that the largest diagonal ele-

ment is at the (1,1)-position and compute then the Cholesky step:

A = Am+Em = LmLTm+Em with Em := P1P2 · · ·Pm

[0 00 Sm

]Pm · · ·P2P1.

Helmut Harbrecht

6

Algorithm: cost O(nm2)Algorithm 1: Pivoted Cholesky decomposition

Data: matrix A = [ai,j ] ! Rn!n and error tolerance ! > 0Result: low-rank approximation Am =

!mi=1 !i!

Ti such that

trace(A " Am) # !begin

set m := 1;set d := diag(A) and error := $d$1;initialize " := (1, 2, . . . , n);while error > ! do

set i := arg maxd!j: j = m, m + 1, . . . , n;

swap "m and "i;set #m,!m :=

"d!m;

for m + 1 # i # n do

compute #m,!i:=

#a!m,!i

"m"1$

j=1

#j,!m#j,!i

%&#m,!m ;

update d!i:= d!i

" #m,!m#m,!i;

compute error :=n$

i=m+1

d!i;

increase m := m + 1;

end

Notice that only all diagonal entries of the matrix A and the m rows asso-ciated with the pivot elements need to be evaluated to compute the rank-mapproximation. All other matrix coe!cients do not enter the computation.This makes the method highly attractive for the sparse approximation ofsmooth nonlocal operators (see Thm. 3.2). For operators with kernel func-tions that exhibit a singularity on the diagonal x = y it might be better tointroduce a suitable partitioning of the matrix which leads to the originaladaptive cross approximation as introduced in [1, 2].

Theorem 3.1. Let A ! Rn!n symmetric and positive semi-definite. Then,performing m steps of the pivoted Cholesky decomposition is of complexityO(m2n).

Proof. The most expensive part in Algorithm 1 is the computation of theCholesky vectors !k, k = 1, 2, . . . , m. This requires

m$

k=1

n$

i=k+1

k"1$

j=1

1 #m$

k=1

(k " 1)n # m2

2n

additions and multiplications each which proves the assertion.

7

Helmut Harbrecht

7

Features

• symmetric low-rank approximation: A≈ Am = LmLTm

• approximation error is rigorously controlled in terms of the trace norm

• stable variant of the Cholesky decomposition, especially if the eigenvalues decay rapidly

• only the diagonal coefficients and the m columns of A, associated with the pivot ele-

ments, need to be computed

• extremely simple to implement

• coincides with the adaptive cross approximation for symmetric matrices

• a purely algebraic convergence proof is available

Helmut Harbrecht

8

ConvergenceTheorem. (H/Peters/Schneider) Assume that the eigenvalues of A∈Rn×n satisfy

4mλm . exp(−bm)

for some b > 0 uniformly in n. Then, the pivoted Cholesky approximation Am with

rank m∼ | log(ε/n)| satisfies trace(Am−A). ε uniformly as ε tends to zero.

Proof. Assume that A is permuted such that the k-th pivot is found at the (k,k)-position forall k = 1,2, . . . ,n. Then, Lm ∈ Rn×m is always a lower triangular matrix. It follows from

Am = LmLTm =

[L1,1 0L2,1 0

][LT

1,1 LT2,1

0 0

]=

[L1,1LT

1,1 L1,1LT2,1

L2,1LT1,1 L2,1LT

2,1

]=

[A1,1 A1,2A2,1 L2,1LT

2,1

]

that L1,1LT1,1 is the (pivoted) Cholesky decomposition of A1,1. Consequently, we have

1λm(A1,1)

=∥∥A−1

1,1∥∥

2 =∥∥L−1

1,1∥∥2

2

sharp!≤ 4m+6m−1

9`2m,m

≤ 4m

`2m,m

.

The trace norm of A−Am is bounded by (n−m)-times the pivot element `2m,m:

trace(A−Am)≤ (n−m)`2m,m ≤ 4mnλm(A1,1)

Courant≤

Fischer4mnλm(A).

Helmut Harbrecht

9

Numerical results I

Gauss kernel: (2πσ2)−1/2 exp(|x− y|2/σ2)

value of σε

1 0.5 0.1 0.05 0.0110−1 2 3 10 19 8910−2 3 5 15 28 13710−3 4 5 19 36 17310−4 5 6 21 39 18710−5 5 7 24 46 21410−6 5 8 27 50 238

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

1e-08

1e-06

0.0001

0.01

1

100

10000

1e+06

1e+08

1 10 100 1000

valu

e

rank

error in the trace normeigenvalues of the matrix Aeigenvalues of the Hilbert-Schmidt operator

Jumping Gauss kernel:

value of σε

1 0.5 0.1 0.05 0.0110−1 2 3 10 17 8110−2 3 4 15 28 13110−3 4 5 18 34 16810−4 4 6 21 39 18610−5 5 7 24 45 21110−6 5 8 26 50 234

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

-0.4

-0.2

0

0.2

0.4

0.6

0.8

1e-08

1e-06

0.0001

0.01

1

100

10000

1e+06

1e+08

1 10 100 1000

valu

erank


Helmut Harbrecht

10

Numerical results II

Random kernel: A = ∑mk=1 λkvkvT

k , λk = exp(−σk), vTk v` = δk,`

value of σε

1 0.5 0.1 0.05 0.0110−1 3 6 29 61 33310−2 6 11 56 115 61010−3 8 15 81 167 87310−4 10 21 106 216 112610−5 13 25 130 266 137510−6 15 30 154 315 1618

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

-0.04

-0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

1e-14

1e-12

1e-10

1e-08

1e-06

0.0001

0.01

1

100

1 10 100 1000 10000

valu

e

rank


Poisson kernel: exp(−σ|x− y|)/√σ

value of σε

1 10−1 10−2 10−3 10−4

10−1 5 1 1 1 110−2 36 5 1 1 110−3 376 36 5 1 110−4 3616 376 36 5 1

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1e-08

1e-06

0.0001

0.01

1

100

10000

1e+06

1 10 100 1000 10000

valu

erank


Helmut Harbrecht

11

Fast eigenpair computation

Generalized eigenvalue problem:

Ax = λBx, A = [(K ψi,ψ j)]i, j, B = [(ψi,ψ j)]i, j

Inserting the low-rank approximation

A≈ Am := LmLTm, Lm ∈ Rn×m

leads to

LmLTmx = λBx ⇐⇒ B−1/2LmLT

mB−1/2x = λx, x = B−1/2x.

Since the nonzero eigenvalues of MMT and MT M coincide, we can replace the largeeigenvalue problem by a small one

LTmB−1Lm︸︷︷︸∈Rm×m

x = λx, x = B−1Lmx.

Error estimate: (Bauer/Fike)

|λk− λk| ≤∥∥B−1/2(A−Am)B−1/2∥∥

2 . ‖A−Am‖2, k = 1,2, . . . ,m

speed-up of more than 10 compared to ARPACK with low-rank approximation

Helmut Harbrecht

12

Eigenvalue compuation

Gauss kernel exp(−100‖x−y‖2):Approximate spectrum for

ε = 0.1/0.01/0.001/0.0001

0 50 100 150 200 250 300 350 40010

−3

10−2

10−1

100

101

102

103

Poisson kernel exp(−‖x−y‖):Approximate spectrum for

ε = 0.1/0.05/0.025/0.01

0 200 400 600 800 1000 1200 1400 1600 180010

−2

10−1

100

101

102

103

104

Helmut Harbrecht

13

Stochastic loadingsStochastic boundary value problem:

−div[α∇u(ω)

]= f (ω) in D, u(ω) = 0 on ∂D

−→ the random solution depends linearly on the stochastic input data

Theorem: (Schwab/Todor) For the PDE with stochastic loading

−div[α∇u(ω)] = f (ω) on D, u(ω) = g(ω) on ∂Done has

−div[α∇Eu] = E f on D, Eu = Eg on ∂D

and

(divx⊗divy)[(

α(x)⊗α(y))(∇x⊗∇y)Coru(x,y)

]= Cor f (x,y), x,y ∈ D

−divx[α(x)∇x Coru(x,y)

]= 0, x ∈ D, y ∈ ∂D

−divy[α(y)∇y Coru(x,y)

]= 0, x ∈ ∂D, y ∈ D

Coru(x,y) = 0, x,y ∈ ∂D.

by perturbation theory, similar equations are derived in case of stochastic diffusion

coeffcients or stochastic domains

Helmut Harbrecht

14

Two-point correlation functionsSecond order statistics:

AEu = E f , (A⊗A)Coru = Cor f

Approximate Cor f (x,y)≈m

∑i=1

ψi(x)ψi(y) by the pivoted Cholesky decomposition

and solve Aϕi = ψi for all i = 1,2, . . . ,m. Then it holds that

Coru(x,y)≈m

∑i=1

ϕi(x)ϕi(y), Vu(x)≈m

∑i=1

ϕ2i (x)−E2

u(x).

Smooth kernel: 1/(σ+‖x−y‖2)value of σ

ε

0.1 0.2 0.4 0.8 1.610−1 85 46 27 14 910−2 234 122 66 37 2110−3 442 236 123 68 3810−4 710 371 198 108 6110−5 1038 539 290 157 8710−6 1426 748 395 214 118

0

0.2

0.4

0.6

0.8

1

0

0.2

0.4

0.6

0.8

1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0.5

1

1.5

2

2.5

3

3.5

4

−1

−0.5

0

0.5

1

0

0.05

0.1

0.15

0.2

0.25

Helmut Harbrecht

15

Concluding remarks

• the pivoted Cholesky decomposition is a

simple algorithm to compute low-rank approximations

in case of symmetric and positive definite matrices

• an algebraic convergence proof is available

• no need to compute the complete matrix,

only (m+1)n matrix coefficients have to be computed

• the pivoted Cholesky decomposition leads to an

extremely efficient eigenvalue solver

Preprint.H. Harbrecht, M. Peters and R. Schneider.

On the low-rank approximation by the pivoted Cholesky decomposition.

Preprint 2010-32, SimTech Cluster of Excellence, Universitat Stuttgart, Germany, 2010.

Helmut Harbrecht

16

The pivoted Cholesky decomposition and its application to … · 2010. 9. 28. · Karhunen-Loeve...

Documents

Transcript of The pivoted Cholesky decomposition and its application to … · 2010. 9. 28. · Karhunen-Loeve...