Matrix Functions: Theory and Algorithmshigham/talks/funm03.pdf · Matrix Functions: Theory and...

Matrix Functions:Theory and Algorithms

Nick HighamDepartment of Mathematics

University of Manchester

[email protected]://www.ma.man.ac.uk/~higham/

Includes joint work with Philip Davies

Function of Matrix – p.1/42

http://www.ma.man.ac.uk/~higham/

http://www.ma.man.ac.uk

http://www.man.ac.uk

mailto:[email protected]


http://www.ma.man.ac.uk/~ieuan

OUTLINE

I Definitions of f(A)

Applications

Algorithms for particular f

Schur–Parlett algorithm for general f

Computing f(A)b


Defining by Substitution

Want to define f : Cn×n → C

n×n, but not elementwise.Given f(t), can define f(A) by substituting A for t:

f(t) =1 + t2

1 − t⇒ f(A) = (I − A)−1(I + A2).

log(1 + x) = x − x2

2+

x3

3− x4

4+ · · · , |x| < 1

⇒ log(I + A) = A − A2

2+

A3

3− A4

4+ · · · , ρ(A) < 1.

Works for f

a polynomial,

a rational,

or with a convergent power series.Function of Matrix – p.3/42

Multiplicity of Definitions

There have been proposed in the literature since 1880eight distinct definitions

of a matric function,by Weyr, Sylvester and Buchheim,Giorgi, Cartan, Fantappiè, Cipolla,

Schwerdtfeger and Richter.

— R. F. Rinehart,The Equivalence of Definitions of a Matric Function,

Amer. Math. Monthly (1955)


Cauchy Integral Theorem

Definition 1

f(A) =1

2πi

∫

Γf(z)(zI − A)−1 dz,

where f is analytic inside a closed contour Γ whichencloses λ(A).


Jordan Canonical Form

Z−1AZ = J = diag(J1, J2, . . . , Jp), Jk =

λk 1

λk. . .. . . 1

λk

Definition 2

f(A) = Zf(J)Z−1 = Zdiag(f(Jk))Z−1,

f(Jk) =

f(λk) f ′(λk) . . .f (k−1))(λk)

(k − 1)!

f(λk). . . .... . . f ′(λk)

f(λk)

.


Interpolation

Definition 3 (Sylvester, 1883; Buchheim, 1886) Distincte’vals λ1, . . . , λs, ni = geometric mult. of λi. Thenf(A) = r(A), where r is unique Hermite interpolating poly ofdegree less than

∑si=1 ni satisfying interpolation conditions

r(j)(λi) = f (j)(λi), j = 0: ni − 1, i = 1: s.

Poly r depends on A.

This def. preserves functional relations G(f1, . . . , fp) = 0,where G is a polynomial. E.g. sin2(A) + cos2(A) = I.

But of course eA+B 6= eAeB.


Non-Primary Functions

Horn & Johnson call these defs primary matrix functions.But not all possible functions captured when multipleeigenvalues. E.g.,

A =

[−1 00 −1

], X =

[i 00 −i

], Y =

[0 −11 0

].

X and Y are square roots of A but are not polynomials in A.However, A = givens(π) and Y = givens(π/2) is a naturalsquare root.

Virtually all existing theory and methods are for primaryfunctions.

Non-primary functions sometimes needed whentracking f(A(t)) when eigenvalues of A(t) coalesce.


Textbook References

[1] F. R. Gantmacher. The Theory of Matrices, volumeone. Chelsea, New York, 1959.

[2] Gene H. Golub and Charles F. Van Loan. MatrixComputations. Johns Hopkins University Press,Baltimore, MD, USA, third edition, 1996.

[3] Roger A. Horn and Charles R. Johnson. Topics inMatrix Analysis. Cambridge University Press, 1991.

[4] Peter Lancaster and Miron Tismenetsky. The Theoryof Matrices. Academic Press, London, second edition,1985.


OUTLINE

Definitions of f(A)

I Applications



Computing f(A)b


Application: Differential equations

Nuclear magnetic resonance: Solomon equations

dM/dt = −RM, M(0) = I,

where M(t) = matrix of intensities and R = symmetricrelaxation matrix. NMR workers need to solve both forwardand inverse problems.

Exponential time differencing for stiff systems (Cox &Matthews, J. Comp. Phys., 2002)

y′ = Ay + F (y, t).

Methods based on exact integration of linear part—requireone accurate evaluation of exp(hA) per integration.


Application: Control theory

Convert continuous-time system

dx

dt= Ax(t) + Bu(t)

to discrete-time state-space system

xk+1 = Fxk + Guk,

where F = eAτ and τ is sampling period.(E.g., MATLAB Control System Toolbox, c2d , d2c .)


OUTLINE

Definitions of f(A)

Applications

I Algorithms for particular f


Computing f(A)b


Classic MATLAB< M A T L A B >

Version of 01/10/84

HELP is available

<>help

Type HELP followed byINTRO (To get started)NEWS (recent revisions)ABS ANS ATAN BASE CHAR CHOL CHOP CLEA COND CONJ COSDET DIAG DIAR DISP EDIT EIG ELSE END EPS EXEC EXITEXP EYE FILE FLOP FLPS FOR FUN HESS HILB IF IMAGINV KRON LINE LOAD LOG LONG LU MACR MAGI NORM ONESORTH PINV PLOT POLY PRIN PROD QR RAND RANK RCON RATREAL RETU RREF ROOT ROUN SAVE SCHU SHOR SEMI SIN SIZESQRT STOP SUM SVD TRIL TRIU USER WHAT WHIL WHO WHY< > ( ) = . , ; \ / ’ + - * :


Classic MATLAB<>help fun

FUN For matrix arguments X , the functions SIN, COS, ATAN,SQRT, LOG, EXP and X**p are computed using eigenvalues Dand eigenvectors V . If <V,D> = EIG(X) then f(X) =V*f(D)/V . This method may give inaccurate results if Vis badly conditioned. Some idea of the accuracy can beobtained by comparing X**1 with X .For vector arguments, the function is applied to eachcomponent.

The availability of [FUN] in early versions of MATLABquite possibly contributed to

the system’s technical and commercial success.

— Cleve Moler (2003)


Setup

I General nonsymmetric A

I Factorization of A feasible

I May not want full accuracy

I Many applications.

I Methods for very large, sparse A, often require solutionof smaller, dense subproblems.


Matrix Exponential

Cleve Moler and Charles Van Loan.Nineteen dubious ways to compute the exponential of amatrix, twenty-five years later, SIAM Rev., 45 (2003).

B 355 citations on Science Citation Index.

Scaling and squaring (SS) method for X ≈ eA

(Ward, 1977; Moler & Van Loan, 1978).

1. A ← A/2k so ‖A‖∞ ≤ 1/2

2. r(A) = [6/6] Padé approximant to eA

3. X = r(A)2k

Used by MATLAB’s expm.


Alternative SS Algorithm for eA

Suggested by Najfeld & Havel (1995): exploit

τ(A) = A coth(A) = A(e2A + I)(e2A − I)−1

= I +A2

3I +A2

5I +A2

7I + · · ·

.

1. B = A/2k+1 so ‖A2‖∞/22k+2 ≤ 1.152

2. r(B) = [8/8] Padé approximant to τ(B).

3. X =[(r(B) + B)(r(B) − B)−1

]2k

I Claimed to require fewer flops than original SS alg.


Principal Log and pth Root

Let A ∈ Cn×n have no eigenvalues on R

− .

LogX = log A denotes unique X such that

1. eX = A.

2. −π < Im(λ(X)) < π.

pth rootFor integer p > 0, X = A1/p is unique X such that

1. Xp = A.

2. −π/p < arg(λ(X)) < π/p.


Briggs’ Log Method (1617)

log(ab) = log a + log b ⇒ log a = 2 log a1/2.

Use repeatedly:

log a = 2k log a1/2k

.

Write a1/2k

= 1 + x and note log(1 + x) ≈ x. Briggs worked tobase 10 and used

log10 a ≈ 2k · log10 e · (a1/2k − 1).




Use repeatedly:


.

Write a1/2k


log10 a ≈ 2k · log10 e · (a1/2k − 1).

Briggs must be viewed as one of thegreat figures in numerical analysis.

— Herman H. Goldstine, A History of NumericalAnalysis (1977)




Use repeatedly:


.

Write a1/2k


log10 a ≈ 2k · log10 e · (a1/2k − 1).

Can we generalize to matrices:

log A = 2k log A1/2k

?


Splitting Lemma

Lemma 0 (Cheng, H, Kenney & Laub, 2001) SupposeA = BC has no eigenvalues on R

− and

1. BC = CB.

2. Every eigenvalue of B (or C) lies in the open halfplaneof the corresponding eigenvalue of A1/2.

Then log A = log B + log C .

Re λ

Im λ

λB

λA1/2


Matrix Logarithm

Use the Briggs idea:

log A = 2k log A1/2k

.

Kenney & Laub’s (1989) inverse scaling and squaringmethod:

Bring A close to I by repeated square roots.

Approximate log A1/2k

using an [m/m] Padéapproximant rm(x) ≈ log(1 − x).

Rescale to find log A.


Alg of Cheng, H, Kenney & Laub (2001)

F Transformation-free: uses only matrix mult, LU, inv.




F Sq. roots by product form of Denman–Beavers iteration:

Mk+1 =1

2

[I +

1

2(Mk + M−1

k )], M0 = A,

Yk+1 = Yk(I + M−1k )/2, Y0 = A,

where Mk → I and Yk → A1/2.





Mk+1 =1

2

[I +

1

2(Mk + M−1

k )], M0 = A,

Yk+1 = Yk(I + M−1k )/2, Y0 = A,


F Aims for a specified accuracy.





Mk+1 =1

2

[I +

1

2(Mk + M−1

k )], M0 = A,

Yk+1 = Yk(I + M−1k )/2, Y0 = A,



F Padé degree m chosen using K & L’s (1989) bound:

‖rm(X) − log(I − X)‖ ≤ |rm(‖X‖) − log(1 − ‖X‖)|.





Mk+1 =1

2

[I +

1

2(Mk + M−1

k )], M0 = A,

Yk+1 = Yk(I + M−1k )/2, Y0 = A,



F Padé degree m chosen using K & L’s (1989) bound:

‖rm(X) − log(I − X)‖ ≤ |rm(‖X‖) − log(1 − ‖X‖)|.

F rm evaluated using partial fraction expansion

rm(x) =∑m

j=1α

(m)j x

1+β(m)j x

: fast and accurate (H, 2001).Function of Matrix – p.23/42

Matrix pth Root

Square root: Björck & Hammarling (1983). Compute Schurdecomp. A = QTQ∗ and then solve R2 = T by

rii =√

tii, rij =tij −

∑j−1k=i+1 tijtkj

tii + tjj.

Extended to pth roots by Smith (2003)—much morecomplicated recurrence.

These algs

I Have essentially optimal numerical stability.

I Generalize to real Schur decomp.


Matrix Cosine

Algorithm 0 (Serbin & Blalock, 1980) Given A ∈ Rn×n

and parameter α > 0 this alg approximates cos(A).

Choose m such that 2−m‖A‖ ≈ α.C0 = Taylor or Pade approximation to cos(A/2m).for i = 0: m − 1

Ci+1 = 2C2i − I

end

Choice of m (i.e., α)?

Which approximation?

Effect of rounding errors?


Alg of H & Smith (2002)

I Initial argument reduction and balancing to

reduce norm.

I [8/8] Padé approximation proved fully accurate

in IEEE double if ‖A‖∞ ≤ 1. More economical

than Taylor series.

I “Schoolboy” evaluation of r8(A).

I Total cost: (4 + dlog2(‖A‖∞)e)M + D.

I Error analysis give bound containing terms

(4.1)m and norms of intermediate Ci.


Numerical Stability

Is ‖f̂ − f‖ consistent with condition of problem?

Is f̂ = f(A + E) with E “small’, i.e.,

is residual f−1(f̂) − A “small’?

Unclear for all algs discussed except “yes” for A1/p.

F Currently lack characterizations of when an

f(A) problem is ill conditioned for nonnormal A.


OUTLINE

Definitions of f(A)

Applications


I Schur–Parlett algorithm for general f

Computing f(A)b


Similarity Transformations

Can use the formula

A = XBX−1 ⇒ f(A) = Xf(B)X−1,

provided f(B) is easily computable.E.g. B = diag(λi) if A diagonalizable.

Problem : any error ∆B in f(B) magnified by up toκ(X) = ‖X‖‖X−1‖ ≥ 1.

Prefer to work with unitary X: thus can use

eigendecomposition (diagonal B) when A is normal(AA∗ = A∗A),

Schur decomposition (triangular B) in general.


Example: Eigendecomposition

function F = funm_ev(A,fun)[V,D] = eig(A);F = V * diag(feval(fun,diag(D))) / V;

>> A = [3 -1; 1 1]; X = funm_ev(A,@sqrt)X =

1.7678e+000 -3.5355e-0013.5355e-001 1.0607e+000

>> norm(A-Xˆ2) % cond(V) = 9.4e7ans =

9.9519e-009

>> Y = sqrtm(A); norm(A-Yˆ2)ans =

6.4855e-016


Parlett’s Recurrence

Schur decomposition A = QTQ∗ reduces problem toF = f(T ), T upper triangular.

fii = f(tii) is immediate.

Parlett (1976): from FT = TF obtain recurrence

fij = tijfii − fjj

tii − tjj+

j−1∑

k=i+1

fiktkj − tikfkj

tii − tjj.

Used in MATLAB’s funm .


Parlett’s Recurrence

Schur decomposition A = QTQ∗ reduces problem toF = f(T ), T upper triangular.

fii = f(tii) is immediate.

Parlett (1976): from FT = TF obtain recurrence

fij = tijfii − fjj

tii − tjj+

j−1∑

k=i+1

fiktkj − tikfkj

tii − tjj.

Used in MATLAB’s funm .

Fails when T has repeated eigenvalues.


Parlett vs. Björck & Hammarling

Parlett recurrence is not “optimal”, as clear from sq. rootcase: x12 obtained from

Parlett :a12(

√a11 −

√a22)

a11 − a22=

a12√a11 +

√a22

: B & H.


Schur–Parlett Algorithm

H & Davies (2002):

Compute Schur decomposition A = QTQ∗.

Re-order T to block triangular form in whicheigenvalues within a block are “close” and those ofseparate blocks are “well separated”.

Evaluate Fii = f(Tii).

Solve the Sylvester equations

TiiFij − FijTjj = FiiTij − TijFjj +

j−1∑

k=i+1

(FikTkj − TikFkj).

Undo the unitary transformations.


Function of Atomic Block

Assume f has Taylor series with ∞ radius of cgce andderivatives available.

For diagonal blocks T use

T = σI + M, σ = trace(T )/n : f(T ) =∞∑

k=0

f (k)(σ)

k!Mk.

Truncate series based on strict error bound, not usingsize of terms. NB: for n = 2,

M =

[ε α0 −ε

]

⇒ M2k =

[ε2k 00 ε2k

], M2k+1 =

[ε2k+1 αε2k

0 −ε2k+1

].


Features of Algorithm

Costs O(n3) flops, or up to n4/3 flops if large

blocks needed (close, repeated eigenvalues).

Needs derivatives if blocks size > 1: price to

pay for treating general f and nonnormal A.

Best general f(A) alg. Benchmark for

comparing other f(A) algs—general and

specific.

The basis of a new funm for next MATLAB

release.


OUTLINE

Definitions of f(A)

Applications



I Computing f(A)b


log(A) b

Apply quadrature rule∫

1

0f(t) dt ≈

∑m

k=1ckf(tk) to (Wouk, 1965)

log A =∫

1

0(A − I)

[t(A − I) + I

]−1

dt.

Combine with Hessenberg reduction A = QHQT to get

(log A) b ≈ Q

m∑

k=1

ck

[tk(H − I) + I

]−1

d, d = QT (A − I)b,

Costs (10/3)n3 + 2mn2 flops.

When ‖I − A‖ < 1 can use m-point Gauss-Legendre ≡ Padéapproximation! Choose m using (Kenney & Laub, 2001)

‖rmm(X) − log(I + X)‖ ≤ |rmm(−‖X‖) − log(1 − ‖X‖)|.

When ‖I − A‖ > 1 use adaptive quadrature.


Aα b

dy

dt= α(A − I)[t(A − I) + I]−1y, y(0) = b

has unique solution y(t) = [t(A − I) + I]αb ⇒ y(1) = Aαb.Used by Allen, Baglama & Boyd (2000) for α = 1/2, spd A.

Example using MATLAB’s ode45 .A = gallery(’parter’,64) , b = randn(64,1) .

f(A) tol Succ. steps Fail. atts f evals Rel. err

A−1/2 1e-3 12 0 73 3.5e-81e-6 14 0 85 6.0e-91e-9 40 0 241 7.7e-12

A2/5 1e-3 15 0 79 2.8e-81e-6 16 0 91 2.4e-91e-9 54 0 325 1.8e-12


Interpolation

If A has distinct eigenvalues λj, Lagrange interp poly:

f(A)b =n∑

j=0

fj`j(A)b, `j(x) =

n∏

k=0, k 6=j

(x − λk)

n∏

k=0, k 6=j

(λj − λk)

.

Cost: O(n4) flops.For any A, Newton divided difference form:

f(A)b =n∑

i=0

ci

i−1∏

j=0

(A − λjI)b, ci = (confluent) div. diffs.

Requires derivatives of f . Cost: O(n3) flops.Function of Matrix – p.39/42

Cauchy Integral Theorem

y =1

2πi

∫

Γf(z)(zI − A)−1b dz =:

∫

Γg(z) dz.

Take circleΓ : z − α = βeiθ, 0 ≤ θ ≤ 2π.

Apply repeated trapezium rule:

∫

Γg(z) dz =

∫ 2π

0(z(θ) − α)g(z(θ)) dθ ≈ 2πi

n

n−1∑

k=0

(zk − α)g(zk),

where zk − α = βe2πki/n.

Use Hessenberg reduction, as before.


Euler-Maclaurin Error Bound

h(x) period 2π, in C2k+1(−∞,∞), |h(2k+1)(x)| ≤ M :∣∣∣∣∫ 2π

0h(x) dx − Tn(f)

∣∣∣∣ ≤4πM ζ(2k + 1)

n2k+1.

• h(2k+1)(x) proportional to β2k+2 = radius of circle.

• h(2k+1)(x) contains powers of resolvent (z(θ)I − A)−1.Bad if contour close to some λi or A highly nonnormal.

• h(2k+1)(x) contains derivatives of f on contour.

Conclude : restricted to matrices

not too nonnormal,

λi can be enclosed in circle of small radius not close tosingularity of derivs of f .


Future Work

F Theory and algorithms for non-primary

functions, perhaps linked to an f(A(t))

application.

F Better understanding of conditioning of f(A).

F Exploiting structure, e.g. A ∈ matrix

automorphism group (H, Mackey, Mackey &

Tisseur, 2003).




Matrix Functions: Theory and Algorithmshigham/talks/funm03.pdf · Matrix Functions: Theory and...

Documents

Transcript of Matrix Functions: Theory and Algorithmshigham/talks/funm03.pdf · Matrix Functions: Theory and...