G. Fasano and M. Roma A class of preconditioners …roma/2011wp4.pdfG. Fasano and M. Roma A class of...

G. Fasano and M. Roma

A class of preconditioners for large indefinite linear systems, as by-product of Krylov subspace methods: Part I

Working Paper n. 4/2011June 2011

ISSN: 2239-2734

This Working Paper is published ! under the auspices of the Department of Management at Università Ca’ Foscari Venezia. Opinions expressed herein are those of the authors and not those of the Department or the University. The Working Paper series is designed to divulge preliminary or incomplete work, circulated to favour discussion and comments. Citation of this paper should consider its provisional nature.

A Class of Preconditioners for Large Indefinite Linear

Systems, as by-product of Krylov subspace

Methods: Part I∗

Giovanni Fasano Massimo Roma<[email protected]> <[email protected]>Dept. of Management Dip. di Inform. e Sistem. “A. Ruberti”

Universita Ca’Foscari Venezia SAPIENZA, Universita di Roma

The Italian Ship Model Basin - INSEAN, CNR

(June 2011)

Abstract. We propose a class of preconditioners, which are also tailored for symmetriclinear systems from linear algebra and nonconvex optimization. Our preconditioners arespecifically suited for large linear systems and may be obtained as by-product of Krylovsubspace solvers. Each preconditioner in our class is identified by setting the values of apair of parameters and a scaling matrix, which are user-dependent, and may be chosenaccording with the structure of the problem in hand. We provide theoretical propertiesfor our preconditioners. In particular, we show that our preconditioners both shift someeigenvalues of the system matrix to controlled values, and they tend to reduce the modulusof most of the other eigenvalues. In a companion paper we study some structural propertiesof our class of preconditioners, and report the results on a significant numerical experience.

Keywords: Preconditioners, large indefinite linear systems, large scale nonconvex opti-mization, Krylov subspace methods.

JEL Classification Numbers: C44, C61.

Correspondence to:

Giovanni Fasano Dept. of Management, Universita Ca’ Foscari VeneziaSan Giobbe, Cannaregio 87330121 Venezia, Italy

Phone: [+39] 041-234-6922Fax: [+39] 041-234-7444E-mail: [email protected]

∗ G.Fasano wishes to thank the Italian Ship Model Basin, CNR - INSEAN institute, for the indirect

support.

1 Introduction

We study a class of preconditioners for the solution of large indefinite linear systems, withoutassuming any sparsity pattern for the system matrix. In many contexts of numerical analysisand nonlinear optimization the iterative efficient solution of sequences of linear systems issought. Truncated Newton methods in unconstrained optimization, KKT systems, interiorpoint methods, and PDE constrained optimization are just some examples (see e.g. [4]).

In this work we consider the solution of symmetric indefinite linear systems by usingpreconditioning techniques; in particular, the class of preconditioners we propose uses infor-mation collected by Krylov subspace methods, in order to capture the structural propertiesof the system matrix. We iteratively construct our preconditioners either by using (but notperforming) a factorization of the system matrix (see, e.g. [7, 11, 18]), obtained as by prod-uct of Krylov subspace methods, or performing a Jordan Canonical form on a very smallsize matrix. We address our preconditioners using a general Krylov subspace method; then,we prove theoretical properties for such preconditioners, and we describe results which indi-cate how to possibly select the parameters involved in the definition of the preconditioners.The basic idea of our approach is that we apply a Krylov-based method to generate a pos-itive definite approximation of the inverse of the system matrix. The latter is then used tobuild our preconditioners, needing to store just a few vectors, without requiring any prod-uct of matrices. Since we collect information from Krylov-based methods, we assume thatthe entries of the system matrix are not known and the necessary information is gained byusing a routine, which computes the product of the system matrix times a vector.

In the companion paper [9] we experience our preconditioners, both within linear algebraand nonconvex optimization frameworks. In particular, we test our proposal on significantlinear systems from the literature. Then, we focus on the so called Newton–Krylov methods,also known as Truncated Newton methods (see [15] for a survey). In these contexts, bothpositive definite and indefinite linear systems have been considered.

We recall that in case the optimization problem in hand is nonconvex, i.e. the Hessianmatrix of the objective function is possibly indefinite and at least one eigenvalue is nega-tive, the solution of Newton’s equations within Truncated Newton schemes may claim forsome cares. Indeed, the Krylov-based method used to solve Newton’s equation, should besuitably applied considering that, unlike in linear algebra, optimization frameworks requirethe definition of descent directions, which have to satisfy additional properties [5, 16]. Inthis regard our proposal provides a tool, in order to preserve the latter properties.

The paper is organized as follows: in the next section we describe our class of precondi-tioners for indefinite linear systems, by using a general Krylov subspace method. Finally, asection of conclusions and future work completes the paper.

As regards the notations, for a n×n real matrix M we denote with Λ[M ] the spectrumof M ; Ik is the identity matrix of order k. Finally, with C ≻ 0 we indicate that the matrixC is positive definite, tr[C] and det[C] are the trace and the determinant of C, respectively,while ∥ ⋅ ∥ denotes the Euclidean norm.

1

2 Our class of preconditioners

In this section we first introduce some preliminaries, then we propose our class of precon-ditioners. Consider the indefinite linear system

Ax = b, (2.1)

where A ∈ IRn×n is symmetric, n is large and b ∈ IRn. Some real contexts where the lattersystem requires efficient solvers are detailed in Section 1. Suppose any Krylov subspacemethod is used for the solution of (2.1), e.g. the Lanczos process or the CG method [11](but MINRES [17] or Planar-CG methods [12, 6] may be also an alternative choice). Theyare equivalent as long as A ≻ 0, whereas the CG, though cheaper, in principle may notcope with the indefinite case. In the next Assumption 2.1 we consider that a finite numberof steps, say ℎ ≪ n, of the Krylov subspace method adopted have been performed.

Assumption 2.1 Let us consider any Krylov subspace method to solve the symmetric linearsystem (2.1). Suppose at step ℎ of the Krylov method, with ℎ ≤ n − 1, the matricesRℎ ∈ IRn×ℎ, Tℎ ∈ IRℎ×ℎ and the vector uℎ+1 ∈ IRn are generated, such that

ARℎ = RℎTℎ + �ℎ+1uℎ+1eTℎ , �ℎ+1 ∈ IR, (2.2)

Tℎ =

⎧

⎨

⎩

VℎBℎVTℎ , if Tℎ is indefinite

LℎDℎLTℎ , if Tℎ is positive definite

(2.3)

where

Rℎ = (u1 ⋅ ⋅ ⋅ uℎ), uTi uj = 0, ∥ui∥ = 1, 1 ≤ i ∕= j ≤ ℎ,

uTℎ+1ui = 0, ∥uℎ+1∥ = 1, 1 ≤ i ≤ ℎ,

Tℎ is irreducible and nonsingular, with eigenvalues �1, . . . , �ℎ not all coincident,

Bℎ = diag1≤i≤ℎ{�i}, Vℎ = (v1 ⋅ ⋅ ⋅ vℎ) ∈ IRℎ×ℎ orthogonal, (�i, vi) is eigenpair of Tℎ,

Dℎ ≻ 0 is diagonal, Lℎ is unit lower bidiagonal.

Remark 2.1 Note that most of the common Krylov subspace methods for the solution ofsymmetric linear systems (e.g. the CG, the Lanczos process, etc.) at iteration ℎ may easilysatisfy Assumption 2.1. In particular, also observe that from (2.2) we have Tℎ = RT

ℎARℎ, sothat whenever A ≻ 0 then Tℎ ≻ 0. Since the Jordan Canonical form of Tℎ in (2.3) is requiredonly when Tℎ is indefinite, it is important to check when Tℎ ≻ 0, without computing theeigenpairs of Tℎ if unnecessary. On this purpose, note that the Krylov subspace methodadopted always provides relation Tℎ = LℎDℎL

Tℎ , with Lℎ nonsingular andDℎ block diagonal

(blocks can be 1×1 or 2×2 at most), even when Tℎ is indefinite [17, 18, 7]. Thus, checkingthe eigenvalues of Dℎ will suggest if the Jordan Canonical form Tℎ = VℎBℎV

Tℎ is really

needed for Tℎ, i.e. if Tℎ is indefinite.

2

Observe also that from Assumption 2.1 the parameter �ℎ+1 ma be possibly nonzero, i.e. thesubspace span{u1, . . . , uℎ} is possibly not an invariant subspace under the transformationby matrix A (thus, in this paper we consider a more general case with respect to [3]).

Remark 2.2 The Krylov subspace method adopted may, in general, perform m ≥ ℎ itera-tions, generating the orthonormal vectors u1, . . . , um. Then, we can set Rℎ = (uℓ1 , . . . , uℓℎ),where {ℓ1, . . . , ℓℎ} ⊆ {1, . . . ,m}, and change relations (2.2)-(2.3) accordingly; i.e. Assump-tion 2.1 may hold selecting any ℎ out of the m vectors (among u1, . . . , um) computed bythe Krylov subspace method.

Remark 2.3 For relatively small values of the parameter ℎ in Assumption 2.1 (say ℎ ≤ 20,as often suffices in most of the applications), the computation of the eigenpairs (�i, vi),i = 1, . . . , ℎ, of Tℎ when Tℎ is indefinite may be extremely fast, with standard codes. E.g.if the CG is the Krylov subspace method used in Assumption 2.1 to solve (2.1), then theMatlab [1] (general) function eigs() requires as low as ≈ 10−4 seconds to fully computeall the eigenpairs of Tℎ, for ℎ = 20, on a commercial laptop. In the latter case indeed,the matrix Tℎ is tridiagonal. Nonetheless, in the separate paper [8] we consider a specialcase where the request (2.3) on Tℎ may be considerably weakened under mild assumptions.Moreover, in the companion paper [9] we also prove that for a special choice of the parameter‘a’ used in our class of preconditioners (see below), strong theoretical properties may bestated.

On the basis of the latter assumption, we can now define our preconditioners and show theirproperties. To this aim, considering for the matrix Tℎ the expression (2.3), we define (seealso [10])

∣Tℎ∣def=

⎧

⎨

⎩

Vℎ∣Bℎ∣VTℎ , ∣Bℎ∣ = diag1≤i≤ℎ{∣�i∣}, if Tℎ is indefinite,

Tℎ, if Tℎ is positive definite.

As a consequence, when Tℎ is indefinite we have Tℎ∣Tℎ∣−1 = ∣Tℎ∣

−1Tℎ = VℎIℎVTℎ , where

the ℎ nonzero diagonal entries of the matrix Iℎ are in the set {−1,+1}. Furthermore, it iseasily seen that ∣Tℎ∣ is positive definite, for any ℎ, and the matrix ∣Tℎ∣

−1T 2ℎ ∣Tℎ∣

−1 = Iℎ isthe identity matrix.

Now let us introduce the following n × n matrix, which depends on the real parameter‘a’:

Mℎdef= (I −RℎR

Tℎ ) +Rℎ∣Tℎ∣R

Tℎ + a

(

uℎ+1uTℎ + uℎu

Tℎ+1

)

, ℎ ≤ n− 1,

= [Rℎ ∣ uℎ+1 ∣ Rn,ℎ+1]

⎡

⎣

(

∣Tℎ∣ aeℎaeTℎ 1

)

0

0 In−(ℎ+1)

⎤

⎦

⎡

⎣

RTℎ

uTℎ+1

RTn,ℎ+1

⎤

⎦ (2.4)

Mndef= (I −RnR

Tn ) +Rn∣Tn∣R

Tn = Rn∣Tn∣R

Tn , (2.5)

where Rℎ and Tℎ satisfy relations (2.2)-(2.3), a ∈ IR, the matrix Rn,ℎ+1 ∈ IRn×[n−(ℎ+1)] issuch that RT

n,ℎ+1Rn,ℎ+1 = In−(ℎ+1) and [Rℎ ∣ uℎ+1 ∣ Rn,ℎ+1] is orthogonal. By (2.4), when

3

ℎ ≤ n− 1, the matrix Mℎ is the sum of three terms.It is easily seen that I − RℎR

Tℎ represents a projector onto the subspace S orthogonal to

the range of matrix Rℎ, so that Mℎv = v + a(uTℎ+1v)uℎ, for any v ∈ S. Thus, for any

v ∈ S, when either uTℎ+1v = 0 or a = 0, then Mℎv = v (or equivalently if Mℎ is nonsingular

M−1ℎ v = v), i.e. the vector v is unaltered by applying Mℎ (or M−1

ℎ ). As a result, if eithera = 0 or uTℎ+1v = 0 then Mℎ behaves as the identity matrix for any vector v ∈ S.Using the parameter dependent matrix Mℎ in (2.4)-(2.5) we are now ready to introduce thefollowing class of preconditioners

M ♯ℎ(a, �,D) = D

[

In − (Rℎ ∣ uℎ+1) (Rℎ ∣ uℎ+1)T]

DT ℎ ≤ n− 1,

+ (Rℎ ∣ Duℎ+1)

(

�2∣Tℎ∣ aeℎaeTℎ 1

)−1

(Rℎ ∣ Duℎ+1)T (2.6)

M ♯n(a, �,D) = Rn∣Tn∣

−1RTn . (2.7)

Lemma 2.1 Given the symmetric matrices H ∈ IRℎ×ℎ, P ∈ IR(n−ℎ)×(n−ℎ) and the matrixΦ ∈ IRℎ×(n−ℎ), suppose

ΦTH =

⎡

⎢

⎢

⎢

⎣

zT1...zTm

0[n−(ℎ+m)],ℎ

⎤

⎥

⎥

⎥

⎦

, z1, . . . , zm ∈ IRℎ, (2.8)

with H = �[Iℎ+u1wT1 + ⋅ ⋅ ⋅+upw

Tp ], p ≤ ℎ, 0 ≤ m ≤ ℎ−p, � ∈ IR, ui, wi ∈ IRℎ, i = 1, . . . , p

and {wi} linearly independent. Then, the symmetric matrix

(

H HΦ

ΦTH P

)

(2.9)

has the eigenvalue � with multiplicity at least equal to ℎ− rk[w1 w2 ⋅ ⋅ ⋅ wp z1 z2 ⋅ ⋅ ⋅ zm].

Proof: Observe that H has the eigenvalue � with a multiplicity at least ℎ − p, sinceHs = �s for any s ⊥ span{w1, . . . , wp}. Moreover, imposing the condition (with x1, x2 notsimultaneously zero vectors)

(

H HΦ

ΦTH P

)(

x1x2

)

= �

(

x1x2

)

,

is equivalent to impose the conditions⎧

⎨

⎩

H(x1 +Φx2) = �x1

ΦTHx1 + Px2 = �x2.

By (2.8), choosing x2 = 0 and x1 any ℎ-real vector such that x1 ⊥ span{w1, . . . , wp, z1, . . . , zm},then � is eigenvalue of (2.9) with multiplicity given by ℎminus the largest number of linearlyindependent vectors in the set {w1, . . . , wp, z1, . . . , zm}.

4

Theorem 2.2 Consider any Krylov-subspace method to solve the symmetric linear sys-tem (2.1), where A is indefinite. Suppose that Assumption 2.1 holds and the Krylov-subspace method performs ℎ ≤ n iterations. Let a ∈ IR, � ∕= 0, and let the matrixD ∈ IRn×n be such that [Rℎ ∣ Duℎ+1 ∣ DRn,ℎ+1] is nonsingular, where Rn,ℎ+1R

Tn,ℎ+1 =

In − (Rℎ ∣ uℎ+1) (Rℎ ∣ uℎ+1)T . Then, we have the following properties:

a) the matrix M ♯ℎ(a, �,D) is symmetric. Furthermore,

– when ℎ ≤ n − 1, for any a ∈ IR ∖ {±�(eTℎ ∣Tℎ∣−1eℎ)

−1/2}, M ♯ℎ(a, �,D) is nonsin-

gular. In addition, if D = In then

det(

M ♯ℎ(a, �, In)

)

= �−2ℎ det(∣Tℎ∣−1)

(

1−a2

�2eTℎ ∣Tℎ∣

−1eℎ

)−1

;

– when ℎ = n the matrix M ♯ℎ(a, �,D) is nonsingular. In addition, if D = In then

det(

M ♯n(a, �, In)

)

= det(∣Tℎ∣−1);

b) setting D = In and � = 1 the matrix M ♯ℎ(a, 1, In) coincides with M−1

ℎ ;

c) for ∣a∣ < ∣�∣(eTℎ ∣Tℎ∣−1eℎ)

−1/2 the matrix M ♯ℎ(a, �,D) is positive definite. Moreover, if

D = In the spectrum Λ[M ♯ℎ(a, �, In)] is given by

Λ[M ♯ℎ(a, �, In)] = Λ

[

(


)−1]

∪ Λ[

In−(ℎ+1)

]

;

d) when ℎ ≤ n− 1, D = In and either Tℎ ≻ 0 or Tℎ is indefinite

– then M ♯ℎ(a, �, In)A has at least (ℎ− 3) singular values equal to +1/�2;

– if a = 0 then the matrix M ♯ℎ(0, �, In)A has at least (ℎ − 2) singular values equal

to +1/�2;

e) when ℎ = n, then M ♯n(a, �,D) = M−1

n , Λ[Mn] = Λ[∣Tn∣] and Λ[M−1n A] = Λ[AM−1

n ] ⊆

{−1,+1}, i.e. the n eigenvalues of the preconditioned matrix M ♯ℎ(a, �,D)A are either

+1 or −1.

Proof: Let N = [Rℎ ∣ Duℎ+1 ∣ DRn,ℎ+1], where N is nonsingular by hypothesis. Observe

that for ℎ ≤ n− 1 the preconditioners M ♯ℎ(a, �,D) may be rewritten as

M ♯ℎ(a, �,D) = N

⎡

⎣

(


)−1

0

0 In−(ℎ+1)

⎤

⎦NT , ℎ ≤ n− 1. (2.10)

5

The property a) follows from the symmetry of Tℎ. In addition, observe that RTn,ℎ+1Rn,ℎ+1 =

In−(ℎ+1). Thus, from (2.10) the matrix M ♯ℎ(a, �,D) is nonsingular if and only if the matrix

(


)

(2.11)

is invertible. Furthermore, by a direct computation we observe that for ℎ ≤ n − 1 thefollowing identity holds(


)

=

(

Iℎ 0a�2eTℎ ∣Tℎ∣

−1 1

)(

�2∣Tℎ∣ 0

0 1− a2

�2 eTℎ ∣Tℎ∣

−1eℎ

)(

Iℎa�2∣Tℎ∣

−1eℎ0 1

)

.

(2.12)Thus, since Tℎ is nonsingular and � ∕= 0, for ℎ ≤ n− 1 the determinant of matrix (2.11) is

nonzero if and only if a ∕= ±�(eTℎ ∣Tℎ∣−1eℎ)

−1/2. Finally, for ℎ = n the matrix M ♯ℎ(a, �,D) is

nonsingular, since Rn and Tn are nonsingular in (2.7).As regards b), recalling that RT

ℎRℎ = Iℎ and ∣Tℎ∣ is nonsingular from Assumption 2.1,when ℎ ≤ n−1 relations (2.4) and (2.10) trivially yield the result, as well as (2.5) and (2.7)for the case ℎ = n.

As regards c), observe that from (2.10) the matrix M ♯ℎ(a, �,D) is positive definite, as

long as the matrix (2.11) is positive definite. Thus, from (2.12) and relation ∣Tℎ∣ ≻ 0 we

immediately infer that M ♯ℎ(a, �,D) is positive definite as long as ∣a∣ < ∣�∣(eTℎ ∣Tℎ∣

−1eℎ)−1/2.

Moreover, we recall that when D = In then N is orthogonal.Item d) may be proved considering that D = In and computing the eigenvalues of the

matrix[

M ♯ℎ(a, �, In)A

] [


]T= M ♯

ℎ(a, �, In)A2M ♯

ℎ(a, �, In).

On this purpose, for ℎ ≤ n − 1 we have for M ♯ℎ(a, �, In)A

2M ♯ℎ(a, �, In) the expression (see

(2.10))


2M ♯ℎ(a, �, In) =

N

⎡

⎣

(


)−1

0

0 In−(ℎ+1)

⎤

⎦C

⎡

⎣

(


)−1

0

0 In−(ℎ+1)

⎤

⎦NT (2.13)

where C ∈ IRn×n, with

C = NTA2N =

⎡

⎣

RTℎA

2Rℎ RTℎA

2uℎ+1 RTℎA

2Rn,ℎ+1

uTℎ+1A2Rℎ uTℎ+1A

2uℎ+1 uTℎ+1A2Rn,ℎ+1

RTn,ℎ+1A

2Rℎ RTn,ℎ+1A

2uℎ+1 RTn,ℎ+1A

2Rn,ℎ+1

⎤

⎦ .

From (2.2) and the symmetry of Tℎ we obtain

RTℎA

2Rℎ = (ARℎ)T (ARℎ) = (RℎTℎ + �ℎ+1uℎ+1e

Tℎ )

T (RℎTℎ + �ℎ+1uℎ+1eTℎ )

= T 2ℎ + �2ℎ+1eℎe

Tℎ (2.14)

RTℎA

2uℎ+1 = (ARℎ)TAuℎ+1 = v1 ∈ IRℎ, (2.15)

6

and considering relation (2.2) we obtain

ARℎ+1 = A(Rℎ ∣ uℎ+1) = Rℎ+1Tℎ+1 + �ℎ+2uℎ+2eTℎ+1

= (Rℎ ∣ uℎ+1)

(

Tℎ �ℎ+1eℎ�ℎ+1e

Tℎ tℎ+1,ℎ+1

)

+ �ℎ+2uℎ+2eTℎ+1

i.e.

ARℎ = RℎTℎ + �ℎ+1uℎ+1eTℎ

Auℎ+1 = �ℎ+1uℎ + tℎ+1,ℎ+1uℎ+1 + �ℎ+2uℎ+2, (2.16)

so that

A2Rℎ = (ARℎ)Tℎ + �ℎ+1Auℎ+1eTℎ

= (RℎTℎ + �ℎ+1uℎ+1eTℎ )Tℎ + �ℎ+1(�ℎ+1uℎ + tℎ+1,ℎ+1uℎ+1 + �ℎ+2uℎ+2)e

Tℎ .

As a consequence, from (2.16) we also have that Auℎ+2 = span{uℎ+1, uℎ+2, uℎ+3} and

RTℎA

2Rn,ℎ+1 = (A2Rℎ)TRn,ℎ+1 = (�ℎ+2uℎ+2e

Tℎ )

TRn,ℎ+1 =

= �ℎ+2

⎛

⎜

⎜

⎜

⎝

0 0 ⋅ ⋅ ⋅ 0...

......

...0 0 ⋅ ⋅ ⋅ 01 0 ⋅ ⋅ ⋅ 0

⎞

⎟

⎟

⎟

⎠

∈ IRℎ×[n−(ℎ+1)] = �ℎ+2Eℎ,1,

uTℎ+1A2uℎ+1 = c > 0

uTℎ+1A2Rn,ℎ+1 = [A(�ℎ+1uℎ + tℎ+1,ℎ+1uℎ+1 + �ℎ+2uℎ+2)]

T Rn,ℎ+1 =

= [A(tℎ+1,ℎ+1uℎ+1 + �ℎ+2uℎ+2)]T Rn,ℎ+1 = (� � 0 ⋅ ⋅ ⋅ 0) ∈ IRn−(ℎ+1)

withRT

n,ℎ+1A2Rn,ℎ+1 = V2 ∈ IR[n−(ℎ+1)]×[n−(ℎ+1)],

where Ei,j has all zero entries but +1 at position (i, j). Thus,

C =

⎡

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎢

⎣

T 2ℎ + �2ℎ+1eℎe

Tℎ v1 �ℎ+2Eℎ,1

vT1 c � � 0 ⋅ ⋅ ⋅ 0

�ℎ+2E1,ℎ

��0...0

V2

⎤

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎥

⎦

.

Moreover, from (2.12) we can readily infer that

[


]−1

=

(

Iℎ − a�2∣Tℎ∣

−1eℎ0 1

)

( 1�2 ∣Tℎ∣

−1 0

0 1

1− a2

�2eTℎ∣Tℎ∣−1eℎ

)

(

Iℎ 0

− a�2eTℎ ∣Tℎ∣

−1 1

)

=

(

1�2 ∣Tℎ∣

−1 − a�4!∣Tℎ∣

−1eℎeTℎ ∣Tℎ∣

−1 !�2 ∣Tℎ∣

−1eℎ!�2eTℎ ∣Tℎ∣

−1 −!a

)

, (2.17)

7

with! = −

a

1− a2

�2 eTℎ ∣Tℎ∣−1eℎ

. (2.18)

Now, recalling that since D = In then N = [Rℎ ∣ uℎ+1 ∣ Rn,ℎ+1], for any ℎ ≤ n − 1 weobtain from (2.13)


2M ♯ℎ(a, �, In) =

N

⎡

⎢

⎢

⎢

⎣

[


]−1 [T 2ℎ + �2ℎ+1eℎe

Tℎ v1

vT1 c

] [


]−1 (


)−1(�ℎ+2Eℎ,1

� � 0 ⋅ ⋅ ⋅ 0

)

(

�ℎ+2Eℎ,1

� � 0 ⋅ ⋅ ⋅ 0

)T (�2∣Tℎ∣ aeℎaeTℎ 1

)−1

V2

⎤

⎥

⎥

⎥

⎦

NT ,

with

(


)−1(�ℎ+2Eℎ,1

� � 0 ⋅ ⋅ ⋅ 0

)

=

⎛

⎜

⎝

∗...∗

∗...∗

0ℎ+1,[n−(ℎ+3)]

⎞

⎟

⎠∈ IR(ℎ+1)×[n−(ℎ+1)],

where the ‘∗’ indicates entries whose computation is not relevant to our purposes.Now, considering the second last relation, we focus on computing the submatrix Hℎ×ℎ

corresponding to the first ℎ rows and ℎ columns of the matrix

[


]−1 [T 2ℎ + �2ℎ+1eℎe

Tℎ v1

vT1 c

] [


]−1

. (2.19)

After a brief computation, from (2.17) and (2.19) we obtain for the submatrix Hℎ×ℎ

Hℎ×ℎ =

[(

1

�2∣Tℎ∣

−1 −a

�4!∣Tℎ∣


−1

)

(

T 2ℎ + �2ℎ+1eℎe

Tℎ

)

+

!

�2∣Tℎ∣

−1eℎvT1

]

⋅

[

1

�2∣Tℎ∣

−1 −a

�4!∣Tℎ∣


−1

]

+

[(

1

�2∣Tℎ∣

−1 −a

�4!∣Tℎ∣


−1

)

v1 +!

�2c∣Tℎ∣

−1eℎ

]

⋅!

�2eTℎ ∣Tℎ∣

−1,

and for the case of Tℎ indefinite, from (2.3) we obtain (a similar analysis holds for the caseof Tℎ positive definite, too)

Hℎ×ℎ =

[

1

�2VℎIℎV

Tℎ Tℎ +

�2ℎ+1

�2∣Tℎ∣

−1eℎeTℎ −

a

�4!∣Tℎ∣

−1eℎeTℎVℎIℎV

Tℎ Tℎ

−a

�4!�2ℎ+1e

Tℎ ∣Tℎ∣

−1eℎ∣Tℎ∣−1eℎe

Tℎ +

!

�2∣Tℎ∣

−1eℎvT1

]

⋅

[

1

�2∣Tℎ∣

−1 −a

�4!∣Tℎ∣


−1

]

+!

�2

[

1

�2∣Tℎ∣

−1v1 −a

�4!∣Tℎ∣


−1v1 +!

�2c∣Tℎ∣

−1eℎ

]

eTℎ ∣Tℎ∣−1.

8

Recalling that (VℎIℎVTℎ )(VℎIℎV

Tℎ ) = Iℎ (so that eTℎ (VℎIℎV

Tℎ )(VℎIℎV

Tℎ )eℎ = 1), from the

last relation we finally have for Hℎ×ℎ the expression

Hℎ×ℎ =1

�4

{

Iℎ +[

�∣Tℎ∣−1eℎ −

a!

�2eℎ

+ !∣Tℎ∣−1v1

]

eTℎ ∣Tℎ∣−1 + !∣Tℎ∣

−1eℎ

[

vT1 ∣Tℎ∣−1 −

a

�2eTℎ

]}

, (2.20)

where

� = �2ℎ+1 − 2a

�2!�2ℎ+1(e

Tℎ ∣Tℎ∣

−1eℎ) +a2!2

�4

+a2

�4!2�2ℎ+1(e

Tℎ ∣Tℎ∣

−1eℎ)2 − 2

a

�2!2(eTℎ ∣Tℎ∣

−1v1) + !2c; (2.21)

moreover, since M ♯ℎ(a, �, In)A

2M ♯ℎ(a, �, In) ≻ 0 then also Hℎ×ℎ is positive definite.

Let us now define the subspace (see the vectors which define the dyads in relation (2.20))

T2 = span{

∣Tℎ∣−1eℎ , !

[

∣Tℎ∣−1v1 −

a

�2eℎ

]}

. (2.22)

Observe that since D = In then after some computation v1 = �ℎ+1 [Tℎ + tℎ+1,ℎ+1Iℎ] eℎ.Thus, from (2.22) the subspace T2 has dimension 2, unless

(i) Tℎ is proportional to Iℎ,

(ii) a = 0 (which from (2.18) also implies ! = 0).

We analyze separately the two cases. The condition (i) cannot hold since (2.2) would implythat the vector Aui is proportional to ui, i = 1, . . . , ℎ− 1, i.e. the Krylov-subspace methodhad to stop at the very first iteration, since the Krylov-subspace generated at the firstiteration did not change. As a consequence, considering any subspace Sℎ−2 ⊆ IRn, suchthat Sℎ−2

⊕

T2 = IRℎ, we can select any orthonormal basis {s1, . . . , sℎ−2} of the subspaceSℎ−2 so that (see (2.20)) the ℎ − 2 vectors {s1, . . . , sℎ−2} can be thought as (the first)ℎ − 2 eigenvectors of the matrix Hℎ×ℎ, corresponding to the eigenvalue +1/�4. Thus,

from the formula after (2.18) the eigenvalues of M ♯ℎ(a, �, In)A

2M ♯ℎ(a, �, In) coincide with the

eigenvalues of (we recall that since M ♯ℎ(a, �, In)A

2M ♯ℎ(a, �, In) ≻ 0 then Hℎ×ℎ ≻ 0)

⎛

⎜

⎜

⎜

⎜

⎜

⎝

Hℎ×ℎ Hℎ×ℎΦ

ΦTHℎ×ℎ

∗ ∗ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∗

∗...∗

V2

⎞

⎟

⎟

⎟

⎟

⎟

⎠

, Φ = H−1ℎ×ℎ

(

z1 z2 z3 0ℎ×[n−(ℎ+3)]

)

, zi ∈ IRℎ,

(2.23)which becomes, after setting

P =

⎛

⎜

⎜

⎜

⎝

∗ ∗ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ⋅ ∗

∗...∗

V2

⎞

⎟

⎟

⎟

⎠

,

9

of the form[

Hℎ×ℎ Hℎ×ℎΦ

ΦTHℎ×ℎ P

]

.

Thus, using Lemma 2.1 with w1 = ∣Tℎ∣−1eℎ, w2 = !

[

∣Tℎ∣−1v1 − a/�2eℎ

]

and m = 3, andobserving that we have by (2.14)

[

z1∗

]

=

[


]−1 [T 2ℎ + �2ℎ+1eℎe

Tℎ v1

vT1 c

] [


]−1

eℎ+1

=

[

1�2∣Tℎ∣

−1 − a�4!∣Tℎ∣


−1 !�2∣Tℎ∣

−1eℎ!�2 e

Tℎ ∣Tℎ∣

−1 −!a

]

⋅

⋅

(

!�2T 2ℎ ∣Tℎ∣

−1eℎ + �2ℎ+1!�2(eTℎ ∣Tℎ∣

−1eℎ)eℎ − !a v1

!�2 v

T1 ∣Tℎ∣

−1eℎ − c!a

)

so that z1 ∈ span{

!eℎ , !∣Tℎ∣−1eℎ , ∣Tℎ∣

−1v1}

,

[

z2∗

]

=

[


]−1

⎛

⎜

⎜

⎜

⎜

⎜

⎝

0...0

�ℎ+2

�

⎞

⎟

⎟

⎟

⎟

⎟

⎠

=

[ �ℎ+2

�2 ∣Tℎ∣−1eℎ − �ℎ+2

a!�4 ∣Tℎ∣

−1eℎ(eTℎ ∣Tℎ∣

−1eℎ) +�!�2 ∣Tℎ∣

−1eℎ∗

]

so that z2 ∈ span{∣Tℎ∣−1eℎ}, and

[

z3∗

]

=

[


]−1

⎛

⎜

⎜

⎜

⎜

⎜

⎝

0...00�

⎞

⎟

⎟

⎟

⎟

⎟

⎠

=

[ �!�2 ∣Tℎ∣

−1eℎ∗

]

so that z3 ∈ span{!∣Tℎ∣−1eℎ}, we conclude that considering the expression of Hℎ×ℎ, at least

ℎ− 3 eigenvalues of M ♯ℎ(a, �, In)A

2M ♯ℎ(a, �, In) coincide with +1/�4. As a consequence, the

matrix M ♯ℎ(a, �, In)A has at least ℎ − 3 singular values equal to +1/�2, which proves the

first statement of d).As regards the case (ii) with a = 0, observe that by the definition (2.18) of !, a = 0 implies! = 0, and from relations (2.20)-(2.21), we have Hℎ×ℎ = 1/�4[Iℎ + �2ℎ+1∣Tℎ∣


−1].Thus, the subspace T2 in (2.22) reduces to T1 = span{∣Tℎ∣

−1eℎ}. Now, reasoning as in the

case (i), we conclude that the matrix M ♯ℎ(a, �, In)A has at least (ℎ−2) singular values equal

to +1/�2.As regards item e), observe that for ℎ = n the matrix Rn is orthogonal, so that by (2.5)

and (2.7) Λ[M ♯ℎ(a, �,D)] = Λ[M−1

ℎ ] = Λ[∣Tℎ∣−1]. Furthermore, by (2.2) and (2.7) we have

10

for the case of Tn indefinite (a similar analysis holds for the case Tℎ positive definite, too)

M ♯n(a, �,D)A = M−1

n A = Rn∣Tn∣−1RT

nRnTnRTn = RnVnInV

Tn RT

n = (RnVn)In(RnVn)T .(2.24)

Since both Rn and Vn are orthogonal so is the matrix RnVn; thus, relation (2.24) proves

that M ♯n(a, �,D)A has all the n eigenvalues in the set {−1,+1}.

Remark 2.4 Note that of course the matrix Rn,ℎ+1 in the statement of Theorem 2.2 alwaysexists, such that [Rℎ ∣ uℎ+1 ∣ Rn,ℎ+1] is orthogonal. However, Rn,ℎ+1 is neither built norused in (2.6)-(2.7), and it is introduced only for theoretical purposes. Furthermore, it iseasy to see that since [Rℎ ∣ uℎ+1 ∣ Rn,ℎ+1] is orthogonal, any nonsingular diagonal matrixD may be used in order to satisfy the hypotheses of Theorem 2.2.

Remark 2.5 Observe that the introduction of the nonsingular matrix D in (2.6) addresses

a very general structure for the preconditioner M ♯ℎ(a, �,D). As an example, setting ℎ = 0 we

have M ♯ℎ(a, �,D) = DDT ≻ 0, so that the preconditioner M ♯

ℎ(a, �,D) will encompass severalclasses of preconditioners from the literature (e.g. diagonal banded and block diagonal pre-conditioners [17]), even though no information is provided by the Krylov subspace method.

On the other hand, with the choice D = In and � = 1 the preconditioner M ♯ℎ(a, 1, In) can

be regarded as an approximate inverse preconditioner [17], without any scaling. Finally,though the choice � = 1 in (2.6) seems the most obvious, numerical reasons related to for-

mula (2.17) and to the condition number of M ♯ℎ(a, �,D)A may suggest other values for the

parameter ‘�’. In the companion paper [9] we give motivations for the latter conclusion.

It is possible to show that trying to introduce a slightly more general structure ofM ♯

ℎ(a, �,D), where the parameter ‘�’ is replaced by a scaling (diagonal) matrix Δ ∈ IRℎ×ℎ

(used to balance the matrix ∣Tℎ∣), the item d) of Theorem 2.2 may not be fulfilled. Thenext result summarizes the properties of our class of preconditioners, for a very simple andopportunistic choice of the parameters ‘a’, ‘�’ and matrix ‘D’.

Corollary 2.3 Consider any Krylov method to solve the symmetric linear system (2.1).Suppose that Assumption 2.1 holds and the Krylov method performs ℎ ≤ n iterations. Then,setting a = 0, � = 1 and D = In in Theorem 2.2 the preconditioner

M ♯ℎ(0, 1, In) =

[

In − (Rℎ ∣ uℎ+1) (Rℎ ∣ uℎ+1)T]

+ (Rℎ ∣ uℎ+1)

(

∣Tℎ∣ 0

0 1

)−1

(Rℎ ∣ uℎ+1)T (2.25)

M ♯n(0, 1, In) = Rn∣Tn∣

−1RTn , (2.26)

is such that

a) the matrix M ♯ℎ(0, 1, In) is symmetric and nonsingular for any ℎ ≤ n;

b) the matrix M ♯ℎ(0, 1, In) coincides with M−1

ℎ , for any ℎ ≤ n;

11

c) the matrix M ♯ℎ(0, 1, In) is positive definite. Moreover, its spectrum Λ[M ♯

ℎ(0, 1, In)] isgiven by

Λ[M ♯ℎ(0, 1, In)] = Λ

[

∣Tℎ∣−1]

∪ Λ [In−ℎ] ;

d) when ℎ ≤ n − 1, then the matrix M ♯ℎ(0, 1, In)A has at least (ℎ − 2) singular values

equal to +1;

e) when ℎ = n, then Λ[Mn] = Λ[∣Tn∣] and Λ[M ♯n(0, 1, In)A] = Λ[M−1

n A] = Λ[AM−1n ] ⊆

{−1,+1}, i.e. the n eigenvalues of M ♯ℎ(0, 1, In)A are either +1 or −1.

Proof: The result is directly obtained from (2.4)-(2.5) and Theorem 2.2, with a = 0, � = 1and D = In.

Remark 2.6 Observe that the case ℎ ≈ n in Theorem 2.2 and Corollary 2.3 is of scarceinterest for large scale problems. Indeed, in the literature of preconditioners the values of‘ℎ’ typically do not exceed 10÷20 [13, 14]. Moreover, for small values of ℎ the computationof the inverse matrix

(


)−1

, (2.27)

in order to provide M ♯ℎ(a, �, In) or M

♯ℎ(a, �,D), may be cheaply performed when Tℎ is either

indefinite or positive definite. Indeed, Remark 2.3 and relation (2.17) will provide theresult. Thus, the overall cost (number of flops) for computing (2.27) is mostly due tothe computational burden of ∣Tℎ∣

−1. However, with a better insight and considering thatour preconditioners are suited for large scale problems, observe that the application of ourproposal only requires to compute the inverse matrix (2.27) times a real (ℎ+1)-dimensionalvector. Indeed, Krylov subspace methods never use directly matrices during their recursion.Thus, the computational core of computing the matrix (2.27) times a vector is the product∣Tℎ∣

−1u, where u ∈ IRℎ. In this regard, we have the following characterization:

∙ if Tℎ is indefinite then ∣Tℎ∣−1u = (Vℎ∣Bℎ∣V

Tℎ )−1u = V T

ℎ ∣Bℎ∣−1Vℎu, and recalling that

Bℎ is at most 2 × 2 block diagonal, the cost C(∣Tℎ∣−1u) of calculating the product

∣Tℎ∣−1u (not including the cost to compute the Jordan Canonical form of Tℎ), is given

by C(∣Tℎ∣−1u) = O(ℎ2);

∙ if Tℎ is positive definite then ∣Tℎ∣−1u = (LℎDℎL

Tℎ )

−1u = L−Tℎ D−1

ℎ L−Tℎ u. Consider-

ing the results in Section 4 of [8], we have that again the cost C(∣Tℎ∣−1u) of computing

the product ∣Tℎ∣−1u, is given by C(∣Tℎ∣

−1u) = O(ℎ2).

Remark 2.7 The choice of the parameters ‘�’ and ‘a’, and the matrix ‘D’ is problemdependent. In particular, ‘�’ and ‘a’ may be set in order to impose conditions like thefollowing (which tend to force the clustering of the eigenvalues of matrix H(ℎ+1)×(ℎ+1) orHℎ×ℎ -see (2.19)- near +1 or near −1):

det[

H(ℎ+1)×(ℎ+1)

]

= 1, tr[

H(ℎ+1)×(ℎ+1)

]

= ℎ+ 1,

det [Hℎ×ℎ] = 1, tr [Hℎ×ℎ] = ℎ.

12

Nonetheless, also the choice a = 0 seems appealing, as described in the companion paper[9]. Finally, observe that depending on the quantities in the expressions (2.20)-(2.21), theremay be real values of the parameters ‘�’ and ‘a’ such that � = 0. Choosing the latter valuesfor ‘�’ and ‘a’ may reinforce the conclusions of item d) in Theorem 2.2.

3 Conclusions

We have given theoretical results for a class of preconditioners, which are parameter de-pendent. The preconditioners can be built by using any Krylov subspace method for thesymmetric linear system (2.1), provided that the general conditions (2.2)-(2.3) in Assump-tion 2.1 are satisfied. We will give evidence in the companion paper [9] that in several realproblems, a few iterations of the Krylov subspace method adopted may suffice to computeeffective preconditioners. In particular, in many problems using a relatively small value ofthe index ℎ, in Assumption 2.1, we can capture a significant information on the systemmatrix A. In order to clarify more carefully the latter statement, consider the eigenvec-tors {�1, . . . , �n} of matrix A in (2.1), and suppose the eigenvectors {�ℓ1 , . . . , �ℓm}, with{�ℓ1 , . . . , �ℓm} ⊆ {�1, . . . , �n}, correspond to large eigenvalues of A (as often happens). Incase the Krylov subspace method adopted to solve (2.1) generates directions which span the

subspace {�ℓ1 , . . . , �ℓm}, then M ♯ℎ(a, �,D) will be likely effective as a class of preconditioners.

On this guideline our proposal seems tailored also for those cases where a sequence oflinear systems of the form

Akx = bk, k = 1, 2, . . .

requires a solution (e.g., see [13, 4] for details), where Ak slightly changes with the index k.

In the latter case, the preconditioner M ♯ℎ(a, �,D) in (2.6)-(2.7) can be computed applying

the Krylov subspace method to the first linear system A1x = b1. Then, M♯ℎ(a, �,D) can be

used to efficiently solve Akx = bk, with k = 2, 3, . . ..Finally, the class of preconditioners in this paper seems a promising tool also for the

solution of linear systems in financial frameworks. In particular, we want to focus onsymmetric linear systems arising when we impose KKT conditions in portfolio selectionproblems, with a large number of titles in the portfolio, along with linear equality constraints(see also [2]).

13

References

[1] MATLAB Release 2011a, The MathWorks Inc., 2011.

[2] G. Al-Jeiroudi, J. Gondzio, and J. Hall, Preconditioning indefinite systems ininterior point methods for large scale linear optimisation, Optimization Methods andSoftware, 23 (2008), pp. 345–363.

[3] J. Baglama, D. Calvetti, G. Golub, and L. Reichel, Adaptively preconditionedGMRES algorithms, SIAM Journal on Scientific Computing, 20 (1998), pp. 243–269.

[4] A. R. Conn, N. I. M. Gould, and P. L. Toint, Trust–region methods, MPS–SIAMSeries on Optimization, Philadelphia, PA, 2000.

[5] J. Dennis and R. Schnabel, Numerical Methods for Unconstrained Optimizationand Nonlinear equations, Prentice-Hall, Englewood Cliffs, 1983.

[6] G. Fasano, Planar–conjugate gradient algorithm for large–scale unconstrained op-timization, Part 1: Theory, Journal of Optimization Theory and Applications, 125(2005), pp. 523–541.

[7] G. Fasano and M. Roma, Iterative computation of negative curvature directionsin large scale optimization, Computational Optimization and Applications, 38 (2007),pp. 81–104.

[8] , Preconditioning Newton-Krylov methods in nonconvex large scale optimization,submitted to Computational Optimization and Applications, (2009).

[9] , A Class of Preconditioners for Large Indefinite Linear Systems, as by-product ofKrylov Subspace Methods: Part 2, Technical Report n. 5, Department of Management,University Ca’Foscari, Venice, Italy, 2011.

[10] P. E. Gill, W. Murray, D. B. Ponceleon, and M. A. Saunders, Preconditionersfor indefinite systems arising in optimization, SIAM J. Matrix Anal. Appl., 13 (1992),pp. 292–311.

[11] G. Golub and C. Van Loan, Matrix Computations, The John Hopkins Press, Bal-timore, 1996. Third edition.

[12] M. Hestenes, Conjugate Direction Methods in Optimization, Springer Verlag, NewYork, 1980.

[13] J. Morales and J. Nocedal, Automatic preconditioning by limited memory quasi–Newton updating, SIAM Journal on Optimization, 10 (2000), pp. 1079–1096.

[14] J. L. Morales and J. Nocedal, Algorithm PREQN: Fortran 77 subroutine for pre-conditioning the conjugate gradient method, ACM Transaction on Mathematical Soft-ware, 27 (2001), pp. 83–91.

14

[15] S. Nash, A survey of truncated-Newton methods, Journal of Computational and Ap-plied Mathematics, 124 (2000), pp. 45–59.

[16] J. Nocedal and S. Wright, Numerical Optimization (Springer Series in OperationsResearch and Financial Engineering) - Second edition, Springer, New York, 2000.

[17] Y. Saad, Iterative Methods for Sparse Linear Systems, Second Edition, SIAM,Philadelphia, 2003.

[18] J. Stoer, Solution of large linear systems of equations by conjugate gradient type meth-ods, in Mathematical Programming. The State of the Art, A. Bachem, M.Grotschel,and B. Korte, eds., Berlin Heidelberg, 1983, Springer-Verlag, pp. 540–565.

15

G. Fasano and M. Roma A class of preconditioners …roma/2011wp4.pdfG. Fasano and M. Roma A class of...

Documents

Transcript of G. Fasano and M. Roma A class of preconditioners …roma/2011wp4.pdfG. Fasano and M. Roma A class of...