A globally convergent Inexact-Newton method for solving ...

A globally convergent Inexact-Newton method for

solving reducible nonlinear systems of equations

Natasa Krejic ∗ Jose Mario Martınez †

December 30, 1998

Abstract

A nonlinear system of equations is said to be reducible if it canbe divided into m blocks (m > 1) in such a way that the i−th blockdepends only on the first i blocks of unknowns. Different ways of han-dling the different blocks with the aim of solving the system have beenproposed in the literature. When the dimension of the blocks is verylarge, it can be difficult to solve the linear Newtonian equations associ-ated to them using direct solvers based on factorizations. In this case,the idea of using iterative linear solvers to deal with the blocks of thesystem separately is appealing. In this paper, a local convergence the-ory that justifies this procedure is presented. The theory also explainsthe behavior of a Block–Newton method under different types of per-turbations. Moreover, a globally convergent modification of the basicBlock Inexact-Newton algorithm is introduced so that, under suitableassumptions, convergence can be ensured, independently of the initialpoint considered.

Keywords. Nonlinear systems, Inexact-Newton methods, reduciblesystems, decomposition. AMS: 65H10

∗Institute of Mathematics, University of Novi Sad, Trg Dositeja Obradovica 4, 21000Novi Sad, Yugoslavia, E-mail: [email protected]. This author was supported byFAPESP (Grant 96/8163-9).†Department of Mathematics, IMECC-UNICAMP, University of Campinas, CP 6065,

13081-970 Campinas SP, Brazil. This author was supported by FAPESP (Grant 90-3724-6), CNPq and FAEP-UNICAMP. E-Mail: [email protected]

1

1 Introduction

We consider the problem of solving the nonlinear system of equations

F (x) = 0, (1)

where F : IRn → IRn has lower block triangular structure. In other wordswe suppose that the components of F can be partitioned into blocks ofcomponents such that (1) becomes

F1(x1) = 0,F2(x1, x2) = 0,...Fm(x1, x2, . . . , xm) = 0,

(2)

with x = (x1, x2, . . . , xm), xi ∈ IRni , Fi : IRn1 × IRn2 × . . . × IRni →IRni , i = 1, 2, . . . ,m, n1 +n2 + . . .+nm = n. Such systems were considered in[4], where a large class of locally convergent p-step (quasi) Newton methodswas proposed. Of course, in practice one is interested in the case m >> 1.

Systems with the structure (2) appear naturally in many practical appli-cations. For example, the index i ∈ {1, . . . ,m} could represent time and xicould be the vector of state variables of a physical, social or economic sys-tem. In this case, (2) says that the state of the system at a given time isthe solution of a nonlinear system defined by the previous states. When onesolves hyperbolic or parabolic partial differential equations using implicit orsemi-implicit systems, a system with the structure (2) is generated, with xirepresenting the solution for the i−th time level. See [1, 12]. In many ofthese applications, the system that must be solved at the i−th level is large,so that Newton-like methods, based on the use of direct methods for solvinglinear systems, cannot be applied. The use of iterative linear solvers at eachlevel i can thus be appealing. This situation gives rise to the development ofBlock Inexact-Newton methods for solving (2), which is the subject of thepresent research.

The usefulness of research on Inexact-Newton methods does not rely onlyon its applicability to large-scale systems. On one hand, practical Newto-nian iterations can be regarded as Inexact-Newton ones in the presenceof rounding errors. On the other hand, the Inexact-Newton approach pro-vides a useful representation of Newtonian iterations when the domain spaceis high– (perhaps infinite–) dimensional and the Jacobian employed comesfrom the consideration of coarser grids. See, for example [2]. In this sense, it

2

can be claimed that Inexact-Newton theories complete the theory of the ex-act Newton method. Practical implementations of Newton’s method wouldnot be efficient if the classical Inexact-Newton convergence (and order-of-convergence) theorems were not true. Finally, all popular globalizations ofNewton’s method require a reduction in the norm of the local linear modelof F and, so, can be considered Inexact-Newton methods.

The same observations apply to globalization schemes. While globalmodifications of Newton’s method (based on line searches or trust regions)are known for many years, globalization schemes for Inexact–Newton meth-ods appeared only recently in the literature (see [6] and references therein).However, as we claimed in the case of local convergence properties, thereason why Newton globalization schemes are efficient in real life is thattheoretically justified Inexact-Newton counterparts exist.

The relation between the results presented in this paper with the Block-Newton method introduced in [4] is the same that exists between the classicalInexact-Newton and the exact Newton method. We are going to prove thatthe Inexact-Newton version of the Block-Newton method of [4] is locallyconvergent, which means that, very likely, local convergence of the Block-Newton method will be reflected in practical computations. Even moreimportant is the fact that the Inexact-Newton method can be globalized,since a global modification of the Block-Newton method was not known. So,the same globalization scheme can be applied to the Block-Newton methodand, in practice, this procedure will be robust in the presence of roundingerrors or other perturbations.

In Section 2 of this paper we introduce the “local version” of the BlockInexact-Newton method, which is based on the Dembo-Eisenstat-Steihaug[3] criterion for accepting an approximate solution of the linear Newtoniansystem at each block. In Section 3 we prove local linear and superlinearconvergence. In Section 4 we introduce a modification of the local algorithmby means of which it turns out to be globally convergent. We also provethat, under suitable assumptions, the local and the global algorithm coincidenear a solution and, so, superlinear convergence holds for the global methodtoo. Possible applications and conclusions are drawn in Section 5.

2 Local Block Inexact-Newton method

The Block Inexact-Newton (BIN) method introduced in this paper is, asmost methods for solving systems of equations, iterative. Iterates will be

3

denoted xk. Each xk ∈ IRn can be partitioned into m components belongingto IRn1 , . . . , IRnm , which will be denoted xki . So, we write

xk = (xk1, . . . , xkm)

for all k = 0, 1, 2, . . ., where xki ∈ IRni , i = 1, . . . ,m.Each iteration of BIN consists of a sequence of m steps. Roughly speak-

ing, at the i−th step, xk+1i (the i−th component of xk+1) is computed,

using xki (the i−th component of xk) and the already obtained first i − 1components of xk+1. Therefore, the vector whose components are the firsti−1 components of xk+1 followed by the i−th component of xk plays a veryspecial role in the algorithm and deserves a special notation. In fact, wedefine xk,1 = xk1 and

xk,i = (xk+11 , . . . , xk+1

i−1 , xki ) for i = 2, 3, . . . ,m.

Many times, given x ∈ IRn, we will need to refer to the vector whosecomponents are the first i components of x. This vector, which belongs toIRn1 × . . .× IRni , will be denoted xi. So, if x = (x1, . . . , xm), we have

xi = (x1, . . . , xi), i = 1, 2, . . . ,m.

With this notation, we can write

xk,i = (xk+1i−1 , x

ki ) for i = 2, 3, . . . ,m.

As we saw in the Introduction, the arguments of the function Fi arex1, . . . , xi and its range space is IRni . We assume that the derivatives of thecomponents of Fi with respect to the components of xi are well defined forall xi. Therefore, the corresponding Jacobian matrix of Fi with respect toxi is a well defined square matrix Ji(x1, . . . , xi) ∈ IRni×ni . Derivatives of Fiwith respect to x1, . . . , xi−1 are not assumed to exist at all.

Let us denote now |·| an arbitrary norm on IRni as well as its subordinatematrix norm. Assume that x0 ∈ IRn is an arbitrary initial approximationto the solution of the nonlinear system (1). Given the k−th approximationxk = (xk1, . . . , x

km) and the forcing parameter ηk ∈ [0, 1), the (local) Block

Inexact- Newton algorithm obtains xk+1 = (xk+11 , . . . , xk+1

m ) by means of

Ji(xk,i)(xk+1i − xki ) = −Fi(xk,i) + rki , (3)

where|rki | ≤ ηk|Fi(xk,i)| (4)

4

for i = 1, 2, . . . ,m.

As we mentioned in the Introduction, the computation of an incrementxk+1i − xki that satisfies (3)-(4) involves the approximate solution of a linear

system of equations, and the parameter ηk represents the relative precisionthat is required for that solution. In most practical situations, an iterativelinear solver will be used to compute the increment (3).

Convergence proofs make it necessary to define appropriate norms onthe spaces IRn1 × . . .× IRni . Here we define a norm ‖ · ‖ on IRn1 × . . .× IRni

by

‖xi‖ =i∑

j=1

|xj |.

Let us now state the assumptions that will be used for proving localconvergence of the method defined by (3) and (4). As usually in local con-vergence theories, we assume that there exists x∗ ∈ IRn that solves theproblem, that is:

F (x∗) = 0.

Assumptions A1 and A2 stated below say that the diagonal Jacobianblocks Ji(x) are well behaved at x∗.

(A1) Ji(xi) is continuous at x∗i , i = 1, 2, . . . ,m;

(A2) Ji(x∗i ) is nonsingular for i = 1, 2, . . . ,m.

Since derivatives of Fi with respect to variables other than xi are not as-sumed to exist, only assumptions relative to diagonal Jacobians are presentin our theory. However, as in [4], an assumption is needed relative to thevariation of Fi with respect to the remaining variables.

(A3) There exist ε1 > 0, β ≥ 0 such that, whenever ‖xi − x∗i ‖ ≤ ε1,

|Fi(xi−1, xi)− Fi(x∗i−1, xi)| ≤ β‖xi−1 − x∗i−1‖, i = 2, 3, . . . ,m.

One should notice that these assumptions, together with (A4), whichwill be stated in Section 3, are the same as the ones used in [4] for proving

5

local convergence of Newton-like methods.

The next two lemmas, the proofs of which can be found in [13], are go-ing to be used for proving local results. Essentially, Lemma 1 says that theinverse of Ji is continuous at x∗ and Lemma 2 states the differentiability ofFi with respect to xi.

Lemma 1 For all γ > 0 there exists ε > 0 such that Ji(xi) is nonsingularand

|Ji(xi)−1 − Ji(x∗i )−1| < γ, whenever ‖xi − x∗i ‖ < ε.

Lemma 2 For all γ > 0 there exists ε > 0 such that

|Fi(x∗i−1, xi)− Fi(x∗i−1, x∗i )− Ji(x∗i )(xi − x∗i )| ≤ γ|xi − x∗i |,

whenever |xi − x∗i | < ε.

We finish this section proving a technical lemma which is crucial for thelocal convergence proofs. Roughly speaking, the quantities eki mentioned inthis technical result will represent componentwise errors in the local conver-gence theorem which will be proved in the next section.

Lemma 3 Let eki , i = 1, 2, . . . ,m, k = 0, 1, 2, . . . , ρ ∈ [0, 1) and C > 0 bereal numbers such that eki ≥ 0 for all i = 1, 2, . . . ,m, k = 0, 1, 2, . . . and

ek+1i ≤ ρeki + C

i−1∑j=1

ek+1j (5)

for all i, k. Then the sequence{ek}, with ek =

(ek1, . . . , e

km

), converges to

0 and the convergence is q-linear in a suitable norm.

Proof. Define the matrix A ∈ IRn×n, A = [aij ] , by

aij =

1, i = j,−C, i > j,0, i < j

.

Clearly, (5) is equivalent to

Aek+1 ≤ ρek.

6

It is easy to see that the entries of A−1 are nonnegative. Therefore, we canmultiply both sides of the previous inequality by A−1, obtaining

ek+1 ≤ ρA−1ek (6)

for all k. Since the spectral radius of A−1 is equal to 1, there exists a norm‖ · ‖C on IRn such that, given ρ0 ∈ (ρ, 1), we have

‖ρA−1‖C ≤ ρ0.

The construction of this norm in the case that the matrix is lower-triangularimposes that ‖x‖C = ‖Dx‖2, where D is a positive diagonal matrix. There-fore, by (6),

Dek+1 ≤ ρDA−1ek.

Thus, by the monotonicity of ‖ · ‖2,

‖ek+1‖C = ‖Dek+1‖2 ≤ ρ‖DA−1ek‖2

= ρ‖A−1ek‖C ≤ ρ‖A−1‖C‖ek‖C ≤ ρ0‖ek‖C .

So, the desired result is proved. 2

3 Local convergence

In this section we prove local convergence results for the algorithm BIN,defined by (3)-(4). We are going to prove that, if the forcing parametersηk are bounded away from 1 and the initial estimate is close enough to thesolution, then {xk} converges to x∗ with a q-linear rate determined by anupper bound of ηk in a specific norm. This implies r-linear convergence inany other norm.

As so happens to occur with the classical Inexact–Newton method in-troduced in [3], the linear convergence rate is associated to a special normdefined by the Jacobian at x∗. Similarly, here we will use some auxiliarynorms. Let |·| be the norm used in Lemma 3. The induced norms are

|xi|∗ = |Ji(x∗i )xi| and ‖xi‖∗ =i∑

j=1

|xj |∗.

for each i = 1, . . . ,m and every xi ∈ IRn1 × . . .× IRni .

7

Theorem 1 Let 0 ≤ ηk ≤ η < 1. There exists ε > 0 such that if ‖x0−x∗‖ ≤ε, then the sequence {xk} generated by BIN is well defined and converges q-linearly (in a suitable norm) to x∗.

Proof. Let us denote

eki = |xki − x∗i |∗, i = 1, 2, . . . ,m, k = 0, 1, 2, . . . .

Since Ji(x∗i ) is nonsingular for all i = 1, 2, . . . ,m, there exists µ > 0 suchthat

µ = max1≤i≤m

{|Ji(x∗i )|, |Ji(x∗i )−1|} (7)

and1µ|xi| ≤ |xi|∗ ≤ µ|xi|, for xi ∈ IRni , i = 1, 2, . . . ,m. (8)

Choosing γ > 0 small enough, we have that

(1 + γµ)[η(1 + γµ) + 2γµ] ≤ ρ < 1.

By Lemma 1, Lemma 2 and Assumption A2 there exists ε ∈ (0, ε1) suchthat

|Ji(xi)− Ji(x∗i )| ≤ γ, (9)

|Ji(xi)−1 − Ji(x∗i )−1| ≤ γ, (10)

|Fi(x∗i−1, xi)− Fi(x∗i−1, x∗i )− Ji(x∗i )(xi − x∗i )| ≤ γ|xi − x∗i | (11)

for ‖xi − x∗i ‖ ≤ µ2ε and all i = 1, 2, . . . ,m.We are going to prove, by induction on i, that the sequence {eki } satisfies

(5).Suppose that ‖xk − x∗‖ ≤ µ2ε. Then

J1(x∗1)(xk+11 − x∗1) = J1(x∗1)(xk1 − x∗1 − J1(xk1)−1F1(xk1) + J1(xk1)−1rk1)

= (I + J1(x∗1)(J1(xk1)−1 − J1(x∗1)−1))

×[rk1 − (F1(xk1)− F1(x∗1)− J1(x∗1)(xk1 − x∗1))− (J1(x∗1)− J1(xk1))(xk1 − x∗1)],

where I is the identity matrix. So by (7), (9)-(11) and condition (4) wehave:

ek+11 ≤ (1 + µγ)(ηk|F1(xk1)|+ 2γ|xk1 − x∗1|).

Since

F1(xk1) = J1(x∗1)(xk1 − x∗1) + F1(xk1)− F1(x∗1)− J1(x∗1)(xk1 − x∗1),

8

taking norms and using (11) we obtain

|F1(xk1)| ≤ ek1 + γ|xk1 − x∗1|.

Thereforeek+1

1 ≤ (1 + µγ)[(1 + µγ)ηk + 2γµ]ek1 ≤ ρek1and |xk+1

1 − x∗1| ≤ µ2ε. Assume as inductive hypothesis that

ek+1j ≤ ρekj + βµ(2 + µγ)

j−1∑l=1

ek+1l and ‖(xk+1

i−1 , xki )− x∗i ‖ ≤ µ2ε,

for j = 1, 2, . . . , i− 1, i ≥ 2. Then

Ji(x∗i )(xk+1i − x∗i ) = (I + Ji(x∗i )(Ji(x

k,i)−1 − Ji(x∗i )−1))

×[rki − (Fi(xk,i)− Fi(x∗i−1, xki ))

−(Fi(x∗i−1, xki )− Fi(x∗i )− Ji(x∗i )(xki − x∗i ))− (Ji(x∗i )− Ji(xk,i))(xki − x∗i )]

and

Fi(xk,i) = Ji(x∗i )(xki − x∗i ) + (Fi(xk,i)− Fi(x∗i−1, x

ki ))

+(Fi(x∗i−1, xki )− Fi(x∗i )− Ji(x∗i )(xki − x∗i )).

Taking norms, using (11) and Assumption A3 we have that

ek+1i ≤ (1 + µγ)[ηk(eki + γ|xki − x∗i |+ β‖xk+1

i−1 − x∗i−1‖)

+β‖xk+1i−1 − x

∗i−1‖+ 2µγeki ],

so

ek+1i ≤ ρeki + βµ(2 + µγ)

i−1∑j=1

ek+1j .

Since ek+1i satisfies inequality (5) with C = βµ(2 + µγ) it follows from

Lemma 3 thatlimk→∞

ek = 0

and the convergence is q-linear.2

In the next theorem we will prove superlinear convergence of the sequence{xk} generated by BIN when lim ηk = 0. To the end of this section wesuppose that F satisfies the assumption A4 in addition to assumptions (A1–A3).

9

(A4) There exists L > 0 such that

|Ji(xi)− Ji(x∗i )| ≤ L‖xi − x∗i ‖

for i = 1, 2, . . . ,m and ‖xi − x∗i ‖ < ε1.

As a consequence of this assumption we have that

|Fi(x∗i−1, xi)− Fi(x∗i−1, x∗i )− Ji(x∗i )(xi − x∗i )| ≤

L

2|xi − x∗i |2 (12)

for i = 1, 2, . . . ,m and ‖xi − x∗i ‖ < ε1.

Theorem 2 Assume that the BIN iterates {xk} converge to x∗ andlimk→∞ ηk = 0. Then the convergence is superlinear.

Proof. We will prove by induction on i that

|xk+1i − x∗i | = o(‖xki − x∗i ‖).

By the convergence of the sequence and Lemma 1, there exists C > 0such that

C ≥ (|Ji(x∗i )−1|+ |Ji(xk,i)−1 − Ji(x∗i )−1|),

i = 1, 2 . . . ,m. For i = 1 we have, as in the proof of Theorem 1,

|xk+11 − x∗1| ≤ (|J1(x∗1)−1|+ |J1(xk1)−1 − J1(x∗1)−1|)·

(|rk1 |+ |F1(xk1)− F1(x∗1)− J1(x∗1)(xk1 − x∗1)|+ |J1(x∗1)− J1(xk1)||xk1 − x∗1|)

≤ C(ηk|F1(xk1)|+ L

2|xk1 − x∗1|2 + L|xk1 − x∗1|2),

and since ηk → 0 and

|F1(xk1)| ≤ ek1 + |F1(xk1)− F1(x∗1)− J1(x∗1)(xk1 − x∗1)|,

we have|xk+1

1 − x∗1| = o(|xk1 − x∗1|) = o(|xk1 − x∗1|).

Assume, as inductive hypothesis,

|xk+1j − x∗j | = o(‖xkj − x∗j‖), j = 0, 1, . . . , i− 1.

As in the proof of Theorem 1, we have that

10

|xk+1i − x∗i | ≤ (|Ji(x∗i )−1|+ |Ji(xk,i)−1 − Ji(x∗i )−1|)

×[|rki |+ |Fi(xk,i)− Fi(x∗i−1, xki )|+ |Fi(x∗i−1, x

ki )− Fi(x∗i )− Ji(x∗i )(xki − x∗i )|

+|(Ji(x∗i )− Ji(xk,i))(xki − x∗i )|]

andFi(xk,i) = Ji(x∗i )(x

ki − x∗i ) + (Fi(xk,i)− Fi(x∗i−1, x

ki ))

+(Fi(x∗i−1, xki )− Fi(x∗i )− Ji(x∗i )(xki − x∗i )).

So, using (12) we have

|xk+1i − x∗i | ≤ C[ηk(|xki − x∗i ||Ji(x∗i )|+ β

i−1∑j=1

|xk+1j − x∗j |+

L

2|xki − x∗i |2)

+βi−1∑j=1

|xk+1j − x∗j |+

L

2|xki − x∗i |2 + L(

i−1∑j=1

|xk+1j − x∗j |+ |xki − x∗i |)|xki − x∗i |]

≤ Co(|xki − x∗i |) +Mi−1∑j=1

|xk+1j − x∗j |,

for some constant M. From this inequality and the inductive hypothesis weobtain that

|xk+1i − x∗i | = o(‖xki − x∗i ‖),

so

‖xk+1 − x∗‖ =m∑i=1

|xk+1i − x∗i | = o(‖xk − x∗‖)

and the sequence {xk} converges superlinearly to x∗. 2

4 Global convergence

Up to now, we have proved some local convergence properties of the Al-gorithm BIN, introduced in Section 2. Obviously, since Newton’s methodis a particular case of BIN (when m = 1 and ηk ≡ 0), this algorithm isnot globally convergent. (Examples of nonconvergence of Newton’s methodin one-dimensional cases are easy to construct.) By this we mean that

11

convergence to solutions or other special points depends on the initial ap-proximation. In this section we introduce a modification of BIN that makesit globally convergent in the sense that will be specified later. Moreover,we will prove that, near a solution and under suitable assumptions, themodification coincides with the original BIN, so that local properties alsohold.

Unlike the local convergence results, based on the assumptions A1-A4,our global convergence theorems assume that F (x) has continuous partialderivatives with respect to all the variables. We will denote F ′(x) the Jaco-bian matrix of F (x). Moreover, for all i = 2, . . . ,m, j = 1, . . . , i− 1 we de-note ∂Fi(x)

∂xjthe matrix of partial derivatives of Fi with respect to x1, . . . , xj .

Therefore, ∂Fi(x)∂xj

∈ IRni×(n1+...+nj). For proving that the globalized algo-rithm converges to a solution independently of the initial approximation wewill need the assumption stated below.

(A5) For all x ∈ IRn, F ′(x) is continuous and nonsingular.

Assumption A5 implies that all the entries of F ′(x) and F ′(x)−1 arebounded on bounded sets, as well as the norms of all the submatrices ofF ′(x) and F ′(x)−1.

From now on we denote |·| the Euclidean norm on IRni , ‖·‖ the Euclideannorm on IRn (so ‖x‖2 =

∑mi=1 |xi|2) and we denote 〈·, ·〉 the standard scalar

product.The global modification of BIN is stated below. Essentially, given an

iterate xk, a “pure” BIN iteration may be tried first, expecting sufficient de-crease of ‖F (x)‖. If this decrease is not obtained, closer points to xk will betried, following rules that are similar to classical backtracking (see [5]), butintroducing a suitable change of direction each time the step is reduced. Theidea is that, for small steps, the direction obtained approximates a Newtondirection for the whole system.

Algorithm GBINAssume that σ ∈ (0, 1) and 0 < θ1 < θ2 < 1, η ∈ [0, 1) are constantsindependent of k. For given xk (k ≥ 0) such that F

(xk)6= 0 and 0 ≤ ηk ≤

η ∈ [0, 1), xk+1 is obtained by means of the following steps.Step 1.Set

α← 1.

12

Step 2. (BIN iteration)

Choose δ ∈ (0, α].For i = 1, 2, . . . ,m, define

yi = xki + α(di−1, 0

)= (xk1 + αd1, . . . , x

ki−1 + αdi−1, x

ki ),

zi = xki + δ(di−1, 0

)= (xk1 + δd1, . . . , x

ki−1 + δdi−1, x

ki ),

wi ∈ {xki , yi},

(y1 = z1 = xk1) and find di ∈ IRni such that

|Ji(wi)di + Fi(xki ) +1δ

(Fi(zi)− Fi(xki ))| ≤ ηk|Fi(wi)|. (13)

Step 3. (Test sufficient decrease)Define γk = 1− η2

k and d = (d1, . . . , dm).If

‖F (xk + αd)‖ ≤ (1− σγkα

2)‖F (xk)‖ (14)

setαk = α, dk = d and xk+1 = xk + αkd

k.

Otherwise chooseαnew ∈ [θ1α, θ2α], α← αnew

and repeat Step 2.

Remarks. The step di computed in (13) is an approximate solution of thelinear system

1δ

(Fi(zi)− Fi(xki )) + Ji(wi)di + Fi(xki ) = 0.

When δ is small this system is an approximation of the system consideredat the i−th stage of a block lower-triangular solver for a step of Newton’smethod applied to F (x) = 0. This is obviously true for i = 1. For i > 1,the statement above follows from

Fi(zi)− Fi(xki ) = Fi(xk1 + δd1, . . . , xki−1 + δdi−1, x

ki )− Fi(xk1, . . . , xki−1, x

ki )

≈ δ[∂Fi∂x1

(xki )d1 + . . .+∂Fi∂xi−1

(xki )di−1].

13

Sinceσγkα

2=σ(1− η2

k)α2

=σ(1 + ηk)

2(1− ηk)α

and σ(1+ηk)2 < 1, the algorithm GBIN can be formulated setting σ′ ∈ (0, 1)

and replacing (14) by

‖F (xk + αd)‖ ≤ (1− σ′(1− ηk)α)‖F (xk)‖. (15)

This is the condition for acceptability of a step in the backtracking al-gorithm of Eisenstat and Walker [6]. In our formulation of GBIN we preferthe Euclidean norm to measure progress towards the solution because thedifferentiability of ‖F (x)‖2 allows one to justify the use of safeguarded inter-polation schemes for choosing αnew ∈ [θ1α, θ2α]. Global convergence proofsprobably hold for arbitrary norms.

The choices of δ and wi at Step 2 of GBIN deserve some discussion. Ifwe fix δ = α and wi = yi, the solution of (13) is an approximate solution ofthe linear system

Ji(yi)di +1αFi(yi) + (1− 1

α)Fi(xki ) = 0. (16)

For α = 1 this corresponds exactly to an iteration of the local algorithmdefined in Section 2. For this reason we say that GBIN is a generalization ofBIN. When α is reduced, the scheme based on the inexact solution of (16)preserves the spirit of decomposition in the sense that the Jacobian Ji iscomputed at the displaced point yi, as an attempt to use fresh informationon the local variation of the current block. However, the independent termof the linear system (16) must change in such a way that, in the limit,di approximates a Newton direction. This scheme might not be economicbecause each time α is reduced, Ji(yi) and the approximate solution of (16)must be computed again. A modified algorithm could consist on choosingδ “very small” and wi = xki . In this case, the same δ can be used throughsuccessive reductions of α and, so, the solution of (13) does not change.The resulting algorithm turns out to be an Inexact-Newton backtrackingmethod in the sense of [6] and the convergence proof can be obtained usingthe acceptance criterion (15) and the arguments of [6]. In this case, nointermediate iterates are computed, but the decomposition structure of themethod disappears and, in fact, GBIN ceases to be a generalization of BIN.

The discussion above reveals that, in some sense, full preservation ofthe decomposition philosophy conflicts with low computational cost require-ments, at least if the first trial point is not accepted. However, the formu-lation of GBIN given above allows us to combine decomposition and pure

14

backtracking in several ways. For example, one can choose δ = α and wi = yiwhen α = 1 but δ “very small” and wi = xki when α < 1. Since we expectthat, at most iterations, α = 1 will be accepted near a solution, (see Theo-rem 4 below), the resulting method will be, in many cases, very similar tothe decomposition algorithm BIN. The first time α is reduced, a new linearsystem must be solved but at the remaining reductions the same di can beused.

The question of how to choose parameters ηk in ordinary Inexact-Newtonmethods has been object of several studies. See [7], [9] and references therein.Many of the strategies used for the (one-block) Inexact-Newton method canbe easily adapted to the Block Inexact-Newton case. As so happens to bewith ηk, the parameters σ, θ1 and θ2 are adimensional and so, it is possibleto recommend specific choices independently of the problem. See [5, 7].

The global algorithm can be stated using an approximation Ji for theJacobian Ji in the equation (13). Convergence statements stay valid withminor technical changes in the proofs if the approximation Ji is nonsingular,‖J−1

i ‖ is bounded and Ji can be made close to Ji(xki

).

In the rest of this section, we prove the global convergence results. Thefirst lemma we need to prove says that, independently of the current pointxk there exists a common bound for the directions d computed at Step 2 ofGBIN.

Lemma 4 Let {xk}lie in a compact set. Then, the directions di computedat Step 2 of GBIN are well defined and uniformly bounded independently ofxk.

Proof. By Assumption A5, F ′(x) is nonsingular. So, by the structure ofthis matrix, Ji(x) is nonsingular. This implies that directions di satisfying(13) always exist. Now, by (13),

|J1(xk1)d1 + F1(xk1)| ≤ η|F (xk1)|.

This implies that|J1(xk1)d1| ≤ (1 + η)|F (xk1)|.

So,|d1| ≤ |J1(xk1)−1||J1(xk1)d1| ≤ 2|J1(xk1)−1||F (xk1)|.

Therefore, the directions d1 are bounded.

15

Suppose that i ≥ 2 and that the directions d1, . . . , di−1 are bounded.Then, by (13),

|di| ≤ |Ji(wi)−1|[|Fi(xki ) +1δ

(Fi(zi)− Fi(xki ))|+ η|Fi(wi)|].

By the inductive hypothesis and the boundedness of {xk}, we have that{|Fi(wi)|} is bounded. Therefore, we only need to prove that 1

δ (Fi(zi) −Fi(xki )) is bounded independently of α.

But

1δ

(Fi(zi)−Fi(xki )) =1δ

(Fi(xk1+δd1, . . . , xki−1+δdi−1, x

ki )−Fi(xk1, . . . , xki−1, x

ki ))

=i−1∑j=1

[∫ 1

0

∂Fi∂xj

(xk1 + tδd1, . . . , xki−1 + tδdi−1, x

ki )dt]dj . (17)

So, the desired result follows from the inductive hypothesis , the bound-edness of {xk} and the continuity and boundedness of the partial derivatives.2

In the following lemma, we prove that, independently of the currentpoint xk, after a finite number of reductions of α a trial point is found thatsatisfies the sufficient decrease condition (14). Therefore, a single iterationof GBIN finishes in finite time.

Lemma 5 Algorithm GBIN is well defined. That is, α cannot be decreasedinfinitely many times within a single iteration.

Proof. If F (xk) = 0 the algorithm terminates, so let us assume F (xk) 6= 0.Suppose that α is decreased infinitely many times i.e. GBIN generatesinfinite sequences {αj}, {δj} and {dj} at iteration k such that

‖F (xk + αjdj)‖2 > (1− σγkαj

2)2‖F (xk)‖2. (18)

Let us denote zj1 = yj1 = xk1,

yji = (xk1 + αjdj1, . . . , x

ki−1 + αjd

ji−1, x

ki ), i = 2, . . . ,m,

zji = (xk1 + δjdj1, . . . , x

ki−1 + δjd

ji−1, x

ki ), i = 2, . . . ,m,

16

and wji ∈ {xki , yji } as in Step 2 of GBIN.

Since, as in Lemma 4, all the directions dj generated at Step 2 of GBINare bounded, there exists an infinite set of indices K1, such that

limj∈K1

dji = d∗i , i = 1, 2, . . . ,m, (19)

and, since αj → 0, we have

limj∈K1

δjdji = lim

j∈K1

αjdji = 0.

Consequently,limj∈K1

wji = limj∈K1

yji = limj∈K1

zji = xki . (20)

Define

Ai = Ji(wji )d

ji +

1δj

(Fi(zji )− Fi(x

ki )), i = 1, 2, . . . ,m.

Then, by (13), we have that

〈Ai + Fi(xki ), Ai + Fi(xki )〉 ≤ η2k〈Fi(w

ji ), Fi(w

ji )〉.

This implies, using 〈Ai, Ai〉 ≥ 0, that

2〈Fi(xki ), Ji(wji )d

ji +

1δj

(Fi(zji )− Fi(x

ki ))〉 ≤ η2

k|Fi(wji )|

2 − |Fi(xki )|2, (21)

for i = 1, 2, . . . ,m.Define φ(x) = 1

2‖F (x)‖2, then ∇φ(x) = F ′(x)TF (x). Using (19), (20),the identity

1δj

(Fi(zji )−Fi(x

ki )) =

1δ

(Fi(xk1+δdj1, . . . , xki−1+δdji−1, x

ki )−Fi(xk1, . . . , xki−1, x

ki ))

and taking limits on both sides of (21) we obtain:

〈∇φ(xk), d∗〉 ≤ −γk2‖F (xk)‖2. (22)

On the other hand, our assumption (18) gives

φ(xk + αjdj)− φ(xk)

αj> −σγk

2‖F (xk)‖2 +

σ2γ2kαj8‖F (xk)‖2

17

and, by the Mean - Value Theorem there exists ξj ∈ [0, 1] such that

〈∇φ(xk + ξjαjdj), dj〉 > −

σγk2‖F (xk)‖2 +

σ2γ2kαj8‖F (xk)‖2.

Taking limits in the last inequality we obtain

〈∇φ(xk), d∗〉 ≥ −σγk2‖F (xk)‖2.

This contradicts (22). Therefore, α cannot be decreased infinitely manytimes within one iteration. 2

The main convergence result for GBIN is given in the following theorem.We are going to prove that if {xk} is bounded, every cluster point of thesequence is a solution of (1). Boundedness of {xk} can be guaranteed if‖F (x)‖ has bounded level sets. Again, Lemma 4 is crucial for proving thisresult since it guarantees that search directions are bounded.

Theorem 3 Let{xk}

be a bounded infinite sequence generated by GBIN.Then all its limit points are solutions of (1).

Proof. Assume that K2 is an infinite set of indices such that

limk∈K2

xk = x∗.

Let us first suppose that αk ≥ α > 0 for k ≥ k1. Then,

1− σγkαk2≤ 1− σγα

2= r < 1, for k ≥ k1.

so ‖F (xk+1)‖ ≤ r‖F (xk)‖ for all k ≥ k1. This implies that ‖F (xk)‖ → 0and F (x∗) = 0.

Assume now that lim infk→∞ αk = 0. Let K3 be an infinite subset ofK2 such that α is decreased at every iteration k ∈ K3 and limk∈K3 αk = 0.So, for k ∈ K3 we have a steplength α′k which immediately precedes αkfor which the sufficient descent condition (14) does not hold. Let δ′k ∈(0, α′k] as in Step 2 of GBIN (αk is chosen in [θ1α

′k, θ2α

′k].) Let gk be the

direction corresponding to α′k, generated at Step 2 of GBIN. By Lemma 4and the hypothesis of this theorem, {|gki |} is a bounded sequence for i =

18

1, 2, . . . ,m, k ∈ K3. So, there exists an infinite set of indices K4 ⊂ K3 suchthat

limk∈K4

gki = g∗i , limk∈K4

ηk = η, limk∈K4

γk = γ = 1− η2 > 0

andlimk∈K4

α′kgk = lim

k∈K4

δ′kgk = 0.

Define, for k ∈ K4,yk1 = zk1 = xk1,

yki = (xk1 + α′kgk1 , . . . , x

ki−1 + α′kg

ki−1, x

ki ),

zki = (xk1 + δ′kgk1 , . . . , x

ki−1 + δ′kg

ki−1, x

ki ),

and wki ∈ {xki , yki } as in Step 2 of GBIN, i = 2, . . . ,m. Defining φ as inLemma 5, using the arguments of that Lemma and uniform continuity oncompact sets, we deduce that

〈∇φ(x∗), g∗〉 ≤ − γ2‖F (x∗)‖2. (23)

Now, by the definition of α′k, we have that

‖F (xk + α′kgk)‖2 > (1− σγkα

′k

2)2‖F (xk)‖2 for all k ∈ K4.

So, as in Lemma 5,

φ(xk + α′kgk)− φ(xk)

α′k> −σγk

2‖F (xk)‖2 +

σ2γ2kα′k

8‖F (xk)‖2.

The Mean–Value Theorem implies that there exists ξk ∈ [0, 1] such that

〈∇φ(xk + ξkα′kgk), gk〉 > −σγk

2‖F (xk)‖2 +

σ2γ2kα′k

8‖F (xk)‖2.

Taking limits in the last inequality we obtain

〈∇φ(x∗), g∗〉 ≥ −σγ2‖F (x∗)‖2. (24)

By (23) and (24), it follows that F (x∗) = 0. 2

To complete the analysis of global convergence we will prove that GBINmatches well with local method analyzed in the previous section. The fol-lowing classical lemma will be used.

19

Lemma 6 Let F ′(x∗) be nonsingular and

P = max{‖F ′(x∗)‖+1

2Q, 2Q}

where Q = ‖F ′(x∗)−1‖. Then

1P‖y − x∗‖ ≤ ‖F (y)‖ ≤ P‖y − x∗‖

for ‖y − x∗‖ sufficiently small.

In the next theorem we assume that algorithm GBIN runs in such a waythat, at each iteration, the m Newtonian linear equations are solved withincreasing accuracy ηk → 0. In this case, it will be proved that, if a solutionx∗ is a limit point of {xk} where the basic assumptions hold, then the firsttrial direction dk will be asymptotically accepted. As a consequence, GBINcoincides asymptotically with BIN in this case and ηk → 0 implies that theconvergence is superlinear.

Theorem 4 Assume that the sequence {xk} is generated by GBIN and thatfor all k = 0, 1, 2, . . ., whenever α = 1 we choose δ = α, wi = yi and we havethat (13) holds with ηk → 0. Suppose that x∗ is a limit point of {xk} andthat the basic assumptions (A1)-(A5) are satisfied in some neighborhood ofx∗. Then x∗ is a solution of (1) and xk → x∗ superlinearly.

Proof. Let ε1 > 0 be such that the basic assumptions are satisfied for‖x − x∗‖ < ε1. Let k be large enough such that ‖xk − x∗‖ ≤ ε1. Sincelimk→∞ ηk = 0, using the same argument as in the proof of Theorem 2 wecan prove that

limk→∞

‖xk + dk − x∗‖‖xk − x∗‖

= 0

where dk is the increment obtained with α = 1. Thus, since σγk ∈ (0, 1)from Lemma 6 it follows that, for k large enough,

‖F (xk + dk)‖ ≤ (1− σγk2

)‖F (xk)‖.

So xk + dk satisfies the sufficient descent condition for k large enough. Thismeans that, for k large enough, GBIN coincides with BIN. Therefore, The-orem 2 applies and the convergence is superlinear. 2

20

5 Conclusions

In the Introduction we emphasized the relevance of the Block Inexact-Newton theory to complete the convergence theory of the Block-Newtonmethod introduced in [4]. We also mentioned the fact that, when ni is largeand the structure of Ji is not good for matrix factorizations, using Newton’smethod is not possible and the employment of iterative linear solvers on eachblock is unavoidable. The practical question is: Is it better to perform justone Inexact-Newton iteration on the i-th block (as suggested by this paper)or to execute several Inexact-Newton iterations on block i before dealingwith block i+ 1? The common practice is, in many cases, to “solve” block ibefore considering block i+ 1. (“Solving” is an ambiguous word in this casesince we deal with infinite processes.) Perhaps, this is the most effectivealternative in a lot of practical situations. Numerical experiments in [4]seem to indicate that using “several” iterations on each block is, frequently,more efficient than performing only one. However, some arguments can beinvoked in defense of the “one-iteration” strategy:

1. It is easy to construct small-dimensional (even 2 × 2) examples (sayf1(x1) = 0, f2(x1, x2) = 0) where “to solve” f1(x1) = 0 before dealingwith f2 leads to disastrous results. This is what happens when thebehavior of f1(x∗1, x2) is good only for x2 close to the solution. Inthese cases it is better to approach slowly to x∗1, mixing these attemptswith approximations to the solution of the second equation. Of course,these examples could appear arbitrary, but this is also the case of theexamples described in [4] which seem to favor p−step methods.

2. If the pure local method does not work, the present research showshow to correct it in order to achieve global convergence. It is not clearwhat should be done if we used a “several-iterations” strategy. In thebest case, we could be wasting a lot of time for global unsuccessfultrials.

3. From the considerations above it seems that, in difficult problems,a “several-iterations” strategy should be associated to internal glob-alization schemes. In other words, when dealing with block i, the“several-iterations” strategy should not consist on performing “sev-eral” local Inexact-Newton (or Newton) iterations but on performingthe same number of global Inexact-Newton iterations. Of course, suchprocedure can work very well but, rigorously speaking, an additional

21

external globalization procedure should be necessary to guarantee con-vergence. So, the globalization introduced in this paper, that uses onlylocal internal methods, is appealing.

As a result, we think that both “one-iteration” and “several-iterations”strategies deserve attention and can be effective for practical large-scaleblock systems. The correct and effective way of implementing “several-iterations” strategies is, however, still unclear and deserves future research.

Let us finish with some remarks regarding potential applications. Manyengineering problems are described by systems of differential-algebraic equa-tions (DAEs). Parametric sensitivity of the (DAE) model may yield usefulinformation. See [8, 10]. The general DAE system with parameters is of theform

F(t, y, y′, p

)= 0, y (0) = y0, y ∈ IRny , p ∈ IRnp ,

with sensitivity equations

∂F

∂ysi +

∂F

∂y′si +

∂F

∂p= 0, i = 1, . . . , np,

where si = dy/dpi. Defining

Y = [y, s1, . . . , snp ]>

and

IF =

[F (t, y, y′, p),

∂F

∂ys1 +

∂F

∂y′s1 +

∂F

∂p1, . . . ,

∂F

∂ysnp +

∂F

∂y′snp +

∂F

∂pnp

],

the combined system can be written as

IF(t, Y, Y ′, p

)= 0, Y0 =

[y0,

dy0

dp1, . . . ,

dy0

dp1

]>.

Approximating the solution to the combined system by an implicit nu-merical method leads to a block-lower triangular nonlinear algebraic system.When y (or p) is of large dimension, an iterative method for solving the New-tonian equation is necessary. Up to our present knowledge, several codes forsolving this problem exist. Some of them, like DASSLSO, DASPKSO (see[10]) and DSL48S (see [8]), exploit the lower-triangular structure of theJacobian but not in the way suggested here. The implementation of thealgorithm described in this paper for sensitivity analysis problems can be

22

the subject of research in the near future.

AcknowledgementsThe authors are indebted to two anonymous referees whose criticism

helped a lot to improve this paper. In particular, “Referee 1” suggested thepresent form of Lemma 3, from which q-linear convergence follows. “Ref-eree 2” made insightful remarks concerning a first version of the global algo-rithm and drew our attention to possible applications of the BIN algorithm.

References

[1] M. G. Crandall (editor), Nonlinear Evolution Equations, AcademicPress, 1978.

[2] E. J. Dean, A Model Trust Region Modification of Inexact Newton’smethod for nonlinear two point boundary value problems, TR No

¯ 85-6,Department of Mathematical Sciences, Rice University, Houston, Texas(1985).

[3] R. S. Dembo, S. C. Eisenstat and T. Steihaug, Inexact Newton methods,SIAM Journal on Numerical Analysis 19 (1982), 400-408.

[4] J. E. Dennis, J. M. Martınez and X. Zhang, Triangular decompositionmethods for solving reducible nonlinear systems of equations, SIAMJournal on Optimization 4 (1994), 358-382.

[5] J. E. Dennis and R. B. Schnabel, Numerical Methods for Uncon-strained Optimization and Nonlinear Equations, Prentice-Hall, Engle-wood Cliffs, 1983.

[6] S. C. Eisenstat and H. F. Walker, Globally Convergent Inexact NewtonMethods, SIAM Journal on Optimization 4 (1994), 393-422.

[7] S. C. Eisenstat and H. F. Walker, Choosing the forcing terms in anInexact Newton method, SIAM Journal on Scientific Computing 17(1996), 16-32.

[8] W. F. Feehery, J. E. Tolsma, P. I. Barton, Efficient sensitivity analy-sis of large - scale differential - algebraic systems, Applied NumericalMathematics 25 (1997), 41-54.

23

[9] Z. Luzanin, N. Krejic and D. Herceg, Parameter selection for InexactNewton method, Nonlinear Analysis. Theory, Methods and Applica-tions 30 (1997), 17-24.

[10] T. Maly, L. R. Petzold, Numerical methods and software for sensitivityanalysis of differential - algebraic systems, Applied Numerical Mathe-matics 20, (1996), 57-79.

[11] T. Meis and U. Marcowitz, Numerical Solution of Partial DifferenceEquations, Springer-Verlag, New York, Heidelberg, Berlin, 1981.

[12] T. U. Myint and L. Debnath, Partial Differential Equations for Scien-tists and Engineers, Elsevier, 1987.

[13] J. M. Ortega and W. C. Rheinboldt, Iterative Solution of NonlinearEquations in several Variables, Academic Press, NY, 1970.

24

A globally convergent Inexact-Newton method for solving ...

Documents

Transcript of A globally convergent Inexact-Newton method for solving ...