On Decomposition Methods for Multidisciplinary Design...

On Decomposition Methods forMultidisciplinary Design Optimization

Victor DeMiguelLondon Business School, UK

[email protected], http://faculty.london.edu/avmiguel/

Francisco J. NogalesDepartment of Statistics, Universidad Carlos III de Madrid, Spain

[email protected], http://www.est.uc3m.es/fjnm

Multidisciplinary design optimization (MDO) problems are engineering design problems that require theconsideration of the interaction between several design disciplines. Due to certain organizational aspectsof MDO problems, decomposition algorithms are often the only feasible solution approach. Two naturaldecomposition approaches to the MDO problem are bilevel decomposition algorithms and Schur interior-point methods. Our first contribution is to find a strikingly close relationship between these two approaches.This constitutes one more step towards developing a robust convergence theory for bilevel decompositionalgorithms. Our second contribution is to show how Schur interior-point methods can be modified to dealwith certain problems for which the Schur complement matrix is not invertible in general. This substantiallyenlarges the class of problems that can be addressed with this type of methods.

Key words : Bilevel programming, Schur complement, decomposition algorithms, multidisciplinary designoptimization.

MSC2000 subject classification : 49M27, 90C30, 90C51History : Submitted 5th December 2005

1. Introduction. The Multidisciplinary Design Optimization (MDO) problem arises in engi-neering design projects that require the consideration of several disciplinary analysis [1, 12, 23]. Forinstance, when designing an airplane, one must consider both a structural and an aerodynamicalanalysis. Often, a different group of engineers is in charge of each of these disciplinary analysis.Moreover, the different groups often rely on sophisticated software codes (known as legacy codes)that have been under development for many years and whose method of use is subject to constantmodification. Integrating all of these codes into a single platform is judged to be impractical.

In this context, decomposition algorithms may be the only feasible alternative to finding theoverall optimal design. Decomposition algorithms exploit the structure of MDO problems by re-formulating them as a set of independent subproblems, one per discipline. Then, a so-called masterproblem is used to coordinate the subproblem solutions and find an overall problem minimizer.

A popular approach to decompose the MDO problem is bilevel decomposition algorithms (BDAs)[33, 6, 15]. Once decomposed into a master problem and a set of subproblems, the MDO problembecomes a particular type of a bilevel program. For a review of bilevel programming see the recentpaper by Colson, Marcotte, and Savard [10] or the monographs [32, 16]. Just as bilevel program-ming methods, BDAs apply nonlinear optimization techniques to solve both the master problemand the subproblems. At each iteration of the algorithm solving the master problem, each of thesubproblems is solved, and their minimizers are used to compute the master problem derivativesand their associated Newton direction.

The main advantage of BDAs is that they allow for a high degree of flexibility in the solutionprocess. In particular, the different groups working on the disciplinary analysis are free to choosethe particular optimization algorithms they use to solve the disciplinary subproblems. This allowsengineers to re-use their software codes with only minor modifications. The downside of BDAs,

1

mailto:[email protected]�

http://faculty.london.edu/avmiguel/�

mailto:[email protected]�

http://www.est.uc3m.es/fjnm�

2 DeMiguel and Nogales: On Decomposition Methods for MDO

however, is that there is analytical and numerical evidence that certain commonly used BDAsmay fail to converge even when the starting point is very close to the minimizer [2, 13]. Moreover,although there are local convergence proofs for certain BDAs [15], these proofs require the com-putation of exact subproblem minimizers. Thus the convergence theory of BDAs is not nearly asdeveloped as that of standard interior-point methods.1

An alternative to BDAs, with well-established local convergence properties, is to use an interior-point method combined with a Schur complement approach. That is, one can use an interior-pointmethod to solve the MDO problem in its original (integrated) form, but then apply a Schurcomplement approach to solve the Newton system (and thus compute the search direction) in adecentralized manner. These algorithms allow for a parallel solution of the linear algebra involvedin interior-point methods, have lower cost per iteration than BDAs, and can be shown to convergesuperlinearly or quadratically to a solution of the MDO problem.

Despite their advantageous features, Schur interior-point methods have not been used to solveMDO problems in practice. One of the reasons for this is that the degree of flexibility providedby these approaches is not as high as with BDAs. In particular, when using a Schur interior-pointmethod, the same optimization algorithm must be used for all disciplinary analysis (in fact, to solvethe whole problem) and there is flexibility only at the linear algebra level. Another reason for thelack of popularity of the Schur complement approach among engineers is that it can only be appliedto solve problems that satisfy the so-called Strong Linear Independence Constraint Qualification(SLICQ). Roughly speaking, the SLICQ implies that the constraints of the MDO problem can beeliminated and that, in the vicinity of the minimizer, the problem is an unconstrained one [15].Unfortunately, SLICQ does not hold for some MDO problems, where the difficulty is precisely tofind a solution that is feasible with respect to the constraints of all disciplines.

Summarizing, BDAs are currently the method of choice of most engineers solving MDO problemsbecause of their flexibility, while Schur interior-point methods are efficient and backed up by robustconvergence theory for the case where SLICQ holds.

We make two contributions to the field of decomposition methods for MDO. Our first contribu-tion is to establish a theoretical relationship between BDAs and Schur interior-point methods. Inparticular, we find a strikingly close relationship between a particular inexact BDA, which onlytakes one iteration to solve the subproblems, and a Schur interior-point method. The relevance ofthis relationship is that if effectively bridges the gap between the incipient local convergence theoryof BDAs [2, 15] and the mature local convergence theory of interior-point methods [25, 17, 37, 22].As a result we think our analysis constitutes one important step towards the development of arobust convergence theory for BDAs.

Our second contribution is to show how Schur interior-point methods can be applied to solveMDO problems that satisfy only the conventional LICQ but not the Strong LICQ. We accomplishthis task in two steps. First, we show that any MDO problem satisfying the conventional LICQcan be regularized by introducing an exact penalty function. Second, we show how degenerateoptimization techniques can be used to ensure a Schur interior-point method will achieve fast localconvergence when applied to solve the regularized problem. The importance of this contribution isthat it substantially enlarges the class of problems that can be addressed with Schur interior-pointmethods and thus makes them a viable alternative to BDAs for MDO problems.

The paper is organized as follows. In Section 2, we state the MDO problem as well as some ofthe assumptions we make in our analysis. Section 3 discusses the two benchmark approaches to theMDO problem: bilevel decomposition algorithms and Schur interior-point methods. In Sections 4

1 Cutting-plane decomposition methods [3, 34, 21, 30, 26, 28] rely heavily on convexity assumptions and thus cannot be applied to solve the MDO problem because of its inherent nonconvexity. Likewise, the theory of augmentedLagrangian decomposition methods [9, 29, 31] depends on convexity assumptions. Moreover, augmented Lagrangianmethods may converge slowly in practice (see [24, 8, 15]).

DeMiguel and Nogales: On Decomposition Methods for MDO 3

and 5, we introduce an inexact BDA and analyze its relationship to Schur interior-point methods.In Section 6, we show how an MDO problem satisfying LICQ can be regularized by introducing anexact penalty function. In Section 7, we show how the Schur interior-point method can be modifiedto ensure it achieves fast local convergence when applied to solve the regularized problem. Section8 gives a numerical example and finally, Section 9 states some conclusions.

2. Problem Statement and Assumptions

2.1. Problem statement. MDO problems integrate the objective and constraint functions ofK disciplinary analysis. Usually, only a few of the design variables, which we term global variables,are relevant to all disciplines, while the remainder only affect one of the disciplinary analysis.Mathematically, these problems may be stated as follows:

minx,y1,y2,··· ,yK

F1(x, y1)+F2(x, y2)+ · · ·+FK(x, yK)

s.t. c1(x,y1) ≥ 0,c2(x,y2) ≥ 0,

...cK(x, yK) ≥ 0,

(1)

where x ∈ Rn are the global variables, yk ∈ Rnk are the kth discipline local variables, ck(x, yk) :Rn+nk →Rmk are the kth discipline constraints, Fk(x, yk) :Rn+nk →R is the objective functionterm corresponding to the kth discipline, and there are K disciplines. Note that while globalvariables appear in all of the objective function terms and constraints, local variables appear onlyin the objective function term and constraints corresponding to one of the disciplines.

To facilitate the exposition and without loss of generality, herein we consider the followingsimplified problem composed of only one discipline:

minx,y,r

F (x, y),

subject to c(x, y)− r = 0,r≥ 0,

(2)

where x ∈ Rnx are the global variables, y ∈ Rny are the local variables, r ∈ Rm are the slackvariables, and c(x, y) :Rnx+ny →Rm and F (x, y) :Rnx+ny →R are smooth functions. Note that, inaddition to considering only one discipline, we have introduced slack variables so that only equalityconstraints and nonnegativity bounds are present.

2.2. Assumptions. We assume there exists a minimizer (x∗, y∗, r∗) to problem (2) and aLagrange multiplier vector (λ∗, σ∗) such that the vector

w∗ = (x∗, y∗, r∗, λ∗, σ∗)

is a KKT point; that is, it satisfies the KKT conditions:

∇x,yL(w∗) = 0 (3)−σ∗+λ∗ = 0 (4)

c(x∗, y∗)− r∗ = 0 (5)R∗σ∗ = 0 (6)r∗, σ∗ ≥ 0, (7)

where λ∗ ∈Rm are the multipliers of the equality constraints, σ∗ ∈Rm are the multipliers of thenonnegativity bounds on r∗, R∗ = diag(()r∗), and the Lagrangian function is L(w∗) = F (x∗, y∗)−(λ∗)T (c(x∗, y∗)− r∗)− (σ∗)T r∗.


We make the following assumptions on the problem functions and on the KKT point w∗ =(x∗, y∗, r∗, λ∗, σ∗).

A.1 The second derivatives of the functions in problem (2) are Lipschitz continuous in an openconvex set containing w∗.A.2 The linear independence constraint qualification is satisfied at w∗, that is, the matrix

J =(∇xc(x∗, y∗) ∇yc(x∗, y∗) −I

0 0 IZ

)(8)

has full row rank, where Z is the active set {i : r∗i = 0} and IZ is the matrix formed by the rowsof the identity corresponding to indices in Z.A.3 The strict complementary slackness condition is satisfied at w∗; that is, σ∗i > 0 for i∈Z.A.4 The second order sufficient conditions for optimality are satisfied at w∗; that is, for all d 6= 0satisfying Jd = 0 we have

dT∇2L(w∗)d > 0, (9)

where ∇2L(w∗) is the Hessian of the Lagrangian function with respect to the primal variablesx, y, r.

Finally, the following condition is often assumed to ensure the Schur complement matrix is well-defined near the solution.C.1 The Strong Linear Independence Constraint Qualification (SLICQ) holds at w∗, that is, thematrix (∇yc(x∗, y∗) −I

0 IZ

)(10)

has full row rank.

Clearly, the SLICQ (Condition C.1) implies the LICQ (Assumption A.2). Moreover, by the implicitfunction theorem, it is easy to see that the SLICQ implies that the MDO constraints can be usedin the vicinity of the minimizer to eliminate the local and slack variables y and r and transformthe MDO problem into an unconstrained problem on the global variables only. To illustrate thispoint, we consider the following example:

minx,y

12(x− a)2 + 1

2(y− b)2

s.t. x+ y ≤ 2,x− y ≤ 0.

Note that the LICQ holds for this problem at all feasible points. In addition, the parameters (a, b)are useful to control whether the SLICQ holds at the minimizer. In particular, for (a, b) = (1,0) theminimizer is (x∗, y∗) = (0.5,0.5). At this point, only one of the constraints is active and the SLICQholds. Moreover, note that for (a, b) = (1,0), one can perturb the value of the global variable xaround the minimizer and there are values of the local variable y for which the problem is feasible.That is, one may use the active constraint to eliminate the local variable from the problem inthe vicinity of the minimizer, and thus obtain an equivalent unconstrained problem on the globalvariables only. For (a, b) = (2,1), on the other hand, the minimizer is (x∗, y∗) = (1,1). At this point,there are two active constraints and the SLICQ does not hold. Also, if one increases the value ofthe global variable x above its optimal value of 1, there is no value of the local variables that makesthe problem feasible. That is, for this case we can not use the constraints to eliminate the localvariables from the problem. The example is illustrated in Figure 1.


-

6y

x

@@

@@@

¡¡

¡¡¡

v

r

(a)Case (a, b) = (1,0): theSLICQ holds at the mini-mizer (0.5,0.5).

-

6y

x

@@

@@@

¡¡

¡¡¡

v r

(b)Case (a, b) = (2,1):the SLICQ does not holdat the minimizer (1,1).

Figure 1. Example illustrating SLICQ.

3. Two Benchmark Approaches. We discuss two benchmark approaches to the MDO prob-lem: bilevel decomposition algorithms (BDAs) and Schur interior-point methods (SIPMs). Becausemost of our analysis will be based on interior-point techniques, it is convenient to introduce thebarrier MDO problem:

minx,y,r

F (x, y)−µ∑m

i=1 log(ri)

subject to c(x, y)− r = 0,(11)

where µ is the barrier parameter.

3.1. Bilevel decomposition algorithms. The structure of the barrier MDO problem sug-gests it can be decomposed into a master problem, which only depends on the global variables,and a subproblem, which depends on the local and slack variables. In particular, problem (11) canbe reformulated as the following master problem:

minx

F ∗(x). (12)

where F ∗(x)≡F (x, y∗(x))−µ∑m

i=1 log(r∗i (x)) is the subproblem optimal value function

F ∗(x) = minimumy,r

F (x, y)−µ∑m

i=1 log(ri)

subject to c(x, y)− r = 0,(13)

and we have omitted the dependence of F ∗(x) on µ to simplify notation.Bilevel decomposition algorithms (BDAs) divide the job of finding a minimizer to the MDO

problem into two different tasks: (i) finding an optimal value of the global variables x∗ and (ii)finding an optimal value of the local variables y∗(x) for a given value of the global variables x.The first task is accomplished by solving the master problem (12) and the second by solving thesubproblem (13).

BDAs apply an unconstrained nonlinear optimization method to solve the master problem. Ateach iteration, a new estimate of the global variables xk is generated and the subproblem is solvedexactly using xk as a parameter. Then, sensitivity analysis formulae [18] are used to compute themaster problem objective and its derivatives from the exact subproblem minimizer. Using thisinformation, a new estimate of the global variables xk+1 is computed. This procedure is repeateduntil a master problem minimizer is found. The general BDA algorithm is stated in Figure 2

The main advantage of BDAs is that, in the general case where there are more than one disci-pline, the above formulation allows the different disciplines to be dealt with almost independently.


Initialization: Choose a starting point x0. Set k← 0 andchoose the parameter µ0 > 0.Repeat

1. Solve subproblem: Solve the subproblem (13) withµk and evaluate F ∗(xk), ∇F ∗(xk), and ∇2F ∗(xk).

2. Master problem search direction: Compute thesearch direction for the master problem (12) by solving thesystem: ∇2F ∗(xk)∆x =−∇F ∗(xk).

3. Line search: Compute a diagonal matrix, Λk, of stepsizes.

4. Update iterate: Set xx+1 = xk +αk∆x,5. Parameter update: Set µk+1 > 0 and k← k +1.

Until convergence

Figure 2. Bilevel Decomposition Algorithm (BDA)

Moreover, within the framework given in Figure 2, the engineer is free to choose the particular opti-mization algorithm used to solve each of the disciplinary subproblems and the sensitivity formulaeused to compute the first and second derivative of F ∗(x). Again, this usually allows engineers tore-use their disciplinary software codes with minimum change. Another advantage of BDAs is thatthere exist BDAs that can deal with problems whose minimizer does not satisfy the SLICQ [5, 15].

The downside of BDAs is that there is analytical and numerical evidence that certain commonlyused BDAs may fail to converge even when the starting point is very close to the minimizer[2, 13]. Moreover, although there are some local convergence proofs for certain BDAs that solvethe subproblems exactly [15], it is safe to say that the local convergence theory of BDAs is notnearly as satisfactory as that of standard interior-point methods.

3.2. A Schur interior-point method. In this section, we describe how a primal-dualinterior-point method [7, 17, 19, 20, 35] can be combined with a Schur complement approach[11] to solve problem (1) in a decentralized manner. We term the resulting approach the SchurInterior-Point Method (SIPM).

3.2.1. The interior-point method. Primal-dual interior-point methods apply Newton’smethod to solve a perturbed version of the KKT conditions for the MDO problem (3)–(7). Ateach iteration, a search direction is computed by solving a linearization of these perturbed KKTconditions. Then, a step size is chosen such that all nonnegative variables remain strictly positive.

Given a suitable constraint qualification holds, a minimizer to the barrier MDO problem (11)must satisfy the following KKT conditions:

g(µ)≡

∇xF (x, y)−∇xc(x, y)T λ∇yF (x, y)−∇yc(x, y)T λ

−σ +λ−c(x, y)+ r−Rσ +µ em

= 0, (14)

where R = diag(()r), λ ∈Rm are the multipliers of the equality constraints, σ ∈Rm are the mul-tipliers of the nonnegativity bounds on r, em ∈Rm is the vector whose components are all ones,and the variables r,λ,σ are strictly positive. Note that a minimizer to the MDO problem (2) mustsatisfy g(0) = 0. This is why people refer to conditions (14) as the perturbed KKT conditions.

Notice that the problem variables can be split into two different components: the global compo-nent x and the local component y = (y, r,λ,σ). Likewise, g(µ) can also be split into two different


components. Note that, to simplify notation, we omit the dependence of g on the variables andmultipliers.

g(µ) =(

g1

g2(µ)

)= 0, (15)

whereg1 =∇xF (x, y)−∇xc(x,y)T λ, (16)

and

g2(µ) =

∇yF (x, y)−∇yc(x, y)T λ−σ +λ

−c(x, y)+ r−Rσ +µ em

. (17)

Let wk = (xk, yk) be the current estimate of the global and local components. Then, the Newtonsearch direction, ∆wN

k = (∆xNk ,∆yN

k ), is the solution to the following system of linear equations:(

Wk −ATk

−Ak Mk

)(∆xN

k

∆yNk

)=−

(g1,k

g2,k(µk)

), (18)

where g1,k and g2,k denote the functions g1 and g2 evaluated at wk, Wk = ∇xg1,k, Ak =−∇xg2,k(µk) =−(∇yg1,k)T , and

Mk =∇yg2,k(µk). (19)

For convenience, we rewrite the Newton system (18) as

KNk ∆wN

k =−gk(µk). (20)

In addition to computing the Newton step, interior-point methods choose a step size such thatall nonnegative variables remain strictly positive. In our case, r, λ and σ must remain positive. Toensure this, we assume that the step sizes are chosen such as those in [37]. Therefore, at iterationk,

αr,k = min{1, γk min{− rk,i

∆rNk,i

} s.t. ∆rNk,i < 0

}(21)

αλ,k = min{1, γk min{− λk,i

∆λNk,i

} s.t. ∆λNk,i < 0

}, (22)

ασ,k = min{1, γk min{− σk,i

∆σNk,i

} s.t. ∆σNk,i < 0

}, (23)

where γk ∈ (0,1). As the global and local variables are not required to be nonnegative, we can set

αx,k = αy,k = 1. (24)

If we define the matrix Λk as

Λk =

αx,kI 0 0 0 00 αy,kI 0 0 00 0 αr,kI 0 00 0 0 αλ,kI 00 0 0 0 ασ,kI

,

the kth iteration of a primal-dual algorithm has the following form:

wk+1 = wk +Λk ∆wNk . (25)

We also define the matrix Λy,k, which contains the step lengths on the y variables. That is, thematrix Λy,k = diag(()αy,kI,αr,kI,αλ,kI,ασ,kI).


3.2.2. The Schur complement approach. Finally, assuming the matrix Mk is invertible,one can use the Schur complement approach to solve the Newton system (18) in a distributedmanner. In particular, the Schur complement of matrix KN

k is the matrix

Sk = Wk− ATk M−1

k Ak. (26)

If in addition Sk is invertible, the global component of the Newton search direction ∆xNk can be

computed as:Sk ∆xN

k =−(g1,k + AT

k M−1k g2,k(µk)

). (27)

Then the local component ∆yNk is

Mk ∆yNk =−(

g2,k− Ak∆xNk

). (28)

The following proposition gives conditions under which the matrices Mk and Sk are invertible inthe vicinity of the minimizer to problem (2).

Proposition 1. Under assumptions A.1–A.4 and condition C.1, ‖M−1k ‖ and ‖S−1

k ‖ arebounded for wk in a neighborhood of the minimizer w∗.

Proof. Let (KN)∗, M∗, and S∗ be the matrices KNk , Mk, and Sk evaluated at w∗. Assumptions

A.1–A.4 imply by [18, Theorem 14] and [17, Proposition 4.1] that (KN)∗ must be invertible.In addition, it is easy to see that if (x∗, y∗, r∗, λ∗, σ∗) is a KKT point satisfying assumptions

A.1–A.4 and condition C.1, then (y∗, r∗, λ∗, σ∗) is a minimizer satisfying the linear independenceconstraint qualification, strict complementarity slackness and second-order sufficient conditions forsubproblem (13) with x = x∗ and µ = 0

miny,r

F (x∗, y)

subject to c(x∗, y)− r = 0,r≥ 0.

(29)

This implies the matrix M∗ must also be invertible by [18, Theorem 14] and [17, Proposition 4.1].Moreover, the invertibility of (KN)∗ and M∗ implies that S∗ must be invertible or otherwise there

would be multiple solutions to the Newton system (18), in contradiction with the invertibility of(KN)∗. Finally, ‖M−1

k ‖ and ‖S−1k ‖ are bounded in the vicinity of the minimizer by the invertibility

of M∗k and S∗k and condition A.1. ¤

Note that to ensure the Schur complement matrix is well-defined in the vicinity of a minimizer, weneed to assume the SLICQ (condition C.1). As we mentioned in Section 2, the SLICQ implies thatthe MDO constraints can be used in the vicinity of the minimizer to eliminate the local and slackvariables y and r, and thus transform the MDO problem into an unconstrained problem on onlythe global variables. This is a fairly strong assumption for MDO problems where often the difficultyis precisely finding a design that is feasible with respect to the constraints of all disciplines. Thisis one of the reasons the Schur complement approach is not popular among engineers solving theMDO problem. In Section 7, we show how the Schur IPM can be modified to deal with problemsthat do not satisfy the SLICQ.

The resulting Schur interior-point method is stated in Figure 3. Also, note that, for the generalproblem (1) with K disciplines, Mk is a block diagonal matrix composed of K blocks. Thus, theSchur complement allows one to decompose the linear system (28) into K smaller independentlinear systems (see, [4, Section 5.6]).


Initialization: Choose a starting point wT0 =

(xT0 , yT

0 , rT0 , λT

0 , σT0 )T such that r0 > 0, λ0 > 0, σ0 > 0. Set

k← 0 and choose the parameters µ0 > 0 and 0 < γ ≤ γ0 < 1.Repeat

1. Master problem iteration: Form the matrix Sk andcompute ∆xN

k from system (27). Set xk+1 = xk +∆xNk .

2. Subproblem iteration:(a) Search direction: Compute ∆yN

k by solving sys-tem (28).

(b) Line search: With γk, compute the diagonal ma-trix, Λk, from the subproblem step sizes as in (21)-(24).

(c) Update iterate: Set yk+1 = yk +Λy,k ∆yNk .

3. Parameter update: Set µk+1 > 0, 0 < γ ≤ γk+1 < 1,and k← k +1.Until convergence

Figure 3. Schur Interior-point Method (SIPM)

3.2.3. Convergence. The local convergence theory of interior-point methods is developed inthe papers by [25, 17, 37, 22]. These papers show that under assumptions A.1–A.4 and certainconditions on parameters µk and γk, the Newton matrix KN

k is invertible in the vicinity of theminimizer and the iteration (25) converges superlinearly or quadratically to a solution of (2). Ifin addition, SLICQ holds, we know by Proposition 1 that equations (27) and (28) have a uniquesolution in the vicinity of the minimizer and thus the Schur IPM converges also superlinearlyor quadratically. As in this paper our analysis focuses on the local convergence properties of thealgorithms, no procedures are given to ensure global convergence, though the techniques in [35, 20,19, 7] could be adapted.

4. An Inexact BDA. In this section, we describe the inexact BDA. We also analyze itsrelationship to the Schur interior-point method described in the previous section.

4.1. The algorithm. The inexact BDA takes just one Newton iteration of a primal-dualinterior-point method to solve the subproblems (13). Following the notation introduced in Section3, the subproblem perturbed KKT conditions can be written in compact form as g2(µ) = 0, see(17). Then, the Newton system for the subproblem perturbed KKT conditions is simply

Mk ∆yDk =−g2,k(µk). (30)

The inexact BDA applies Newton’s method to solve the master problem (12). The Newton searchdirection is the solution to

∇2xxF

∗(xk) ∆xDk =−∇xF

∗(xk), (31)

where, to simplify notation, we do not explicitly write the dependence of F ∗ on the barrier para-meter µ.

Unfortunately, because the algorithm does not solve the subproblems exactly, the master problemHessian and gradient ∇2

xxF∗(xk) and ∇xF

∗(xk) cannot be computed from standard sensitivityformulae as is customary in bilevel decomposition algorithms. In the remainder of this section,we show how approximations to ∇xF

∗(xk) and ∇2xxF

∗(xk) can be obtained from the estimate ofthe subproblem minimizer. In particular, the following two propositions show that the right handside in equation (27) can be seen as an approximation to the master problem gradient ∇xF

∗(xk)and that the Schur complement matrix Sk can be interpreted as an approximation to the masterproblem Hessian ∇2

xxF∗(xk).


Proposition 2. Let (x∗, y∗, r∗, λ∗, σ∗) be a KKT point satisfying assumptions A.1–A.4 andcondition C.1 for problem (2). Then, for (xk, yk, rk, λk, σk) close to (x∗, y∗, r∗, λ∗, σ∗), the subproblemoptimal value function F ∗(xk) and its gradient ∇xF

∗(xk) are well defined and

‖∇xF∗(xk)− (g1,k + AT

k M−1k g2,k(µk))‖= o(‖y(xk)− yk‖),

where y(xk) = (y(xk), r(xk), λ(xk), σ(xk)) is the locally unique once continuously differentiable tra-jectory of minimizers to subproblem (13) with y(x∗) = (y∗, r∗, λ∗, σ∗) and yk = (yk, rk, λk, σk).

Proof. Note that if Condition C.1 holds at (x∗, y∗, r∗, λ∗, σ∗), then the linear independenceconstraint qualification (LICQ) holds at (y∗, r∗, λ∗, σ∗) for subproblem (13) with x = x∗ and µ = 0.Moreover, it is easy to see that if (x∗, y∗, r∗, λ∗, σ∗) is a KKT point satisfying assumptions A.1–A.4, then (y∗, r∗, λ∗, σ∗) is a minimizer satisfying the strict complementarity slackness (SCS) andsecond-order sufficient conditions (SOSC) for subproblem (13) with x = x∗ and µ = 0. It followsfrom [18, Theorem 6] that there exists a locally unique once continuously differentiable trajectoryof subproblem minimizers y(xk) = (y(xk), r(xk), λ(xk), σ(xk)) satisfying LICQ, SCS and SOSC forthe subproblem with xk close to x∗ and µk sufficiently small. As a result, the subproblem optimalvalue function F ∗(xk) can be defined as F ∗(xk) = F (xk, y(xk))− µ

∑m

i=1 log(ri(xk)) and it is oncecontinuously differentiable. By the properties of y(xk), its gradient is simply

∇xF∗(xk) =

d[F(xk, y(xk)

)−µk

∑m

i=1 log(ri(xk))]dx

=dLy

(xk, y(xk)

)

dx, (32)

where d/dx denotes the total derivative and Ly is the subproblem Lagrangian function:

Ly(x, y(x)) = F (x, y)−µm∑

i=1

log(ri)−λT(c(x, y)− r

). (33)

Applying the chain rule, we get:

dLy(xk, y(xk))dx

=∇xLy(xk, y(xk)) (34)+∇yLy(xk, y(xk)) y′(xk) (35)+∇rLy(xk, y(xk)) r′(xk) (36)+∇λLy(xk, y(xk)) λ′(xk), (37)+∇σLy(xk, y(xk)) σ′(xk), (38)

where y′(xk), r′(xk), λ′(xk), and σ′(xk) denote the Jacobian matrices of y, r, λ, and σ evaluated atxk, respectively. Note that (35) and (36) are zero because of the optimality of y(xk), (37) is zeroby the feasibility of y(xk), and (38) is zero because the Lagrangian function does not depend on σ.Thus, we can write the master problem objective gradient as

∇xF∗(xk) = ∇xLy

(xk, y(xk)

). (39)

If we knew the subproblem minimizer y(xk), we could easily compute the master problem gradientby evaluating the gradient of the Lagrangian function (33) at xk and y(xk). After taking only oneinterior-point iteration on the subproblem, we do not know y(xk) exactly but rather the followingapproximation

yk +∆yDk , (40)

where ∆yDk is the subproblem search direction computed by solving system (30). However, by

Taylor’s Theorem we know that the master problem gradient can be approximated as:

∇xF∗(xk) =∇xLy(xk, yk)+∇x,yLy(xk, yk)(y(xk)− yk)+O(‖y(xk)− yk‖2). (41)


Moreover, if yk is close enough to y(xk), we know from the local convergence theory of Newton’smethod that ‖y(xk)− (yk +∆yD

k )‖= o(‖y(xk)− yk‖) and thus

∇xF∗(xk) =∇xLy(xk, yk)+∇x,yLy(xk, yk)∆yD

k + o(‖y(xk)− yk‖). (42)

Finally, from Proposition 1 we know that the matrix Mk is nonsingular once the iteratesare close to the minimizer [18, Theorem 14]. Since ∆yD

k = −M−1k g2,k(µk) and AT

k = −∇yg1,k =−∇x,yLy(xk, yk), the result follows from (42). ¤

Proposition 3. Let (x∗, y∗, r∗, λ∗, σ∗) be a KKT point satisfying assumptions A.2–A.4 andcondition C.1 for problem (2). Moreover, assume all functions in problem (2) are three timescontinuously differentiable. Then, for (xk, yk, rk, λk, σk) close to (x∗, y∗, r∗, λ∗, σ∗), the Hessian ofthe subproblem optimal value function ∇2

xxF∗(xk) is well defined and

‖∇2xxF

∗(xk)−Sk‖= O(‖y(xk)− yk‖),

where y(xk) = (y(xk), r(xk), λ(xk), σ(xk)) is the locally unique twice continuously differentiable tra-jectory of minimizers to subproblem (13) with y(x∗) = (y∗, r∗, λ∗, σ∗), yk = (yk, rk, λk, σk), and Sk

is the Schur complement matrix Sk = Wk− ATk M−1

k Ak.

Proof. By the same arguments as in Proposition 2, and the assumption that all problemfunctions are three times continuously differentiable, we know that the subproblem optimal valuefunction can be defined as

F ∗(xk) = F (xk, y(xk))−µm∑

i=1

log(ri(xk))

and it is twice continuously differentiable.Moreover, differentiating expression (39), we obtain the following expression for the optimal

value function Hessian:

∇xxF∗(xk) =

d(∇xLy(xk, y(xk))dx

(43)

=∇x,xLy

(xk, y(xk)

)+∇x,yLy

(xk, y(xk)

)y′(xk),

where y′(xk) is the Jacobian matrix of the subproblem minimizer with respect to xk.By A.3, A.4 and C.1, we know that for xk close enough to x∗ and µk sufficiently small, y(xk)

is a minimizer satisfying the LICQ, SCS, and SOSC for the subproblem, and thus it follows from[18, Theorem 6] that:

M∗k y′(xk) = A∗

k, (44)

where M∗k and A∗

k are the matrices Mk and Ak evaluated at y(xk).If we knew the subproblem minimizer y(xk) exactly, we could use (43) and (44) to compute the

master problem Hessian. Unfortunately, after taking only one Newton iteration on the subproblems,we do not know y(xk) exactly. But we can approximate y′(xk) as the solution to the followingsystem

Mk y′(xk)' Ak. (45)

Assumptions A.3, A.4 and Condition C.1 imply by Proposition 1 that ‖M−1k ‖ is uniformly

bounded for (xk, yk) in the vicinity of (w∗). Then,

‖y′(xk)−M−1k Ak‖ ≤ ‖((M∗

k )−1−M−1k )Ak‖+ ‖(M∗

k )−1(A∗k− Ak)‖. (46)


By the Lipschitz assumption A.1, and the fact that ‖M−1k ‖ is uniformly bounded in the vicinity

of w∗, the first term in (46) can be written as

‖((M∗k )−1−M−1

k )Ak‖ ≤ ‖M−1k ‖‖Mk−M∗

k‖‖(M∗k )−1‖‖Ak‖= O(‖y(xk)− yk‖).

Likewise, by the Lipschitz assumption A.1 and because ‖M−1k ‖ is uniformly bounded, the second

term in (46) is O(‖y(xk)− yk‖). Therefore,

‖y′(xk)−M−1k Ak‖= O(‖y(xk)− yk‖).

Finally, the result follows because Wk = ∇xg1,k = ∇x,xLy

(xk, yk

)and AT

k = −∇yg1,k =−∇x,yLy(xk, yk). ¤

Note that Propositions 2 and 3 show that the Schur IPM iteration

Sk ∆xDk =−(

g1,k + ATk M−1

k g2,k(µk)). (47)

described in Section 3.2.2, provides a suitable approximation to the master problem Newton equa-tion (31).

The inexact bilevel decomposition algorithm (IBDA) is stated in Figure 4.


(xT0 , yT

0 , rT0 , λT



1. Solve master problem: Form the matrix Sk andcompute ∆xD

k from system (47). Set xk+1 = xk +∆xDk .

2. Solve subproblem:(a) Search direction: Compute ∆yD

k by solving sys-tem (30).

(b) Line search: With γk, compute the diagonal ma-trix, Λk, from the subproblem step sizes as in (21)-(24).

(c) Update iterate: Set yk+1 = yk +Λy,k ∆yDk .

3. Parameter update: Set µk+1 > 0, 0 < γ ≤ γk+1 < 1,and k← k +1.Until convergence

Figure 4. Inexact Bilevel Decomposition Algorithm (IBDA)

4.2. Relationship to the Schur IPM. The following proposition establishes the relation-ship between the search direction of the inexact BDA (∆xD

k ,∆yDk ) and the search direction of the

Schur IPM (∆xNk ,∆yN

k ). In particular, we show that the global variable components of both searchdirections are identical and the local components are equal up to first-order terms.

Proposition 4. Under assumptions A.1–A.4 and condition C.1,

∆xDk = ∆xN

k and ∆yDk = ∆yN

k +O(‖∆xNk ‖). (48)

Proof. The first equality follows trivially because we are using a Schur complement iterationto approximate the master problem search direction. The second equality follows from (28), (30)and Proposition 1. ¤


Note that the difference between the local components of both search directions is not surprisingbecause the global variables are just a parameter to the subproblem solved by the decompositionalgorithm. As a result, the local component of the search direction computed by the inexact BDAlacks first-order information about the global component search direction. In Section 5, we showhow a Gauss-Seidel strategy can be used to overcome this limitation inherent to BDAs.

Finally, it is useful to note that the inexact BDA search direction is the solution to the followinglinear system:

KDk ∆wD

k =−gk(µk), (49)

whereKD

k =(

Sk −ATk

0 Mk

). (50)

Note that the fact that the global variables are a parameter to the subproblems is evident in thestructure of KD

k . In particular, notice that the lower left block in matrix KDk is zero instead of −Ak

as in the interior-point method matrix KNk . In Section 5, we give conditions under which the norm

of the matrix ‖(KDk )−1‖ is bounded in the neighborhood of the minimizer and thus the iterates of

the proposed decomposition algorithm are well defined.

5. The Gauss-Seidel BDA. The difference in the local component of the search directionscomputed by the inexact BDA and the Schur IPM precludes any possibility of superlinear con-vergence for the decomposition algorithm. In this section we show how one can first computethe global variable component of the search direction, and then use it to update the subproblemderivative information before computing the local variable component. We show that the resultingGauss-Seidel BDA generates a search direction that is equal (up to second-order terms) to thesearch direction of the Schur IPM. Moreover, we prove that the resulting BDA converges locallyat a superlinear rate.

5.1. The algorithm. The inexact BDA defined in Section 4 does not make use of all theinformation available at each stage. Note that, at each iteration of the decomposition algorithm,we first compute the master problem step as the solution to

Sk ∆xGk =−(

g1,k + ATk M−1

k g2,k(µk)), (51)

and update the global variables as xk+1 = xk +∆xGk . At this point, one could use the new value of

the global variables xk+1 to perform a nonlinear update of the subproblem derivative informationand thus, generate a better subproblem step. In particular, after solving for the master problemsearch direction, we could compute

g+2,k(µk) =

∇yF (xk+1, yk)−∇yc(xk+1, yk)T λk

−σk +λk

−c(xk+1, yk)+ rk

−Rkσk +µkem

. (52)

Then, the subproblem search direction would be given as the solution to

Mk ∆yGk =−g+

2,k(µk). (53)

The resulting algorithm is stated in Figure 5. It must be noted that, the only difference betweenthe Gauss-Seidel BDA stated in Figure 5 and the inexact BDA stated in Figure 4 is that inthe Gauss-Seidel version, we introduce a nonlinear update into the derivative information of thesubproblem g2,k(µk) using the master problem step ∆xG

k . As a consequence, the refinement requiresone more subproblem derivative evaluation per iteration. The advantage is that, as we show in thenext section, the Gauss-Seidel refinement guarantees that the proposed algorithm converges at asuperlinear rate.



(xT0 , yT

0 , rT0 , λT



1. Master problem iteration: Form the matrix Sk andcompute ∆xG

k from system (51). Set xk+1 = xk +∆xGk .

2. Subproblem iteration:(a) Search direction: Use xk+1 to update g+

2,k(µk)and compute ∆yG

k by solving system (53).(b) Line search: With γk, compute the diagonal ma-

trix, Λk, from the subproblem step sizes as in (21)-(24).(c) Update iterate: Set yk+1 = yk +Λy,k ∆yG

k .3. Parameter update: Set µk+1 > 0, 0 < γ ≤ γk+1 < 1,

and k← k +1.Until convergence

Figure 5. Gauss-Seidel Bilevel Decomposition Algorithm (GBDA)

5.2. Relationship to the Schur IPM. The following proposition shows that the searchdirections of the proposed Gauss-Seidel BDA and the Schur IPM are equal up to second-orderterms.

Proposition 5. Under assumptions A.1–A.4 and condition C.1,

∆xGk = ∆xN

k and ∆yGk = ∆yN

k +O(‖∆xNk ‖2). (54)

Proof. The result for the global components is trivial from (51). For the local components, notethat the search direction of the resulting Gauss-Seidel decomposition algorithm satisfy

∆xGk = ∆xD

k = ∆xNk , (55)

and

∆yGk = ∆yD

k −M−1k (g+

2,k(µk)− g2,k(µk)). (56)

Moreover, from (48), we know that

∆yGk = ∆yN

k −M−1k Ak ∆xN

k −M−1k (g+

2,k(µk)− g2,k(µk))= ∆yN

k −M−1k (g+

2,k(µk)− g2,k(µk)+ Ak ∆xNk ).

The result is obtained by Taylor’s Theorem and the fact that ATk =−∇xg2,k(µk). ¤

Proposition 5 intuitively implies that the Gauss-Seidel BDA converges locally at a superlinearrate. In Section 5.3 we formally show this is the case.

5.3. Convergence of the Gauss-Seidel BDA. In this section, we first show that the searchdirection of the Gauss-Seidel BDA is well-defined in the proximity of the minimizer and then, weshow that the iterates generated by the Gauss-Seidel approach converge to the minimizer at asuperlinear rate.

Note that the search directions of the inexact and Gauss-Seidel BDA are related as follows:

∆wGk = ∆wD

k −Gk(g+2,k(µk)− g2,k(µk)), (57)


whereGk =

(0

M−1k

),

and ∆wDk = −(KD

k )−1 gk(µk). Consequently, to show that the Gauss-Seidel search direction isbounded in a neighborhood of a minimizer w∗, it suffices to show that the norms of the matrices(KD

k )−1 and M−1k are bounded for wk in a neighborhood of the minimizer w∗.

Proposition 6. Under assumptions A.1–A.4 and condition C.1, ‖(KDk )−1‖ and ‖M−1

k ‖ arebounded for wk in a neighborhood of the minimizer w∗.

Proof. Note that

(KDk )−1 =

(S−1

k S−1k AT

k M−1k

0 M−1k

). (58)

Consequently, it is sufficient to prove that ‖S−1k ‖ and ‖M−1

k ‖ are bounded in the vicinity of theminimizer. The result follows from Proposition 1. ¤

We now give a result that provides sufficient conditions on the barrier and the step size parameterupdates to ensure superlinear convergence of the Gauss-Seidel BDA.

Theorem 1. Suppose that assumptions A.1-A.4 and condition C.1 hold, that the barrier para-meter is chosen to satisfy µk = o

(‖gk(0)‖) and the step size parameter is chosen such that 1−γk =o(1). If w0 is close enough to w∗, then the sequence {wk} described in (57) is well-defined andconverges to w∗ at a superlinear rate.

Proof. As matrices (KDk )−1 and M−1

k are well-defined by Proposition 6, the sequence in (57)updates the new point as

wk+1 = wk +Λk

[∆wD

k −Gk(g+2,k(µk)− g2,k(µk))

](59)

= wk−Λk (KDk )−1gk(µk)−Λk Gk (g+

2,k(0)− g2,k(0))= wk−Λk (KD

k )−1(gk(0)+ µk)−Λk Gk (g+2,k(0)− g2,k(0)) (60)

where µk = (0,0,0,0, µkem). Then,

wk+1−w∗ = wk−w∗−Λk (KDk )−1gk(µk)−Λk Gk (g+

2,k(0)− g2,k(0))= (I −Λk)(wk−w∗)

+Λk (KDk )−1

(KD

k (wk−w∗)− gk(0)− µk

)−Λk Gk (g+

2,k(0)− g2,k(0)), (61)

which may be rewritten as

wk+1−w∗ = (I −Λk)(wk−w∗)−Λk (KD

k )−1µk

+Λk

(KD

k )−1(KNk (wk−w∗)− gk(0))

+Λk

(KD

k )−1(KDk −KN

k )(wk−w∗)−Λk Gk (g+

2,k(0)− g2,k(0)))

(62)

By [37, Lemma 4] and , the first term in (62) satisfies

‖(I −Λk)(wk−w∗)‖ ≤ ((1− γk)+O(‖gk(0)‖)+O(µk)

) ‖(wk−w∗)‖. (63)

This inequality together with conditions 1− γk = o(1) and µk = o(‖gk(0)‖) imply that

‖(I −Λk)(wk−w∗)‖= o(‖wk−w∗‖). (64)


The second term in (62) satisfies

‖Λk (KDk )−1µk‖ ≤ ‖Λk‖ ‖(KD

k )−1‖ ‖µk‖ ≤ β ‖µk‖, (65)

which by condition µk = o(‖gk(0)‖) imply

‖Λk (KDk )−1µk‖= o(‖wk−w∗‖). (66)

By Taylor’s Theorem, the third term in (62) satisfies

‖Λk

(KD

k )−1(KNk (wk−w∗)− gk(0))‖ ≤

‖Λk‖ ‖(KD

k )−1‖ ‖(KNk (wk−w∗)− gk(0))‖= o(‖wk−w∗‖). (67)

Finally, because

KDk = KN

k −[Wk−Sk 0−Ak 0

], (68)

the fourth term in (62) is

Λk

(KD

k )−1(KDk −KN

k )(wk−w∗) =−Λk

(KD

k )−1

[Wk−Sk 0−Ak 0

](wk−w∗)

= Λk

[0 0

M−1k Ak 0

](wk−w∗)

= Λk

[0

M−1k Ak(xk−x∗)

]. (69)

Then, adding the fourth and fifth terms in (62) and using (69) we get

Λk (KDk )−1(KD

k −KNk )(wk−w∗)−Λk Gk (g+

2,k(0)− g2,k(0)) =

Λk

[0

M−1k

[Ak(xk−x∗)− (g+

2,k(0)− g2,k(0))]]. (70)

If only the global variable component, x, of equations (64), (66), (67), and (70) is considered,then the following relationship is attained:

‖xk+1−x∗‖= o(‖wk−w∗‖). (71)

Note that this is not a surprising result because we know that the step taken by the Gauss-Seideldecomposition algorithm on the global variables, x, is the same as that of a Schur interior-pointmethod.

To finish the proof, it only remains to show that the local variable component, y, satisfies asimilar relationship. The local component of equation (70) can be written as

Λk,y M−1k

(Ak(xk−x∗)− (g+

2,k(0)− g2,k(0)))=

Λk,y M−1k

(Ak(xk+1−x∗)− (g+

2,k(0)− g2,k(0))− Ak(xk+1−xk)), (72)

which by Taylor’s Theorem and the fact that Ak =−∇xg2,k(µk) =∇xg2,k(0) is

Λk,y M−1k

(Ak(xk+1−x∗)− (

g+2,k(0)− g2,k(0)

)− Ak(xk+1−xk))=

Λk,y M−1k

(Ak(xk+1−x∗)+O(‖xk+1−xk‖2)

)(73)

Becausexk+1−xk = ∆xG

k =−(Sk)−1(g1,k + AT

k M−1k g2,k(µk)

), (74)


we conclude that‖xk+1−xk‖= O(||gk(µk)||), (75)

and thus, the second term in the right hand side of (73) is of order O(‖wk−w∗‖2).Moreover, we know by (71) that the first term in the right hand side of (73) is of order o(‖wk−

w∗‖). This, together with the local variable component in (64), (66), (67), give

‖yk+1− y∗‖= o(‖wk+1−w∗‖). (76)

Relationships (71) and (76) prove the result. ¤

6. An exact penalty formulation of the MDO problem. The Schur IPM described inSection 3 can only be applied to problems satisfying the SLICQ. In particular, Proposition 1 showsthat if SLICQ holds then the Schur complement matrix is invertible. As we mentioned in Section2, the SLICQ implies that the MDO constraints can be used in the vicinity of the minimizer toeliminate the local and slack variables y and r, and thus transform the MDO problem into anunconstrained problem on only the global variables. This is a fairly strong assumption for MDOproblems where often the difficulty is precisely finding a design that is feasible with respect tothe constraints of all disciplines. This is one of the reasons the Schur complement approach is notpopular among engineers solving the MDO problem.

In this section we show how an MDO problem satisfying the LICQ but not SLICQ may beregularized by introducing an exact penalty function. Specifically, we show how any MDO problemsatisfying LICQ can be reformulated by means of an exact penalty function as an MDO problemsatisfying the so-called Strong Mangasarian-Fromovitz Constraint Qualification (SMFCQ). Thenin Section 7, we show how degenerate optimization techniques can be used to ensure the SchurIPM converges locally at a fast rate when applied to solve an MDO problem satisfying SMFCQ.Essentially, our work allows the application of Schur IPMs to solve general MDO problems satisfyingonly the conventional LICQ and thus makes Schur IPMs a viable alternative to BDAs.

We now define the SMFCQ.C.2 The strong Mangasarian-Fromovitz constraint qualification (SMFCQ) is satisfied at w∗, thatis, the matrix (∇yc(x∗, y∗) −I

)(77)

has full row rank and there exist ∆y and ∆r such that ∇yc(x∗, y∗)∆y−∆r = 0 and (∆r)i > 0 forall i ∈ Z, where Z is the active set {i : r∗i = 0} and IZ is the matrix formed by the rows of theidentity corresponding to indices in Z.Note that by assumption A.2 the Lagrange multiplier vector (λ∗, σ∗) associated with (x∗, y∗, r∗)is unique. Also the matrix (77) always has full row rank if all problem constraints are inequalityconstraints.

Note that the SMFCQ (C.2) holds if and only if the conventional MFCQ holds for the subproblem(2) with x = x∗; that is for the following problem:

miny,r

F (x∗, y)

subject to c(x∗, y)− r = 0,r≥ 0.

(78)

Likewise, the SLICQ (C.1) holds if and only if the conventional LICQ holds for the subproblem(2) with x = x∗. Consequently, satisfaction of the SLICQ implies satisfaction of the SMFCQ.

In the remainder of this section we show how any MDO problem that satisfies the LICQ (A.2)can be reformulated by introducing an exact penalty function as an equivalent problem for which


the SMFCQ holds. To see this, note that by using an l1 exact penalty function and a sufficientlylarge but finite penalty parameter γ, the MDO problem (2) can be reformulated as follows:

minx,z,y,r

F (z, y)+ γ‖x− z‖1

subject to c(z, y)− r = 0,r≥ 0,

(79)

where z ∈Rn is a vector of auxiliary local variables that will be equal to the global variables x fora sufficiently large γ.

Moreover, by introducing slack variables problem (79) can be transformed in the following smoothoptimization problem:

minx,z,y,r,s,t

F (z, y)+ γeTn (s+ t)

subject to c(z, y)− r = 0,z + s− t = x,r, s, t≥ 0.

(80)

Note that all variables introduced (z, s, and t) must be considered as local variables. That is, inthe general case where there are K subproblems, we must introduce a different set of variables (zi,si, ti) for each of the subproblems; that is, for i = 1, . . . ,K. As a result, the variables introducedcan be considered as part of the local variable vector y, and thus the K subproblems are still onlycoupled through the global variables x.

The following proposition shows that satisfaction of LICQ for the original problem implies sat-isfaction of SMFCQ for problem (80).

Proposition 7. If the LICQ (Assumption A.2) holds at (x∗, y∗, r∗) for the MDO problem (2),then the SMFCQ (Condition C.2) holds at (x, z, y, r, s, t) = (x∗, x∗, y∗, r∗,0,0) for (80).

Proof. We only need to show that the standard MFCQ holds at (z, y, r, s, t) = (x∗, y∗, r∗,0,0)for the subproblem that results from fixing the global variables x to x∗ in problem (80); that is,for problem

minz,y,r,s,t

F (z, y)+ γeTn (s+ t)

subject to c(z, y)− r = 0,z + s− t = x∗,r, s, t≥ 0.

(81)

Note that the gradients with respect to the subproblem variables (z, y, r, s, t) of the equality con-straints are the rows of the following matrix

(∇zc(z, y) ∇yc(z, y) −I 0 0I 0 0 I −I

).

Moreover, these gradients are linearly independent at any point for (81) because of the presenceof the identity matrices that correspond to the gradients of the equality constraints with respectto the slack and elastic variables r, s, and t. Moreover, because the LICQ holds for (2), we knowthat the MFCQ also holds for (2) and thus there exists (∆x,∆y,∆r) such that

∇xc(x∗, y∗)∆x+∇yc(x∗, y∗)∆y−∆r = 0

and ∆rZ > 0, where Z is the set of indices i for which the nonnegativity bounds ri ≥ 0 are activeat r∗. But then the vector

(∆z,∆y,∆r,∆s,∆t) = (∆x,∆y,∆r,max(0,−∆x)+ e,max(0,∆x)+ e)


satisfies

∇xc(x∗, y∗)∆z +∇yc(x∗, y∗)∆y−∆r = 0,∆z +∆s−∆t = 0,

and ∆rZ ,∆s,∆t > 0. Thus, the MFCQ holds for subproblem (81) and therefore the SMFCQ holdsfor (80). ¤

7. A degenerate Schur approach. In this section we show how degenerate optimizationtechniques can be used to ensure the Schur IPM converges locally at a fast rate when applied tosolve an MDO problem satisfying SMFCQ. In the previous section we showed how a general MDOproblem satisfying simply the LICQ can be reformulated as an MDO problem satisfying SMFCQ.As a result, our work extends the applicability of Schur IPMs to cover general MDO problemssatisfying the conventional LICQ.

Our proposed Schur IPM borrows from the degenerate interior-point method developed by Vi-cente and Wright [36]. There are two differences between their method and a standard interior-pointmethod. Firstly, they modify the Newton matrix by perturbing some of the elements in its diagonal.As we show in this section, this perturbation assures the Schur complement matrix is invertibleeven when replacing SLICQ by SMFCQ. The second difference is in the step-size rule: it allowssome of the nonnegative variables to become negative if sufficient progress is made towards thesolution. Raghunathan and Biegler [27] show that the methodology of Vicente and Wright canwork with other step-size rules but, to simplify the exposition, we use the same step-size rule as in[36].

7.1. The method. The modified Newton matrix is obtained by perturbing the current iteratefor the variables rk and σk as follows:

rk = max(µmink em, rk), σk = max(µmin

k em, σk), (82)

where µmink is a positive value and max is the componentwise maximum. With this perturbation,

the modified matrix Mk isMk =∇yg2(xk, yk, rk, λk, σk), (83)

the modified Newton matrix is

KNk =

(Wk −AT

k

−Ak Mk

), (84)

and the modified Schur complement matrix is

Sk = Wk− ATk (Mk)−1 Ak. (85)

We use the same step size αk for all components of wk; that is, wk+1 = wk + αk∆wk. Moreover,the step-size rule is computed as follows: 1em

If both of the following conditions hold (where τ ∈ (1,2) and (a)− is the negative part of thevector a):

‖(rk +∆rk, λk +∆λk, σk +∆σk)−‖ ≤ ‖gk(0)‖τ , (86)‖g(wk +∆wk;µ = 0)‖ ≤ ‖gk(0)‖τ , (87)

then αk = 1.Otherwise,

αk = min(αr,k, αλ,k, ασ,k), (88)

where αr,k, αλ,k, and ασ,k are given by (21)–(23).The resulting modified Schur IPM is stated in Figure 6



(xT0 , yT

0 , rT0 , λT


k ← 0 and choose the parameters µ0 > 0, µmin0 > 0 and

τ ∈ (1,2).Repeat

1. Master problem iteration: Form the modified ma-trix Sk and compute ∆xk by solving system (51) with Sk.

2. Subproblem iteration: Form the modified matrixMk and compute ∆yk by solving system (28) with Mk.

3. Line search: With τ , compute the step size αk as in(86)–(88).

4. Update iterate: Set wk+1 = wk + αk∆wk, where∆wk = (∆xk,∆yk).

5. Parameter update: Set µk+1 > 0, µmink+1 > 0 and k←

k +1.Until convergence

Figure 6. Modified Schur IPM

7.2. Nonsingularity of Mk and Sk. In this section we show that the modified matricesMk and Sk are invertible in the vicinity of a minimizer. This, in turn, implies that the iteratesgenerated by the modified Schur IPM are well defined in the vicinity of a minimizer.

First, the following proposition shows that the modified Newton matrix in (84) is invertible inthe vicinity of a minimizer.

Proposition 8. If assumptions A.1–A.4 hold and µmink = O(‖gk(0)‖), then ‖(KN

k )−1‖ isbounded for wk in a neighborhood of the minimizer w∗.

Proof. By assumptions A.1–A.4 we know by [18, Theorem 14] and [17, Proposition 4.1] that‖(KN

k )−1‖ is bounded in the vicinity of the minimizer. Moreover, KNk = KN

k + Dk, with ‖Dk‖ =O(µmin

k ) = O(‖gk(0)‖). Consequently, ‖(KNk )−1‖ is bounded in a small enough neighborhood around

the minimizer. ¤The following proposition shows that Mk is invertible in the vicinity of the minimizer. The

notation ak = Θ(bk) means that there exists β1, β2 > 0 such that for all k sufficiently large we havethat 0≤ β1 ≤ ak/bk ≤ β2.

Proposition 9. If assumptions A.1–A.4 and condition C.2 hold and µmink = Θ(‖gk(0)‖), then

Mk is nonsingular for wk in a neighborhood of the minimizer w∗.

Proof. The proof is made by contradiction. Assume there exists a nonzero vector ∆w =(∆y,∆r,∆λ,∆σ) such that Mk∆w = 0. Then (we have omitted the subscript k for convenience),

Hyy 0 −(∇yc(x,y))T 00 0 I −I

−∇yc(x, y) I 0 00 −Σ 0 −R

∆y∆r∆λ∆σ

= 0, (89)

where Hyy =∇2yyF (x, y)−∑m

i=1 λi∇2yyci(x, y).


Let B be the inactive set {i : r∗i > 0} and Z be the active set {i : r∗i = 0}. Using this partition,because RB is invertible, the system (89) can be symmetrized and written in the following form:

Hyy 0 −(∇yc(x, y))T 00 E I −IT

Z−∇yc(x, y) I 0 0

0 −IZ 0 −Σ−1Z RZ

∆y∆r∆λ∆σZ

= 0 (90)

with∆σB =−R−1

B ΣB∆rB, (91)

and whereE =

(R−1B ΣB 00 0

),

and IZ is the matrix formed by the rows of the identity corresponding to indices in Z.2

First, it is convenient to transform the system (90) via the singular value decomposition (SVD)of the Jacobian matrix of the active constraints at (y∗, r∗). The SVD of this Jacobian matrix canbe formulated as

J∗ =(−∇yc(x∗, y∗) I

0 −IZ

)=

(U V

)(C 00 0

)(UT

V T

), (92)

where U ∈ R(m+|Z|)×p, C ∈ Rp×p, V ∈ R(m+|Z|)×(m+|Z|−p), U ∈ R(ny+m)×p, V ∈ R(ny+m)×(ny+m−p)

and p is the rank of the Jacobian matrix J∗. Moreover, we will denote by U1 and V1, the first mrows of U and V , respectively; by U2 and V2, the last |Z| rows of U and V , respectively; by U1 andV1, the first ny rows of U and V , respectively; and finally, by U2 and V2, the last m rows of U andV , respectively.

Using the SVD in (92), the following (orthogonal) change of variables can be applied:

∆y∆r∆λ∆σZ

=

U1 V1 0 0U2 V2 0 00 0 U1 V1

0 0 U2 V2

cU

cV

cU

cV

. (93)

With this change of variables, system (89) is equivalent to

UT GU UT GV UT JT U UT JT V

V T GU V T GV V T JT U V T JT V

UT JU UT JV UT NU UT NV

V T JU V T JV V T NU V T NV

cU

cV

cU

cV

= 0 (94)

whereG =

(Hyy 00 E

)and N =

(0 00 −Σ−1

Z RZ

).

Under assumptions A.1, A.3, A4 and Condition C.2, system (94) can be written near the mini-mizer as follows (see [36]):

(D1 +O(‖gk(0)‖) O(‖gk(0)‖)

O(‖gk(0)‖) −D2

)(ccV

)= 0, (95)

2 Note that the matrix (10) used in the definition of SLICQ is the 2 × 2 block at the bottom left corner of matrix(90).


where c = (cU , cV , cU),

D1 =

UT G∗U UT G∗V C

V T G∗U V T G∗V 0C 0 0

, with G∗ =

(H∗

yy 00 E∗

)(96)

andD2 = V T

2 (ΣZ)−1RZV2. (97)

Note that because of the modification introduced in the Newton matrix, ‖D2‖= Θ(‖gk(0)‖) andthus D2 is invertible in the vicinity of the minimizer. Therefore, we can use the second row of thesystem in (95) to compute cV as

cV = D−12 O(‖gk(0)‖) c.

Substituting cV in the first row of the system in (95) we obtain

(D1 +O(‖gk(0)‖)+O(‖gk(0)‖)D−12 O(‖gk(0)‖)) c = 0. (98)

But note that O(‖gk(0)‖)D−12 O(‖gk(0)‖) = O(‖gk(0)‖) and thus we have

(D1 +O(‖gk(0)‖)) c = 0. (99)

On the other hand, by the results in [36, Lemma 4.1], we know that under assumptions A.1,A.3, A.4 and Condition C.2, near the solution, the matrix D1 is a O(‖gk(0)‖) perturbation ofa nonsingular matrix and hence, it is uniformly nonsingular. This contradicts system (99) andconcludes that matrix in (90) is nonsingular. Therefore, Mk must be invertible a neighborhood ofthe minimizer. ¤

Finally, the following proposition shows that Sk is invertible in the vicinity of the minimizer.

Proposition 10. If assumptions A.1–A.4 and condition C.2 hold and µmink = Θ(‖gk(0)‖), then

Sk is nonsingular for wk in a neighborhood of the minimizer w∗.

Proof. By Propositions 8 and 9, the modified Newton matrix KNk and the matrix Mk are non-

singular, respectively, in a neighborhood of the minimizer. This implies that Sk must be invertibleor otherwise there would be multiple solutions to the modified Newton system

KNk ∆wk =−gk(µk), (100)

in contradiction with the invertibility of KNk . ¤

7.3. Superlinear convergence. Finally, we show that the proposed modification of the SchurIPM maintains the good local convergence properties even when the SLICQ is replaced by SMFCQ.

Theorem 2. Suppose that assumptions A.1-A.4 and Condition C.2 hold, that the barrier pa-rameter is chosen to satisfy µk = O

(‖gk(0)‖2)

and that µmink = Θ(‖gk(0)‖). If w0 is close enough to

w∗, then the sequence {wk} generated by the modified Schur IPM method described in Figure (6)is well-defined and converges to w∗ at a quadratic rate.

Proof. Propositions 9 and 10 imply that the matrices Mk and Sk are nonsingular in a neigh-borhood of the minimizer and thus, the search direction of the Schur IPM method is well-definedin this neighborhood.

Moreover, assumptions A.1–A.4 ensure that the conditions in [36, Theorem 4.3] hold at the min-imizer w∗. Therefore, the proposed modification of the Schur IPM converges locally at a quadraticrate. ¤


8. An example. We apply the decomposition algorithms analyzed in this paper (the SchurIPM, the inexact BDA, and the Gauss-Seidel BDA) to the following quadratic program taken fromthe test problem set proposed by DeMiguel and Murray in [14]:

minx,y

12(x− a)2 + 1

2(y− b)2

s.t. x+ y ≤ 2,x+ y ≥ 1,x− y ≤ 1.

(101)

Note that the LICQ holds for this problem at all feasible points. In addition, the parameters (a, b)are useful to control whether the SLICQ holds at the minimizer. In particular, for (a, b) = (0,0)the minimizer is (x∗, y∗) = (0.5,0.5). At this point, only one of the constraints is active and theSLICQ holds. On the other hand, for (a, b) = (3,0) the minimizer is (x∗, y∗) = (1.5,0.5). At thispoint, there are two active constraints and neither the SLICQ nor the SMFCQ hold. However, thisproblem can be reformulated by introducing an exact penalty function as described in Section 7,to ensure the SMFCQ holds at (x∗, y∗) = (1.5,0.5). In particular, we can rewrite the above problemas:

minx,z,y,s,t

12(x− a)2 + 1

2(y− b)2 +100(s+ t)

s.t. z + y ≤ 2,z + y ≥ 1,z− y ≤ 1,

−x+ z + s− t = 0,s, t ≥ 0.

(102)

For (a, b) = (0,0) the minimizer is (x∗, z∗, y∗, s∗, t∗) = (0.5,0.5,0.5,0,0). At this point, assump-tions A.1-A.4 and the SLICQ hold. For (a, b) = (3,0) the minimizer is (x∗, z∗, y∗, s∗, t∗) =(1.5,1.5,0.5,0,0). At this point, assumptions A.1-A.4 and the SMFCQ hold (but not the SLICQ).

In our numerical experiments we use the following starting point (x, z, y, s, t) = (1,1,1,1,1), thestopping criterion is ‖gk(0)‖< 10−6, and the barrier parameter is reduced at a quadratic rate. Asin [14], we apply a linear transformation to the variables so that the test problem objective is notseparable on x and y. Table 1 displays the performance of the Schur IPM, the inexact BDA andthe Gauss-Seidel BDA for the first case where the SLICQ holds; that is, (a, b) = (0,0). The tablegives the norm of the KKT conditions ‖gk(0)‖ for each of the last five iterations.

Table 1. Convergence of the proposed methods. The SLICQ is satisfied at the minimizer.

Schur IPM Inexact BDA Gauss-Seidel

9.2e-2 2.0e-3 3.5e-22.4e-2 4.3e-4 3.2e-32.0e-3 6.7e-5 3.0e-41.5e-5 2.4e-6 1.2e-51.7e-9 2.9e-7 1.1e-7

The results confirm our convergence analysis of Sections 4 and 5. In particular, the Schur IPMand the Gauss-Seidel BDA converge superlinearly. On the other hand, the convergence of theInexact BDA appears to be only linear or perhaps two-step superlinear.


Table 2. Convergence of the modified Schur IPM. The SMFCQ is satisfied at the minimizer.

Modified Schur IPM

1.2e-14.2e-25.0e-37.9e-54.5e-8

Table 2 displays the performance of the modified Schur IPM for the second case where theSMFCQ is satisfied but not the SLICQ; that is, (a, b) = (3,0).

The results confirm the effectiveness of the modification proposed in Section 7 for the case whenthe SLICQ is not satisfied. In particular, the Schur IPM seems to converge at a quadratic rate.

9. Conclusions. We make two contributions to the field of decomposition methods for MDO.Our first contribution is to establish a theoretical relationship between BDAs and Schur interior-point methods. This connection is established through the inexact BDA, which we show is a closerelative of both BDAs and Schur IPMs. The relevance of this relationship is that if effectivelybridges the gap between the incipient local convergence theory of BDAs [2, 15] and the maturelocal convergence theory of interior-point methods [25, 17, 37, 22]. As a result we think our analysisconstitutes one important step towards the development of a robust convergence theory for BDAs.

Our second contribution is to show how Schur interior-point methods can be applied to solveMDO problems that satisfy only the conventional LICQ but not the Strong LICQ. We accomplishthis task in two steps. First, we show that any MDO problem satisfying the conventional LICQ canbe reformulated by introducing an exact penalty function as a problem satisfying SMFCQ. Second,we show how degenerate optimization techniques can be used to ensure a Schur interior-pointmethod will achieve fast local convergence when applied to solve a problem satisfying SMFCQ.The importance of this contribution is that it enlarges the class of problems that can be addressedwith Schur interior-point methods to include general MDO problems satisfying LICQ and thusmakes them a viable alternative to certain practical BDAs [5, 15], which can deal with problemsthat satisfy the LICQ and not the SLICQ.

Acknowledgments. We would like to thank Jorge Nocedal, Danny Ralph, and S. Scholtes fortheir detailed feedback and suggestions. We also acknowledge comments from R.W. Cottle, M.P.Friedlander, F.J. Prieto, L. Vicente, S. Wright, and seminar participants at the SIAM Conferenceon Optimization (Stockholm, 2005), Argonne National Laboratory, Judge Institute of Cambridge,and Nortwestern University. This research was partially supported by the Research DevelopmentFund at London Business School and by the Ministerio de Educacion y Ciencia of Spain, throughproject MTM2004-02334.

References[1] Alexandrov, N.M., M.Y. Hussaini, eds. 1997. Multidisciplinary Design Optimization: State of the Art .

SIAM, Philadelphia.[2] Alexandrov, N.M., R.M. Lewis. 2000. Analytical and computational aspects of collaborative optimiza-

tion. Tech. Rep. TM-2000-210104, NASA.


[3] Benders, J.F. 1962. Partitioning procedures for solving mixed variables programming problems. Nu-merische Mathematik 4 238–252.

[4] Birge, J.R., F. Louveaux. 1997. Introduction to Stochastic Programming . Springer-Verlag, New York.

[5] Braun, R.D. 1996. Collaborative optimization: An architecture for large-scale distributed design. Ph.D.thesis, Stanford University.

[6] Braun, R.D., I.M. Kroo. 1997. Development and application of the collaborative optimization architec-ture in a multidisciplinary design environment. N.M. Alexandrov, M.Y. Hussaini, eds., MultidisciplinaryDesign Optimization: State of the Art .

[7] Byrd, R. H., M. E. Hribar, J. Nocedal. 1999. An interior point algorithm for large–scale nonlinearprogramming. SIAM Journal on Optimization 9 877–900.

[8] Chun, B.J., S.M. Robinson. 1995. Scenario analysis via bundle decomposition. Annals of OperationsResearch 56 39–63.

[9] Cohen, G., B. Miara. 1990. Optimization with an auxiliary constraint and decomposition. SIAM Journalof Control and Optimization 28(1) 137–157.

[10] Colson, B., P. Marcotte, G. Savard. 2005. Bilevel programming: a survey. 4OR 3 87–107.

[11] Cottle, R.W. 1974. Manifestations of the schur complement. Linear Algebra Appl. 8 189–211.

[12] Cramer, E.J., J.E. Dennis, P.D. Frank, R.M. Lewis, G.R. Shubin. 1994. Problem formulation for mul-tidisciplinary optimization. SIAM Journal on Optimization 4(4) 754–776.

[13] DeMiguel, V., W. Murray. 2000. An analysis of collaborative optimization methods. EightAIAA/USAF/NASA/ISSMO Symposium on Multidisciplinary Analysis and Optimization. AIAA Paper00-4720.

[14] DeMiguel, V., W. Murray. 2001. A class of quadratic programming problems with global variables.Tech. Rep. SOL 01-2, Dept. of Management Science and Engineering, Stanford University.

[15] DeMiguel, V., W. Murray. 2005. A local convergence analysis of bilevel decomposition algorithms.Optimization and Engineering, forthcoming.

[16] Dempe, S. 2002. Foundations of Bilevel Programming . Kluwer Academic Publishers, Boston.

[17] El–Bakry, A. S., R. A. Tapia, T. Tsuchiya, Y. Zhang. 1996. On the formulation and theory of Newtoninterior–point method for nonlinear programming. Journal of Optimization Theory and Applications89 507–541.

[18] Fiacco, A.V., G.P. McCormick. 1968. Nonlinear Programming: Sequential Unconstrained MinimizationTechniques. John Wiley & Sons, New York.

[19] Forsgren, A., P.E. Gill. 1998. Primal-dual interior methods for nonconvex nonlinear programming.SIAM Journal on Optimization 8(4) 1132–1152.

[20] Gay, D. M., M. L. Overton, M. H. Wright. 1997. A primal-dual interior method for nonconvex nonlinearprogramming. Tech. Rep. 97-4-08, Computing Sciences Research, Bell Laboratories, Murray Hill, NJ.

[21] Geoffrion, A.M. 1972. Generalized Benders decomposition. Journal of Optimization Theory and Appli-cations 10(4) 237–260.

[22] Gould, N.I.M., D. Orban, A. Sartenaer, P.L. Toint. 2001. Superlinear convergence of primal-dual interiorpoint algorithms for nonlinear programming. SIAM Journal on Optimization 11(4) 974–1002.

[23] Haftka, R.T., J. Sobieszczanski-Sobieski. 1997. Multidisciplinary aerospace design optimization: Surveyof recent developments. Structural Optimization 14 1–23.

[24] Helgason, T., S.W. Wallace. 1991. Approximate scenario solutions in the progressive hedging algorithm.Annals of Operations Research 31 425–444.

[25] Martinez, H. H., Z. Parada, R. A. Tapia. 1995. On the characterization of q-superlinear convergenceof quasi-Newton interior-point methods for nonlinear programming. SIAM Boletın de la SociedadMatematica Mexicana 1 1–12.

[26] Medhi, D. 1990. Parallel bundle-based decomposition for large-scale structured mathematical program-ming problems. Annals of Operations Research 22 101–127.


[27] Raghunathan, Arvind U., Lorenz T. Biegler. 2003. Interior point methods for mathematical programswith complementarity constraints. Tech. rep., Carnegie Mellon University, Department of ChemicalEngineering.

[28] Robinson, S.M. 1986. Bundle-based decomposition: Description and preliminary results. A. Prekopa,J. Szelezsan, B. Strazicky, eds., System Modelling and Optimization. Lecture Notes in Control andInformation Sciences, Springer-Verlag.

[29] Rockafellar, R.T., R.J-B. Wets. 1991. Scenarios and policy aggregation in optimization under uncer-tainty. Mathematics of Operations Research 16 119–147.

[30] Ruszczynski, A. 1986. A regularized decomposition for minimizing a sum of polyhedral functions.Mathematical Programming 35 309–333.

[31] Ruszczynski, A. 1995. On convergence of an augmented lagrangian decomposition method for sparseconvex optimization. Mathematics of Operations Research 20(3) 634–656.

[32] Shimizu, K., Y. Ishizuka, J.F. Bard. 1997. Nondifferentiable and Two-Level Mathematical Programming .Kluwer Academic Publishers, Boston.

[33] Tammer, K. 1987. The application of parametric optimization and imbedding for the foundation and re-alization of a generalized primal decomposition approach. J. Guddat, H. Jongen, B. Kummer, F. Nozicka,eds., Parametric Optimization and Related Topics, Mathematical Research, vol. 35. Akademie-Verlag,Berlin.

[34] Van Slyke, R., R.J-B. Wets. 1969. L-shaped linear programs with application to optimal control andstochastic programming. SIAM Journal on Applied Mathematics 17 638–663.

[35] Vanderbei, R. J., D. F. Shanno. 1997. An interior-point algorithm for nonconvex nonlinear programming.Tech. Rep. SOR-97-21, Statistics and Operations Research, Princeton University.

[36] Vicente, L. N., S. J. Wright. 2002. Local convergence of a primal-dual method for degenerate nonlinearprogramming. Computational Optimization and Applications 22 311–328.

[37] Yamashita, H., H. Yabe. 1996. Superlinear and quadratic convergence of some primal–dual interiorpoint methods for constrained optimization. Mathematical Programming 75 :377–397.

On Decomposition Methods for Multidisciplinary Design...

Documents

Transcript of On Decomposition Methods for Multidisciplinary Design...