Dynamic Bundle Methods - Fuqua School of Businessabn5/dynbunMP.pdf · 2005-07-12 · Dynamic Bundle...

Mathematical Programming manuscript No.(will be inserted by the editor)

Alexandre Belloni · Claudia Sagastizabal

Dynamic Bundle Methods

Application to Combinatorial Optimization

Revised version of: July 12, 2005

Abstract. Lagrangian relaxation is a popular technique to solve difficult optimization prob-lems. However, the applicability of this technique depends on having a relatively low number ofhard constraints to dualize. When there are exponentially many hard constraints, it is prefer-able to relax them dynamically, according to some rule depending on which multipliers areactive. For instance, only the most violated constraints at a given iteration could be dualized.From the dual point of view, this approach yields multipliers with varying dimensions and adual objective function that changes along iterations.

We discuss how to apply a bundle methodology to solve this kind of dual problems. Weanalyze the resulting dynamic bundle method giving a positive answer for its convergenceproperties, including finite termination and a primal result for polyhedral problems. We alsoreport preliminary numerical experience on Linear Ordering and Traveling Salesman Problems.

1. Introduction

Consider the following optimization problem:

maxp

C(p)

p ∈ Q ⊂ IRmp

gj(p) ≤ 0 , j ∈ L := {1, . . . , n},

(1)

where C : IRmp → IR is a convex function and, for each j ∈ L, gj : IRmp → IRdenotes an affine constraint, usually difficult to deal with. Easy constraints areincluded in the set Q, which can be discrete. Our interest lies in applicationswhere the number n is an huge number, possibly exponential in the dimensionmp. This is a common setting in Combinatorial Optimization, as shown by manyworks in the area devoted to the study of different families of inequalities aimedat strengthen linear relaxations of several prototypical problems.

Alexandre Belloni: Operations Research Center MIT, E40-149, 77 Massachusetts Av., Cam-bridge, Massachusetts 02139-4307, US. e-mail: [email protected]

Claudia Sagastizabal: IMPA, Estrada Dona Castorina 110, Jardim Botanico, Rio de JaneiroRJ 22460-320, Brazil. On leave from INRIA Rocquencourt, France. Research supported byCNPq Grant No. 303540-03/6, e-mail: [email protected]

Correspondence to: Claudia A. Sagastizabal

1

2 Belloni and Sagastizabal

In principle, solutions to (1) can be found by using Lagrangian Relaxationtechniques. Let xj denote the nonnegative multipliers associated to hard con-straints and denote the non negative orthant by

IRn≥0 := {x ∈ IRn : xj ≥ 0 for all 1 ≤ j ≤ n}.

The Lagrangian dual problem of (1) is given by

minx∈IRn

≥0

f(x) , where f(x) := maxp∈Q

C(p) −∑

j∈L

xjgj(p)

(2)

is the dual function, possibly nondifferentiable.

For the Lagrangian relaxation approach to make sense in practice, two pointsare fundamental. First, the dual function should be much simpler to evaluate(at any given x) than solving the primal problem directly; we assume this isthe case for (1). Second, the dual problem should not be too big: in nonsmoothoptimization (NSO) this means that the dimension of the dual variable is lessthan 10000. So the approach is simply not applicable in our setting, becausein (1) n is too big. Instead of dualizing all the n hard constraints at once,an alternative approach is to choose at each iteration subsets of constraints tobe dualized. In this dynamical relaxation, subsets J have cardinality |J | much

smaller than n. As a result, the corresponding dual function, defined on IR|J|, ismanageable from the NSO point of view.

The idea of replacing a problem with difficult feasible set by a sequence ofsubproblems with simpler constraints can be traced back to the cutting-planesmethods [CG59,Kel60]. An important matter for preventing subproblems frombecoming too difficult is how and when to drop constraints (i.e., how to chose thesubsets J). In Semi-infinite Programming, for example, the identification of suffi-ciently violated inequalities is crucial for producing implementable outer approx-imation algorithms, [GP79,GPT80]. In Combinatorial Optimization, a methodto dynamically insert and remove inequalities was already used for the Travel-ing Salesman Problem (TSP) in [BC81] and for network design in [Gav85]. Re-lated works are [CB83,Luc92,Bea90,LB96], and more recently, [HFK01,MLM00,CLdS01,BL02b]. Being an special class of cutting-planes method, the approachwas named Relax-and-Cut algorithm in [EGM94].

All the works mentioned above use a subgradient type method, [Erm66,Sho85,HWC74] to update the dual multipliers. More recently, more robust NSOmethods have been used for the dual step, such as the proximal analytical cen-ter cutting planes method in [BdMV05] and bundle methods in [FG99,FGRS05,RS03,FLR05,Hel01]. Although numerical experience shows the efficiency of adynamic technique, except for the last work, no convergence proof for the algo-rithm is given. The algorithm in [Hel01] is tailored to solve semidefinite (SDP)relaxations of combinatorial ones, with a dual step that uses the so-called spec-tral bundle method. To choose subsets J , a maximum violation oracle seeks forthe most violated inequality in the primal problem at each iteration.

Dynamic bundle methods 3

In this paper we consider Relax-and-Cut methods from a broader point ofview, more focused on a dual perspective. Our procedure applies to generalproblems, not only combinatorial problems. Primal information produced bya Separation Oracle (which identifies constraints in (1) that are not satisfied,i.e., “violated inequalities”) is used at each iteration to choose the subset J .Dual variables, or multipliers, are updated using a particular form of bundlemethods, that we call Dynamic Bundle Method and is specially adapted to thesetting. Thanks to the well known descent and stability properties of the bundlemethodology, we are in a position to prove convergence of our method.

When compared to [Hel01], our method is slightly more general, becauseour separation procedure depends on the current index set J ; see Section 2.1below. This is an important feature for applications where the family of inequal-ities defined by L does not have an efficient maximum violation oracle and onlyheuristics can be used to search for violated inequalities. Naturally, the qualityof the separation procedure does limit the quality of the solution obtained bysolving dynamically the dual problem. For this reason, to prove global conver-gence we force the separation procedure to eventually satisfy a complementarityrule (cf. Section 4). Our convergence results show not only asymptotic solution ofthe dual problem, but also finite termination of the algorithm with a maximizerof a convexified form of (1), provided some reasonable assumptions, stated inTheorem11 below, hold.

This paper is organized as follows. Sections 2 and 3 discuss the general Relax-and-Cut formulation and the bundle methodology adapted to the new context.In Section 4 we describe the dynamic bundle method, whose formal proof ofconvergence is given in Section 5. In Section 6 we show finite termination of themethod when (1) is a linear program with Q a finite discrete set, under mildassumptions that are satisfied in many applications. Numerical results for LinearOrdering and Traveling Salesman problems are shown in Section 7. Finally, inSection 8 we give some concluding remarks.

Our notation and terminology is standard in bundle methods; see [HUL93]and [BGLS03]. We use the euclidean inner product in IRn: 〈x, y〉 =

∑nj=1 xjyj ,

with induced norm denoted by | · |. For an index set J , |J | stands for its cardinal-ity. The indicator function of the non negative orthant IRn

≥0, denoted by I≥0, isdefined to be 0 for all x ∈ IRn

≥0 and +∞ otherwise. Finally, given x nonnegative,NIRn

≥0(x) is the usual normal cone of Convex Analysis:

NIRn≥0

(x) ={

ν ∈ IRn : 〈ν, x − x〉 ≤ 0 for all x ∈ IRn≥0

}

.

2. Combining primal and dual information

The dual function f from (2) is the maximum of a collection of affine functionsof x, so it is convex. Under mild regularity conditions on (1), the dual problem(2) has a solution (see, for instance, Proposition6 below). In addition, the well-known weak duality property

f(x) ≥ C(p) for all x ≥ 0 and p primal feasible (3)


implies in this case that f is bounded from below. Note that the evaluationof the dual function at a given value of x gives straightforwardly a subgradi-ent. More precisely, letting px ∈ Q be a maximizer for which f(x) = C(px) −∑

j∈L xjgj(px), it holds that

−g(px) = − (gj(px)j∈L) = −(gj(px),{j=1,...,n} ) ∈ ∂f(x) . (4)

Based on this somewhat minimal information, black-box methods generate a se-quence {xk} converging to a solution x of the dual problem (2); see [BGLS03,Chs. 7-9]. If there is no duality gap, the corresponding px solves a convexifiedform of (1) (with Q therein replaced by its convex hull convQ). Otherwise, it isnecessary to recover a primal solution with some heuristic technique. Such tech-niques usually make use of px or even of the primal iterates pxk ; see for example[BA00], [BMS02].

An important consequence of considering a subset J instead of the full L isthat complete knowledge of dual function is no longer available. Namely, for anygiven x only

maxp∈Q

C(p) −∑

j∈J

xjgj(p)

and not maxp∈Q

C(p) −∑

j∈L

xjgj(p)

is known. In order to obtain convergent algorithms in the absence of completedata, primal and dual information should be combined adequately. We first ex-plain how to select hard constraints using primal points.

2.1. Selecting inequalities

To choose which constraints are to be dualized at each iteration, we assume thata Separation Oracle is available. A Separation Oracle is a procedure SepOr that,given p ∈ IRmp and J ⊆ L, identifies inequalities in L\J violated by p (in otherwords, hard constraints in (1) that are not satisfied by p). The output of theprocedure is an index set I , i.e., I = SepOr(p, J), which can be empty.

We assume that, as long as there remain inequalities in L\J violated by p,the separation procedure is able to identify one of such constraints. With thisassumption,

SepOr(p, J)=∅ ⇔ {j ∈ L : gj(p) > 0}⊆J ⇔ {j ∈ L : gj(p) ≤ 0}⊇L\J. (5)

In particular, SepOr(p, L) is always the empty set. In Combinatorial Optimiza-tion, this assumption corresponds to an exact separation algorithm for the familyof inequalities defined by the hard constraints.

The separation procedure in [Hel01] does not make use of an index set J ,and corresponds to J = ∅ in our setting:

SepOr[Hel01]

(p) = SepOr(p, ∅).


In [Hel01, Def. 4.1], to guarantee that the separation procedure does not stall, butexplores all the inequalities, the following “maximum violation oracle” condition

SepOr[Hel01](p) ⊆

{

j ∈ L : gj(p) = maxl∈L

{gl(p)} > 0}

}

is assumed. In particular, SepOr[Hel01](p) only answers the empty set if p is

feasible in (1).Even though stated for sets J , our assumption can be related to the one in

[Hel01], because each set J is a subset of L. More precisely, a Separation Oraclethat finds all violated inequalities by exploring the whole set L can as well exploreany given subset J to decide if there remain violated inequalities. Reciprocally,a Separation Oracle for which (5) holds for all J , satisfies in particular themaximum violation condition for J = L. However, for the following artificialexample the only one active constraint at the solution will never be the mostviolated inequality, for any iteration: given a positive parameter K, consider theproblem

maxp∈[0,2]2 p1 + p2

g1(p) = p1 − 1 ≤ 0g2(p) = p2 − 1 ≤ 0g3(p) = 2

K(p1 + p2) −

1K

≤ 0

and suppose the three constraints are relaxed. At the optimal set {(p1,12 − p1) :

p1 ∈ [0, 12 ]}, g3 is the only active constraint. Since the relaxed problem is linear,

for any multiplier x the evaluation of the dual function will return as px aextreme point in [0, 2]2. So gi(p) ∈ {−1, 1} for i = 1, 2 and |g3(p)| ≤ 7

K. Finally,

if g1(p) ≤ 0, g2(p) ≤ 0, this implies that g3(p) ≤ 0. Thus, in order for the thirdconstraint to be violated, at least one of the previous two constraints must alsobe violated. As a result, for any K > 7, the third constraint will never be themost violated constraint and the Separation Oracle proposed in [Hel01] may failto generate the right inequalities.

The construction of an efficient separation procedure is an important ques-tion by itself and has been extensively studied in Combinatorial Optimization.In many cases, the particular structure of Q and g can be exploited so that theSeparation Oracle reduces to solving a simple optimization problem. Unfortu-nately, there are families of inequalities for which no efficient exact separationprocedure is known so far (for example, the path inequalities for the TravelingSalesman Problem; see Section 7.2 below).

In this work, we give minimum assumptions on the separation procedure togain in generality. For instance, assumption (5) allows for any violated inequalityto be separated, requiring no special property from that inequality (such as beingthe maximum violated inequality, for example). Moreover, when the SeparationOracle returns many inequalities, (5) allows just to choose any nonempty subsetof such inequalities. This is a feature of practical interest when using heuristicapproaches, either because no exact separation procedure is available, or becausethe problem has inequalities that are more “promising” (known to be active at anoptimum, for example). Moreover, even if an exact separation oracle is available,


the use of fast heuristic methods for preliminary searches might increase theefficiency of the overall procedure.

Another question of interest is the determination of a bound on the numberof active constraints at a solution of (1). This knowledge could potentially leadto bounds on the number of dual variables considered in any given iteration. Inthe absence of degeneracy, one could expect that this knowledge would help tocorrectly identify the set of active inequalities and, hence, the minimal dimensionfor the dual problem. This is not the case for our problems of interest. Due tothe combinatorial nature of our applications, any vertex can have an exponentialnumber of binding inequalities. For example, the polytope associated with theTSP is known to be highly degenerate. In fact, an instance with only 10 citieshas more than 107 binding inequalities on each vertex (instead of 35 inequalitiesunder a nondegeneracy assumption); see [CR96] for details.

In order for our analysis to include such highly degenerate problems, we makeno assumptions of strict complementarity and we do not look for bounds onthe number of dual variables. However, to prove convergence of the algorithmwe shall require the condition that eventually all the relevant inequalities arecorrectly identified. Although this is somehow a minimum requirement (withoutit, the separation procedure may fail to recognize an optimum as a feasible point),it is not constructive and, therefore, not completely satisfactory. Instead, wemake use of a -constructive- complementarity rule which, after a certain numberkcomp of iterations has passed, allows the algorithm to remove inequalities onlyif the separation procedure returns no additional inequality; see Section 4 below.

2.2. Dual information

Given an index set J ⊂ L = {1, . . . , n}, write IRn = IR|J| × IRn−|J|. We denoteby P

Jthe linear operator in IRn defined as the orthogonal projection on the

corresponding subspace:

PJ

: IRn −→ IR|J| × IRn−|J|

x 7−→(

xJ , 0L\J

)

:=(

xj∈J , 0j∈L\J

)

.

With this definition,

x = PJ(x) =⇒ x = P

J′ (x) for all J ′ ⊇ J . (6)

As we mentioned, at each iteration only a partial knowledge of the dual function fis available. Namely, the composed function f ◦P

J, where the index set J varies

along iterations. From [HUL93, Thm. VI.4.2.1, volume I], the correspondingsubdifferential is given by the formula

∂(f ◦ PJ)(x) = P∗

J

(

∂f(PJ(x))

= PJ

(

∂f(PJ(x))

, (7)

because P∗J

= PJ.

NSO methods use dual data to define affine minorants of the dual function.When all the constraints are dualized at once, given

(

xi, f(xi), si ∈ ∂f(xi))

, by


the subgradient inequality, the relation f(xi)+⟨

si, x − xi⟩

≤ f(x) always holds.In our dynamic setting, there is also an index set Ji. If we used (7) to computesubgradients, we would only get minorizations of the form

f(

PJi(xi)

)

+⟨

s(

PJi(xi)

)

, x −PJi(xi)

⟩

≤ f(x) ,

which only hold on the subspace {x ∈ IRn : x = PJi(x)}. For a similar relation

to hold at any x, one should rather define subgradients omitting the projectionoperation, as shown by the next result.

Lemma 1. With the notation and definitions above, given xi ∈ IRn≥0 and Ji ⊂

L, let pi be a maximizer defining fi := f(PJi(xi)):

pi ∈ Arg maxp∈Q

C(p) −∑

j∈Ji

xijgj(p)

(8)

and define si := −g(pi) = −(

gj(pi)j∈L

)

. Then

fi +⟨

si, x −PJi(xi)

⟩

= C(pi) −⟨

g(pi), x⟩

≤ f(x) for all x ∈ IRn . (9)

Proof. Use the definition of the dual function f in (2) and the definition of theprojection PJi

to write:

fi = C(pi) −∑

j∈Jixi

jgj(pi)

= C(pi) −∑

j∈L PJi(xi)jgj(p

i) .

From the definition of si, the first equality in (9) follows. As for the inequality in(9), it follows too from the definition of f , since it implies that f(x) ≥ C(pi) −∑

j∈L xjgj(pi). ut

For convenience, in Algorithm1 below

(fi, pi) := DualEval(xi, Ji)

denotes the computations described by Lemma 1.

3. Bundle methods

To help getting a better understanding of how the dynamic bundle methodworks, we emphasize the main modifications that are introduced by the dynamicdualization scheme when compared to a standard bundle algorithm.

Unless required for clarity, we drop iteration indices in our presentation: xand x+ below stand for certain current and next iteration stability centers,respectively.

Since (2) is a constrained NSO problem, there is an additional complication inthe algebra commonly used in bundle methods. We assume that the primal prob-lem (1) is such that the dual function f is finite everywhere. In this case, bundlemethods are essentially the same than for unconstrained NSO, because the dualproblem in (2) is equivalent to the unconstrained minimization of (f + I≥0).


3.1. An overview of bundle methods

Let ` denote the current iteration of a bundle algorithm. Classical bundle meth-ods keep memory of the past in a bundle of information consisting of

{

fi := f(xi) , si ∈ ∂f(xi)}

i∈Band (x, f(x)) the last “serious” iterate .

Serious iterates, also called stability centers, form a subsequence {xk} ⊂ {xi}such that {f(xk)} is strictly decreasing. Although not always explicitly denoted,the superscript k in xk depends on the current iteration, i.e., k = k(`).

When not all the hard constraints are dualized, there is no longer a fixeddual function f , but dual objectives f ◦ P

J, with J varying along iterations. For

example, Jk below denotes the index set used when generating the stabilitycenter x and, similarly, Ji corresponds to xi. With the notation from Lemma 1,the bundle data is{

fi := f(pi) , si := −(

gj(pi)j∈L

)

}

i∈B

and(

x, f(x), Jk

)

with x = PJk

(x).

We will see that, by construction, it always holds that xi = PJi(xi). Using the

(nonnegative) linearization errors

ei := f(x) − fi −⟨

si, x − xi⟩

= f(x) − C(pi) +⟨

g(pi), x − xi⟩

,

all the past information is recovered from the bundle:{

(

ei, pi)

}

i∈B

and(

x, f(x), Jk

)

with x = PJk

(x).

Bundle information is used to define at each iteration a model of the dual functionf , namely the cutting-planes function

fB(x + d) = f(x) + maxi∈B

{

−⟨

g(pi), d⟩

− ei

}

,

where the bundle B varies along iterations.A well known property of bundle methods is that it is possible to keep in

the bundle B any number of elements without impairing convergence; see forinstance [BGLS03, Ch. 9]. More precisely, it is enough for the new bundle B+ tocontain the last generated information and the so-called aggregate couple (e, π);see Lemma 2 below.

An iterate x` is considered good enough to become the new stability centerwhen f(x`) provides a significant decrease, measured in terms of the nominaldecrease ∆`

∆` := f(x) − fB(x`). (10)

When an iterate is declared a serious iterate, it becomes the next stability centerand x+ = x` (otherwise, x+ = x). Linearization errors are then updated, usingthe recursive formula:

e+i = ei + f(x+) − f(x) +

⟨

g(pi), x+ − x⟩

. (11)


3.2. Defining iterates in a dynamic setting

We now consider more in detail the effect of working with a dynamic dualizationscheme.

Let J` be an index set such that J ⊆ J` (so that we have x = PJ`(x), by (6)).

To define an iterate satisfying x` = PJ`(x`), we solve a quadratic programming

problem (QP) depending on a varying set J` and on a positive parameter µ.More specifically, d` = (d`

J`, 0L\J`

) solves:

{

mind

fB`(x + d) +

1

2µ`|d|

2

x + d = PJ`(x + d) .

(12)

The next iterate is given by x` := x + d`.Here arises a major difference with a purely static bundle method. Instead

of just solving

mind

(

fB`+ I≥0

)

(x + d) +1

2µ`|d|

2,

we introduce the additional constraint x + d = PJ`(x + d) which depends on the

index set J`; see (14) below.Before giving the algorithm, we derive some technical relations that are im-

portant for the convergence analysis. For notational convenience, we drop iter-ation subindices and, instead of fB, we use a more general function ϕ. We alsouse the notation

IRnJ,≥0 := {x ∈ IRn

≥0 : x = PJ(x)}

for the constraint set in (12).

Lemma 2. Let x ∈ IRn≥0 be such that x = P

J(x). Having the affine function

g : IRmp → IR, for i ∈ B, let the nonnegative scalars ei and pi ∈ IRmp define thefunction

ϕ(x) := f(x) + maxi∈B

{⟨

−g(pi), x − x⟩

− ei

}

, (13)

and consider the problem

mindJ∈IR|J|

{

(ϕ + I≥0) (x + (dJ , 0L\J)) +1

2µ|dJ |

2

}

(14)

where J ⊂ J ⊂ L and µ > 0. The following hold:

(i) Let d∗J denote the unique solution to (14). Then x∗ := x+(d∗J , 0L\J) satisfies

x∗ = PJ(x∗) and

x∗ = x −1

µ(s + ν) with s ∈ ∂ϕ(x∗) and ν ∈ NIRn

J,≥0(x∗). (15)

Furthermore, letting e := f(x) − ϕ(x∗) − 1µ〈s, s + ν〉,

ϕ(y) ≥ f(x) + 〈s, y − x〉 − e for all y ∈ IRn (16)

where e is nonnegative.


(ii) There exists a simplicial multiplier

α∗ ∈ ∆I := {z ∈ IR|B| : zi ≥ 0 ,∑

i∈B

zi = 1}

such that letting π :=∑

i∈B α∗i p

i,

s = −g(π) and e = f(x) −∑

i∈B

α∗i C(pi) + 〈g(π), x〉 .

(iii) The L\J-components of the normal element are νL\J = −sL\J , and

ν ∈ NIRn≥0

(x∗) ⇐⇒ gL\J(π) ≤ 0.

Proof. Item (i) is just Lemma 3.1.1 in [HUL93, Ch. XV], written with(

C, x, y+, p)

therein replaced by(

IRnJ,≥0, x, x∗, ν

)

, respectively. Item (ii) follows from the fact

that ϕ in (13) is defined as the maximum of piecewise affine functions, yieldingthe subdifferential ∂ϕ(x) = {−

∑

i∈B αig(pi) : α ∈ ∆I and αi > 0 for i ∈ B

such that ϕ(x) = f(x) −⟨

g(pi), x∗ − x⟩

− ei}. In particular, when x = x∗,s = −

∑

i∈B α∗i g(pi) = −g(π) and

ϕ(x∗) = f(x) −∑

i∈B

α∗i

(⟨

g(pi), x∗ − x⟩

+ ei

)

= f(x) − 〈g(π), x∗ − x〉 −∑

i∈B

α∗i ei,

because g is affine. Then, by (15) and the definition of e, e =∑

i∈B α∗i ei and (9)

gives the desired expression for e. Finally, to show item (iii), first note that, by(15), s + ν = µ(x − x∗) has zero components in L\J . The result then followsfrom item (ii) and the expression for the normal cone

NIRn≥0

(x∗) ={

ν ∈ IRn : ν ≤ 0, with νj = 0 for j ∈ J such that x∗j > 0

}

;

see for instance [HUL93, Ex. III.5.2.6(b), volume I]. ut

The aggregate couple (e, π) from Lemma 2 represents in a condensed formrelevant bundle information for subsequent iterations. Along the iterative bundleprocess, when the bundle size becomes too big, the aggregate couple may beinserted in the index set B+ defining ϕ+ for the next iteration, to replace deletedelements. This is the so-called compression mechanism in bundle methods. As aresult, bundle elements have the form (e, p) ∈ IR≥0 × IRmp , corresponding eitherto past dual evaluations:

p = pj from (8) for some previous xj , and e = ej = f(x) − C(pj) +⟨

g(pj), x⟩

,

or to past compressions:

p = πj =∑

i∈Bjα∗,j

i C(pi) for some previous iteration j, and

e = ej = f(x) −∑

i∈Bjα∗,j

i C(pi) + 〈g(π), x〉 .


By (9), any bundle element corresponding to a dual evaluation, i.e., (ej , pj),

defines a plane that always stays below f , tangent to f at xj . By (16), aggregatebundle elements satisfy a similar relation: (ej , π

j) defines a plane that alwaysstays below f (because ϕ ≤ f). However, the aggregate plane is no longer tangentto f : at x, its value is not f(x), but f(x)−e. Being the maximum of planes definedvia (e, p), functions from (13) will satisfy the relation

ϕ(y) ≤ f(y) for all y ∈ IRn (17)

at all iterations.Relation (17) is of paramount importance for showing convergence in bundle

methods. In a static setting, (17) is automatic from the convexity of f becauseϕ is a model for f . In a dynamic setting, ϕ is no longer a model for f , but forf ◦ P

J, and (17) may not be satisfied for an arbitrary function f . Fortunately,

in our problem, f is the dual function in (2), and its specific structure givessatisfaction of (17), via (9) and (16).

Incidentally, note that aggregate errors can be updated with a formula similarto (11), using the aggregate primal point π instead of pi.

We finish this section with some relations derived from Lemma 2. In particu-lar, the equivalence in item (iii) reveals which are the primal points that shouldbe separated by the Separation Oracle in order to eventually solve (a convexifiedform of) the primal problem (1).

Corollary 3. Consider the notation and assumptions of Lemma2. The follow-ing hold:

(i) letting ε := f(x) − ϕ(x∗) − 1µ|s + ν|2,

f(y) ≥ f(x) + 〈s + ν, y − x〉 − ε for all y ∈ IRnJ,≥0.

(ii) Suppose the separation procedure satisfies (5). If SepOr(π, J) = ∅ then ν ∈NIRn

≥0(x∗) and

f(y) ≥ f(x) + 〈s + ν, y − x〉 − ε for all y ∈ IRn≥0. (18)

Proof. To show item (i), we use (17) in (16) to write f(y) ≥ f(x)+ 〈s, y − x〉− efor all y ∈ IRn. In particular, using the fact that ν ∈ IRn

J,≥0(x∗), adding the

inequality

0 ≥ 〈ν, y − x∗〉 = 〈ν, y − x〉 + 〈ν, x − x∗〉 for all y ∈ IRnJ,≥0 (19)

we obtain, using (15),

f(y) ≥ f(x) + 〈s + ν, y − x〉 − (e −1

µ〈ν, s + ν〉) for all y ∈ IRn

J,≥0.

By definition of e, the relation e− 1µ〈ν, s + ν〉 = ε holds, so ε is the sum of two

nonnegative terms, and item (i) follows. Finally, if the set SepOr(π, J) is empty,by (5), gj(π) ≤ 0 for all j ∈ L\J , and item (ii) follows from the equivalence inLemma 2(iii), by an argument similar to the one in item (i), where the inequalityin (19) now holds for all y ∈ IRn

≥0. ut


In the algorithm stated in section 4 below, both the bundle B and the indexset J may vary with the iteration index `. Stability centers are indexed byk = k(`). The function ϕ is the cutting-plane model fB (possibly aggregated),and for the QP information we use the condensed notation

(

x`, s`, ν`, e`, π`)

= QPIt(xk, fB, J`, µ`) .

For future reference, note that the quantity δ` defined below is directly com-putable after solving the QP, via the relations

δ` := f(xk(`)) − fB(x`) −1

2µ`|x

` − x|2 = ε` +1

2µ`|x

` − x|2, (20)

derived from the definition of ∆ in (10), of e in Lemma 2, and of ε in Corollary3.

4. The algorithm

We now give the dynamic bundle procedure in full details. The algorithm requiresthe choice of some parameters, namely m, |B|max, tol, and kcomp. The Armijo-like parameter m is used to measure significant decrease and declare a newstability center, |B|max is the maximum size allowed for the bundle withoutentering the compression mechanism, the tolerance tol is used in the stoppingtest, and, finally, kcomp denotes the first serious-step iteration for which thecomplementarity rule holds; see Step 3 and the last item in Remark 4 below.

Algorithm 1 (Dynamic Bundle Method).

Initialization. Choose m ∈ (0, 1], |B|max ≥ 2, a tolerance tol ≥ 0, and a non-negative integer kcomp.Choose x0 ∈ IRn

≥0 and define J0 ⊇ {j ≤ n : x0j > 0}, J0 6= ∅.

Make the dual computations to obtain (f0, p0) := DualEval(x0, J0).

Set k = 0, x0 := x0, f0 := f0, J0 := J0, ` = 1 and J1 := J0.Choose a positive µ1.Define the aggregate bundle B := ∅ and the oracle bundle B :=

{

(e0 := 0, p0)}

.Step 1. (Iterate Generation) Solve the QP problem:

(

x`, s`, ν`, e`, π`)

= QPIt(xk, fB, J`, µ`) .

Compute the nominal decrease from (10), ∆` = e` + 1µ `

⟨

s`, s` + ν`⟩

.

Step 2. (Dual Evaluation) Make the dual computations

(f`, p`) := DualEval(x`, J`).

Step 3. (Descent test and Separation)Call the separation procedure to compute

I` := SepOr(π`, J`).

If f` ≤ fk − m∆` then declare a serious step:


Move the stabilization center:(

xk+1, fk+1

)

:=(

x`, f`

)

.

Update the linearization and aggregate errors:

ei = ei + fk+1 − fk +⟨

g(pi), xk+1 − xk⟩

for i ∈ B,

e` = e` + fk+1 − fk +⟨

g(π`), xk+1 − xk⟩

.If k > kcomp and I` 6= ∅, take Ok+1 := ∅.Otherwise, compute Ok+1 := {j ∈ J` : xk+1

j = 0}.

Define Jk+1 := J`\Ok+1 and let J`+1 := Jk+1 ∪ I`. Set k = k + 1.

Otherwise, declare a null step: compute e` := fk − f` +⟨

g(p`), xk − x`⟩

.Set J`+1 := J` ∪ I`.

Step 4. (Stopping test) If ∆` ≤ tol and I` = ∅ stop.Step 5. (Bundle Management and Update)

If the bundle has not reached the maximum bundle size, set Bred := B.Otherwise,

If |B| = |B|max, then delete at least one element from B to obtain Bred.If {i ∈ B : fB(x`)=f(xk)−C(pi)−

⟨

g(pi), x` − xk⟩

−ei} ⊂ Bred, then B = ∅.

Otherwise, delete at least one more element from Bred, set B := {(e`, π`)}.

In all cases define, B+ := Bred ∪ B ∪ {(e`, p`)}.

Loop Choose a positive µ`+1. Set B = B+, ` = ` + 1 and go to Step 1. ut

Before passing to the convergence analysis, we comment on some importantfeatures of the algorithm.

Remark 4.– Note that J0 is defined in order to satisfy x0 = PJ0

(x0). Since by construction,it always holds that J` ⊃ Jk, Lemma 2(i) ensures that x` = PJ`

(x`) for allsubsequent iterations.– Typically, the Armijo-like parameter m used for the descent test takes valuesin (0, 1). Algorithm 1 allows m = 1 in order to ensure finite termination whenthe dual function is polyhedral (i.e., when in (1) C is affine and Q is finite); seeSection 6 below.– Since at null steps the cardinality of sets J` increases and may become “toolarge”, serious steps (which allow to remove indices from the set) could be favoredby setting a very low value for the parameter m. In addition, note that, if thedual dimension grows beyond control, it will at worst be like in a static bundlemethod.– In Step 5 the bundle is managed using the same mechanism than a staticbundle method. Essentially, we first try to keep all active bundle elements; ifthis is not possible, some (arbitrary) elements are deleted and the aggregatecouple is inserted in the bundle.– Let `k denote an iteration when a serious step is declared at Step 3 (xk+1 =x`k ). The new index set Jk+1 is defined by removing from J`k

those indicescorresponding to null components in xk+1:

Jk+1 = {j ∈ L : xk+1j > 0}∪I`k

where I`k⊂ {j ∈ L : xk+1

j = 0 and gj(π`k ) > 0}.

However, after generating kcomp serious steps, indices will be removed only ifI`k

= ∅, i.e., only if the Separation Oracle does not return any violated inequality


(because gL\J`k(π`k ) ≤ 0). As a result, the relation

|I`||Ok+1| = 0 if k > kcomp and xk+1 = x`

holds. Due to its similarity with a complementarity condition in nonlinear pro-gramming, we call this relation complementarity rule. In practice, the parameterkcomp plays an important role. It gives the algorithm the flexibility to remove in-equalities which are easily identified not to be active at the optimal primal pointin the early iterations and, hence, possibly reducing the cardinality of Jk+1.

5. Convergence Analysis. Dual Results

In this section we focus on dual convergence results. When the algorithm loopsforever, there are two mutually exclusive situations. Either there is a last seriousstep followed by an infinite number of null steps, or there are infinitely manydifferent serious steps.

For our presentation, we follow [HUL93, Ch. XV.3]. Notwithstanding theformal similarities with the referred material, the dynamic scheme introducesmodifications in the QP that need to be addressed carefully when there is aninfinite sequence of serious steps; see Lemma 7 below.

We first address the (simpler) case when only a finite number of serious stepsis generated by Algorithm 1.

Theorem 5. Consider Algorithm 1, applied to the minimization problem (2).

Suppose there is an iteration ˆwhere the stability center x is generated, followedby an infinite number of null steps. There is an iteration àst ≥ ˆ such that thefollowing holds:

(i) The sets J` stabilize: there exists J ⊂ L such that J` = J for all ` ≥ àst.(ii) If µ` ≤ µ`+1 for ` ≥ àst, then the sequence {δ`}`>àst is strictly decreasing.

Suppose that the Armijo-like parameter satisfies m < 1. The following holds:

(iii) If µ` ≤ µ`+1 for ` ≥ àst and∑

`≥àst

µ`

µ2`+1

= +∞, then x is a minimizer of f

on IRnJ,≥0.

(iv) If, in addition, the separation procedure satisfies (5), then x solves (2).

Proof. Since we only remove inequalities at serious steps via Ok+1, once x hasbeen generated, index sets J` can only increase. Since L is finite, there is eventu-ally an iteration, say àst, such that Jàst = J ⊂ L. Subsequently, there are onlynull steps with J` = J for all ` ≥ àst, so (i) holds, and the method becomesa static bundle method for the problem min {f(x) : x = PJ(x)}, whose conver-gence analysis can be found in [HUL93, Ch.XV.3]. In particular, item (ii) followsfrom [HUL93, eq. (3.2.9) p. 311]. Moreover, since our assumptions in item (iii)correspond, respectively, to conditions (3.2.7) and (3.2.8) in [HUL93, TheoremXV.3.2.4], x minimizes f on IRn

J,≥0.


Furthermore, as shown [HUL93, eq. (3.2.11) p. 312] δ` satisfies the relationCµ`δ

2` < δ`−1 − δ`. This relation, together with the assumptions on {µ`} imply

thatε` → 0 and s` + ν` → 0.

By Corollary3(ii), whenever SepOr(π`, J`) = ∅, the vector s` + ν` belongs to

∂e`

(

f + IIRn≥0

)

(x). Since sets J` eventually stabilize at J` = J , the sets I` =

SepOr(π`, J`) are eventually empty. Passing to the limit in (18), we concludethat x minimizes f on IRn

≥0, as desired. ut

The proof of Theorem 5 puts in evidence an interesting issue. Namely, aslong as J`+1 ⊃ J` at null steps, from the dual point of view it does not mattermuch how the index set is chosen, there will always be convergence to a solutionto

{

min f(x)x = PJ(x),

while convergence to the original dual problem, i.e., to (2), is guaranteed if (5)holds. The situation is quite different at serious steps. In this case, for havingconvergence, the primal problem must satisfy a Slater-like condition given below,that implies that the dual function is 0-coercive on the nonnegative orthant, i.e.,

f(x) → +∞ as x ∈ IRn≥0 goes to ∞ in norm.

Proposition 6. Suppose that for the primal problem (1)

∃r ≤ n + 1, points p1, p2, . . . , pr ∈ Q, and convex multipliers α1, α2, . . . , αr

such that gj(∑

i≤r αipi) < 0 for all j = 1, . . . , n.

(21)Then the optimal value in (2) is finite and the dual function is 0-coercive onIRn

≥0.

Proof. Our assumption (21) gives the existence of δ > 0 for which the ballB(0; ρ) satisfies the inclusion

B(0; ρ) ⊂

∑

i≤r

αig(pi), for convex multipliers α

+ IRn≥0,

In particular, for the point −x

|x|ρ ∈ B(0; ρ), where 0 6= x ∈ IRn

≥0,

−x

|x|ρ =

∑

i≤r

αig(pi) + σ for some convex multiplier α and a vector σ ∈ IRn≥0.

By definition of the dual function, f(x) ≥ C(p) − 〈x, g(p)〉 for any p ∈ Q. Bytaking p = pi and making the convex sum with the multipliers αi we see that

f(x) ≥∑

i≤r

αiC(pi) −

⟨

x,∑

i≤r

αig(pi)

⟩

. (22)


By (21), the righmost term above is positive, and, thus, f(x) ≥∑

i≤r αiC(pi) >

−∞ for all x ∈ IRn≥0. Finally, using in (22) the relation−

∑

i≤r αig(pi) =x

|x|ρ+σ,

the inequality

f(x) ≥∑

i≤r

αiC(pi) + ρ|x| + 〈x, σ〉

holds for any 0 6= x ∈ IRn≥0, and the 0-coercivity of f follows. ut

Before stating convergence for the case of an infinite sequence of serious steps,we give some relations that are useful for the showing finite termination of thealgorithm in Theorem11.

Lemma 7. Suppose the primal problem (1) satisfies (21) and consider Algo-rithm 1 applied to the minimization problem (2). Suppose there are infinitelymany serious steps and let `k denote an iteration index giving a serious step:xk+1 = x`k . Then

(i) {xk} is bounded, and the nonnegative sequences {∆`k}, {δ`k

}, {ε`k}, and

{e`k} tend to 0 when k → ∞.

(ii) If µ`k≤ µmax for all k, then s`k + ν`k → 0 when k → ∞.

(iii) If, in addition, the separation procedure satisfies (5), then there exist infinitelymany serious steps indexed in the set K ′′ such that Ik = ∅ for all k ∈ K ′′,

and, for any accumulation point x of {xk}k∈K′′ , the relation fB`k(xk+1)

k∈K′′

−→f(x) holds when k ∈ K ′′ goes to ∞.

Proof. For the algorithm to generate infinitely many serious steps, the stoppingtest must never hold: either ∆`k

> tol, or I`k6= ∅. We claim that if there is an

infinite sequence of different serious steps then

∀k′ > kcomp ∃k ≥ k′ such that ∆`k> 0,

even if tol = 0. For contradiction purposes, suppose the claim does not hold: thereis k′ > kcomp such that ∆`k

= 0 for all k ≥ k′. In particular, since ∆`k≤ tol,

and the algorithm does not stop, I`k6= ∅. Since k > kcomp, the complementarity

rule forces Ok+1 = ∅, by construction. By (20), the nominal decrease satisfiesthe relation

∆`k= ε`k

+ µ`k|xk+1 − xk|2 = ε`k

+1

µ `k

|s`k + ν`k |2. (23)

Thus, if ∆`k= 0, xk = xk+1, with I`k

6= ∅, Ok+1 = ∅ for all k ≥ k′. Thiscontradicts the fact that n = |L| is finite, so the claim holds.

As a result, when the descent test holds, fk+1 ≤ fk − m∆`kfor infinitely

many k for which ∆`k> 0. Boundedness of {xk} then follows, because the

sequence {fk} is strictly decreasing and f is 0-coercive, by Proposition (6). Sincethe optimal value in (2), f , is finite, by summation over the infinite set of serious

step indices, we obtain that 0 ≤ m∑

k ∆`k≤ f0 − f . Thus, the series of nominal

decreases is convergent. Recalling that, by (20), 0 ≤ ε`k, δ`k

≤ ∆`k, and, by


(15), 0 ≤ e`k≤ ε`k

, item (i) follows. Item (ii) now follows from (23), becauseµ`k

≤ µmax.To see (iii), we start by showing that there is an infinite set of serious stepindices K ′ such that

ν`k ∈ IRn≥0(x

`k ) for all k ∈ K ′. (24)

We base our argument on the set G = {k : I`k= ∅}. Using Corollary3 (ii),

G ⊂ K ′. If |G| < ∞, using the complementary rule it implies that we removeinequalities only a finite number of times and since we add at least one inequalityon each serious step k /∈ G, one would have an infinite number of inequalities (acontradiction since n is finite). So we must have |G| = ∞ which implies |K ′| = ∞.Finally, let x be any accumulation point of {xk}K′ (a bounded subsequence, byitem (i)), and let {xk}k∈K′′⊂K′ be the subsubsequence converging to x. Then,by continuity, using again that ∆`k

→ 0 in particular for k ∈ K ′′, we see that

fB`k(xk+1)−f(x) = fB`k

(xk+1)−f(xk)+f(xk)−f(x) = −∆`k+f(xk)−f(x) → 0,

and the proof is complete. ut

Theorem 8. Suppose the primal problem (1) satisfies (21) and consider Algo-rithm 1 applied to the minimization problem (2). Suppose there are infinitelymany serious steps and let `k denote an iteration index giving a serious step:xk+1 = x`k . Suppose that µ`k

≤ µmax for all k. If the separation proceduresatisfies (5) then the sequence of serious steps is minimizing for (2).

Proof. Consider the convergent subsequence of serious steps K ′′ defined byLemma 7 (iii). Since Ik = ∅ for all k ∈ K ′′ , by Corollary3(ii), we have that

f(x) ≥ fk +⟨

s`k + ν`k , x − xk⟩

− εk for all x ∈ IRn+,

and fk → f(x) where x is the accumulation point for the sequence definedby K ′′. Assume that there exist a positive ε and point x∗ ∈ IRn

≥0 such thatf(x∗) < f(x) − ε. Then, for all k ∈ K ′′,

−ε >⟨

s`k + ν`k , x∗ − xk⟩

− εk ≥ −|s`k + ν`k ||x∗ − xk | − εk.

Lemma 7 (ii) ensures that εk → 0 and |s`k + ν`k | → 0 and, by Lemma7(i),the sequence is bounded. This is a contradiction, so x must a minimizer forproblem (2). ut

We end this section with a convergence result gathering the statements ofTheorems 5 and 8.


Corollary 9. Suppose the primal problem (1) satisfies (21) and the separationprocedure satisfies (5). Consider Algorithm 1 applied to the minimization problem(2). Suppose that the Armijo-like parameter satisfies m < 1 and the algorithmloops forever. If at serious steps µ`k

≤ µmax, while at null steps

µ` ≤ µ`+1, with∑

{`:` and `+1 are null steps}

µ`

µ2`+1

= +∞,

then either the sequence of serious steps is finite with last element solving (2),or the sequence is infinite and minimizing for (2). ut

In the next section we address the question of when our dynamic bundleprocedure solves the original primal problem (and not just its dual).

6. Finite termination. Primal Results

We now study when the primal points generated by Algorithm 1 solve (1). Inparticular, we will show that when Q is finite and the primal problem is alinear program, an adequate choice of parameters yields finite termination at anoptimal convex primal point. Indeed, as usual when applying a dual approach,primal points generated by Algorithm1 can (at best) solve a convexified formof the primal problem (1).

First we show that, even is Q is not finite, when the Algorithm stops at someiteration with ∆ = 0, then one of such optimal convex primal points is available.

Lemma 10. Consider Algorithm 1 applied to the minimization problem (2) andthe primal convex point πàst defined in item (ii) of Lemma 2. If the algorithmstops at iteration àst with ∆àst = 0 and Iàst = ∅, the following hold:

(i)⟨

g(πàst), xk(àst)⟩

= 0, and g(πàst) ≤ 0.(ii) If the objective function in (1) is affine, then πàst solves the problem

maxp

C(p)

p ∈ convQ ⊂ IRmp

gj(p) ≤ 0 , j ∈ L := {1, . . . , n},

which is (1) with Q therein replaced by convQ.

Proof. Since ∆àst = f(xk(àst)) − fBàst(xàst), from Lemma 2, we see that

f(xk(àst)) =∑

i∈Bàst

α∗i C(pi) −

⟨

g(πàst), xàst⟩

.

In addition, from the relation ∆àst = εàst + µàst|xàst − xk(àst)|2 (cf. (20)),we see that xàst = xk(àst), so

f(xàst) =∑

i∈Bàst

α∗i C(pi) −

⟨

g(πàst), xàst⟩

. (25)


By Lemma 2, item (iii), Iàst = ∅ is equivalent to having νàst ∈ NIRn≥0

(xàst).

Using the characterization of normal elements in [HUL93, Ex. III.5.2.6(b)], wesee that νàst

j ≤ 0 if xàstj = 0, and νàst

j = 0 if xàstj > 0, so their scalar product

is zero:⟨

νàst, xàst⟩

= 0. But, since xàst = xk(àst), sàst + νàst = 0, withsàst = −g(πàst). As a result, νàst = g(πàst), and item (i) follows.If C is affine, using the first assertion in item (i), the relation (25) becomesf(xàst) = C(πàst). The second assertion in item (i) says that πàst is feasiblefor the primal problem in item (ii), so the final result follows from the weakduality relation (3). ut

The maximizer pàst corresponding to (8) written with xàst is also available.However, since there is no control over which maximizer pàst is computed toevaluate the dual function, nothing can be said about its feasibility: unlike πàst,pàst does belong to Q, but g(pàst) may have any sign. In fact, by Everett’stheorem ([HUL93, p.163], all that can be asserted is that pàst solves the problem

maxp

C(p)

p ∈ Q ⊂ IRmp

gj(p) ≤ gj(pàst) , j such that xàst

j > 0gj(p) unconstrained otherwise .

We finish our analysis by studying the behavior of Algorithm 1 for a polyhe-dral dual function (Q finite and C affine), when parameters are set to tol = 0,m = 1, |B|max = ∞. We proceed somewhat similarly to [Kiw85, Ch. 2.6], usingthe asymptotic dual results to obtain a contradiction and show finite termina-tion.

Suppose that Q in (1) is finite, with q elements p1, p2, . . . , pq. The corre-sponding dual function has the form

f(x) = maxl≤q

C(pl) −∑

j∈L

xjgj(pl)

.

Therefore, at each given xi there are at most q maximizers pi as in (8), yieldinga finitely generated subdifferential

∂f(xi) = conv{

−g(pi) : pi from (8), i ≤ q}

.

Likewise, for a fixed stability center x := xk(`), linearization errors correspondingto past dual evaluations, i.e,

ei = f(x) − C(pi) +⟨

g(pi), x − xi⟩

,

can only take a finite number of different values, at most q. This is not the casefor aggregate linearization errors corresponding to bundle compressions, becausethey have the expression

ej = f(x) −∑

i∈Bj

α∗,ji C(pi) + 〈g(π), x〉 ,


which can take an infinite number of different values in convQ. This is the reasonwhy we consider a variant of Algorithm 1 that does not use bundle compression(|B|max = ∞).

Theorem 11. Suppose the primal problem (1) is a linear program, with Q finite,and satisfies (21). Suppose, in addition, that the separation procedure satisfies(5). Consider Algorithm 1 applied to the minimization problem (2) with param-eters tol = 0, m = 1, |B|max = ∞. If for all iterations µ` ≤ µmax, while atnull steps µ` ≤ µ`+1, then the algorithm stops after a finite number of iterationshaving found a primal convex point πàst solution of the problem

maxp

C(p)

p ∈ convQ ⊂ IRmp

gj(p) ≤ 0 , j ∈ L := {1, . . . , n},

which is (1) with Q therein replaced by convQ.

Proof. For contradiction purposes, suppose Algorithm 1 loops forever. The as-sumptions on µ` imply that the asymptotic results in Theorem5(i) − (ii) andLemma 7 apply.

Suppose first that there is a last serious step x, followed by infinitely manynull steps. By item (i) in Theorem5, for all ` ≥ àst, J` = J . The Fenchel dualof (14) for ` ≥ àst has the form

minα,ν

1

2µ`

∑

j∈J l

(

∑

i∈B`

αigj(pi) − νj

)2

+∑

i∈B`

αiei −∑

j∈L

νj xj

α ∈ {z ∈ IR|B`| : zi ≥ 0 ,∑

i∈B`zi = 1}

ν ∈ IRn, νj ≤ 0, for j ∈ J ,

(26)

whose optimal value equals δ`, as defined in (20). To derive (26), recall thatthe Fenchel dual problem of miny(f1(y) + f2(y) + f3(y)), is mins,ν(f∗

1 (−s) +f∗2 (−s − ν) + f∗

3 (ν)), where f∗ denotes the conjugate function. Take in (14)f1(dJ ) := ϕ(x+(dJ , 0L\J)), f2(dJ ) := 1

2µ|dJ |2, and f3(dJ ) := I≥0(x+(dJ , 0L\J)).Problem (26) is obtained by computing the respective conjugates, via variouscalculus rules in [HUL93, Ch. X] and the expression for ϕ∗ given in [HUL93, p.299, ch. XV].

The assumption that there is no aggregation implies that B` contains at mostq different pairs (ei, p

i). As a result, there is a finite number of different feasiblesets in (26) for ` ≥ àst. Consider now a problem similar to (26), but withobjective function

1

2µàst

∑

j∈J

(

∑

i∈B`

αigj(pi) − νj

)2

+∑

i∈B`

αiei −∑

j∈L

νj xj ,

and let β` denote its optimal value. Clearly, β` yields an upper bound for δ`

(since µàst ≤ µ`) and may only have a finite number of different optimal values.


On the other hand, µàst

µmaxβ` ≤ µàst

µ`β` ≤ δ`. The contradiction follows, because

{δ`}`≥àst, is a strictly decreasing sequence by item (ii) in Theorem5.Suppose now that there is an infinite number of serious steps. Since µ`k

≤µmax, by Lemma7(ii) sk + νk → 0 as k → ∞, where sk ∈ ∂fB`k

(xk+1) and νk ∈

NIRnJ`k

,≥0(xk+1). Since the function fB`k

is convex and polyhedral, there exists

ρk > 0 such that if |sk + νk| < ρk, then 0 ∈ ∂(fB`k+ IIRn

J`k,≥0

)(xk+1) and xk+1

minimizes fB`kon IRn

J`k,≥0; see for example [Kiw91, Lemma 4.1]. Furthermore,

for k ∈ K ′′ as defined in Lemma7(iii), I`k= ∅ and νk ∈ NIRn

≥0(xk+1), so xk+1

minimizes fB`kon IRn

≥0.Note that the requirement that m = 1 in the descent test (Algorithm 1 Step

3), i.e.,f(xk+1) ≤ f(xk) − ∆`k

= fB`k(xk+1),

means that f(xk+1) = fB`k(xk+1), because fB`k

≤ f by (17). It follows that for

k ∈ K ′′ sufficiently large f(xk+1) = f(x). Consider the first index k ∈ K ′′ suchthat this equality holds and note that, since

fB`k+1(y) = max

{

fB`k(y), f(xk+1) −

⟨

g(p`k), xk+1 − y⟩

}

= max{

fB`k(y), f(x) −

⟨

g(p`k), xk+1 − y⟩

}

≤ f(y),

and xk+1 ∈ J`k+1, xk+1 is also a minimizer of fB`k+1on IRn

J`k+1,≥0. In particular,

f(x) = fB`k+1(xk+1) ≤ fB`k+1

(x`k+1).

But x`k+1 is the unique minimizer of fB`k+1(·)+ 1

2µ`k+1| ·−xk+1|2 on IRnJ`k+1,≥0.

Hence,

fB`k+1(x`k+1) +

1

2µ`k+1|x

`k+1 − xk+1|2 < fB`k+1(xk+1) = f(x)

and, thus, 12µ`k+1|x`k+1 − xk+1|2 = 0, so x`k+1 = xk+1. Note that, because

∆`k+1 = f(xk+1)− fB`k+1(x`k+1) = 0, iteration `k +1 is a serious step iteration.

So the previous reasoning applies at iteration `k + 2, and, hence, to all subse-quent iterations. Therefore, for the next iterate in K ′′, say for iteration `k + M ,∆`k+M = 0, I`k+M = ∅, and the algorithm stops.

We have shown that Algorithm1 always satisfies the stopping test withtol = 0 and Iàst = ∅ at some finite iteration àst. The final result followsfrom Lemma 10. ut

For our convergence results to hold, the separation procedure must satisfy(5) and the primal problem must verify condition (21). In order to evaluate theimpact of these assumptions, we analyze the behavior of Algorithm1 for twoCombinatorial Optimization problems.


7. Computational Experiments

We begin validation of our approach by running a simple implementation ofthe Dynamic Bundle Method (DynBM) on two problems. A more sophisticatedimplementation of DynBM would certainly require the investigation of variousstrategies to improve the efficiency of the Linear Algebra subroutines when pass-ing to a dynamic setting. Such enhancements are out of the scope of this work,essentially theoretical.

For this reason, instead of comparing computational times, we focus ourstudy on comparing the quality of the directions generated by DynBM with re-spect to other methods. In this case, an appropriate measure for comparisonsis the number of dual evaluations required to achieve a desired precision. Thatmeasure is particularly relevant for problems whose dual evaluation is extremelycostly (for example, SDP relaxations or simulations); see [FGRS05]. We referto [Fra02] for a detailed study on how to exploit the structure of the QP inorder to efficiently handle all the dynamic operations required by DynBM (forinstance, adding/removing variables). Our implementation uses the code de-scribed in [Fra02] as a starting point, without worrying about fully exploitingthe QP structure to reduce CPU times.

To test the proposed method, two combinatorial problems were investigated.For each one, a family of inequalities was dynamically dualized, and for bothproblems the Separation Oracle and the Dual Evaluation procedures were im-plemented. Before proceeding to the computational results, we briefly describeeach problem formulation.

7.1. Linear Ordering Problems

The Linear Ordering Problem (LOP) consists in placing elements of a finite setN in sequential order. If object i is placed before object j, we incur in a cost cij .The objective is to find the order with minimum cost.

The LOP Linear Integer Programming formulation in [GJR84] uses a set ofbinary variables {pij : (i, j) ∈ N × N}. If object i is placed before object j,pij = 1 and pji = 0.

minp

∑

(i,j):i6=j

cijpij

pij + pji = 1, for every pair (i, j)pij ∈ {0, 1}, for every pair (i, j)pij + pjk + pki ≤ 2, for every triple (i, j, k) (∗)

The 3-cycle inequalities in constraint (*) above have a huge cardinality. Theyare the natural candidates for our dynamic scheme, so we associate a multiplierxijk to each one of them. After relaxation, it results a concave dual function,corresponding to −f(x) in (2), whose evaluation at any given x is done by solving


(N2 − N)/2 small problems with only two variables each. More precisely, eachDual Evaluation amounts to solving

∑

(i,j):i<j

min{pij ,pji}

cijpij + cjipji

pij + pji = 1pij , pji ∈ {0, 1}

where cij = cij −∑

k∈N\{i,j} xijk depends on the multiplier x.

For initialization purposes, we need J0 6= ∅. To obtain this first set of in-equalities, we just separate a solution of the relaxed problem with J = ∅. Thisprocedure generates a non-empty set of inequalities to start with (otherwise wehave found the optimal solution and we are done).

For this test problem, there is only a polynomial number of dynamicallydualized constraints. The Separation Procedure is therefore easy: it just consistsof checking any triple of indices, a task that can be done in constant time foreach triple. So all assumptions required for dual and finite/primal convergenceare met. To illustrate the application of the method we used 49 instances ofthe library LOLIB [LOLIB]. Finally, we point out that considering the feasiblesolutions p1 and p2 induced by the orderings {1, 2, . . . , N − 1, N} and {N, N −1, . . . , 2, 1}, we have that 1

2p1 + 12p2 satisfies the Slater-like condition (21). In

our runs we used |Bmax| = 25 for the LOP instances.

7.2. Traveling Salesman Problem

Let G = G(Vn, En) be the undirected complete graph induced by a set of nodesVn and let the set of edges be En. For each edge of the graph, a cost ce is given,and we associate a binary variable pe which equals one if we use the edge ein the solution and equals zero otherwise. For every set S ⊂ Vn, we denote byδ(S) = {(i, j) ∈ En : i ∈ S, j ∈ Vn \S} the edges with exactly one endpoint in S,and γ(S) = {(i, j) ∈ En : i ∈ S, j ∈ S} . Finally, for every set of edges A ⊂ En,p(A) =

∑

e∈A pe. The TSP formulation of Dantzig, Fulkerson and Johnson isgiven by

minp

∑

e∈En

cepe

p(δ({j})) = 2 , for every jp(δ(S)) ≥ 2 , S ⊂ Vn, S 6= ∅pe ∈ {0, 1} , e ∈ E.

(27)

As in the well known Held and Karp (HK) bound, we make use of 1-tree struc-tures. Let Vn denote the set of vertices, and let En the set of edges. Given avertex v1, we call a 1-tree the set of edges formed by the union of a tree onVn \ v1 and any two edges that have v1 as one end node. For

X := {x ∈ IR|En| : x is a 1-tree},


we define an equivalent problem of (27) by

minp

∑

e∈En

cepe

p(δ({j})) = 2 , for every j (#)p ∈ X

(28)

The HK bound is obtained by solving a dual problem resulting from dualizingconstraints (28.#). In order to obtain tighter bounds, we introduce a family offacet inequalities for the associated polytope, called the r-Regular t-Paths In-equalities, following the approach in [BL02a]. More precisely, we choose vertexsets H1, H2, . . . , Hr−1 and T1, T2, . . . , Tt, called “handles” and “teeth” respec-tively, which satisfy the following relations:

H1 ⊂ H2 ⊂ · · · ⊂ Hr−1

H1 ∩ Tj 6= φ for j = 1, . . . , tTj \ Hr−1 6= φ for j = 1, . . . , t(Hi+1 \ Hi) ⊂ ∪t

j=1Tj for 1 ≤ i ≤ r − 2.

The corresponding p-Regular t-Path inequality is given by

r−1∑

i=1

y(γ(Hi)) +t∑

j=1

p(γ(Tj)) ≤r−1∑

i=1

|Hi| +t∑

j=1

|Tj | −t(r − 1) + r − 1

2.

When introducing these inequalities as additional constraints in (28), wemay improve the HK bound. However, since the number of such inequalitiesis exponential in the size of the problem, a dynamic scheme of dualization isneeded. The initial set J0 is given by the set of inequalities dualized by Held andKarp.

For this test problem, the Separation Oracle implemented had to be a heuris-tic search, since no efficient exact separation procedure is known. If the Separa-tion Oracle is called to separate inequalities from 1-tree structures, the integralityand the existence of exactly one cycle, make the heuristic search for violated in-equalities easier to implement. The case of calling the separation oracle for anaggregated point (i.e., π instead of p ∈ Q) is slightly more complicated, sincethe point is no longer a 1-tree, but an arbitrary graph. Essentially, we searchfirst for any node with degree greater than or equal to three, and then try toexpand an odd number of paths from such a node. Computational tests weremade over TSPLIB (see [TSPLIB]) instances which are symmetric and have di-mension smaller than 1000 cities. In our runs we used |Bmax| = 80 for the TSPinstances.

7.3. Comparison with a dynamic subgradient method

For our comparisons, we use a dynamic subgradient method (DynSG), as in[BL02b] and [Luc92]. This is a natural extension of the traditional (exact) sub-gradient method found in the literature [Sho85]. The Dual evaluation and Sep-aration Oracle routines for the DynSG were the same used for the DynBM. In


order to define the stepsize, of the form ω(f(x`)−f∗)/|PJ`(s(x`))|2 with ω ∈]0, 2[,

we only consider subgradient components in J`. Otherwise, the method is boundto produce small stepsizes, since |s(x`)| should be comparable to the total num-ber of constraints. We also need an estimate f ∗ of the optimal dual value or thevalue of any feasible solution. We use bounds given by primal feasible points inLOP problems, and (known) optimal values for TSP problems. As in [BL02b]and [Luc92], the rule used to manage J` in this method is to remove inequali-ties whenever their multiplier is set to zero. We refer to these works for a morecomplete description of the dynamic subgradient method.

7.4. Numerical results

In order to compare the two methods, we specified a desired precision and com-puted the number of dual evaluations required to achieve such precision. Thiswas done because the DynSG inherits all the drawbacks of the original subgra-dient method that lacks of a reliable stopping criterion. By contrast, DynBMdoes have a reliable criteria which actually worked most of the instances tested.

In order to ease the analysis of our results, we report them both in Table 1,showing average values and in Figures 1 and 2, with performance profiles; see[DM02].

Table 1 reports the number of active dualized constraints at the final point, aswell as the maximum and average number of dualized constraints. Averages weretaken after normalization, relative to the instance size (the number of objects Nfor the LOP or the number of cities n for the TSP).

Cardinality of J` LOP (Normalized by N) TSP (Normalized by n)Active Maximum Average Active Maximum Average

DynSG 61 114 61 0.165 0.192 0.157

DynBM 124 154 100 1.759 2.089 0.437

Table 1. Average statistics for the number of inequalities dualized

Note that the number of Dual Evaluations required by the dynamic bundlescheme is substantially lower than in the subgradient scheme. This suggests thehigh quality of the directions generated by the bundle method at each iteration.Furthermore, the average number of inequalities in the Bundle Method is consis-tently larger, due to the “memory” imposed by our method, which only removesinequalities at serious steps.

Performance profiles plot the ratio of dual calls used by a given solver to num-ber of dual calls used by the best solver against the portion of tests successfullycompleted. The solvers which most rapidly reach the value of one are consideredthe most successful; this is the case for DynBM in Figures 1 and 2.


1 2 3 4 5 6 70

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Performance Profile for the Number of Oracle Calls (LOP)

Dynamic Bundle MethodSubgradient

Fig. 1. Performance profile for the number of oracle calls for the Linear Ordering Problem.

2 4 6 8 10 120

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Performance Profile for the Number of Oracle Calls (TSP)

Dynamic Bundle Method

Subgradient

Fig. 2. Performance profile for the number of oracle calls for the Traveling Salesman Problem.

In its current version, the bundle code uses cold starts for QP subproblems.This feature, together with the fact that there are more dual variables (i.e.,more dualized constraints in average), explains the higher running times for theDynBM. Warm starts (for example, starting QP matrices factorizations not fromscratch, but from previous computations) should lead to a more favorable timedistribution as in other works such as [FG99], [FLR05], and [FGRS05].

8. Concluding Remarks

In this paper, we consider the use of bundle methods in a dynamic setting tosolve combinatorial problems. Although the approach has already been consid-ered in other recent works, the theoretical analysis presented in this paper notonly provides justification to the dynamic bundle methodology, but also gives asophisticated rule for removing inequalities that is sound both computationally


and theoretically. More precisely, by means of a constructive complementarityrule we ensure convergence to a dual solution without artificial assumptions onthe sequence of iterates. The complementarity rule is general enough to allownontrivial resettings of inequalities, preventing, at the same time, eventual cy-cling behavior.

Another feature of our method is that, in addition to proving convergencefor the dual problem, we show that, for polyhedral problems and a proper choiceof parameters, DynBM terminates having found a primal point that solves aconvexified form of the primal problem.

With respect to the assumption made on the separation procedure, we pointout that any dynamic method will be limited by the quality of the available Sep-aration Oracle. Our results require the separation procedure to be good enough,in the sense of (5), but the method still provides enough flexibility to combinethe Separation Oracle with fast heuristics capable of speeding up the process.

After having developed a satisfactory convergence theory, we validate theapproach with computational experiments that illustrate the potential of theapproach. The performance profiles comparing the number of calls to the oraclefor achieving a certain duality gap show an impressive superiority of DynBMover subgradient methods. To make our preliminary implementation competi-tive also in terms of computational times, not only the QP structure should beefficiently exploited to handle dynamic operations, but also a suitable choice forthe proximal parameters µ` in the QP subproblems should be considered. Thisis an interesting subject that deserves further investigation.

References

[BA00] F. Barahona and R. Anbil. The volume algorithm: producing primal solutions witha subgradient method. Math. Program., 87(3, Ser. A):385–399, 2000.

[BC81] E. Balas and N. Christofides. A restricted Lagrangian approach to the travelingsalesman problem. Math. Programming, 21(1):19–46, 1981.

[BdMV05] F. Babonneau, O. du Merle, and J.-P. Vial. Solving large scale linear multicom-modity flow problems with an active set strategy and proximal-accpm. Acceptedfor publication in Operations Research, 2005.

[Bea90] J. E. Beasley. A Lagrangian heuristic for set-covering problems. Naval Res. Logist.,37(1):151–164, 1990.

[BGLS03] J.F. Bonnans, J.Ch. Gilbert, C. Lemarechal, and C. Sagastizabal. Numerical Opti-mization. Theoretical and Practical Aspects. Universitext. Springer-Verlag, Berlin,2003. xiv+423 pp.

[BL02a] A. Belloni and A. Lucena. Improving on the Help and Karp Bound for the STSPvia Lagrangian Relaxation. Working paper, 2002.

[BL02b] A. Belloni and A. Lucena. Lagrangian Heuristics to Linear Ordering. Workingpaper, 2002.

[BMS02] L. Bahiense, N. Maculan, and C. Sagastizabal. The Volume Algorithm revisited.Relation with Bundle Methods. Math. Program., Ser. A, 94(1):41–69, 2002.

[CB83] N. Christofides and J. E. Beasley. Extensions to a Lagrangean relaxation approachfor the capacitated warehouse location problem. European J. Oper. Res., 12(1):19–28, 1983.

[CG59] E. Cheney and A. Goldstein. Newton’s method for convex programming and tcheby-cheff approximations. Numerische Mathematik, 1:253–268, 1959.

[CLdS01] F. Calheiros, A. Lucena, and C. de Sousa. Optimal rectangular partitions. Technicalreport, Laboratorio de Metodos Quantitativos, Departamento de Administracao,Universidade Federal do Rio de Janeiro, Brazil, 2001.


[CR96] T. Christof and G. Reinelt. Combinatorial optimization and small polytopes. Top,4(1):1–64, 1996.

[DM02] E. Dolan and J. More: Benchmarking optimization software with performance pro-files. Math. Program. 91(2, Ser. A), 201–213 (2002)

[EGM94] L. F. Escudero, M. Guignard, and K. Malik. A Lagrangian relax-and-cut approachfor the sequential ordering problem with precedence relationships. Ann. Oper. Res.,50:219–237, 1994. Applications of combinatorial optimization.

[Erm66] Ju. M. Ermol′ev. Method for solving nonlinear extremal problems. Kibernetika(Kiev), 2(4):1–17, 1966.

[FG99] A. Frangioni and G. Gallo. A bundle type dual-ascent approach to linear multi-commodity min-cost flow problems. INFORMS J. Comput., 11(4):370–393, 1999.

[FGRS05] I. Fischer, G. Gruber, F. Rendl, and R. Sotirov. Computational experience with abundle approach for semidefinite cutting plane relaxations of max-cut and equipar-tition. Accepted for pubilcation in Mathematical Programming Series B, 2005.

[FLR05] A. Frangioni, A. Lodi, and G. Rinaldi. New approaches for optimizing over the semi-metric polytope. Accepted for publication in Mathematical Programming, 2005.

[Fra02] A. Frangioni. Generalized bundle methods. SIAM Journal on Optimization,13(1):117–156, 2002.

[Gav85] B. Gavish. Augmented Lagragian Based Algorithms for Centralized Network De-sign. IEEE Transactions on Communication, 33:1247–1257, 1985.

[GJR84] M. Grotschel, M. Junger, and G. Reinelt. A cutting plane algorithm for the linearordering problem. Oper. Res., 32(6):1195–1220, 1984.

[GP79] C. Gonzaga and E. Polak. On constraint dropping schemes and optimality func-tions for a class of outer approximation algorithms. SIAM Journal on Control andOptimization, 17(4):477–497, 1979.

[GPT80] C. Gonzaga, E. Polak, and R. Trahan. An improved algorithm for optimizationproblems with functional inequality constraints. IEEE Transactions on AutomaticControl, 25:49–54, 1980.

[Hel01] C. Helmberg. A cutting plane algorithm for large scale semidefinite relaxations.Technical Report ZIB-Report ZR-01-26, Konrad-Zuse-Zentrum Berlin, 2001. Toappear in the Padberg Festschrift ”The Sharpest Cut”, SIAM.

[HFK01] M. Hunting, U. Faigle, and W. Kern. A Lagrangian relaxation approach to theedge-weighted clique problem. European J. Oper. Res., 131(1):119–131, 2001.

[HUL93] J.-B. Hiriart-Urruty and C. Lemarechal. Convex Analysis and Minimization Algo-rithms. Number 305-306 in Grund. der math. Wiss. Springer-Verlag, 1993. VolumeII.

[HWC74] M. Held, Ph. Wolfe, and H.P. Crowder. Validation of subgradient optimization.Math. Programming, 6:62–88, 1974.

[Kel60] J. E. Kelley. The cutting plane method for solving convex programs. J. Soc. Indust.Appl. Math., 8:703–712, 1960.

[Kiw85] K.C. Kiwiel. Methods of descent for nondifferentiable optimization. Springer-Verlag, Berlin, 1985.

[Kiw91] K.C. Kiwiel. Exact penalty functions in proximal bundle methods for constrainedconvex nondifferentiable minimization. Math. Programming, 52(2, Ser. B):285–302,1991.

[LB96] Abilio Lucena and John E. Beasley. Branch and cut algorithms. In Advancesin linear and integer programming, volume 4 of Oxford Lecture Ser. Math. Appl.,pages 187–221. Oxford Univ. Press, New York, 1996.

[Luc92] A. Lucena. Steiner problem in graphs: Lagrangean relaxation and cutting-planes.COAL Bulletin, 21(2):2–8, 1992.

[MLM00] C. Martinhon, A. Lucena, and N. Maculan. A relax and cut algorithm for thevehicle routing problem. Technical report, Laboratorio de Metodos Quantitativos,Departamento de Administracao, Universidade Federal do Rio de Janeiro, Brazil,2000.

[LOLIB] G. Reinelt. http://www.iwr.uni-heidelberg.de/iwr/comopt/soft/LOLIB/[TSPLIB] G. Reinelt.http://www.iwr.uni-heidelberg.de/groups/comopt/software/TSPLIB95/[RS03] F. Rendl and R. Sotirov. Bounds for the quadratic assignment problem using the

bundle method. In Discrete Optimization: Methods and Applications, pages 287–305. Braunschweig–Kaiserslautern–Kln, 2003.

[Sho85] N. Shor. Minimization methods for non-differentiable functions. Springer-Verlag,Berlin, 1985.

Dynamic Bundle Methods - Fuqua School of Businessabn5/dynbunMP.pdf · 2005-07-12 · Dynamic Bundle...

Documents

Transcript of Dynamic Bundle Methods - Fuqua School of Businessabn5/dynbunMP.pdf · 2005-07-12 · Dynamic Bundle...