ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH...

20
MATHEMATICS OF OPERATIONS RESEARCH Vol, 25. No. 4. November 200(1. pp. 606-624 Printed in U.S.A. ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH BREGMAN DISTANCES ALFREDO IUSEM AND R E N A T O D.C. MONTEIRO TTie use of generalized distances (e.g.. Bregman distances), instead of Ihe Euclidean one, in the proximal point method for convex optimization, allows for elimination of the inequality constraints from the subproblems. In this paper we consider the proximal point method with Bregman dis- tances applied to linearly constrained convex optimization problems, and study the behavior of the dual sequence obtained from the optimal multipliers of the linear constraints of each subproblcm. Under rather general assumptions, which cover most Bregman distances of interest, we obtain an ergodic convergence result, namely that a sequence of weighted averages of the dual seqtjence converges to a specific point of the dual optimal set. As an intermediate result, we prove under the same assumptions that the dual central path generated by a large class of barriers, including the generalized Bregman distances, converges to the same point. 1. Introduction. The main goal of this paper is to analyze the behavior of dual se- quences generated by generalized proximal point (GPP) algorithms with separable Bregman distances for solving the linearly constrained problem (1) min{f(,x):Ax^b, x>0}, where /: IR" ^-[R is a differentiable convex function, A is an m x « real matrix., & is a real m-vector and the variable .c is a real n-vector. These algorithms generate a sequence [x''} according to the iteration (2) x*+'-argmin{/(x) + AiA>(.v-.v^):/ix-i}. where x'^>0 is arbitrary, {/.*} is a bounded sequence of positive scalars and D^p is the Bregman distance determined by a convex barrier function (/); M'J. —* [R of the form 'P{-^)— Yl%\ ^/(^/) according to (43). This method is a generalization of the classical proximal point method studied in Rockafeliar (1976). A complete study of the behavior of the sequence {;i:*} defined above in a more general setting can be found in Kiwiel (1997). The optimality condition for (2) naturally determines a sequence of dual variables {s''} defined as 5* = /.i[Vi^(x*) - V^(J:*^')], which satisfies the dual condition that .9* e V/(.T*+') + Im^^, but not necessarily s^ >0. A natural question is whether, under appropriate conditions, [s^] converges to the set of dual optimal solutions of (1) or, even stronger, to a specific dual optima! solution of (1). In this paper, we study the related issue of analyzing the behavior of the averaged dual sequence {.?*} constructed from {.V*} as i^*^Xl,=i ^kis'- where the weights 71*, are determined as ^A/ = ^-r'/Z],=i V' for i = l,..,,k. The main result we obtain is that {^*}, under appropriate conditions. Received December 11, 1997; revised September 27, 1999, and June 29, 2000. MSC 2000 subject classification. Primary; 90C25. ORI MS subject classification. Primary: Nonlinear programming. Key words. Generalized proximal point methods, barrier function, /i-center of the optimal set, Bregman distances, convergence of dual sequence, central path. 606 0364-765X/00/2504/606/$05.00 1526-5471 electronic ISSN, © 2000, INFORMS

Transcript of ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH...

Page 1: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

MATHEMATICS OF OPERATIONS RESEARCH

Vol, 25. No. 4. November 200(1. pp. 606-624

Printed in U.S.A.

ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMALPOINT METHOD WITH BREGMAN DISTANCES

ALFREDO IUSEM AND R E N A T O D.C. M O N T E I R O

TTie use of generalized distances (e.g.. Bregman distances), instead of Ihe Euclidean one, in theproximal point method for convex optimization, allows for elimination of the inequality constraintsfrom the subproblems. In this paper we consider the proximal point method with Bregman dis-tances applied to linearly constrained convex optimization problems, and study the behavior of thedual sequence obtained from the optimal multipliers of the linear constraints of each subproblcm.Under rather general assumptions, which cover most Bregman distances of interest, we obtain anergodic convergence result, namely that a sequence of weighted averages of the dual seqtjenceconverges to a specific point of the dual optimal set. As an intermediate result, we prove underthe same assumptions that the dual central path generated by a large class of barriers, includingthe generalized Bregman distances, converges to the same point.

1. Introduction. The main goal of this paper is to analyze the behavior of dual se-quences generated by generalized proximal point (GPP) algorithms with separableBregman distances for solving the linearly constrained problem

(1) min{f(,x):Ax^b, x>0},

where / : IR" ^-[R is a differentiable convex function, A is an m x « real matrix., & is areal m-vector and the variable .c is a real n-vector. These algorithms generate a sequence[x''} according to the iteration

(2) x*+'-argmin{/(x) + AiA>(.v-.v^):/ix-i}.

where x'^>0 is arbitrary, {/.*} is a bounded sequence of positive scalars and D^p isthe Bregman distance determined by a convex barrier function (/); M'J. —* [R of the form'P{-^)— Yl%\ ^/(^/) according to (43). This method is a generalization of the classicalproximal point method studied in Rockafeliar (1976). A complete study of the behaviorof the sequence {;i:*} defined above in a more general setting can be found in Kiwiel(1997).

The optimality condition for (2) naturally determines a sequence of dual variables{s''} defined as 5* = /.i[Vi^(x*) - V ^ ( J : * ^ ' ) ] , which satisfies the dual condition that.9* e V/(.T*+') + Im^^, but not necessarily s^ >0 . A natural question is whether, underappropriate conditions, [s^] converges to the set of dual optimal solutions of (1) or, evenstronger, to a specific dual optima! solution of (1). In this paper, we study the relatedissue of analyzing the behavior of the averaged dual sequence {.?*} constructed from{.V*} as i^*^Xl,=i ^kis'- where the weights 71*, are determined as A/ = ^-r'/Z],=i V 'for i = l,..,,k. The main result we obtain is that {^*}, under appropriate conditions.

Received December 11, 1997; revised September 27, 1999, and June 29, 2000.MSC 2000 subject classification. Primary; 90C25.ORI MS subject classification. Primary: Nonlinear programming.Key words. Generalized proximal point methods, barrier function, /i-center of the optimal set, Bregman distances,convergence of dual sequence, central path.

6060364-765X/00/2504/606/$05.00

1526-5471 electronic ISSN, © 2000, INFORMS

Page 2: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 0 7

converges to a specific dual optimal solution of (I), namely the /(-center of the dualoptimal set with respect to the barrier h^D^i-.x^) (see the paragraph before Proposition8 for the definition of/i-center).

The relevance of this result becomes apparent when A has full row rank: If x' =limA^oo^, s* = limk^^s'', and we define v* G W' as y* =(AA'^)-U[Vf(x*)-s*l thenit is rather immediate that y* is a vector of optimal multipliers for the linear constraintsin (1). In many applications, knowledge of the optimal multipliers is as important as thecomputation of the primal solution. It is worthwhile to mention that when the proximalpoint method with Bregman distances, and more generally with nonquadratic regularizationterms, is applied to the dual problem of a constrained convex optimization problem, itgives rise to a family of smooth augmented Lagrangian methods, which are successful inefficiently solving large-scale convex optimization problems, as shown in recent numericalexperiments (see, e.g., Ben Tal and Zibulevsky 1997). A survey on these augmentedLagrangian methods and its connection with the proximal point method with Bregmandistances can be found in Iusem (1998a).

Partial resuhs regarding the behavior of the dual sequence {5*} have been obtained ina few papers which we now discuss. Most of these results are described in a somewhatdifferent framework, with a (^-divergence d^ix,y) instead of a Bregman distance D,p(x, v)in (2) (see (58) and (59) in §5). For the entropic barrier, which can be seen as either theBregman distance or the 0-divergence induced by the functions of Examples I(a) and 2(a)in §3, respectively, it was proved in Tseng and Bertsekas (1993) that all cluster points ofthe sequence {i"*} are dual optimal solutions. The case of the shifted logarithmic barrier(i.e.. the 0-divergence induced by the function of Example 2(b) in §3) was consideredin Jensen and Polyak (1994), where it was proved that some cluster points of {.?*} aredual optimal solutions. This result was improved upon in Polyak and Teboulle (1997),where it is proved that all cluster points of {.?*} are dual optimal solutions for a largerclass of (^-divergences, but with a rather restrictive assumption, namely log-convexity ofthe conjugate function (y f o r 7 = l , . . . , n (see the paragraph following (60)). These pa-pers deal with the more general case of convex (rather than linear) constraints, but noneof them establish convergence of the whole sequence {5*}. The only result of this typeappears in Powell (1995), where convergence of {5*} to the /j-center of the dual optimalset is proved, but only for linear programming with the shifted logarithmic barrier. Wemention that, up to multiplicative and additive constants, the entropic barrier of Examplesl(a) and 2{a) in §3 is the only one which gives rise both to a Bregman distance and to a0-divergence. so that all the results just mentioned apply essentially only to one Bregmandistance, namely the entropic one.

With the goal of analyzing the behavior of the sequence {s''}, we first study the behaviorof the path of solutions of the following family of problems parametrized by a parameterH>0:

(3) min{f(x) + /iZ)^(x.jc' ) : A x = b } .

With this family of problems we associate a path {.T(/0^''(/0} where xin) is the solutionof (3) and .s(/0— ~ / 'V/ I ( J ; ( /O) - The motivation for analyzing this path is that in the caseof linear programming, as we discuss in §5. it happens that the averaged sequence {5*}mentioned above belongs to the path; more precisely it holds that s''=s{fiii) for somesequence {fi/;} of positive scalars converging to 0. When the objective is nonlinear thesequence is not contained in the path, but it approaches it asymptotically. Thus, we takea detour in §2 making an analysis of this path, putting together the required tools forestablishing the convergence behavior of the sequence {i*}.

The analysis of the primal path {x(ft)} has been systematically studied in the paper byIusem et al. (1999). The resuhs in Iusem et al. (1999) are in turn related to the work

Page 3: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 0 8 A. tUSEM AND R. D. C, MONTEIRO

of Auslender et al. (1997) where existence, uniqueness, and convergence of this path ofsolutions and an associated dual path are established, under appropriate conditions, forbarriers with different properties than those of £> ( ,;c'). Extending the work of Iusem etal. (1999) from the dual point of view, we study the existence and convergence of thedual path of solutions associated with the above family of problems in §2. Our analysisin this part uses several ideas from Auslender et al. (1997), which in turn generalizesseveral works dealing with the convergence behavior ofthe central path and the continuoustrajectories of various interior-point algorithms for linear and convex programming (e.g.,see McLinden 1980a,b; Megiddo 1989; Kojima et al. 1990; AdIer and Monteiro 1991;Monteiro 1991, 1992; Monteiro and Zhou 1998).

Our paper is organized as follows. In §2 we study the behavior of the associated dualpath of solutions of the family of problems (3) and develop a convergence result fordual sequences that asymptotically behave like the dual path. Using this asymptotic resultand the fact that {5*} asymptotically approaches points of the dual path for (3), weestablish in §4 the convergence of {5*} to a unique dual optimal solution of (1). In §3,we give several examples of well-known barriers that satisfy the required assumptionsof the convergence results developed in §§2 and 4. We end the paper by giving someremarks and open problems in §5.

1.1. Notation and terminology. The following notation is used throughout the paper.The superscript ^ denotes transpose. Let U^ denote the /^-dimensional Euclidean spaceand define Ui = {xeUP:xj>0, J=\,...,p} and U^^ = {xeUP:xj>0, j ^ \ . . . . , p } .The set of all pxq matrices with real entries is denoted by U^^''. If J is a finite indexset, then \J\ denotes its cardinality, that is the number of elements of J . The Euclideannorm is denoted by || • ||. For a matrix E, imE denotes the subspace generated by thecolumns of E and Null E denotes the subspace orthogonal to the rows of E. The /thcomponent of a vector we U" is denoted by H', for every i— \,...,n. Given an index setJ C { l , . . . , n } and a vector weW , we denote the subvector [Wij/gy by wj; conversely,a vector XGM'"^' is often denoted by xj (when we want to index its components byelements of 7) and the set of these indexed vectors is denoted by U-'. For Y c U", we letint Y, riY and cl(K) denote, respectively, the interior, relative interior and closure of Y.As in Hiriart-Urruty and Lemarechal (1993). we let Conv(lf?'') denote the set of properconvex functions defined on U", that is the set of functions ,^: R" —* R u {00} such thatits epigraph epig~ {(x,r)e IR" x IR: g(x) < r} is nonempty and convex. We let Conv(IR'')denote the subset of Conv([R") consisting ofthe closed functions. Given g eCony(U"),we denote its effective domain by dom g and its subdifferential by dg; moreover, we define

and Imidg)^ {J{dg(x):xeU''}.

2. The dual central path associated with a class of barriers. We consider the linearlyconstrained convex programming problem

(4) min{f(x):Ax = b, x>0},

with / : R" M convex and differentiable, A e U""'",b e W". We make two assumptionson Problem (4), whose solution set will be denoted as X*.

(A2) S^'^ ,^Associated with Problem (4), we have the Lagrangian dual problem

(5) max{tl/is):s > 0},

where i/ : R"-* IRu{-oc} is defined as ij/(s)= inf{fix)~x^s:Ax = b} foralUeR". Un-der Condition (Al), it follows from Theorem VII.4.5.1 of Hiriart-Urruty and Lemarechal

Page 4: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 0 9

(1993) that the set of optimal solutions of (5), which we denote by 5*, is a nonemptypolyhedral set. namely

(6) 5*-{5e[R"+ :seVf{x) + lmA'^, x'^s^O},

where .x is an arbitrary element of A"'. Moreover, it is known that S' is bounded when.in addition, (A2) holds (see Theorem VII.2.3.2 of Hiriart-Urruty and Lemarechal 1993),

We consider separable barrier fiinctions h for the nonnegative orthant of the formh{x) = Z%^hj{Xj), where:

(A3) hjeConv{U) is such that int (dom(/i;))^ R + 4- and hj is strictly convex anddifferentiable on IR + ^, for every j — ! . . . . , « ;

(A4) lim,|o''/(0— - c o for every 7 = l,... ,n;(A5) there exists i e . ^ " such that Vh(x) = O.We aiso need an assumption involving either Problem (4) or the barrier function h:(A6) either

(i) \\m,i(,hjit)<oo for y — 1.....K. or(ii) X* is bounded, or

(iii) / is linear.In connection with (A5), we mention that given a barrier (p which satisfies the

remaining conditions and a point .v€i^", it is easy to construct a barrier h satisfyingall the conditions above (see, for example. Relation (39) in §3). Such an x can be foundeither through a standard modification of the problem as it is customary in the initializa-tion of primal interior point methods (see, e.g., AdIer et al. 1989, or by performing a firststep ofthe algorithm (2) with a zone coercive barrier h (i.e., such that \im,i(ihj(t)— - c » ,\im,^r^hj(t]—oc), which need not satisfy (A5), and then takingi — x\ The solution x'of such a first subproblem exists by Proposition 2 of Iusem et al. (1999). In connectionwith (A6), we mention that condition (A6)(i) means that h restricted to IR" is continuous.

The central path [x(p):i.i>0} with respect to the barrier h is defined as follows. For/j>0, let

r(7) x{ii) = argmin{/(x) + ^h{x):Ax = b} = argmin< f{x)-\-

{We next present some properties ofthe central path established in Tusem et al. (1999).In this reference, the central path is defined for a general variational inequality problemwith monotone operator T:W^^{U") and constraint set CCU" as being the path{jc(/i):/i>0} where, for every / i>0, x(/i) is the unique solution of

(8) Q£nx)-^iiVh{x\

with h having the property that Vh "diverges" on the boundary of C {h is not requiredto be separable). Problem (3) is a special case of this framework in which the pair(r.C) is given by C = U\ and T{x) = Vf{x) + dhix), where L^{xeU"-.Ax^b] andIi is the indicator function of L, i.e., /Z.(J:) = O if xeL, and /£(x) = oo otherwise. Itcan be verified that the hypotheses on /i, T, and C made in the results from Iusemet al. (1999) quoted in the next proposition hold under the corresponding assumptionsfor each item. We need some notation first. We say that .r is a cluster point of thecentral path {x{}i)\ii>Q] ifjc^ timA_^jc(^i) for some sequence {MA}CR + ^ such thatl i m ; _ ^ ^ i = O . Let B={j:x*>Q for some x* eX'] and N ^{\,...,n)\B.

PROPOSITION 1.

(i) Under Conditions (AI)-(A5), x{p.) exists, is unique and strictly positive for every

Page 5: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 1 0 A. iUSEM AND R. D. C. MONTEIRO

(ii) Under Conditions {A\)-{A6), the central path {x(ii):i.i>0} is hounded and allits cluster points belongs to X*.

{iii) / / (A1)-(A5) and (A6)(i) holds, then Iim^,^o4/O-^*, where x* =

(iv) If (A1)-(A5) anil (A6)(iii) holds, then \im,,^Qx(ix)=x*, where x* =

PROOF. See the following results of Iusem et al. (1999): Proposition 2 for (i). Propo-sitions 4 and 5 for (ii), Theorem 1 for (iii), and Theorem 2 for (iv). Similar resultsunder slightly different assumptions on the barrier h can be found in Proposition 2.5 ofAuslender et al. (1997). Regarding the connection between our assumptions and those ofIusem et ai. (1999), we mention that Assumption H13 of this reference holds becauseT is the subdifferential of the convex function f + It, which is bounded below by Al(e.g., the comment after H13 in Iusem et aL 1999, or Proposition 3.1 of Burachik 1995)and that our Assumption (A6) substitutes for H14 in Proposition 4 of Iusem et al.(1999). n

We remark that we will not use Items (iii) and (iv) of the previous proposition in ouranalysis. They are mentioned only to illustrate situations in which Iim,|^o^(/O can bespecifically characterized.

The recession cone of a closed convex set C C IR" is the set C^ = {d eU" :x ^td eCfor all j '>0 and xeC}. The following result summarizes some useful facts from convexanalysis that will be used in our presentation.

PROPOSITION 2.

(i) If C C U" is a closed eonvex set, then C is bounded if and only if Coo — {0}.(ii) If C and D are two closed convex sub.sets of U" such that C r\ D^% then

(iii) IfCcU" is convex, xeclC and x' eriC then {l-t)x+tx' eriC for all te(0,1).(iv) If geConviW) then. g(x) = \im, ^ o+g(x + tix' - x)) for all xeU" and x'e

ri(domg),(v) If g e Conv(U") then, sedg{x) if and only ifxedg*{s); in particular, lm{dg) —

dom(dg*) and(vi) Let iJ/eConv{U) be strictly convex and differentiable on I = int(dom ij/). Then,

il/* e Conv(U) is strictly convex and differentiable on /* = int(domt^*). Moreover, i{/'maps I homeomorphically onto I* and (i/'')~'('S) — (</'*)'(?) ,/or all sel'.

(vii) If CcU" is a convex set such that riC^C and gk'.C^U, k^\,2,.,,, is asequence of convex functions converging point-wise to g:C ~^U, then g is convex and{gif} converges uniformly to g over any compact subset K of C,

(viii) Ifx e U" is a solution of the problem min{g(x):xe C} with g e Conv{U"), C CU" closed and polyhedral, and ri{domg)r\C^% then there exists uedg{x) such thatu'^{x-x)>Ofor all xeC.

PROOF. The first six items follow from results in Hiriart-Urruty and Lemarechal (1993).,as follows: (i) from Proposition III.2.2.3, (ii) from Proposition 111.2.2.5, (iii) fromLemma II1.2.I.6 and (iv) from Proposition IV.1.2.5, (v) from Corollary X.1.4.4, (vi) fromProposition 1.6.2.1, Item (v) and some simple arguments. Item (vii) and (viii) follow fromTheorem 10.8 and Theorem 27.4 and Rockafeliar (1970), respectively.

We introduce next the dual central path for (4) associated with the barrier h. For everydefine s{fi) e M" as

(9)

Page 6: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 1 1

The Optimality condition for x(fi) to be a solution of (7) is that Ax(^) = b and

(10) si^)

The path {s{n):fi>0} will be called the dual central path with respect to the barrier h.We are interested in the behavior of s{fi) as n goes to 0. We start by characterizing s(^)as the solution of a convex optimization problem. Let h^^:W ^ UU {oo} be the functiondefined by

where /i* and h' denote the convex conjugate of h and hj, respectively. The second

expression in this definition is due to the fact that /i '(i ')= Yl%]^ji^j) for all veU".

The following result characterizes the domain of ^;,.

PROPOsmON 3. Eor some scalars y \ , — yn^CO.oo] , we have

-i-^y\.oo) x ••• x (-fiy„,oo).

PROOF. By (11), it is sufficient to show that in{{domhj)^dom(dhj) = {~co,yj) for allj = l , . . . ,n, where the -//S are as stated above. Indeed, let y / ^ lim,_.oo^}(0' and note thatthis limit is well defined (possibly equal to +00) due to the fact that, by {A3), h'j is strictlyincreasing. Using (A3) again and (A4) we conclude that \m(dhi) = {hj)(U++) = (-oo,yj).Hence, using Proposition 2{v). we conclude that dom{dhj) = lm{dhj) = (—oo,yj). Also, byProposition 2(vi), we have int(don]h') = ih'j)(iMidomhj)) = (hj)(U++) = (-oc,yj). By(A3) and (A5), we have yj>h'j{Xj -t- I )>/ i ) ( iy)^0, where x is as in {A5). Hence, theresuh follows. G

Next we present the optimization problem whose solution is s{p.).

PROPOSITION 4. Take any xe{xeW •.Ax = b]. Then s{n) belongs to lnt{domh^,) andis the unique solution of

(12) minjx'".? + p}i^X.s).seVf{x{ii)) + Im^'"}.

PROOF. Observe that by {10), s{^i) is a feasible solution of (12). Using (9), (11), andProposition 2(vi) applied to /i,, we obtain

(13) V^^

Hence, by Proposition 3, we have i ( / i )Sdom(%) = int(dom^,,)- Using (7) and (13), weobtain

(14) x + ^Vh^,{s{p))^x-x{n)eHw\\A,

which shows that .v(/i) satisfies the optimality conditions of (12). Assumption (A3) andProposition 2(vi) imply that /i*, for all j= I n, and hence h^^, is strictly convex. Asa consequence, it follows that s{ii) is the unique optimal solution of problem (12). D

Next we prove that the dual central path {.s(|0:/'>0} is bounded. We need first apreliminary result of some interest on its own, which requires some notation. Let ^denote the set of all triplets P = {I,J,K) where 1J,K are pairwise disjoints subsets of{l , . . . ,n} satisfying/UJU A: ={1 n). For aSR" and P G J ^ define qf C R" as

(15) O,^ ^{xeU"-.xj>aj for jeL Xj = aj for jeJ, xj<aj for jeK}.

Page 7: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 1 2 A. IUSEM AND R. D. C. MONTEIRO

For r C R" and a € U", let

(16) BaiY)=[J{c\(0f)nY:Pe,9 such that c\(Of)r\Y is bounded}.

For an arbitrary YcU", the set B^(y) may be empty. As a consequence of Lemmas 1and 2 stated below, it follows that B^iY) is nonempty whenever Y is an affine manifold,or, more generally, a certain set of parallel affine manifolds. Observe that Bo(F) is alwaysbounded since it is the union of a finite number of bounded sets.

LEMMA 1. Lei vectors a.ueU", and a linear subspace HcR" be given. Supposethat geConv{U") is of the form g(w)= Y." = ] yji^jl Assume also that ri{dom g)r\{H + u)^^ and, for each j =^ l,...,n, gj is strictly convex and Oedgj(aj). Then, theproblem

mm{g(w).weH + u}

has a unique solution w which belongs to Ba(H + u).

PROOF. Observe that a is the unique unconstrained minimizer of g since Oedgia)^dgi(tJi) X ••• X dg^ia^) and each (/,, and hence g, is strictly convex. Since yeCom(U"),it follows that all nonempty level sets of g are compact. The assumption implies thatdomgn{H + «) ^ 0, which together with the compactness of the level sets of g andthe closedness of H + u guarantees that (17) has a solution weH + u. Moreover, vv isunique due to the strict convexity of g. The assumption that ri(domg) n (// + «) ^ 0 andProposition 2(viii) imply that w satisfies the optimality condition that for some ^^^^^{vv),

(18) fd = O foraUdeH.

We now show that weB^(H + u). Since IR''= [Jpe^Of. there exists P G ; ^ such thatweOfn(H + u). We claim that the set c\(Of)il(H + u) is bounded. Indeed, assume thatthis set is unbounded. Since c]{Of)r\(H + «) is closed and convex, we mayapply Proposition 2(i) to conclude that there exists 0 j^ d e U" such that the halfline{w + td:t>0} is contained in c\(Of)n{H + u). In particular, the halfline is containedin the affine manifold / / + w, and thus J belongs to the subspace H. We claim now that

(19) dj>O^wj>aj,

(20) dj<O^wj<aj.

Indeed, assume that j satisfies dj>0. Then, we have w, +tdj>aj for large enough /.Since w + td ec\iOf)=: {x eU" -.Xj >aj forye/, xj=aj for JeJ, xj<aj for j^K], weconclude that j e /. Because w e Of and the definition of Of, it follows that vv, >a,. Thesame argument applies to an index J such that dj<0, in which case J belongs to K andthus wi<aj. We have thus established the above claim.

Noting that dgj is strictly monotone due to the strict convexity of gi and using the factthat Oedgjiaj) and qj^dgjiwj), we conclude that

(21) ^j>^j = ;>0'

(22) Wj<aj =i- qj<0.

It follows from the imphcations (19)-(22) that dj^O => qjdj>0. Since d^O. this impliesthat fd>Q. On the other hand, since d^H it follows from (18) that q^d^O. thusyielding the desired contradiction. We have thus shown that c\(Of)n(H + u) is bounded,and hence that wG B^(H + u). D

Page 8: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 1 3

We observe that the above result for the case in which g{w)= ||^^||^, for some positivediagonal matrix D. has already been derived in Todd (1990) (see the paragraph beforeProposition 2 of Todd 1990).

The following result allows us to show that the solution w of problem (17) remainsbounded when the (parameter) vector u varies in a bounded set.

LEMMA 2. If U CU" is a bounded set and H CU" is a linear subspace then, for anyae R", we have

PROOF. We claim that if /* e ^ and u G t/ are such that cl(O/) n (// + u) is nonemptyand bounded then c l (O/ )n ( / / + U) is bounded (and nonempty). For the proof of thisclaim, we may assume that t/ is a closed convex set since otherwise we may considerthe closed convex hull of t/, which is also a bounded set. Using the fact that U isbounded, we easily see that (H + U)^=H^(H+ u)^. Hence, by Proposition 2(ii), weconclude that the recession cones of c l (O/ )n ( / / + M) and c\(Of)n{H + U) are identical.It follows from Proposition 2(i) that these recession cones are equal to {0}, and hencethat d{Of)n{H + U)is bounded.

Using the above claim, we now prove the lemma. Assume first that w e BQ(// + U) forsomeu€t / . Then, there exists P e , ^ such that w e c l ( O / ) n ( / / + w ) and c l (q f )n ( / / + H)is bounded. By the claim, c I (C/ )n ( / / + C/) is also bounded and since w is obviously inthis set, it follows that w e B«(// + (/). This proves that U^et/ B^(// 4- u) C B^(// + U).Assume now that weBaiH + U). Then, there exists F e . ^ such that wec\(Of)n{H + U)and c l (O/ )n ( / / + U) is bounded, it follows that wecl{Of)n(H + u) for some u£U.The set c\(Of)r\{H+ u) is obviously bounded since it is contained in c\{Of)n{H + U).Thus. w^BJH + u) and the inclusion \J^^^:Ba{H + U)DB^{H + U) follows. ~\

Next we use Lemmas 1 and 2 to establish boundedness of the dual central path

{sin)}.

PROPOSITION 5. Under Conditions (Al) (A6) the curve {s(i.i):ii>0} defined by (9) isbounded.

PROOF. We will cast the optimization problem (12) in the framework of Lemma 1.

Take H = lmA^,u = Vf(xin)), a = 0 and g(s) = f2"- \ Sji^jl where gj{t)=xjt +^hj{-t/fi) for every / e IR and x is as in (A5). By (11) and Proposition 4 w i t h x ^ i , sif.i)is the solution w of problem (17) for this choice of t/, H and u. We now check thatthe hypotheses of Lemma 1 hold. Assumption (A3) and Proposition 2(vi) imply that h*,and hence QJ, is strictly convex. The closedness of g follows from the closedness of theconjugate functions h'. By Proposition 4, i( ; j)e int(dom^,,)n(Im^^ + V/(4/ i ))) =int(domi/)n(// + «). Finally, we check that Oedgj(aj), or equivatently g'j(O) = O, forj=\,...\n. Note that g'j(t) = xj - (h'j)'(-t/^iX so that g'j(O) = xj - (/i;)'(0). Using(A5) and Proposition 2(vi). we conclude that gj(0} = 0. Since all the assumptionsof Lemma I hold, it follows that .?( /J)GBO(// +U) = BO{H + V/(x(^))) for all ^ > 0 .Now, let U={Vf{x{ii)):n>0}. Note that the set {x{ft):fi>0} is bounded by Proposi-tion l(ii). Together with the convexity and differentiability o f / , whose eff"ective domainis R", this implies that U is bounded. Hence, by Lemma 2 we conclude that s(n)eBo(// + V/(jc(/i)))CBo(// + U) for all / i>0. Since Bo(// + U) is bounded, the resultfollows. D

A point seU" will be said to be a cluster point of {s(fi)} if s^ limft_co-s(w) forsome {/ift}c[R++ such that limA_ooW=0. We prove next that the cluster points of

Page 9: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 1 4 A. iUSEM AND R. D. C. MONTEIRO

^ which exist by Proposition 5, are dual solutions of Problem (4), i.e. solutions ofProblem (5).

PROPOSITION 6. Under Conditions (A1)-(A6), ifs is a cluster point of {s(p)} then sbelongs to S*.

PROOF. Tn view of (6), it suffices to show that

(23) §e imA^ + V/(x), x^s^O. s > 0.

for some xeX*. Assume that s= \imk^ocs{i.ik) with \imk^^fik=O. In view ofProposition l{ii), we can assume without loss of generality (by refining the sequence{/iA-} if necessary) that x= l im t_^4 ; ( i ) exists. By Proposition l(ii), xeX'. By (10),we have s(pk)eVf(x(Hk))+ lmA^ for all k. Letting ^-^oo in this relation, and usingthe fact that Im/I^ is a closed set and the gradient of a convex and differentiable functionis continuous in the interior of its effective domain, which in the case of / is the wholeW, we obtain the first relation of (23). To verify the second and third relations of (23),it suffices to show that sj = 0 whenever xj > 0, and i} > 0 whenever xj = 0. Indeed, by (9)we have

(24) sifi, )j - -^kh'j{x{iik),). 7 = 1 , . . . , « .

When x,>0, (24) implies that Sj^ \imk^^s{Hk)j^Q due to the fact that x(/j^), con-verges to i ; > 0, h'j{t) is continuous for ali />Oand limj^co/^A =^0. When Jc ^ 0, we have\\mk^^x{^k)j^Q. and hence \\mk^ooh',{x{^k)j)^-oo by (A4). By (24), this impliesthat s(i.ik)/ > 0 for large enough k, and hence Sj ^ lim^^^ sUik), > 0. D

We have established that all cluster points of {s{}.ik)} are dual optimal solutions. Wewould like to characterize them in a way similar to the primal result in Propositionl(iii)-(iv). For this purpose, we need to impose a slightly convoluted assumption on h.We wili show later on that this assumption holds in most significant cases. Next we givea heuristic and somewhat informal justification for this assumption. We know that s{p.)is the solution of the optimization problem given by (12). In order to characterize thecluster points of {^(ij)}, one would like to say that they are minimizers of the functionobtained as thelimit, as }x goes to 0, ofthe objective function of (12). The problem isthat the term ^hf^s) in the objective of (12) does not have to converge, in general, to afunction of s as JJ. goes to 0. Nevertheless, in most specific examples it is possible to finda transfonnation of h,,, denoted as p, which preserves the minimization property of sip.)and which converges, as ^i goes to 0, to a specific convex function of .y. say u. Whensuch a transformation exists, the cluster points of s{n) can be characterized as minimizersof fj, and, when G is strictly convex, such a minimizer is unique and the whole path{.?(/()} converges to such a unique minimizer. Existence of such a p seems an ad hocassumption, but since it does exist in most specific examples, introduction of assumption(A7) seems justified by reasons of practicality: It allows a unique proof of convergenceof {s{p)] for a whole class of barriers h (see examples in §3). The formal introductionofthe assumption requires some notation, which we present next. Let A " c { l , . . . , « } bedefined as

(25) N' ^{j:sj>{) for some 5* e 5*},

(26) B' ={1,.. . ,«}\;V'.

We remark that for linear / , i.e., in the linear programming case, it holds that N'^Nand 5 ' = 5, with B and A' as defined just before Proposition 1 (see Tucker 1956). Thisis not true for nonlinear / . For instance, the problem min/(xi,;c2)=xf s.t. X2= 1, je>0has ^ ' - { ( 0 , 1 ) } and 5" -{ (0 ,0 )} , and hence/V^j l} and A^'-0. Clearly,

Page 10: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 1 5

B' D B, Since, by defitiition, Sj = 0 for all / e B' and seS*, it follows from Propositions 5and 6 that Conditions (A1)-(A6) imply that lim,,^o^(/')> = 0 for al lyefi ' . In particular,if 7V' = 0 then lim^.^o «(/ ) = 0 and 5 ' - { 0 } . The int_eresting case is therefore N'^%which we will analyse in the next proposition. Define ^ : U-^ ^ RU {oo} hy

(27) ^'(^^)-E

The required assumption is stated next.(A7) There exist an interval .r<zU and a function p;.r xU^ i U such that(i) p(-,^) is nondecreasing on .^ for all ;i_>0;(ii) for every fx>Q and JcN', we have hfX^i+)C.T and the function p(hf^(-),^):

1R++ —* IR is convex;(iii) for every J C A'', there exists oj G ConvCIR-'), such that dom(Gj) D iR++ and <Tjisj)

f ) for all i.;>0.We mention that, in view of (i), a sufficient condition for (ii) is that p(-,fi) be convex

for all /j>0. However, in some relevant examples p(\n) is not convex, while (ii) holds.Under (A7), we have the following characterization of the cluster points of {s(ii)}.

PROPOSITION 7. As.sume that Conditions (Al) (A7) hAd and N' ^%. Then, all clusterpoints of {.v(/()} "'•^ solutions of the problem vd\n{a!^-{sM'):seS''}, with a^- as in

PROOF. Let i be a cluster point of {s{fi)}, that is s= hmk^oos(fik) for some sequence{/ii}clR++ such that limi^oo AA ^ 0 - By Proposition 6, we know that seS*. Choose{ / } +any seriS* y^^il and AG(O, 1), and define s'' s(iik) + X(s - s) and .f* - sy y

for all k. Then lim^^oc j ' * =s-\'X{s - .s) and Wmk^aos''=s. Let xeX* be given. Then,s,seVfix)^lmA^ by (6), from which it follows that s-selmA^, This, together with(10), implies that s~^ eVfix(i.ik)) + \mA^ for all k. Using this. Proposition 4 with x=x,the convexity of the objective function in (12) and the fact that .v* = (1 - *we conclude that

(28) ^^5* + Hkk.is'') '^ '^' ^

Since xeX' and s,seS', we have x'^s^x^s^d by (6), from which it follows thati^5* ^Jc^f* for all k. Using this in (28), we conclude that

(29) > h'\—^\^K,(.r)<V(s")--

Moreover, since SB'^SB-=^^ we have 4'^4' *' '" ' ^- ^ ^ ^ together with the lastinequality imply that

(30) ^ '(4)-E//l )^E/

Using the definition of A ' and the fact that .y^ri^*, we easily see that s^j- >0. Hence,Iimi_oo5;{/>0 atid limt_.oo5}J'>0, from which it follows that, for some k>ko, sj^,>0and i>}/>0 for all k>kQ, By (A7)(i) and (ii) and (30). we obtain

(31) p(h^. is^'X Hk) < p(h^, (SN' h fik h VA > ko.

Page 11: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 1 6 A. IUSEM AND R. D. C. MONTEIRO

By (A7)(ii) and (A7){iii). the fiinctions p(h!!^'(-l}it) restricted to the set Uf^_ are finite,convex, and converge pointwise to a^'. By Proposition 2(vii), such convergence is uni-form in any compact set of U^'^. This together with the fact that \imk^^s/^> =Sf^' +A(.9/v" - i V ' ) > 0 and \imk-.fxs/j. =s^'>0 allow us to conclude, after letting k tend to ooin (31), that

(32) ffW'(5iV' -I- /.(Sfj' - 5 A / ' ) ) < Ufj'isN').

Since (32) hold for any /G(0,1) and senS\ and a^' is a closed convex function by(A7)(iii), it follows from Proposition 2(iv) that

(33) (T^,{s^,)= limff^,(i,v' +A(sf^' - % ' ) ) < TW'('SA")' Vseri^*.

Since any boundary point of 5* can be approached by points in ri S* lying on a segmentby Proposition 2(iii), it follows again from Proposition 2(iv) that (33) holds for any

ses*. nThe next corollary gives the characterization of the limit of the dual path

COROLLARY 1. Assume that Conditions (A1)-(A7) hold and N' ^11). If the function(T ' of (Al)(in) is strictly convex on S*, then lim^,_ooJ(/i) exists and is the uniquesolution s* of the problem {

We will see in the next section that in all but one of the examples, it is possible tofind p such that ffv is strictly convex on S*. In the other case (perhaps the most relevantone) (T/v' is convex but not strictly convex. For this case, the convergence of {s(fi)} canbe proved by using a refinement of Proposition 7 which we discuss next.

When (T.v' is not strictly convex on 5*, the problem min{fT^'(5,v'):>yG5*} may havemultiple solutions. Eet ^i denote the optimal solution set of this problem and define the in-dex set A'l = {j e A": Sj is not constant on ^ i} . Consider now the problem minfav, (%,):5e5 t} . Let 52 denote its optimal solution set and define the index set N2 = {jeN[ .Sj isnot constant on ^2}. Continuing in this way, we obtain a sequence of sets S*=SoOSiD52 D and a sequence of index sets N' ^NQDNIDNZD • • • .The result stated belowimposes the following condition on these sequences.

(A8) There exists r > 0 such that Sr {.v'j for some 5' e U" (and hence A' = 0).Note that Condition (A8) holds if and only if the sequence N'-NODNIDN2D •••

is strictly decreasing, i.e., when at least one variable Si with ieN/^] is constant on 5,,for / ^ 1,2,... . The point s" is referred to as the h-center of 5* with respect to thebarrier h. In principle, it is not clear that 5" depends only on h since its definition is givenin terms of the function (T.VS which in turn may not be uniquely detennined. However,the following proposition shows that $'' depends only on the barrier h, and hence justifiesthe terminology A-center.

PROPOSITION 8. Assume that Conditions (Al)-(A8) hold and N'^t Then, the dualpath {s{ii)] converges to the h-center s^.

PROOF. The proof is similar to the one of Proposition 7 except for a few minor pointswhich we now discuss. Again, let s be as before. To show that s^s^, it is sufficientto prove that .fe5, for every / - l , . . . , r , due to Condition (A8). Clearly, by Propo-sition 7, we have s^S^. Assume that seS-, and i<r. We will show that .?e5,M- Clearly,this implies that 5^5^ ^{5^}, and hence that the proposition holds. Indeed, let 5Gri5,be given and fix A €(0.1). Define the sequences {i*} and {5*} as in the proof ofProposition 7. Arguing as in that proof, we easily see that (29) holds. Now since s,seSi,we have 5, =5, for every i^Ni, due to the definitions of the sets A s. This implies that

Page 12: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 617

s!'=s't for every k and i^M- Hence, from (29) we deduce that (30) holds with N' re-placed by Nj, The rest of the proof follows exactly as in Proposition 7 with N' and S'replaced by M and 5,, respectively, and yields the conclusion that s solves the problemmin{(TA;;{.VAf,):-5e5',}, i.e., seSj+], D

We observe that a convergence result similar to Proposition 8 has been derived inAuslender et al. (1997) under different assumptions on the barrier h and for the case oflinear objective function / .

We will now provide a variation of the results stated above which will be used in §4 toatialyze the behavior of dual sequences generated by generalized proximal point methodswith Bregman distances.

PROPOSITION 9. Suppose that Conditions (A1)-(A5) hold and let {x''}c{xeW;Ax = b, x>0], {5*}clR",{/}clR" and {nk}cU+^ be sequences such that

(34) s'' -\-HiVh{x'') = O, 5*€/+Im/ i^ V > 0.

Assume also that {;c*} is bounded, limt^oo fik = O and Umt-oo [ff* - V/(x*)] = 0. Then,(a) for any xe^ sueh that Ax = b and k>0, / is the unique optimal solution of

the problem

(35) min x''^s + }ikhf,^{s)

(36) s.t. seg'^ + lmA^;

(b) any cluster point of {JC*} is a solution of (4);(e) the sequence {J*} is bounded and all its cluster points are contained in S*\(d) if, in addition. Conditions (A7) and (A8) hold then {5*} converges to the h-eenter

s*^ of S* with respect to h.

PROOF. The proof of (a) is similar to the proof of Proposition 4. The boundedness of[s''] can be proved with the same arguments as the proof of Proposition 5 using the setU = {g'' :k>0}, which is clearly bounded due to the assumption that {x*} is boundedand limi_oc[ff*-V/(:c*)] = 0. Assume now that (x,s) is a cluster point of the sequence{(x*,5*)}. Using the fact that limA_,^ [ / - V / ( J : * ) ] = O and arguments similar to thoseused in the proof of Proposition 6, it can be shown that

(37) Ax = b, x>0,

(38) seimA'^ -K V/(jc), x^s^O, .s > 0.

This clearly implies that {x,s)eX*xS*. It is now easy to see that (b) and (c) follow fromthe observations above. The proof of (d) is exactly like the one of Proposition 8. D

It is worth emphasizing that Proposition 9 does not assume Condition (A6). Instead, itexplicitly assumes that {x*} is bounded and lim^^oc [3*-V/(x*)] ^ 0 .

Observe that for any sequence {/iAJClR^.,. such that limt^oo/^* = 0 , the hypothesesof Proposition 9 are satisfied when the sequences {x*}, {5*} and {3*} are given byJ : * ^ X ( / U ) , s''=sipk) and g'' = Vf(x'') for all k. Conclusions (a), (c), and (d) of Propo-sition 9 for this special case are analogous to the ones obtained earlier in Propositions4, 5, 6, and 8; moreover. Conclusion (b) yields an alternative proof of the second partof Proposition l(ii) (assuming that its first part is known).

3. Examples of barriers. In this section we give several examples of barriers thatsatisfy Conditions (A3)-(A5) of §2. We consider two types of barriers, called Bregmantype and divergence type, respectively. For both types, we take a fixed x>0 such thatAx^b,

Page 13: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 1 8 A. IUSEM AND R. D. C. MONTEIRO

We remark that, wheti considered as barriers, i.e., as functions of a single variable in^++y there is tio essential difference between both types of barriers. On the other hand,the proximal point method with Bregman distances and 0-divergences (i.e., (2) and (58),respectively) are rather different from the point of view of their analysis, because in thesemethods the properties of D^(-,) and d^{-,-) as functions of two variables in U^^ areinvolved and quite different from each other (see Kiwiel 1997, Polyak and Teboulle 1997).For this reason only we have grouped our examples of barriers into these two types.

Bregman type barriers are of the form

(39) hix) = q)(x) - (p(x) - {V<p(x),x - x),

where (p(x)= Y,k=] A^j) and each (pj e Conv (U) satisfies Conditions (A3) and (A4).Neglecting constant terms, this is equivalent to h(x)= E"=i^U/ ) with hj(t)^(pj(t)-<pj(xj)l. It follows easily that /iJ(x,) = O and that Conditions (A3)-(A5) hold. In this casethe function h-^ of (27) takes the form

(40) hl{sj)=Y,(p;

The divergence type barriers are of the form

n

(41) h(x)^y

where each (p;€Conv(IR) satisfies the Conditions (A3), (A4), and (pj{\) = (p^{\)^Q. Inthe notation ofthe previous section, we have hi(t)=xj(pj(t/Xj). Again, it is easy to checkthat Conditions (A3)-(A5) hold for this type of barrier. The function h^ of (27) takesthe form

(42) hi{sj) =

Proposition 4 holds for the dual central path of any barrier of either type. To estab-lish convergence of the whole dual central path, i.e.. Proposition 8, we need to checkConditions (A6)-(A8). We will do this for several examples of each type. In each casewe give the expressions of (pj, cp*, A; , .r, p, and Oj. For simplicity, we only define thefunctions over their effective domains.

( 1 ) B r e g m a n type .( a ) Let <p/(/) = / l o g / for al l t > 0 . T h e n , (pj'(t) = e'~^ for all teU a n d h-^{sj) =

Z];6jXje~' '' for all sj^U-'. Define ^ — U\, p(t,fi)^ t'' and ajisj) ^j{e~'>} for all sj e R .

(b) Let (pj{t)=-\ogt for all />0. Then, (p''(t) = -\os(-t) - 1 for all /<0 and

for all sj = (Sj)j^j € ^ such that sj > - fi/ij for every ; eJ. Define ^ -r + |y|(l-log/z)and (7jisj) = ~ J2j^j logs/for all sjeUi^.

Page 14: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 1 9

(c)Let (pj(t) = t-t'' f o r a l l />0 , where O</?<1. Then, (pj-(t) = {\-{}){]-for all t < (i, and

for all sj^{sj)ja € ^ such that sj>-fi^xf~^ for every y € J . Define ^ =U+, p{t,ii) =

dp.-'^t and oj{sj)^Y.j^j^r for all -s ^ [Ri+, where ^ = M 1 - /?)>0 and 0 =

(d) Let <pj{t) = e-t^^ for all / > 0 , where c o l and 0 < ^ < l . In this case it is notpossible to give closed expressions for (p* and A;(, but it can be proved that Condition(A7) holds with the same formulae as in Example l(c).

(2) Divergence type.(a) Let ipj{t)^t\o%t-t+\ for all t>Q. Define hj,.T,p and oy as in

Example l(a),(b) Let ipj{t) = t-\o%t- 1 for all / > 0 . Then, <p;(/)= - log(l - 0 for ail / < 1 , and

for all sj^{sj)j^jeU-^, such that Sj>-fi for every jeJ. Define . ^ - IR , p{t,p.)^

(c) Let (pj{t)^^t - fl' + (\ - ^) for all t>0, where 0<^<L Then, (pf(t) =

''"'* for all t<B, and

rall5j^(5'j)^ey el!^,such that 5,>-/i; i for everyyeJ . Defined ^M, p{t,n)^e^ ''t,-V)^ EjeJ^J^r f' r all sjeUi+, where 6 = ^-'^ and // = /?/(! - ^ ) .

(d) Let 9 / ( f ) -^ r+ / -^+(5+l ) for all / > 0 , where S>0. Then, (p*(0 =for all t<5, and

for all 5y (5y)jej€ U-^, such that 5,>-(5 for every j€J. Define , ^ R, pGASJ ) = - i^jej ^r^j for all sj e Mi, where f/ - S/{5^ 1) and 0 = .5''.

Note that the functions aj of Cases l(b)-(d) are strictly convex on 5*, and henceConditions (A7) and (A8) hold for these cases. In Case l(a), aj is convex, but notstrictly convex. Hence, for this case Condition (A7) holds, and Condition (A8) can beeasily established using the definition of aj.

Condition {A6)(i) holds for Cases l(a), l(c), l(d), 2(a), and 2(c). In the other cases,the results of Corollary 1 and Proposition 8 (i.e., convergence of the whole dual path{s(p)} to a unique point in 5*) are valid if either (A6)(ii) or (A6)(iii) hold (i.e., linearprogramming problem or bounded primal optimal set).

4. The dual sequence of the proximal point method with separable Bregmandistances. A separable Bregman Junction with zone IR is a function (p

Page 15: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

620 A, IUSEM AND R. D, C. MONTEIRO

of the form (pix)== J2%\ <PA^j\ satisfying (A3), (A6)(i), and the three technical condi-tions stated below (see, e.g., Iusem 1998b) on the Bregman distance D^: U"^ x U\^ —* IRdefined as

(Bl) For all (5G R the partial level sets T{x,d)={y e U\^: D^{x,y)<6] are boundedfor al! xG IR .

(B2) If {y^} c [R1_+ converges to y' then D^{y',y'') converges to 0.(B3) If {x*} c [R" and {y*} C (R _ are sequences such that (x*} is bounded, limi_oo

>'* = y* and limi_oo^v(-^*,y*)^O then limt^oo-^* = v*-Bregman functions which also satisfy (A4) are said to be boundary coercive. The

functions (p with (pj as in Examples l(a), l(c), and l(d) of §3 are boundary coerciveseparable Bregman functions with zone U\;ip with (pj as in Example l(b) fails to satisfyonly Condition (A6)(i).

The proximal point method with Bregman distance D^ for solving Problem (4) generatesa sequence {x*}clR1__|_ defined as

(44) x'^eM"^,

(45) x*" ' — argmin{/(x)+A)i£)^(x,x*): Ax — b),

where [Xk] c U++ satisfies

oo

(46) Y.I-'=oo.*=0

The following result on the convergence of {x*} given by (44)-(45) is known.

PROPOSITION 10. Assume that Conditions (Al) and (A2) hold and that (p is a bound-ary coercive Bregman function with zone W^. Then, the sequence {x*} generated by(44)-(45) converges to a solution of problem (4).

PROOF, See Theorem 4.1 of Iusem (1995). We mention that in Iusem (1995) a conditionstronger than (46), namely boundedness of {?.k}, is assumed, but the proof can be easilymodified to hold under our weaker assumption (46). as done in Chen and Teboulle (1993),which on the other hand imposes a condition on (p stronger than (A4), namely that V^Jis onto, n

The optimality condition for jc*^' to be a solution of (45) is that

(47)

where, for every it > 0.

(48) s''

We are interested in the convergence properties of the dual sequence {5*}. UsingProposition 2(vi) and the fact that (p{x)^ Yl%\ <PA^J\ we see that (48) is equivalentto

(49)

Page 16: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 2 1

As in the proof of Proposition 4, it is easy to show that .y* is the solution of

(50)

for any x such that Ax^b. Using this observation and Lemma 1, we can now establishthe boundedness of {J*} as follows.

PROPOSITION 11. Under the assumptions of Proposition 10, {s''} is bounded.

PROOF Observe that by (50), s* is a solution of Problem (17) with u = Vf(x''^^),H = lmA' and gj(t) = xft+/.i((pj'{(pj(x^)~t/kk). Since, by Proposition 10, the set V^{V/(x*): k>0} is bounded and ^'(0) —0 for ally, it follows from Lemma 1 with a = 0and Lemma 2 that 5* € Bo(//+V/(x*^')) C Bo(i/+t/) for every k. The resuh now followsfrom the fact that Bo(H+U) is a bounded set. n

An interesting question is whether the cluster points of {5*} are dual solutions of (4);or equivalently, that any cluster points s of {5*} satisfy (23) for some xeX'. It is easy tocheck that the first two relations of (23) hold with :c^ lim^^ooJ^*- The difficulty lies inestablishing the third relation of (23), that is 5 > 0 . Indeed, writing (48) component-wise,we have

(51) Sj ^).k[(pj(xj)~(pj(xj'^*)].

If Vimk^ooxf — Xj^O, then both terms in the right-hand side of (51) diverge to -00 andnothing can be said about the sign of 5*.

Instead of trying to answer the above question, we will consider the related issue ofanalyzing the behavior of the sequence {.?*} of weighted averages defined as

(52)

where

(53) ^ki ^ f^k^'-i ) ' 0 ' ' ' ^

(54)V ;= I /

Before stating the main result about the behavior of {s''}, we need the following

elementary resuh about sequences.

PROPOSITION 12. Consider a sequence {an}c[R++ such that X ^ A ^ I ^ * ^ ^

Tiki = (^il E * - i ^ j - ^^^^'i' E J = I T^ki =\ for all k>\, and for any sequence {r*} C IR" such

that limA-,oc f'' = r, it holds that limi^oo I]/=i ^*'''' = ^•

PROOF Elementary. DWe are now ready to state the main result of this section.

PROPOSITION 13. Assume that Conditions (Al) and (A2) hold and ihat (p is a bound-ary coercive separable Bregman function with zone U"^. Then, the sequence {J*} definedby (48) and (52)-(54) is bounded, and all its cluster points are dual solutions of

Page 17: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

6 2 2 A. IUSEM AND R. D. C, MONTEIRO

problem (4). Furthermore, if in addition Conditions (A7) and (A8) hold then the wholesequence {s*} converges to the h-center of S' with respect to the barrier h: U"—>Udefined by h(x) = D^{x,x^) for all xeU^ and h(x) = 00 for x^U^ (see the definition(43)).

PROOF, Let {x*} be the sequence defined by .f* -x*+ ' for all k>0. By Proposition 10,we know that linift^oo x'' = x for sotne xeX\ Using (47) and (52), we easily see that

k

(55) ^*

where ff* ^ l^y^, 7rhV/(x'). Using (46) and Proposition 12 with O J A ^ A " ' and r* ^V/(x*), we obtain that lim^^oc [ff*-V/(.Y*)] ^ V/(x)-V/( .v) = O. Moreover, using(48). (52) and (53), we obtain

(56) s' --

(57)

where the last equality follows from the definition of h. Using the fact that (p satisfies(A3), (A4). and (A6)(i) and h'^it)^ (p'j{t)-(p'j{xj) for all / > 0 , it is easy to see h satisfiesConditions (A3), (A4), (A5) with x - x ' , and (A6)(i). By (47) and (54), we see thatlim^^:^ j.ii, ^ 0. We have thus shown that {x*}, (i"*}, {3*} and {fik} satisfy the hypothe-ses of Proposition 9. The conclusion of the proposition now follows from conclusions (a),(c). and (d) of Proposition 9. D

We close this section by making two observations. The first one is that it is not nec-essary to keep the whole set {.v',...,5*} in order to compute .?*. Indeed, it follows from(52), (53), and (54) that 5"+^ = dks'<+i\-Sk)s''+\ where dk^{\+tik/Ak+i)-^. The sec-ond observation is that for the four examples of Bregman functions given in §3, the/i-center of 5* with respect to barrier h defined in Proposition 13 does not depend on x',and hence on the starting point x* . We conjecture that this invariance of the A-center withrespect to x' is always true under some mild conditions.

5. Final remarks. In the case of linear programming where, for some ceiR",V/(x) = cfor all xGlR", we have <:/* = ^f^, 7ri,V/(x')^c=: V/(x*) for all k. Thus, by examining(55) and (57) we see that the pair (x*,J*) is a solution of the system s + Hk^h(x) = dand seVf(x) + imA^, But in view of (9) and (10), this system consists of the opti-mality conditions for Problem (7), which, by Proposition l(i), has the unique solution(x,s) = ixink),s{fik)). Hence, it follows that s'^^si^k) and x*^ '^x* =X(//A) for all k.Therefore, the primal (resp. average dual) proximal sequence {x*} (resp. {5*}) is con-tained in the primal (resp. dual) central path corresponding to the barrier h defined inProposition 13. (The result about the primal sequence {x*} first appeared in Theorem 3of Iusem et al. 1999.)

A proximal method has also been developed for divergence-type barriers (see (41)),with iteration formula given by

(58) x*^+'-ar^

where x"* e M" _ is arbitrary,

(59) d4,ix,y) =

Page 18: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

THE GENERALIZED PROXIMAL POINT METHOD 6 2 3

and each 0,; U.^.+ ^ M is strictly convex, differentiable and satisfies (f>j( 1) = 0y(1) — 0 andlim,_oo<^y'(0^-oo- Convergence of (x*} as defined by (58) to a solution of problem(4) has been proved in Iusem et al. (1994) under the additional condition that

(60) <f>'j(ty4,'/i\)<\ogt, w>o.

A result on the corresponding averaged dual sequence {s*} similar to Proposition 13 canbe found in Polyak and Teboulle (1997) for the more general case of convex, rather thanlinear, constraints. Assuming that 0 , ^ ••• ^ ^ n , that {Xk} is constant, that <f)j satisfies(60) and that log((/»/(O) is convex, it is proved in Polyak and Teboulle (1997) that {s''}is bounded and that all its cluster points are dual optimal solutions.

Finally, we mention three open problems related to our results. The first one is to provethat the cluster points of {i*} as given by (48) are dual optimal solutions. As mentionedabove, the basic difficulty is to prove that they are nonnegative.

The second one is the convergence of the primal sequence {x*} in the absence of(A6)(i) (e.g.. Example l(b)). For linear programming {x*} converges to a primal solution,since x* —x{pk) as discussed above and {xii-ik)} converges by Proposition l(iv). However,the problem is open for other situations (e.g., under (A6)(ii)).

The third problem is to decide whether the limits of {x(/0} and {x*} coincide, whenboth exist (e.g., under (A6)(i)). It has been proved in Corollary 1 of Iusem et al. (1999)that they do coincide under the additional assumption that the rank of the Hessian matrixof / is constant over the feasible set of Problem (4), but the problem remains openwithout this hypothesis.

Acknowledgments. The work of the first author was partially supported by CNP ofGrant No. 301280/86. The work of the second author was partially supported bythe National Science Foundation under Grants [NT-9600343, CCR-9700448 andCCR-9902010.

References

Adler, 1., M, Resende, G, Veiga, N. Karmarkar. 1989, An implementation of Karmarkar's method for linearprogramming. Mcilh. Proaramming 44 297-335.

, R. D, C, Monteiro, 1991, Limiting behavior of the affine scaling continuous trajectories for linearprogramming problems. Math. Proijrammimi 50 29-51,

Auslender, A,, R, Cominetti, M. Haddou, 1997, Asymptotic analysis for penalty and barrier methods in convexand linear programming. Math. Oper. Res. 22 43-62,

Ben Tal. A,, M, Zibulevsky. 1997. Penalty-barrier methods for convex programming problems, SIAM J. Optim.7 347-366,

Burachik. R. S. 1995, Generalized proximal point methods for the variational inequality problem, Ph,D. thesis.Instituto de Matematica Pura e Aplicada. Rio de Janeiro. Brasil,

Chen, G,, M, Teboulle. 1993, Convergence analysis of a proximal-like optimization algorithm using Bregmanftinctions, SIAM J. Optim. 3 538-543,

Hiriart-Umity, J,-B,. C, Lemarechal, 1993- Convex Analysis and Minimization Akioriihm.s I. ComprehensiveStudy m Mathematics, Vol, 305. Springer-Verlag, New York.

Iusem. A, 1995, On some properties of generalized proximal point methods for quadratic and linear programming.J. Optim. Theory Appt. 85 593-612.

, 1998a. Augmented iMcjrimgitin methods and proximal point methods for convex optimisation.fnvestigacion Operaliva. Forthcommg.

. 1998b, On some properties of generalized proximal point methods for variational inequalities, / Optim.Theory Appl. 96 337-362.

——, B. Svaiter, J. Cruz. 1999. Generalized proximal point methods and Cauchy trajectories in Ricmannianmanifolds, SIAM J. Control Optim. 37 566-588,

, , M, Teboulle, 1994. Entropy-like proximal methods in convex programming. Math. Oper. Res. 19790-814.

Jensen, D, L,, R. A, Polyak. 1994, The convergence of a modified barrier method for convex programming,IBM J. Res. Dev. 38 307-321.

Page 19: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT

624 A. IUSEM AND R. D, C. MONTEIRO

Kiwiel, K. C. 1997. Proximal minimization methods with generalized Bregman functions. SIAM J. ControlOptim. 35 1142-1168.

Kojima. M.. S. Mizuno, T. Noma. 1990. Limiting behavior of trajectories by a continuation method for monotonecomplementarity problems. Math. Oper. Res. 15 662-675.

McLinden, L. 1980a. An analogue of Moreau's proximation theorem, with application to the nonlinearcomplementarity problem. Pacific J. Math. 88 101-161.

. 1980b, The complementarity problem for maximal monotone multifiinctions. R, Cottle, F. Giannessi,J.-L. Lions, eds, Vuriationul Inequitlities and Complementarity Problems. Wiley, New York, 251-270.

Megiddo. N. 1989. Pathways to Ihe optimal set in linear programming. N. Megiddo, ed. Progress inMathematical Programming. Interior Point and Related Methods. Springer Verlag. New York, 131-158.

Monteiro. R. D. C. 1991, Convergence and boundary behavior of Ihe projective scaling trajectories for linearprogramming. Math. Oper. Res. 16 842-858.

——. 1992. On tbe continuous trajectories for a potential reduction algorithm for linear programming. Math.Oper. Res. 17 225-253,

, F. Zhou. 1998, On the existence and convergence of the central path for convex programming and someduality resuits. Cotnput. Optim. Appl. 10 51-77.

Polyak, R., M. Teboulle, 1997. Nonlinear rescaling and proximal-like methods in convex optimization. Math.Programmini] 76 265-284.

Powell, M. J. D. 1995. Some convergence properties of the modified log barrier method for linear programming.SIAM J Optim. 5 695-739.

Rockafellar, R. T. 1970. Convex Analysis. Princeton University Press, Princeton, NJ.. 1976. Monotone operators and the proximal point algorithm, SfAM J. Control Optim. 14 877-898,

Todd, M, J. 1990. A Dantzig-Wolfe-like variant of Karmarkar's interior-point linear programming algorithm.Oper. Re.s. 38 1006-1018.

Tseng, P., D. Bertsekas. 1993. On the convergence of the exponential multiplier method for convex programming.Math. Programming 60 1-19.

Tucker, A. W. 1956. Dual systems of homogeneous linear relations. Ann. Math. Stud. 38 3-18.

A. Iusem: Instituto de Matematica Pura e Aplicada. Estrada Dona Castorina 110, Rio de Janeiro, RJ, 22460-320, Brazil; email: [email protected]

R. D. C. Monteiro; School of Industrial and Systems Engineering, Georgia Tech, Atlanta, Geoi^ia 30332;email: [email protected]

Page 20: ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT METHOD WITH …monteiro/publications/tech_reports/... · 2014-05-30 · ON DUAL CONVERGENCE OF THE GENERALIZED PROXIMAL POINT