Feasible Direction Methods for Constrained Nonlinear...

45
Feasible Direction Methods for Constrained Nonlinear Optimization Suggestions for Improvements Maria Mitradjieva–Daneva Link¨ oping 2007

Transcript of Feasible Direction Methods for Constrained Nonlinear...

Feasible Direction Methods for

Constrained Nonlinear Optimization

Suggestions for Improvements

Maria Mitradjieva–Daneva

Linkoping 2007

Linkoping Studies in Science and Technology. Dissertations, No. 1095

Feasible Direction Methods for

Constrained Nonlinear Optimization

Suggestions for Improvements

Maria Mitradjieva-Daneva

Division of Optimization, Department of Mathematics, Linkoping University,

SE-581 83 Linkoping, Sweden

Copyright c© 2007 Maria Mitradjieva-Daneva, unless otherwise noted.

All rights reserved.

ISBN: 978-91-85715-11-4 ISSN 0345-7524

Typeset LATEX2ε

Printed by LiU-Tryck, Linkoping University, SE - 581 83 Linkoping,

Sweden 2007

To the men of my life,

Stefan, Petter, Martin and Danyo.

Acknowledgments

There are a lot of people who made the appearance of this work possible.

First of all my sincere thanks go to professor Maud Gothe Lundgren, for allencouragement and advice. My deepest thanks for her support during mytime of difficulties.

Many thanks go to Clas Rydergren for all help and support he has given meover the years. It has been a great pleasure to work with him.

Special thanks to Torbjorn Larsson for his guidance in optimization theory,research and writing methodology. The interesting discussions with him andhis profound remarks were very helpful.

I would like to thank professor Per Olov Lindberg, for giving me the op-portunity to work within the Optimization group in Linkoping. I have tothank him for his importance in my work, for his enthusiasm and endlessfinickiness.

I also gratefully acknowledge the financial support from KFB (Swedish Trans-portation & Communications Research Board) and later Vinnova under theproject ”Mathematical Models for Complex Problems within the Road andTraffic Area”.

Special acknowledgments to Leonid Engelson from Inregia, Stockholm, forintroducing me an interesting research topic.

Many thanks go to all my colleagues at the Division of Optimization. Thereare many that from behind the scenes have encouraged me and made mywork pleasant and easier. I am especially grateful to Helene, Andreas andOleg for all support and discussions. Sometimes only a few words can makea lot!

v

Many heartfelt thanks to the girls in LiTH Doqtor, esspecially to Linnea.

There is one man in my life who urged me on by way of his unbelievablegenerousness and love. To Danyo, I send all my love.

Last, but absolutely not least, I would like to express my deepest gratitudeto my parents, my sister Rumi and my friends only for being there.

Thank you to my lovely sons, Petter, Stefan and Martin, who made hardtimes seem brighter with their cheering laugh.

To all of you, I send my deepest gratitude!

Linkoping, May 2007Maria Mitradjieva-Daneva

vi

Sammanfattning

Avhandlingen behandlar utveckling av nya effektiva optimeringsmetoder.Optimering med hjalp av matematiska modeller anvands inom en mangdtillampningar, sasom trafikplanering, telekommunikation, schemalaggning,produktionsplanering, finans, massa- och pappersindustri. I avhandlingenstuderas losningsmetoder for olinjara optimeringsproblem.

De optimeringsmetoder som utvecklas i avhandlingen ar tillampbara for ettstort antal problemtyper. I avhandlingen studeras bland annat trafikjam-viktsproblemet, vilket ar centralt vid analys och planering av trafiksystem.Denna typ av modell kan anvandas for att simulera val av fardvag i sam-band med arbetsresor i tatort. Vi har studerat flera typer av trafikjamvikter,till exempel sadana som tar hansyn till trafikanternas tidsvardering vidberakning av vagavgifter baserade pa samhallsekonomiska marginalkostnader.

Avhandlingen beskriver nya koncept for snabbare och noggrannare losnings-metoder. Snabbhet och noggrannhet ar speciellt viktiga da man har opti-meringsproblem med ett stort antal beslutsvariabler. Metoderna som utveck-lats har det gemensamt att de ar baserade pa tillatna sokriktningar. Meto-dutvecklingen som foreslas i avhandlingen bygger pa forbattringar vid berak-ning av dessa tillatna riktningar.

Avhandlingen har rubriken: Tillatnariktningsmetoder for begransad olinjaroptimering - nagra forslag till forbattringar

vii

Abstract

This thesis concerns the development of novel feasible direction type algo-rithms for constrained nonlinear optimization. The new algorithms are basedupon enhancements of the search direction determination and the line searchsteps.

The Frank–Wolfe method is popular for solving certain structured linearlyconstrained nonlinear problems, although its rate of convergence is oftenpoor. We develop improved Frank–Wolfe type algorithms based on conjugatedirections. In the conjugate direction Frank–Wolfe method a line search isperformed along a direction which is conjugate to the previous one withrespect to the Hessian matrix of the objective. A further refinement ofthis method is derived by applying conjugation with respect to the last twodirections, instead of only the last one.

The new methods are applied to the single-class user traffic equilibrium prob-lem, the multi-class user traffic equilibrium problem under social marginalcost pricing, and the stochastic transportation problem. In a limited setof computational tests the algorithms turn out to be quite efficient. Addi-tionally, a feasible direction method with multi-dimensional search for thestochastic transportation problem is developed.

We also derive a novel sequential linear programming algorithm for generalconstrained nonlinear optimization problems, with the intention of beingable to attack problems with large numbers of variables and constraints.The algorithm is based on inner approximations of both the primal andthe dual spaces, which yields a method combining column and constraintgeneration in the primal space.

ix

Contents

Sammanfattning vii

Abstract ix

Contents xi

PART I: INTRODUCTION AND OVERVIEW

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Selected topics in nonlinear optimization . . . . . . . . . . . . 5

2.1 Prerequisites . . . . . . . . . . . . . . . . . . . . . . . 5

2.1.1 Descent directions . . . . . . . . . . . . . . . 5

2.1.2 Line search . . . . . . . . . . . . . . . . . . . 8

2.2 Linearly constrained optimization . . . . . . . . . . . . 9

2.2.1 The Frank–Wolfe method . . . . . . . . . . 9

2.2.2 Simplicial decomposition . . . . . . . . . . . 10

2.3 General constrained optimization . . . . . . . . . . . . 12

2.3.1 Sequential linear programming . . . . . . . . 12

2.3.2 Sequential quadratic programming . . . . . . 13

2.4 The Lagrangian dual problem . . . . . . . . . . . . . . 14

xi

2.5 Convergence . . . . . . . . . . . . . . . . . . . . . . . 15

3 Outline of the thesis and contribution . . . . . . . . . . . . . 16

4 Chronology and publication status . . . . . . . . . . . . . . . 20

Bibliography 23

PART II: APPENDED PAPERS

PAPER I: The Stiff is Moving — Conjugate Direction Frank–Wolfe Methods with Applications to Traffic Assignment

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

2 The Frank–Wolfe method and modifications . . . . . . . . . . 38

2.1 The Frank–Wolfe method . . . . . . . . . . . . . . . . 38

3 Conjugate direction Frank–Wolfe methods . . . . . . . . . . . 40

3.1 The conjugate Frank–Wolfe method, CFW . . . . . . 41

3.2 Outline of the CFW algorithm . . . . . . . . . . . . . 42

3.3 The bi-conjugate Frank–Wolfe method, BFW . . . . 44

3.4 Convergence of CFW method . . . . . . . . . . . . . 45

4 Applications to traffic assignment problems . . . . . . . . . . 49

4.1 The fixed demand traffic assignment problem . . . . . 49

4.2 Computational experiments . . . . . . . . . . . . . . . 50

4.3 A comparison with origin-based and DSD methods . . 53

A Derivation of the coefficients βik in BFW . . . . . . . . . . . . 58

B Closedness of the mapping A(Dk, Nk) . . . . . . . . . . . . . 60

C Closedness of the conjugation map DCFW . . . . . . . . . . 61

Bibliography 63

xii

PAPER II: Multi-Class User Equilibria under Social MarginalCost Pricing

1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2 Multi-class user equilibria . . . . . . . . . . . . . . . . . . . . 71

3 Equilibria under social marginal cost pricing . . . . . . . . . . 72

4 A two-link example . . . . . . . . . . . . . . . . . . . . . . . . 74

5 A Frank–Wolfe algorithm for the SMC equilibrium . . . . . . 75

6 Some experimental results . . . . . . . . . . . . . . . . . . . . 75

6.1 The two link network . . . . . . . . . . . . . . . . . . 75

6.2 Sioux Falls network . . . . . . . . . . . . . . . . . . . 75

Bibliography 79

PAPER III: A Conjugate Direction Frank–Wolfe Method forNonconvex Problems

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

2 Conjugate directions . . . . . . . . . . . . . . . . . . . . . . . 86

3 The extended conjugate Frank–Wolfe method . . . . . . . . . 88

3.1 Outline of the ECFW algorithm . . . . . . . . . . . . 90

4 Applications to marginal cost congestion tolls . . . . . . . . . 90

4.1 Multi-class traffic equilibria under SMC pricing . . . . 91

4.2 Computational experiments . . . . . . . . . . . . . . 91

Bibliography 95

PAPER IV: A Comparison of Feasible Direction Methods forthe Stochastic Transportation Problem

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

xiii

2 The stochastic transportation problem . . . . . . . . . . . . . 102

3 Feasible direction methods for STP . . . . . . . . . . . . . . . 104

3.1 The Frank–Wolfe, FW . . . . . . . . . . . . . . . . . . 105

3.2 The diagonalized Newton method, DN . . . . . . . . . 106

3.3 The conjugate Frank–Wolfe method, CFW . . . . . . 108

3.4 Frank–Wolfe with multi-dimensional search, MdFW . 109

3.5 The heuristic Frank–Wolfe method, FWh . . . . . . . 110

4 Test problems . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

5 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . 113

6 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118

Bibliography 119

PAPER V: A Sequential Linear Programming Algorithm withMulti-dimensional Search — Derivation and Convergence

1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

2 Related methods . . . . . . . . . . . . . . . . . . . . . . . . . 127

2.1 Sequential linear programming algorithms . . . . . . . 127

2.2 Simplicial decomposition . . . . . . . . . . . . . . . . . 129

3 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . 130

4 SLP algorithm with multi-dimensional search . . . . . . . . . 133

4.1 Derivation of the multi-dimensional SLP algorithm . . 133

4.2 Convergence to KKT points . . . . . . . . . . . . . . . 136

5 MdSLP in the convex case . . . . . . . . . . . . . . . . . . . . 140

6 An illustrational example . . . . . . . . . . . . . . . . . . . . 142

7 Numerical experiments . . . . . . . . . . . . . . . . . . . . . . 148

xiv

7.1 Termination criteria . . . . . . . . . . . . . . . . . . . 149

7.2 Numerical results . . . . . . . . . . . . . . . . . . . . . 150

8 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

Bibliography 157

A Computation of the first-order optimality conditions . . . . . 161

B Extension to the case of SQP . . . . . . . . . . . . . . . . . . 163

xv

PART I

Introduction and Overview

1 Introduction 3

1 Introduction

The field of nonlinear programming has a very broad range of applicationsand it has experienced major developments in the last few decades. Nonlin-ear models arise in various fields of the real life and there is a wide variety ofapproaches for solving the resulting nonlinear optimization programs. Non-linear optimization problems appear, for example, in routing problems intraffic [9, 48, 52, 63, 65] and telecommunications [29], oil [35] and chemical in-dustries [47], design optimization of large-scale structures [43, 44], variationalinequalities [62], applications in structural optimization [66], economics [42],marketing [54, 58] and business applications [36], in solving systems of equa-tions [77], and in scientific applications as biology, chemistry, physics andmechanics, protein structure prediction [59], etc.

The traffic assignment problem is a nonlinear model, which describes howeach traveler minimizes his/her own travel cost for reaching the desired des-tination. Modeling of the travel times, congestion and differences in thetravelers value of time leads to nonlinearities. In the management of invest-ment portfolios, the goal might be to determine a mix of investments so asto maximize return while minimizing risk. The nonlinearity in the modelcomes from taking risk into account. Although there is a variety of port-folio selection models, widely used is the quadratic optimization problemthat minimizes the risk. In physics, for example, minimizing the potentialenergy function would determine a stable configuration of a system of atomsor determine the configuration of the largest terminal or kinetic energy, isalso a nonlinear programming model. A related problem in chemistry is todetermine the molecular structure that minimizes Gibb’s free energy, knownalso as chemical equilibrium [17]. In the very last years much research hasbeen devoted to the development of nonlinear optimization for atomic andmolecular physics, to solve difficult molecular configuration problems likecluster problems, protein folding problems, etc. [25, 41, 59].

Some important recent developments in nonlinear optimization are in leastsquares [60], neural networks [4, 73] and interior point methods for linearand nonlinear programs [4, 26]. Karmarkar [40] introduced a polynomial-time linear programming method and this work started the revolution ofinterior-point methods. These algorithms are especially efficient for convex

4 Introduction and Overview

optimization problems [1]. One can show that the number of iterations thatan interior point algorithm needs in order to achieve a specified accuracyis bounded by a polynomial function of the size of the problem. For moredetails on interior-point methods, see [1, 4, 26]. Other important recentdevelopments are the increased accent on large-scale problems [6, 13, 33, 48,67, 79], and algorithms that take advantage of problem structures as well asparallel computation [10, 56, 76].

When modeling real-world problems different types of optimization problemsoccur. They can be linear or nonlinear, with or without constraints, contin-uous, integer or mixed-integer. The functions in an optimization problemcan be differentiable or non-differentiable, convex or non-convex. Sometimeswe consider optimization under uncertainty, known as stochastic optimiza-tion, where the functions are only given probabilistically. Nice referenceson fundamental theory, methods, algorithm analysis and advices on how toobtain good implementations in nonlinear optimization are among others[1, 3, 4, 51, 60, 77].

The focus of this thesis is on algorithms that solve nonlinear constrainedoptimization problems. Our concern is on algorithms which iteratively gen-erate a sequence of points {xk}∞k=1, which either terminates at or convergesto a solution of the problem under consideration. Only in very special cases,such as linear programming, LP, and convex quadratic programming, QP[77, Chapter 1.6], finite termination at an optimal point occurs.

We consider a general constrained nonlinear optimization problem

minx∈X

f(x), (1)

where the objective function f : X 7→ R is differentiable and the feasibleset X ⊂ Rn is non-empty and compact. A generic form of a primal feasibledescent algorithm for (1) can be written as:

2 Selected topics in nonlinear optimization 5

Algorithm 1 A Generic Primal Feasible Descent Algorithm

Step 0 (Initialization): Choose an initial point x0 ∈ X and let k = 0.Step 1 (Termination check): If a termination criterion is satisfied, then

stop, else k = k + 1.Step 2 (Direction determination): Determine a feasible descent search

direction dk.Step 3 (Step length determination): Determine a step length tk > 0 such

that f(xk + tkdk) < f(xk) and xk + tkd

k ∈ X.Step 4 (Update): Update xk+1 = xk + tkd

k and go to Step 1.

The development of better performing algorithms can be made through mod-ifications in both the direction determination and the step length determi-nation steps of Algorithm 1.

2 Selected topics in nonlinear optimization

We here present some fundamental concepts from nonlinear optimizationproblems and methods.

2.1 Prerequisites

We consider the nonlinear minimization problem

minx

f(x), (2)

where the objective f : Rn 7→ R is a differentiable function. Below, wedescribe basic methods for solving unconstrained optimization problems. Aninteresting aspect of these approaches is if they converge globally. By aglobally convergent algorithm we mean that the method generates a sequencethat converges to a stationary point x∗, i.e. ‖∇f(x∗)‖ = 0, for any startingpoint.

2.1.1 Descent directions

How descent directions are generated depends on the particular optimizationproblem. A sufficient condition for dk to be a descent direction with respect

6 Introduction and Overview

to f at xk is given by ∇f(xk)T dk < 0. In unconstrained optimization asearch direction often has the form

dk = −B−1k ∇f(xk), (3)

where Bk is a symmetric and nonsingular matrix. If Bk additionally ispositive definite, dk becomes a descent direction. In the steepest descentmethod, Bk is simply the identity matrix, thus dk = −∇f(xk), which is adescent direction for f at xk. Global convergence of the steepest descentmethod is shown under convexity requirements of the problem [60, Chapter3]. The steepest descent method is important from a theoretical point ofview, but it is quite slow in practice.

The Newton method, that performs a second-order approximation of the ob-jective function and enjoys a better convergence rate [51, Ch. 7], is obtainedfrom (3), when Bk is the Hessian matrix ∇2f(xk) of the objective function.The Newton method converges rapidly when started close enough to a localoptimum. A drawback is that it may not converge starting at an arbitrarypoint. The Newton method acts as a descent method at the iterate xk, if theHessian matrix ∇2f(xk) is positive definite, and as an ascent method if itis negative definite. The lack of positive definiteness of the Hessian may becured by adding to ∇2f(xk) a diagonal matrix Dk, such that ∇2f(xk) + Dk

becomes positive definite.

Accelerating the steepest descent method while avoiding the evaluation, stor-age and inversion of the Hessian matrix motivates the existence of quasi–Newton methods as well as conjugate direction methods. In conjugate di-rection methods (see e.g. [51, Ch. 8]) for unconstrained convex quadraticoptimization, one performs line searches consecutively in a set of directions,d1, · · · , dn, mutually conjugate with respect to the Hessian ∇2f(x) of the ob-

jective (i.e. fulfilling diT∇2f(x)dj = 0 for i 6= j). In Rn the optimum is thenidentified after n line searches [51, p. 241, Expanding Subspace Theorem].

In conjugate gradient methods, one obtains conjugate directions by ”conju-gating” the gradient direction with respect to the previous search direction,that is dk = −∇f(xk) + βkd

k−1, with βk chosen so that dk is conjugate todk−1, which is accomplished by the choice

βk =∇f(xk)T∇f(xk)

∇f(xk−1)T∇f(xk−1).

In the quadratic case, dk then in fact becomes conjugate to all previousdirections d1, · · · , dk−1 (e.g. [51, p. 245, Conjugate Gradient Theorem]).

2 Selected topics in nonlinear optimization 7

In 1962, Fletcher and Reeves [23] introduced the nonlinear conjugate gra-dient method, known as the Fletcher-Reeves, FR, method. This method isshown to be globally convergent when all the search direction are descent.The method can produce a poor search direction in the sense that the searchdirection dk is almost orthogonal to −∇f(xk), which results in a small im-provement in the objective value. Therefore, whenever this happens, usinga steepest descent direction is advisable. The search direction may fail tobe a descent direction, unless the step size βk satisfies the Wolfe condition[60]. The FR method may take small steps and thus have bad numericalperformance (see [10]).

The Polak–Ribiere method has proved to be more efficient in practice. Thetwo methods differ by the formula calculating βk

βk =∇f(xk)T (∇f(xk) −∇f(xk−1))

∇f(xk−1)T∇f(xk−1).

Quasi–Newton methods are based on approximations of the inverse of theHessian. In these methods, the search direction is chosen to be dk =−Dk∇f(xk), where Dk is an approximation of the inverse Hessian. Thequasi-Newton methods use information gathered from the iterates, xk andxk+1, and the gradients, ∇f(xk) and ∇f(xk+1). The well-known Davidon–Fletcher–Powell method, see e.g. [1, 51, 60], has the property that in thequadratic case, it generates the same direction as the conjugate directionmethod, while constructing the inverse of the Hessian. The method startswith a symmetric and positive definite matrix D0 and iteratively updatesthe approximate inverse Hessian by the formula

Dk+1 = Dk +pk(pk)T

(pk)T qk−

(Dkqk)((qk)T Dk)

(qk)T Dkqk,

where

qk =∇f(xk+1) −∇f(xk),

pk =tkdk,

with dk = −Dk∇f(xk), and tk = arg mint

f(xk + tdk).

8 Introduction and Overview

2.1.2 Line search

To ensure global convergence of a descent method, a line search can beperformed. A step length, that gives a substantial reduction of the objectivevalue, is obtained, usually by the minimization of a one-dimensional function.The one-dimensional optimization problem can be formulated as

mint>0

ϕ(t) = f(xk + tdk),

where t is the step length to be determined. In practice, an exact line searchis not recommended since it is often too time consuming. As a matter of fact,an effective step, tk, needs not to be near a minimizer of ϕ(t). A typical linesearch procedure requires an initial estimate t0k and generates a sequence {tik}that terminates when the step length satisfies certain conditions. An obviouscondition on tk is a reduction of the objective value, i.e. f(xk+tkd

k) < f(xk).To get convergence, the line search needs to obtain sufficient decrease, i.e.to satisfy a condition like the strong Wolfe condition, which is

f(xk + tkdk) ≤ f(xk) + η1tk∇f(xk)Tdk, (7a)

η2|∇f(xk)Tdk| ≥ |∇f(xk + tkdk)T dk|, (7b)

where 0 < η1 < η2 < 1. In practice η1 is chosen to be quite small, usuallyη1 = 10−4. Typical values for η2 is 0.9 if dk is obtained by Newton orquasi–Newton and 0.1 if dk is obtained by a nonlinear conjugate gradientmethod (see e.g. [60]). The condition (7a) is also known as the Armijocondition. The strong condition (7b) does not allow the directional derivative∇f(xk + tkd

k)T dk to be too positive. A nice discussion on practical line-searches can be found in [60].

Another way to effect the convergence properties of an optimization algo-rithm is to use a trust region. The trust region methods avoid line searchesby bounding the length of the search direction, d. In the context of a Newtontype method, the second-order approximation

f(xk) + ∇f(xk)T d +1

2dT∇2f(xk)d

is trusted only in a neighborhood of xk, i.e. if ‖d‖2 ≤ ∆k, for some positive∆k. The need for trust regions is apparent when the Hessian ∇2f(xk) is not

2 Selected topics in nonlinear optimization 9

semi-positive definite. The idea is that when ∇2f(xk) is badly conditioned,∆k should be kept low, and thereby the algorithm turns into a steepestdescent-like method. Even if ∇f(xk) = 0, progress is made if the Hessian∇2f(xk) is not positive definite, i.e. the trust region algorithms move awayfrom stationary points if they are saddle points or local maxima.

Line search methods and trust region methods differ in the order in whichthey choose the direction and the step length of the move to the next iterate.Trust region methods first choose the maximum distance and then determinethe new direction. The choice of the trust region size is crucial and it is basedon the ratio between the actual and the predicted reduction of the objectivevalue. The robustness and strong convergence characteristics have madetrust regions popular, especially for non-convex optimization [1, 4, 11, 57, 60].

2.2 Linearly constrained optimization

The direction determination step of Algorithm 1 produces a feasible descentdirection. A direction dk is feasible if there is a scalar α > 0, such thatf(xk + tdk) ∈ X for all nonnegative t ≤ α. A steepest descent direction ora Newton direction do not guarantee that feasibility is maintained.

We briefly discuss methods that solve linearly constrained nonlinear opti-mization problems, that is

f∗ = minx∈X

f(x), (LCP )

where f : Rn → R is continuously differentiable and the feasible set X ={x : Ax ≤ b} is a nonempty polytope. The Frank-Wolfe algorithm is one ofthe most popular methods for solving some instances of such problems.

2.2.1 The Frank–Wolfe method

The Frank–Wolfe, FW, method [28] was originally suggested for quadraticprogramming problems, but in the original paper it was noted that themethod could be applied also to linearly constrained convex programs.

The FW method approximates the objective f of (LCP ) by a first-orderTaylor expansion (linearization) at the current iterate, xk, giving an affineminorant fk to f , i.e.

fk(x) = f(xk) + ∇f(xk)T (x − xk).

10 Introduction and Overview

Then, the FW method determines a feasible descent direction by minimizingfk over X.

f∗k = min

x∈Xfk(x). (FWSUB)

We denote by yk, the solution of this linear program, which is called the FWsubproblem. The Frank–Wolfe direction is dk = yk − xk. Note, that if f isconvex, f∗

k is a lower bound to f∗, a fact that may be used for terminatingthe method.

The next step of the method is to perform a line search in the FW direction,i.e. a one-dimensional minimization of f , along the line segment betweenthe current iterate xk and the point yk. The point where this minimum isattained (at least approximately) is chosen as the next iterate, xk+1. Notethat f(xk+1) is an upper bound to f∗.

The algorithm generally makes good progress towards an optimum duringthe first few iterations, but convergence often slows down substantially whenclose to an optimum. The reason for this is that the search directions of theFW method, in late iterations, tend to become orthogonal to the gradientof the objective function, leading to extreme zigzagging (e.g. [63, p. 102]).For this reason the algorithm is perhaps best used to find an approximatesolution. It can be shown that the worst case convergence rate is sublinear[4]. In order to improve the performance of the algorithm, there are manysuggestions for modifications of the direction finding [30, 49, 53] and theline search steps [64, 72]. There are also other more complex extensionsof the FW method, such as simplicial decomposition, introduced by vonHohenbalken [70].

2.2.2 Simplicial decomposition

The idea of simplicial decomposition is to build up an inner approximationof the feasible set X, founded on Caratheodory’s theorem (e.g. [3]), whichstates that any point in the convex hull of a set X ⊂ Rn can be expressedas a convex combination of at most 1 + dimX points of the set X. Thus,any feasible solution of (LCP ) can be represented as a convex combinationof the extreme points of the set X. The simplicial decomposition algorithmalternates between a master problem, which minimizes the objective f overthe convex hull of a number of extreme points of X, and a subproblem thatgenerates a new extreme point of the feasible set X and, if f is convex onX, also provides a lower bound on the optimal value.

2 Selected topics in nonlinear optimization 11

Given the current iterate xk and the extreme points yi, i = 1 . . . , k + 1,generated by the subproblem, the next iterate is obtained from the masterproblem

min f(xk+k+1∑

i=0

λi(yi − xk))

s.t.

k+1∑

i=0

λi ≤ 1 (10)

λi ≥ 0, i = 0, . . . , k + 1,

where y0 = x0. This problem is typically of lower dimension than the prob-lem (LCP ).

The advantage of using an inner representation of X is that it is mucheasier to deal with the linear constraints. The disadvantage is that thenumber of the extreme points is very large, for a large-scale problem. Thealgorithm may also need a large number of them in order to span an optimalsolution to (LCP ). In [70] von Hohenbalken shows finite convergence of thesimplicial decomposition algorithm, in the number of master problems, evenif extreme points with zero weights are removed from one master to the next[71]. This result allows for the use of column dropping, which is essential togain computational efficiency in large-scale applications.

When the algorithm throws away every point that is previously generated, weare back to the Frank–Wolfe algorithm. The number of the stored extremepoints is crucial for the convergence properties, since if it is to small thebehavior can be as bad as the Frank–Wolfe algorithm. We refer to [1] forfurther information about column dropping and simplicial decomposition.

Hearn et al. [37] extend the simplicial decomposition concept to the re-stricted simplicial decomposition algorithm [38, 69], in which the number ofstored extreme points is bounded by a parameter r. Convergence to an opti-mal solution is obtained provided that r is greater then the dimension of theoptimal face of the feasible set. Another extension of the simplicial decom-position strategy, known as disaggregate simplicial decomposition, is madeby Larsson and Patriksson [45], who take advantage of Cartesian productstructures. The simplicial decomposition strategy has been applied mainlyto certain classes of structured linearly constrained convex programs, whereit has been shown to be successful.

12 Introduction and Overview

2.3 General constrained optimization

We here consider the constrained nonlinear optimization problem

minx

{ f(x) | g(x) ≤ 0 }, (NLP )

where f : Rn 7→ R and g : Rn 7→ Rm are continuously differentiable func-tions. There are plenty of methods that attempt to solve optimization pro-grams with general constraints (see e.g. [34, 60]). A frequently employedsolution principle is to alternate between the solution of an approximateproblem and a line search with respect to a merit function. The merit func-tion measures the degree of non-optimality of any tentative solution. Thesequential linear programming (SLP) and the sequential quadratic program-ming (SQP) approaches are methods that are based on this principle.

2.3.1 Sequential linear programming

The sequential linear programming methods have become popular becauseof their easiness and robustness for large-scale problems. They are basedon the application of first-order Taylor series expansions. The idea is tolinearize all non-linear parts (objective and/or constraints) and, thereafter,to solve the resulting linear programming problem. The solution to thisLP problem is used as a new iterate. The scheme is continued until somestopping criterion is met.

The SLP approach originates from Griffith and Stewart [35]. Their methodis called the Method of Approximation Programming, and utilizes an LPapproximation of the type

min ∇f(xk)T (x − xk) (SLPSUB)

s.t. g(xk) + ∇g(xk)(x − xk) ≤ 0

‖x − xk‖2 ≤ ∆k,

where ∆k is some positive scalar. The linearity of the subproblem makesthe choice of the step size crucial. It is necessary to impose trust regionson the steps taken in order to ensure convergence and numerical efficiencyof an SLP algorithm. The trust regions must be neither too large nor toosmall. If they are too small, the procedure will terminate prematurely ormove slowly towards an optimum and if they are too large infeasibility oroscillation may occur. The SLP methods are most successful when curvature

2 Selected topics in nonlinear optimization 13

effects are negligible. For problems which are highly nonlinear, SLP methodsmay converge slowly and become unreliable. A variety of numerical methodshas been proposed [7, 11, 12, 21, 24, 57, 61, 78] to improve the convergenceproperties of SLP algorithms.

One of the milestones in the development of the SLP concept is the workof Fletcher and Sainz de la Maza [24]. They describe an algorithm thatsolves a linear program to identify an active set of constraints, followedby the solution of an equality constrained quadratic problem (EQP). Thissequential linear programming - EQP (SLP-EQP) method is motivated bythe fact that solving quadratic subproblems with inequality constraints canbe expensive. The cost of solving one linear program followed by an equalityconstrained quadratic problem would be much lower.

2.3.2 Sequential quadratic programming

The method of sequential quadratic programming, suggested by Wilson [74]in 1963, for the special case of convex optimization, has been of great inter-est for solving large-scale constrained optimization problems with nonlinearobjective and constraints. An SQP method obtains search directions from asequence of QP subproblems. Each QP subproblem minimizes a quadraticapproximation of the Lagrangian function subject to linear constraints. Atthe primal-dual point (xk, uk) the SQP subproblem can be written as

min ∇f(xk)T (x − xk) +1

2(x − xk)T∇2

xxL(xk, uk)(x− xk)

s.t. g(xk) + ∇g(xk)(x − xk) ≤ 0, (SQPSUB)

where ∇2xxL(xk, uk) denotes the Hessian of the Lagrangian. The SQP algo-

rithm in this form is a local algorithm. If the algorithm starts at a point in avicinity of a local minimum, the algorithm has a quadratic local convergence.A line search or a trust region method is used to achieve global convergencefrom a distant starting point. In the line search case the new iterate isobtained by searching along the direction generated by solving (SQPSUB),until a certain merit function is sufficiently decreased. A variety of meritfunctions are described in e.g. [60, Chapter 15]. Another way to find thenext iterate is to use trust regions. SQP methods have proved to be efficientin practice. They typically require fewer function evaluations than some ofthe other methods. For an overview of SQP methods, see [5].

One of the important recent developments in SLP and SQP methods is the

14 Introduction and Overview

introduction of the filter concept by Fletcher and Leyffer [20]. The mainadvantage of using the filter concept is to avoid using a merit function. Thefilter allows a trial step to be accepted if it reduces either the objectivefunction or a constraint violation function. The filter is used in trust regiontype algorithms as a criterion for accepting or rejecting a trial step. Globalconvergence of an SLP-filter algorithm is shown in [12, 21] and the globalconvergence properties of an SQP-filter algorithm are discussed in [19, 22,68].

2.4 The Lagrangian dual problem

Suppose, that (NLP ) has a set of optimal solutions which is non-empty andcompact. Let u ∈ Rm

+ be a vector of Lagrangian multipliers associated withthe constraints g(x) ≤ 0, and consider the Lagrangian function

L(x, u) = f(x) + uTg(x).

Under a suitable constraint qualification the problem (NLP ) can be restatedas the saddle point problem (e.g. [3])

maxu≥0

minx

L(x, u) = f(x) + uT g(x). (SPP )

If a point (x∗, u∗) solves (SPP ), then, according to the saddle point theorem([4, p. 427]), x∗ is a local minimum to (NLP ). Furthermore, if the problem(NLP ) is convex then x∗ is a global optimal solution to the problem (NLP )(see [3]).

The saddle point theorem gives sufficient conditions for optimality. By intro-ducing the Lagrangian function for the (NLP ) problem with slack variablesin the constrains, gi(x) + s2

i = 0, i = 1, . . . ,m, necessary conditions for alocal optimum of general constrained optimization problem can be estab-lished. A point (x∗, u∗, s∗) is a stationary point to (SPP ) if it satisfies∇L(x∗, u∗, s∗) = 0, and the Hessian with respect to x and s is positivesemidefinite. These requirements can be written as

∇f(x∗) + ∇g(x∗)T u∗ = 0, (14a)

u∗T g(x∗) = 0, (14b)

g(x∗) ≤ 0, (14c)

u∗ ≥ 0. (14d)

2 Selected topics in nonlinear optimization 15

The conditions (14a – 14d) are known as the Karush-Kuhn-Tucker (KKT)conditions and a point that satisfies them is known as a KKT point. Thecondition (14a) means that there is no descent direction, with respect to x,for L(x, u) from x∗. Additionally it is required that the complementaritycondition u∗T g(x∗) = 0 is fulfilled. The equation (14b) says that u∗

i canbe strictly positive only when the corresponding constraint gi(x

∗) is active,that is gi(x

∗) = 0 holds. The KKT conditions are first-order necessaryconditions, and they may be satisfied by both local maxima, local minimaand other vectors. The second-order condition

dT∇2xxL(x∗, u∗)d ≥ 0, for all d 6= 0 with ∇g(x∗)d = 0,

is used to guarantee that a given point x∗ is a local minimum.

The methods that solve (NLP ) problems can be divided into methods thatwork in primal, dual and primal-dual spaces. The primal algorithms workwith feasible solutions and improve the value of the objective function. Com-putational difficulties may arise from the necessity to remain within thefeasible region, particularly for problems with nonlinear constraints. Forproblems with linear constraints they enjoy fast convergence.

The dual methods attempt to solve the dual problem. In this case a directiondetermination step should find an ascent direction for the dual objectivefunction, which is always concave even when the primal problem may benon-convex. This means that a local optimum of (SPP ) is also a global one.The main difficulty of the dual problem is that it may be non-differentiableand is not explicitly available.

Primal-dual methods [27, 31, 32, 50] are methods that simultaneously workin the primal and dual spaces. This principle is widely spread in the field ofinterior point methods. A nice book that covers the theoretical properties,practical and computational aspects of primal-dual interior-point methodsis written by Stephen J. Wright [75].

2.5 Convergence

An important subject when considering methods in nonlinear optimization istheir local and global convergence properties. Local convergence propertiesmeasure the ultimate speed of convergence, and can be used to determinethe relative advantage of one algorithm to another. If, for arbitrary startingpoints, an algorithm generates a sequence of points converging to a solution,

16 Introduction and Overview

then the algorithm is said to be globally convergent. Many algorithms forsolving nonlinear programming problems are not globally convergent, but itis often possible to modify such algorithms so as to achieve global conver-gence.

The subject of global convergence is treated by Zangwill [77]. We here thinkof an algorithm as a mapping, that is, the algorithm is represented as apoint-to-set map A, that maps the iteration point xk to a set A(xk) towhich xk+1 will belong, i.e. xk+1 ∈ A(xk).

Definition A point-to-set map A is closed at x if for all sequences{xk}∞k=1 → x and {yk}∞k=1 → y with yk ∈ A(xk), we have y ∈ A(x).

The Convergence Theorem [77, p. 91] establishes global convergence of closedalgorithmic point-to-set maps.

Convergence theorem: Let A be an algorithm on X, and suppose that,given x1, the sequence {xk}∞k=1 is generated satisfying xk+1 ∈ A(xk). Let asolution set Γ ⊂ X be given and suppose that

i) all points xk are contained in a compact set S ⊂ X

ii) there is a continuous function Z on X such that

a) if x /∈ Γ, then Z(y) < Z(x) for all points y ∈ A(x)

b) if x ∈ Γ, then Z(y) ≤ Z(x) for all points y ∈ A(x)

iii) the mapping A(x) is closed at points outside Γ.

Then the limit of any convergent subsequence of {xk}∞k=1 is a solution.

The requirements ii) amounts to the existence of a merit function, whichcan be used to measure the progress of an algorithm.

3 Outline of the thesis and contribution

The attention in this thesis is on the development of feasible descent directionalgorithms. The thesis consists of five papers.

The first paper, ”The Stiff is Moving - Conjugate Direction Frank–Wolfe Meth-

ods with Applications to Traffic Assignment”, treats the traffic assignmentproblem [63]. In this problem, travelers between different origin-destination

3 Outline of the thesis and contribution 17

pairs in a congested urban transportation network, want to travel along theirshortest routes (in time). However, the travel times depend on the conges-tion levels, which, in turn, depend on the route choices. The problem is tofind the equilibrium traffic flows, where each traveler indeed travels along hisshortest route. It is well known that this equilibrium problem can be statedas a linearly constrained convex minimization problem of the form (LCP ),see e.g. [63, Ch. 2].

The conventional Frank–Wolfe, FW, method is frequently used for solvingstructured linearly constrained optimization problems. We improve the per-formance of the Frank–Wolfe method by choosing better search directions,based on conjugate directions. In conjugate gradient methods, one obtainssearch directions by conjugating the gradient direction with respect to theprevious search direction. The same trick can be applied to the FW direc-tion.

In the conjugate direction FW method, CFW, we choose the search directiondk as

dk = dk + βkdk−1,

where dk is the FW direction found by solving the (FWSUB) problemand βk is chosen to make dk conjugate to dk−1 with respect to the Hessian∇2f(xk).

Global convergence of the CFW method using an inexact line search isproved. Further refinement of the conjugate direction Frank–Wolfe methodis derived by applying conjugation with respect to the last two directionsinstead of only the last one. The computations in the Bi-Conjugate Frank–Wolfe Method, BFW, are slightly more complicated. This modificationoutperforms CFW, at least for high iteration counts. The CFW and BFWalgorithms were first implemented in the Matlab [55] environment. Thepromising results spurred us to implement the two algorithms, as well asFW, in the programming language C, to be able to make more detailedinvestigations on larger networks.

In a limited set of computational tests the new algorithms, applied to thesingle-class traffic equilibrium problem, turned out to be quite efficient. Ourresults indicate that CFW and BFW algorithms outperform, for accuracyrequirements suggested by Boyce et al. [8], the pure and “PARTANized”Frank–Wolfe, disaggregate simplicial decomposition [45] and origin-basedalgorithms [2].

18 Introduction and Overview

We extend the conjugate Frank–Wolfe method to non-convex optimizationproblems with linear constraints and apply this extension to the multi-classtraffic equilibrium problem under social marginal cost pricing (SMC). In thesecond paper ”Multi-Class User Equilibria under Social Marginal Cost Pricing”

we study the model in which the cost of a link may differ between thedifferent classes of users in the same transportation network [15]. UnderSMC pricing, the users have to pay a toll for the delays they incur to otherusers. We show that, depending on the formulation, the multi-class SMCpricing equilibrium problem (with different time values) can be stated eitheras an asymmetric or as a symmetric equilibrium problem. In the latter case,the corresponding optimization problem is in general non-convex. For thisnon-convex problem, we devise descent methods of Frank–Wolfe type. Weapply these methods to a synthetic case based on Sioux Falls network.

The third paper ”A Conjugate Direction Frank–Wolfe Method for Non-convex

Problems ” generalizes the conjugate Frank–Wolfe method, examine someproperties of it for non-convex problems, and show through limited testingthat it seems to be more efficient than Frank–Wolfe, at least for high iterationcounts.

Further, we exploit the conjugate Frank–Wolfe algorithm for solving thestochastic transportation problem, for which Frank–Wolfe type methodshave been claimed to be efficient [14, 49, 39]. The stochastic transporta-tion problem, first described by Elmaghraby [18] in 1960, can be consideredas the problem of determining the shipping volumes from supply points todemand points with uncertain demands, that yields the minimal expectedtotal cost. In the fourth paper ”A Comparison of Feasible Direction Meth-

ods for the Stochastic Transportation Problem” we compare several feasibledirection methods for solving this problem.

Besides the conjugate Frank–Wolfe algorithm, we also apply the diagonalizedNewton, DN, approach [46]. In this method the direction generation sub-problem of the Frank–Wolfe method is replaced by a diagonalized Newtonsubproblem, based on a second-order approximation of the objective func-tion. The CFW and DN methods do not introduce any further parameters inthe solution algorithm, they have a better practical rate of convergence thanthe Frank–Wolfe algorithm, and they take full advantage of the structure ofthe problem.

Additionally, an algorithm of FW type but with multi-dimensional searchis described in this paper. In the previously discussed approaches for the

3 Outline of the thesis and contribution 19

stochastic transportation problem the direction finding subproblem is mod-ified in order to improve upon the FW algorithm. Numerical results forthe proposed methods, applied to two types of test problems presented inCooper and LeBlanc [14] and LeBlanc et al. [49], show a performance thatis superior to that of the Frank–Wolfe method, and to the heuristic variationof the Frank–Wolfe algorithm used in LeBlanc et al. [49], whenever solutionsof moderate or high accuracy are sought.

In paper five ”A Sequential Linear Programming Algorithm with Multi-dimen-

sional Search — Derivation and Convergence” we utilize ideas from simplicialdecomposition (see Section 2.2.2), sequential linear programming (see Sec-tion 2.3.1) and duality (see Section 2.4). This results in a novel SLP algo-rithm for solving problems with large number of variables and constraints.In particular, the line search step is replaced by a multi-dimensional search.The algorithm is based on inner approximations of both the primal and thedual spaces, and yields both column and constraint generation in the primalspace and its linear programming subproblem differs from the one obtainedin traditional SLP methods.

A linear approximation of (SPP ) (see Section 2.4) at the current primal anddual points gives a column generation problem which reduces and separatesinto a primal and a dual column generation problems. These are used tofind better approximations of the inner primal and dual spaces. The linesearch problem of a traditional SLP algorithm is replaced by a minimizationproblem of the same type as the original one, but with typically fewer vari-ables and fewer constraints. Because of the fewer number of variables andconstraints, it should be computationally less demanding than the originalproblem.

The theoretical results presented in this paper show the convergence of thenew method to a point that satisfies the KKT conditions, and thus to a globaloptimal solution for a convex problem. In the presented algorithm it is notnecessary to introduce rules to control the move limits ∆k, and we mayabandon the merit function as well, while still guaranteeing convergence.In the paper, the suggested idea of using multi-dimensional search is alsooutlined for the case of sequential quadratic programming algorithms.

We apply the new method to a selection of the Hoch-Schittkowski’s nonlineartest problems and report preliminary computational results in a Matlabenvironment.

My contribution to the papers presented in this thesis includes a major in-

20 Introduction and Overview

volvement in the development of the solution methods, in the writing processand the analysis of the results. My contributions are in the implementationand testing of the solution algorithms that are described in the papers, aswell.

4 Chronology and publication status

The papers that has contributed to the contents of the thesis, arised in thefollowing order.

”A Conjugate Direction Frank–Wolfe Method with Applications to the Traffic

Assignment Problem”, co-authored with Per Olov Lindberg.

Published in Operations Research Proceedings 2002, pp. 133-138, Springer,2003. The paper is also presented in my licentiate thesis [16].

”Improved Frank–Wolfe Directions through Conjugation with Applications to

the Traffic Assignment Problem”, co-authored with Per Olov Lindberg.

Published as Technical Report LiTH-MAT-R-2003-6, Department of Math-ematics, Linkoping University. The paper is part of my licentiate thesis[16].

”Multi-Class User Equilibria under Social Marginal Cost Pricing”, co-authoredwith Leonid Engelson and Per Olov Lindberg.

Published in Operations Research Proceedings 2002, pp. 174-179, Springer,2003. This paper is presented as paper II in the thesis and is also presentedin my licentiate thesis [16].

4 Chronology 21

”A Conjugate Direction Frank–Wolfe Method for Non-convex Problems”, co-authored with Per Olov Lindberg.

Published as Technical Report LiTH-MAT-R-2003-09, Department of Math-ematics, Linkopings University. The paper is presented as paper III in thisthesis and is also in my licentiate thesis [16].

”The Stiff is Moving - Conjugate Direction Frank–Wolfe Methods with Appli-

cations to Traffic Assignment”, co-authored with Per Olov Lindberg.

The paper is under review for publication in the journal TransportationScience. This paper is presented as paper I in this thesis and is an extensionof the first two papers above.

”A Sequential Linear Programming Algorithm with Multi-dimensional Search

— Derivation and Convergence”, co-authored with Maud Gothe-Lundgren,Torbjorn Larsson, Michael Patriksson and Clas Rydergren.

The paper is submitted for publication and is presented as paper V in thisthesis.

”A Comparison of Feasible Direction Methods for the Stochastic Transportation

Problem”, co-authored with Torbjorn Larsson, Michael Patriksson and ClasRydergren.

The paper is submitted for publication and is presented as paper IV in thisthesis.

Bibliography

[1] N. Andreasson, A. Evgrafov, and M. Patriksson. An Introduction toContinuous Optimization: Foundations and fundamental algorithms.Studentlitteratur, 2005.

[2] H. Bar-Gera. Origin-based algorithms for the traffic assignment prob-lem. Transportation Sci., 36(4):398–417, 2002.

[3] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. Nonlinear Program-ming: Theory and Algorithms. John Wiley & Sons, New York, NY,second edition, 1993.

[4] D. P. Bertsekas. Nonlinear Programming. Athena Scientific, Belmont,MA, second edition, 1999.

[5] P. T. Boggs and J. W. Tolle. Sequential quadratic programming. ActaNumerica, pages 1–51, 1995.

[6] P. T. Boggs, J. W. Tolle, and A. J. Kearsley. A truncated SQP al-gorithm for large scale nonlinear programming problems. In Advancesin optimization and numerical analysis (Oaxaca, 1992), volume 275 ofMath. Appl., pages 69–77. Kluwer Acad. Publ., Dordrecht, 1994.

[7] J. F. Bonnans, J. Ch. Gilbert, C. Lemarechal, and C. Sagastizabal. Nu-merical Optimization – Theoretical and Practical Aspects. Universitext.Springer Verlag, Berlin, second edition, 2006.

[8] D. Boyce, B. Ralevic-Dekic, and H. Bar-Gera. Convergence of traf-fic assignments: How much is enough? In 16th Annual InternationalEMME/2 Users’ Group Conference, Albuquerque, NM, 2002.

[9] M. Bruynooghe, A. Gibert, and M. Sakarovitch. Une methoded’affectation du trafic. In Proceedings of the 4th International Sympo-

23

24 Introduction and Overview

sium on the Theory of Road Traffic Flow, pages 198–204. Bundesmin-isterium fur Verkehr, Bonn, Karlsruhe, 1969.

[10] Y. Censor and S. A. Zenios. Parallel optimization. Numerical Mathe-matics and Scientific Computation. Oxford University Press, New York,1997.

[11] T. Y. Chen. Calculation of the move limits for the sequential linear pro-gramming method. Internat. J. Numer. Methods Engrg., 36(15):2661–2679, 1993.

[12] C. M. Chin and R. Fletcher. On the global convergence of an SLP-filteralgorithm that takes EQP steps. Math. Programming, 96(1):161–177,2003.

[13] A. R. Conn, N. I. M. Could, and Ph. L. Toint. LANCELOT: a fortranpackage for large-scale nonlinear optimization (release a). In SpringerSeries in Computational Mathematics, volume 17, 1992.

[14] L. Cooper and L. J. LeBlanc. Stochastic transportation problems andother network related convex problems. Naval. Res. Logist. Quart.,24(2):327–337, 1977.

[15] S. Dafermos. Toll patterns for multiclass-user transportation networks.Transportation Sci., 7:211–223, 1973.

[16] M. Daneva. Improved Frank-Wolfe directions with applications to thetraffic assignment problem. Linkoping Studies in Science and Technol-ogy. Theses No. 1023. Department of Mathematics, Linkoping Univer-sity, 2003.

[17] G. B. Dantzig. Linear programming and extensions. Princeton Univer-sity Press, Princeton, N.J., 1963.

[18] S. E. Elmaghraby. Allocation under uncertainty when the demand hascontinuous d.f. Management Sci., 6:270–294, 1960.

[19] R. Fletcher, N. I. M. Gould, S. Leyffer, Ph. L. Toint, and A. Wachter.Global convergence of a trust-region SQP-filter algorithm for generalnonlinear programming. SIAM J. Optim., 13(3):635–659, 2002.

[20] R. Fletcher and S. Leyffer. Nonlinear programming without a penaltyfunction. Technical Report 171, Department of Mathematics, Universityof Dundee, Scotland, 1996.

Bibliography 25

[21] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence ofan SLP-filter algorithm. Technical Report 183, Department of Mathe-matics, University of Dundee, Scotland, 1998.

[22] R. Fletcher, S. Leyffer, and Ph. L. Toint. On the global convergence ofa filter-SQP algorithm. SIAM J. Optim., 13(1):44–59, 2002.

[23] R. Fletcher and C. M. Reeves. Function minimization by conjugategradients. Comput. J., 7:149–154, 1964.

[24] R. Fletcher and E. Sainz de la Maza. Nonlinear programming andnonsmooth optimization by successive linear programming. Math. Pro-gramming, 43(3):235–256, 1989.

[25] C. A. Floudas. A global optimization approach for Lennard-Jones mi-croclusters. Journal of Chemical Physics, 97:7667 – 7678, 1992.

[26] A. Forsgren and Ph. E. Gill. Interior methods for nonlinear optimiza-tion. SIAM Rev., 44:525–597, 2002.

[27] A. Forsgren, Ph. E. Gill, and M. H. Wright. Primal-dual interior meth-ods for nonconvex nonlinear programming. SIAM J. Optim., 8:1132 –1152, 1998.

[28] M. Frank and Ph. Wolfe. An algorithm for quadratic programming.Naval Res. Logist. Quart., 3:95–110, 1956.

[29] L. Fratta, M. Gerla, and L. Kleinrock. The flow deviation method:An approach to store-and-forward communication network design. Net-works, 3:97–133, 1973.

[30] M. Fukushima. A modified Frank-Wolfe algorithm for solving the trafficassignment problem. Transportation Res. Part B, 18(2):169–177, 1984.

[31] E. M. Gertz and Ph. E. Gill. A primal-dual trust region algorithm fornonlinear optimization. Math. Program., 100(1):49–94, 2004.

[32] Ph. E. Gill, W. Murray, D. B. Ponceleon, and M.A. Saunders. Primal-dual methods for linear programming. Math. Programming, 70(3, Ser.A):251–277, 1995.

[33] Ph. E. Gill, W. Murray, and M. A. Saunders. SNOPT: an SQP algo-rithm for large-scale constrained optimization. SIAM Rev., 47(1):99–131, 2005.

26 Introduction and Overview

[34] N. Gould, D. Orban, and Ph. Toint. Numerical methods for large-scalenonlinear optimization. Acta Numerica, pages 299–361, 2005.

[35] R. E. Griffith and R. A. Stewart. A nonlinear programming techniquefor the optimization of continuous processing systems. ManagementSci., 7:379–392, 1960/1961.

[36] R. Haugen. In Modern Investment Theory, pages 92–130. 1997.

[37] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Finiteness in re-stricted simplicial decomposition. Oper. Res. Lett., 4(3):125–130, 1985.

[38] D. W. Hearn, S. Lawphongpanich, and J. A. Ventura. Restricted simpli-cial decomposition: computation and extensions. Math. ProgrammingStudy, 31:99–118, 1987.

[39] K. Holmberg. Efficient decomposition and linearization methods for thestochastic transportation problem. Comput. Optim. Appl., 4(4):293–316, 1995.

[40] N. Karmarkar. A new polynomial-time algorithm for linear program-ming. Combinatorica, (4):373–395, 1984.

[41] V. G. Kartavenko, K. A. Gridnev, and W. Greiner. Nonlinear effects innuclear cluster problem. Int. J. Mod. Phys., E7:287 – 299, 1998.

[42] D. M. Kreps. Course in Microeconomic Theory. Princeton UniversityPress, New Jersey, 1990.

[43] L. Lamberti and C. Pappalettere. Move limits definition in structuraloptimization with sequential linear programming. I. Optimization algo-rithm. Comput. & Structures, 81(4):197–213, 2003.

[44] L. Lamberti and C. Pappalettere. Move limits definition in structuraloptimization with sequential linear programming. II. Numerical exam-ples. Comput. & Structures, 81(4):215–238, 2003.

[45] T. Larsson and M. Patriksson. Simplicial decomposition with disaggre-gated representation for the traffic assignment problem. TransportationSci., 26:4–17, 1992.

[46] T. Larsson, M. Patriksson, and C. Rydergren. An efficient solutionmethod for the stochastic transportation problem. Linkoping Studies inScience and Technology. Theses No. 702. Department of Mathematics,Linkoping University, 1998.

Bibliography 27

[47] L. S. Lasdon and A. D. Waren. Large scale nonlinear programming.Computers and Chemical Engineering, 7(5):595–613, 1983.

[48] L. J. Leblanc. Mathematical programming algorithms for large scalenetwork equilibrium and network design problems. PhD thesis, IE/MSDept, Northwestern University, Evanston IL, 1973.

[49] L. J. LeBlanc, R. V. Helgason, and D. E. Boyce. Improved efficiency ofthe Frank-Wolfe algorithm for convex network programs. Transporta-tion Sci., 19(4):445–462, 1985.

[50] X. Liu and J. Sun. A robust primal-dual interior-point algorithm fornonlinear programs. SIAM J. Optim., 14(4):1163–1186, 2004.

[51] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, Reading, MA, 1984.

[52] J. T. Lundgren. Optimization approaches to travel demand mod-elling. PhD thesis, Department of Mathematics, Linkopings university,Linkoping, Sweden, 1989.

[53] M. Lupi. Convergence of the Frank-Wolfe algorithm in transportationnetwork. Civil Engineering Systems, 3:7–15, 1986.

[54] R. Markland and J. Sweigart. Quantitative Methods: Applications toManagerial Decision Making. John Wiley & Sons, New York, 1987.

[55] The MathWorks, Inc., Natick, MA. Matlab User’s Guide, 1996.

[56] A. Migdalas, G. Toraldo, and V. Kumar. Nonlinear optimization andparallel computing. Parallel Comput., 29(4):375–391, 2003.

[57] J. J. More and D. C. Sorensen. Computing a trust region step. SIAMJ. Sci. Statist. Comput., 4(3):553–572, 1983.

[58] R. M. Nauss and R. E. Markland. Optimization of bank transit checkclearing operations. Management Sci., 31(9):1072–1083, 1985.

[59] A. Neumaier. Molecular modeling of proteins and mathematical pre-diction of protein structure. SIAM Rev., 39(3):407–460, 1997.

[60] J. Nocedal and S. J. Wright. Numerical Optimization. Springer-Verlag,New York, 1999. Springer series in operations research.

[61] J. Nocedal and Y. Yuan. Combining trust region and line searchtechiques. Advances in Nonlinear Programming, pages 153–175, 1998.

28 Introduction and Overview

[62] M. Patriksson. A unified framework of descent algorithms for nonlin-ear programs and variational inequalities. PhD thesis, Department ofMathematics, Linkopings university, Linkoping, Sweden, 1993.

[63] M. Patriksson. The Traffic Assignment Problem - Models and Methods.VSP, Utrecht, 1994.

[64] W. B. Powell and Y. Sheffi. The convergence of equilibrium algorithmswith predetermined step sizes. Transportation Sci., 16(1):45–55, 1982.

[65] C. Rydergren. Decision support for strategic traffic management : anoptimization-based methodology. PhD thesis, Department of Mathemat-ics, Linkopings university, Linkoping, Sweden, 2001.

[66] M. Ronnqvist. Applications of Lagrangean dual schemes to structuraloptimization. PhD thesis, Department of Mathematics, Linkopings uni-versity, Linkoping, Sweden, 1993.

[67] K. Schittkowski and C. Zillober. Nonlinear programming: algorithms,software, and applications. From small to very large scale optimization.In System modeling and optimization, volume 166 of IFIP Int. Fed. Inf.Process., pages 73–107. Kluwer Acad. Publ., Boston, MA, 2005.

[68] S. Ulbrich. On the superlinear local convergence of a filter-SQP method.Math. Programming, 100(1, Ser. B):217–245, 2004.

[69] J. A. Ventura and D. W. Hearn. Restricted simplicial decomposition forconvex constrained problems. Math. Programming, 59(1):71–85, 1993.

[70] B. von Hohenbalken. A finite algorithm to maximize certain pseudo-concave functions on polytopes. Math. Programming, 9:189–206, 1975.

[71] B. von Hohenbalken. Simplicial decomposition in nonlinear program-ming algorithms. Math. Programming, 13:49–68, 1977.

[72] A. Weintraub, C. Ortiz, and J. Gonzalez. Accelerating convergence ofthe Frank-Wolfe algorithm. Transportation Res. Part B, 19(2):113–122,1985.

[73] Y. Wen, M. A. Moreno-Armendariz, and E. Gomez-Ramirez. Modellingof gasoline blending via discrete-time neural networks. In Proceedings.2004 IEEE International Joint Conference on Neural Networks, vol-ume 2, pages 1291 – 1296. 2004.

Bibliography 29

[74] R. B. Wilson. A simplicial method for concave programming. PhDthesis, Harward University, Cambridge, Mass., 1963.

[75] S. J. Wright. Primal-Dual Interior-Point Methods. SIAM, 1997.

[76] G. L. Xue, R. S. Maier, and J. B. Rosen. Minimizing the Lennard-Jonespotential function on a massively parallel computer. In ICS ’92: Pro-ceedings of the 6th international conference on Supercomputing, pages409–416, New York, NY, USA, 1992. ACM Press.

[77] W. I. Zangwill. Nonlinear programming: a unified approach. Prentice-Hall Inc., Englewood Cliffs, N.J., 1969.

[78] J. Z. Zhang, N-H. Kim, and L. Lasdon. An improved successive linearprogramming algorithm. Management Sci., 31(10):1312–1331, 1985.

[79] Ch. Zillober, K. Schittkowski, and K. Moritzen. Very large scale opti-mization by sequential convex programming. Optim. Methods Softw.,19(1):103–120, 2004.