Nonlinear and Mixed Integer Optimization: Fundamentals and Applications

476

description

Christodoulos A. Floudas.

Transcript of Nonlinear and Mixed Integer Optimization: Fundamentals and Applications

  • To my wife, Fotini

  • This page intentionally left blank

  • Preface

    Nonlinear and Mixed-Integer Optimization addresses the problem of optimizing an objectivefunction subject to equality and inequality constraints in the presence of continuous and integervariables. These optimization models have many applications in engineering and applied scienceproblems and this is the primary motivation for the plethora of theoretical and algorithmic devel-opments that we have been experiencing during the last two decades.

    This book aims at presenting the fundamentals of nonlinear and mixed-integer optimization,and their applications in the important area of process synthesis and chemical engineering. Thefirst chapter introduces the reader to the generic formulations of this class of optimization prob-lems and presents a number of illustrative applications. For the remaining chapters, the book con-tains the following three main parts:

    Part 1: Fundamentals of Convex Analysis and Nonlinear Optimization

    Part 2: Fundamentals of Mixed-Integer Optimization

    Part 3: Applications in Process Synthesis

    Part 1, comprised of three chapters, focuses on the fundamentals of convex analysis and non-linear optimization. Chapter 2 discusses the key elements of convex analysis (i.e., convex sets,convex and concave functions, and generalizations of convex and concave functions), which arevery important in the study of nonlinear optimization problems. Chapter 3 presents the first andsecond order optimality conditions for unconstrained and constrained nonlinear optimization.Chapter 4 introduces the basics of duality theory (i.e., the primal problem, the perturbation func-tion, and the dual problem) and presents the weak and strong duality theorem along with the dual-ity gap. Part 1 outlines the basic notions of nonlinear optimization and prepares the reader forPart 2.

    Part 2, comprised of two chapters, addresses the fundamentals and algorithms for mixed-inte-ger linear and nonlinear optimization models. Chapter 5 provides the basic ideas in mixed-inte-ger linear optimization, outlines the different methods, and discusses the key elements of branchand bound approaches. Chapter 6 introduces the reader to the theoretical and algorithmic devel-opments in mixed-integer nonlinear optimization. After a brief description of the motivation andthe formulation of such models, the reader is introduced to (i) decomposition-based approaches(e.g., Generalized Benders Decomposition, Generalized Gross Decomposition), (ii) linearization-based methods (e.g., Outer Approximation and its variants with Equality Relaxation andAugmented Penalty, and Generalized Outer Approximation), and (iii) comparison betweendecomposition- and linearization-based methods.

  • viii

    Part 3, consisting of four chapters, deals with important application areas in chemical engi-neering. Chapter 7 discusses the components of a chemical process system, defines the objectivesin the area of process synthesis, and presents the different approaches in process synthesis.Subsequently, the reader is introduced to modeling issues in mixed-integer nonlinear optimiza-tion problems of process synthesis. Chapter 8 presents the application area of heat exchanger net-work synthesis. The reader is introduced to optimization models that correspond to (i) targetingmethods for minimum utility cost and minimum number of matches, (ii) decomposition-basedmethods, and (iii) simultaneous optimization approaches for the synthesis of heat recovery net-works. Chapter 9 presents applications of mixed-integer nonlinear optimization in the area ofseparations. In particular, the synthesis of sharp heat-integrated distillation columns and the syn-thesis of non-sharp separation columns are addressed. Chapter 10 discusses the application ofmixed-integer nonlinear optimization methods in the synthesis of reactor networks with complexreactions and in the synthesis of prototype chemical processes consisting of reactor-separator-recycle systems.

    The main objectives in the preparation of this book are (i) to acquaint the reader with thebasics of convex analysis and nonlinear optimization without presenting the proofs of the theoret-ical results and the algorithmic details, which can be found in several other textbooks, (ii) to intro-duce the reader to the elementary notions of mixed-integer linear optimization first and to the the-ory and methods of mixed-integer nonlinear optimization next, which are not discussed in othertextbooks, and (iii) to consider several key application areas of chemical engineering process syn-thesis and design in which the mixed-integer nonlinear optimization models and methods applynaturally. Special efforts have been made so as to make this book self-contained, and establish onlythe needed fundamentals in Part 1 to be used in Part 2. The modeling issues and application areasin Part 3 have been selected on the basis of the most frequently studied in the area of process syn-thesis in chemical engineering. All chapters have several illustrations and geometrical interpreta-tions of the theoretical results presented; they include a list of recommended books and articles forfurther reading in each discussed topic, and the majority of the chapters contain suggested prob-lems for the reader. Furthermore, in Part 3 the examples considered in each of the application areasdescribe the resulting mathematical models fully with the key objective to familiarize the readerwith the modeling aspects in addition to the algorithmic ones.

    This book has been prepared keeping in mind that it can be used as a textbook and as a ref-erence. It can be used as a textbook on the fundamentals of nonlinear and mixed-integer opti-mization and as a reference for special topics in the mixed-integer nonlinear optimization part andthe presented application areas. Material in this book has been used in graduate level courses inOptimization and Process synthesis at Princeton University, while Parts 1 and 2 were presentedin a graduate level course at ETH. Selected material, namely chapters 3,5,7, and 8, has been usedin the undergraduate design course at Princeton University as an introduction to optimization andprocess synthesis.

    A number of individuals and institutions deserve acknowledgment for different kinds of help.First, I thank my doctoral students, postdoctoral associates, colleagues, and in particular, thechairman, Professor William B. Russel, at Princeton University for their support in this effort.Second, I express my gratitude to my colleagues in the Centre for Process Systems Engineeringat Imperial College and in the Technical Chemistry at ETH for the stimulating environment andsupport they provided during my sabbatical leave. Special thanks go to Professors John Perkins,Roger W. H. Sargent, and David W. T. Rippin for their instrumental role in a productive andenjoyable sabbatical. Third, I am indebted to several colleagues and students who have providedinspiration, encouragement, extensive feedback, and helped me to complete this book. The thought-

  • Preface ix

    ful comments and constructive criticism of Professors Roger W. H. Sargent, Roy Jackson,Manfred Morari, Panos M. Pardalos, Amy R. Ciric, and Dr. Efstratios N. Pistikopoulos have helpedenormously to improve the book. Claire Adjiman, Costas D. Maranas, Conor M. McDonald, andVishy Visweswaran critically read several manuscript drafts and suggested helpful improvements.The preparation of the camera-ready copy of this book required a significant amount of work.Special thanks are reserved for Costas D. Maranas, Conor M. McDonald and Vishy Visweswaranfor their time, LaTex expertise, and tremendous help in the preparation of this book. Without theirassistance the preparation of this book would have taken much longer time. I am also thankful forthe excellent professional assistance of the staff at Oxford University Press, especially KarenBoyd, who provided detailed editorial comments, and senior editor Robert L. Rogers. Finally andmost importantly, I am very grateful to my wife, Fotini, and daughter, Ismini, for their support,encouragement, and forbearance of this seemingly never ending task.

    C.A.F.Princeton, New JerseyMarch 1995

  • This page intentionally left blank

  • Contents

    1. Introduction, 31.1 Mathematical and Optimization Models, 31.2 Structure of Nonlinear and Mixed-Integer Optimization Models, 41.3 Illustrative Applications, 5

    1.3.1 Binary Distillation Design, 61.3.2 Retrofit Design of Multiproduct Batch Plants, 81.3.3 Multicommodity Facility LocationAllocation, 11

    1.4 Scope of the Book, 12

    PART 1 FUNDAMENTALS OF CONVEX ANALYSIS ANDNONLINEAR OPTIMIZATION2. Convex Analysis, 17

    2.1 Convex Sets, 172.1.1 Basic Definitions, 172.1.2 Convex Combination and Convex Hull, 202.1.3 Separation of Convex Sets, 222.1.4 Support of Convex Sets, 24

    2.2 Convex and Concave Functions, 242.2.1 Basic Definitions, 252.2.2 Properties of Convex and Concave Functions, 252.2.3 Continuity and Semicontinuity, 272.2.4 Directional Derivative and Subgradients, 302.2.5 Differentiable Convex and Concave Functions, 312.2.6 Minimum (Infimum) and Maximum (Supremum), 342.2.7 Feasible Solution, Local and Global Minimum, 36

    2.3 Generalizations of Convex and Concave Functions, 372.3.1 Quasi-convex and Quasi-concave Functions, 372.3.2 Properties of Quasi-convex and Quasi-concave Functions, 392.3.3 Differentiable Quasi-convex, Quasi-concave Functions, 402.3.4 Pseudo-convex and Pseudo-concave Functions, 402.3.5 Properties of Pseudo-convex and Pseudo-concave Functions, 402.3.6 Relationships among Convex, Quasi-convex and Pseudo-convex Functions, 41

  • xii

    3. Fundamentals of Nonlinear Optimization, 453.1 Unconstrained Nonlinear Optimization, 45

    3.1.1 Formulation and Definitions, 453.1.2 Necessary Optimality Conditions, 463.1.3 Sufficient Optimality Conditions, 473.1.4 Necessary and Sufficient Optimality Conditions, 48

    3.2 Constrained Nonlinear Optimization, 493.2.1 Formulation and Definitions, 493.2.2 Lagrange Functions and Multipliers, 513.2.3 Interpretation of Lagrange Multipliers, 523.2.4 Existence of Lagrange Multipliers, 543.2.5 Weak Lagrange Functions, 563.2.6 First-Order Necessary Optimality Conditions, 563.2.7 First-Order Sufficient Optimality Conditions, 613.2.8 Saddle Point and Optimality Conditions, 623.2.9 Second-Order Necessary Optimality Conditions, 643.2.10 Second-Order Sufficient Optimality Conditions, 673.2.11 Outline of Nonlinear Algorithmic Methods, 68

    4. Duality Theory, 754.1 Primal Problem, 75

    4.1.1 Formulation, 754.1.2 Perturbation Function and Its Properties, 764.1.3 Stability of Primal Problem, 764.1.4 Existence of Optimal Multipliers, 77

    4.2 Dual Problem, 774.2.1 Formulation, 784.2.2 Dual Function and Its Properties, 784.2.3 Illustration of Primal-Dual Problems, 794.2.4 Geometrical Interpretation of Dual Problem, 80

    4.3 Weak and Strong Duality, 824.3.1 Illustration of Strong Duality, 844.3.2 Illustration of Weak and Strong Duality, 854.3.3 Illustration of Weak Duality, 86

    4.4 Duality Gap and Continuity of Perturbation Function, 874.4.1 Illustration of Duality Gap, 88

    PART 2 FUNDAMENTALS OF MIXED-INTEGER OPTIMIZATION5. Mixed-Integer Linear Optimization, 95

    5.1 Motivation, 955.2 Formulation, 96

    5.2.1 Mathematical Description, 965.2.2 Complexity Issues in MILP, 965.2.3 Outline of MILP Algorithms, 97

    5.3 Branch and Bound Method, 985.3.1 Basic Notions, 98

  • Contents xiii

    5.3.2 General Branch and Bound Framework, 1015.3.3 Branch and Bound Based on Linear Programming Relaxation, 103

    6. Mixed-Integer Nonlinear Optimization, 1096.1 Motivation, 1096.2 Formulation, 110

    6.2.1 Mathematical Description, 1116.2.2 Challenges/Difficulties in MINLP, 1126.2.3 Overview of MINLP Algorithms, 112

    6.3 Generalized Benders Decomposition, GBD, 1146.3.1 Formulation, 1146.3.2 Basic Idea, 1156.3.3 Theoretical Development, 1166.3.4 Algorithmic Development, 1126.3.5 Variants of GBD, 1256.3.6 GBD in Continuous and Discrete-Continuous Optimization, 140

    6.4 Outer Approximations, OA, 1446.4.1 Formulation, 1446.4.2 Basic Idea, 1456.4.3 Theoretical Development, 1456.4.4 Algorithmic Development, 151

    6.5 Outer Approximation with Equality Relaxation, OA/ER, 1556.5.1 Formulation, 1556.5.2 Basic Idea, 1566.5.3 Theoretical Development, 1566.5.4 Algorithmic Development, 1606.5.5 Illustration, 161

    6.6 Outer Approximation with Equality Relaxation and Augmented Penalty, OA/ER/AP, 1686.6.1 Formulation, 1686.6.2 Basic Idea, 1696.6.3 Theoretical Development, 1696.6.4 Algorithm Development, 1706.6.5 Illustration, 171

    6.7 Generalized Outer Approximation, GOA, 1756.7.1 Formulation, 1756.7.2 Basic Idea, 1756.7.3 Theoretical Development, 1766.7.4 Algorithmic Development, 1796.7.5 Worst-Case Analysis of GOA, 1806.7.6 Generalized Outer Approximation with Exact Penalty, GOA/EP, 181

    6.8 Comparison of GBD and OA-based Algorithms, 1836.8.1 Formulation, 1836.8.2 Nonlinear Equality Constraints, 1846.8.3 Nonlinearities in y and Joint xy, 1846.8.4 The Primal Problem, 1866.8.5 The Master Problem, 1876.8.6 Lower Bounds, 189

  • xiv

    6.9 Generalized Cross Decomposition, GCD, 1906.9.1 Formulation, 1906.9.2 Basic Idea 1916.9.3 Theoretical Development, 1916.9.4 Algorithmic Development, 1996.9.5 GCD under Separability, 2036.9.6 GCD In Continuous and Discrete-Continuous Optimization, 208

    PART 3 APPLICATIONS IN PROCESS SYNTHESIS7. Process Synthesis, 225

    7.1 Introduction, 2257.1.1 The Overall Process System, 226

    7.2 Definition, 2297.2.1 Difficulties/Challenges in Process Synthesis, 230

    7.3 Approaches in Process Synthesis, 2327.4 Optimization Approach in Process Synthesis, 233

    7.4.1 Outline, 2337.4.2 Representation of Alternatives, 2347.4.3 Mathematical Model of Superstructure, 2357.4.4 Algorithmic Development, 256

    7.5 Application Areas, 257

    8. Heat Exchanger Network Synthesis, 2598.1 Introduction, 2598.2 Problem Statement, 261

    8.2.1 Definition of Temperature Approaches, 2628.3 Targets for HEN Synthesis, 262

    8.3.1 Minimum Utility Cost, 2628.3.2 Minimum Number of Matches, 2808.3.3 Minimum Number of Matches for Vertical Heat Transfer, 294

    8.4. Decomposition-based HEN Synthesis Approaches, 3048.4.1 Heat Exchanger Network Derivation, 3058.4.2 HEN Synthesis Strategy, 321

    8.5 Simultaneous HEN Synthesis Approaches, 3238.5.1 Simultaneous Matches-Network Optimization, 3248.5.2 Pseudo-Pinch, 3388.5.3 Synthesis of HENs Without Decomposition, 3428.5.4 Simultaneous Optimization Models for HEN Synthesis, 356

    9. Distillation-based Separation Systems Synthesis, 3799.1 Introduction, 3799.2 Synthesis of Heat-integrated Sharp Distillation Sequences, 381

    9.2.1 Problem Statement 3829.2.2 Basic Idea, 3829.2.3 Derivation of Superstructure, 3839.2.4 Mathematical Formulation of Superstructure, 385

  • Contents xv

    9.3 Synthesis of Nonsharp Distillation Sequences, 3939.3.1 Problem Statement, 3969.3.2 Basic Idea, 3969.3.3 Nonsharp Separation Superstructure, 3979.3.4 Mathematical Formulation of Nonsharp Separation Superstructure, 400

    10. Synthesis of Reactor Networks and Reactor-Separator-Recycle Systems, 40710.1 Introduction, 40710.2 Synthesis of Isothermal Reactor Networks, 411

    10.2.1 Problem Statement, 41110.2.2 Basic Idea, 41210.2.3 Reactor Unit Representation, 41210.2.4 Reactor Network Superstructure, 41410.2.5 Mathematical Formulation of Reactor Superstructure, 415

    10.3 Synthesis of Reactor-Separator-Recycle Systems, 42210.3.1 Introduction, 42210.3.2 Problem Statement, 42410.3.3 Basic Idea, 42410.3.4 Reactor-Separator-Recycle Superstructure, 42510.3.5 Mathematical Formulation, 428

    Bibliography, 435Index, 453

  • This page intentionally left blank

  • Nonlinear and Mixed-Integer Optimization

  • This page intentionally left blank

  • Chapter 1 Introduction

    This chapter introduces the reader to elementary concepts of modeling, generic formulations fornonlinear and mixed integer optimization models, and provides some illustrative applications.Section 1.1 presents the definition and key elements of mathematical models and discusses thecharacteristics of optimization models. Section 1.2 outlines the mathematical structure of nonlinearand mixed integer optimization problems which represent the primary focus in this book. Section1.3 illustrates applications of nonlinear and mixed integer optimization that arise in chemicalprocess design of separation systems, batch process operations, and facility location/allocationproblems of operations research. Finally, section 1.4 provides an outline of the three main partsof this book.

    1.1 Mathematical and Optimization Models

    A plethora of applications in all areas of science and engineering employ mathematical models. Amathematical model of a system is a set of mathematical relationships (e.g., equalities, inequalities,logical conditions) which represent an abstraction of the real world system under consideration.Mathematical models can be developed using (i) fundamental approaches, (ii) empirical methods,and (iii) methods based on analogy. In (i), accepted theories of sciences are used to derive theequations (e.g., Newton's Law). In (ii), input-output data are employed in tandem with statisticalanalysis principles so as to generate empirical or "black box" models. In (iii), analogy is employedin determining the essential features of the system of interest by studying a similar, well understoodsystem.

    A mathematical model of a system consists of four key elements:

    (i) Variables,

    (ii) Parameters,

    (iii) Constraints, and

    3

  • (iv) Mathematical relationships.The variables can take different values and their specifications define different states of the system.They can be continuous, integer, or a mixed set of continuous and integer. The parameters arefixed to one or multiple specific values, and each fixation defines a different model. The constantsare fixed quantities by the model statement.

    The mathematical model relations can be classified as equalities, inequalities, and logicalconditions. The model equalities are usually composed of mass balances, energy balances,equilibrium relations, physical property calculations, and engineering design relations whichdescribe the physical phenomena of the system. The model inequalities often consist of allowableoperating regimes, specifications on qualities, feasibility of heat and mass transfer, performancerequirements, and bounds on availabilities and demands. The logical conditions provide theconnection between the continuous and integer variables.

    The mathematical relationships can be algebraic, differential, integrodifferential, or a mixedset of algebraic and differential constraints, and can be linear or nonlinear.

    An optimization problem is a mathematical model which in addition to the aforementionedelements contains one or multiple performance criteria. The performance criterion is denoted asobjective function, and it can be the minimization of cost, the maximization of profit or yield ofa process for instance. If we have multiple performance criteria then the problem is classified asmulti-objective optimization problem. A well defined optimization problem features a number ofvariables greater than the number of equality constraints, which implies that there exist degreesof freedom upon which we optimize. If the number of variables equals the number of equalityconstraints, then the optimization problem reduces to a solution of nonlinear systems of equationswith additional inequality constraints.

    1.2 Structure of Nonlinear and Mixed-Integer Optimization Models

    In this book we will focus our studies on nonlinear and mixed integer optimization models andpresent the fundamental theoretical aspects, the algorithmic issues, and their applications in thearea of Process Synthesis in chemical engineering. Furthermore, we will restrict our attentionto algebraic models with a single objective. The structure of such nonlinear and mixed integeroptimization models takes the following form:

    where x is a vector of n continuous variables, y is a vector of integer variables, h(x,y) = 0 are mequality constraints, g(jt,.y) < 0 are p inequality constraints, and f ( x , y ) is the objective function.

    4

  • Introduction 5

    Formulation (1.1) contains a number of classes of optimization problems, by appropriate con-sideration or elimination of its elements. If the set of integer variables is empty, and the objectivefunction and constraints are linear, then (1.1) becomes a linear programming LP problem. If theset of integer variables is empty, and there exist nonlinear terms in the objective function and/orconstraints, then (1.1) becomes a nonlinear programming NLP problem. The fundamentals ofnonlinear optimization are discussed in Part 1 of this book. If the set of integer variables isnonempty, the integer variables participate linearly and separably from the continuous, and the ob-jective function and constraints are linear, then (1.1) becomes a mixed-integer linear programmingMILP problem. The basics of mixed-integer linear optimization are discussed in Part 2, Chapter5, of this book. If the set of integer variables is nonempty, and there exist nonlinear terms in theobjective function and constraints, then (1.1) is a mixed-integer nonlinear programming MINLPproblem. The fundamentals of MINLP optimization are discussed in Chapter 6. The last class ofMINLP problems features many applications in engineering and applied science, and a sample ofthese are discussed in Part 3 of this book. It should also be mentioned that (1.1) includes the pureinteger linear and nonlinear optimization problems which are not the subject of study of this book.The interested reader in pure integer optimization problems is referred to the books by Nemhauserand Wolsey (1988), Parker and Rardin (1988), and Schrijver (1986).

    1.3 Illustrative Applications

    Mixed-integer nonlinear optimization problems of the form (1.1) are encountered in a variety ofapplications in all branches of engineering, applied mathematics, and operations research. Theserepresent currently very important and active research areas, and a partial list includes:

    (i) Process Synthesis

    Heat Exchanger NetworksDistillation SequencingMass Exchange NetworksReactor-based SystemsUtility SystemsTotal Process Systems

    (ii) Design, Scheduling, and Planning of Batch ProcessesDesign and Retrofit of Multiproduct PlantsDesign and Scheduling of Multipurpose Plants

    (iii) Interaction of Design and Control

    (iv) Molecular Product Design(v) Facility Location and Allocation

  • (vi) Facility Planning and Scheduling(vii) Topology of Transportation Networks

    Part 3 of this book presents a number of major developments and applications of MINLPapproaches in the area of Process Synthesis. The illustrative examples for MINLP applications,presented next in this section, will focus on different aspects than those described in Part 3. Inparticular, we will consider: the binary distillation design of a single column, the retrofit designof multiproduct batch plants, and the multicommodity facility location/allocation problem.

    1.3.1 Binary Distillation Design

    This illustrative example is taken from the recent work on interaction of design and controlby Luyben and Floudas (1994a) and considers the design of a binary distillation column whichseparates a saturated liquid feed mixture into distillate and bottoms products of specified purity. Theobjectives are the determination of the number of trays, reflux ratio, flow rates, and compositionsin the distillation column that minimize the total annual cost. Figure (1.1) shows a superstructurefor the binary distillation column.

    Formulation of the mathematical model here adopts the usual assumptions of equimolar over-flow, constant relative volatility, total condenser, and partial reboiler. Binary variables q denotethe existence of trays in the column, and their sum is the number of trays N. Continuous variablesrepresent the liquid flow rates Li and compositions x, vapor flow rates Vi and compositions j/;, thereflux Ri and vapor boilup VBi, and the column diameter Di. The equations governing the modelinclude material and component balances around each tray, thermodynamic relations betweenvapor and liquid phase compositions, and the column diameter calculation based on vapor flowrate. Additional logical constraints ensure that reflux and vapor boilup enter only on or.e trayand that the trays are arranged sequentially (so trays cannot be skipped). Also included are theproduct specifications. Under the assumptions made in this example, neither the temperature northe pressure is an explicit variable, although they could easily be included if energy balances arerequired. A minimum and maximum number of trays can also be imposed on the problem.

    For convenient control of equation domains, let TR {1,..., AT} denote the set of trays fromthe reboiler to the top tray and let { Nf} be the feed tray location. Then AF = {Nf + 1,..., JVis the set of trays in the rectifying section and BF {2,..., Nf - 1} is the set of trays in thestripping section. The following equations describe the MINLP model.

    6

    a. Overall material and component balance

    b. Total condenser

  • Introduction 7

    c. Partial reboiler

    d. Phase equilibrium

    e. Component balances

    f. Equimolar overflow

    g. Diameter

    h. Reflux and boilup constraints

    i. Product specifications

    j. Sequential tray constraints

  • 8The economic objective function to be minimized is the cost, which combines the capital costsassociated with building the column and the utility costs associated with operating the column.The form for the capital cost of the column depends upon the vapor boilup, the number of trays,and the column diameter

    where the parameters include the tax factor /3tax, the payback period /3pay, the latent heats ofvaporization &Hvap and condensation A.ffcond and the utility cost coefficients c^psj CGW-

    The model includes parameters for relative volatility a, vapor velocity v, tray spacing flowconstant kv, flooding factor //, vapor py and liquid pi densities, molecular weight MW, andsome known upper bound on column flow rates Fmax.

    The essence of this particular formulation is the control of tray existence (governed by g) andthe consequences for the continuous variables. In the rectifying section, all trays above the trayon which the reflux enters have no liquid flows, which eliminates any mass transfer on these trayswhere & = 0. The vapor composition does not change above this tray even though vapor flowsremain constant. Similarly, in the stripping section, all trays below the tray on which the vaporboilup enters have no vapor flows and the liquid composition does not change below this tray eventhough liquid flows remain constant. The reflux and boilup constraints ensure that the reflux andboilup enter on only one tray.

    It is worth noting that the above formulation of the binary distillation design features the binaryvariables

  • Introduction 9

    In a retrofit batch design, we optimize the batch plant profitability defined as the total productionvalue minus the cost of any new equipment. The objective is to obtain a modified batch plantstructure, an operating strategy, the equipment sizes, and the batch processing parameters. Discretedecisions correspond to the selection of new units to add to each stage of the plant and their typeof operation. Continuous decisions are represented by the volume of each new unit and the batchprocessing variables which are allowed to vary within certain bounds.

    New units may be added to any stage j in parallel to existing units. These new units at stage jare denoted by the index fc, and binary variables yjk are introduced so as to denote whether a newunit k is added at stage j. Upper bounds on the number of units that can be added at stage j andto the plant are indicated by Zj and Zu, respectively.

    The operating strategy for each new unit involves discrete decisions since it allows for theoptions of

    Option Bm: operate in phase with existing unit m to increase its capacity

    Option C: operate in sequence with the existing units to decrease the stage cycle time

    These are denoted by the binary variables (y$k)m and yk, respectively, and take the value of 1if product i is produced via operating options Bm or C for the new unit k in stage j. The volumeof the kth new unit in stage j is denoted by Vjk, and the processing volume of product i requiredis indicated by (V$k}m or Vfjk depending on the operating alternative.

    The MINLP model for the retrofit design of a multiproduct batch plant takes the followingform:

    Objective function

    a. Production targets

    b. Limiting cycle time constraints

    c. Operating time period constraint

    d. Upper bound on total new units

  • JO

    e. Lower bound constraints for new units

    f. Operation in phase or in sequence

    g. Volume requirement for option Bm

    h. Volume requirement for option C

    i. Processing volume restrictions of new units

    j. Distinct arrangements of new units

    The above formulation is a mixed-integer nonlinear programming MINLP model and has thefollowing characteristics. The binary variables appear linearly and separably from the continuousvariables in both the objective and constraints, by defining a new set of variables 10, = tjj/T^ andincluding the bilinear constraints iw;TLi = tij. The continuous variables n;,5i,T[,;,ii;j appearnonlinearly. In particular, we have bilinear terms of n^Bi in the objective and constraints, bilinearterms of niTu and w^Tu in the constraints. The rest of the continuous variables Vj, (V^fc)TO, V$kappear linearly in the objective function and constraints.

  • Introduction 11

    1.3.3 Multicommodity Facility Location-Allocation

    The multicommodity capacity facility location-allocation problem is of primary importance intransportation of shipments from the original facilities to intermediate stations and then to thedestinations. In this illustrative example we will consider such a problem which involves / plants,J distribution centers, K customers, and P products. The commodity flow of product p which isshipped from plant i, through distribution center j to customer k will be denoted by the continuousvariable x^kp. It is assumed that each customer k is served by only one distribution center j. Dataare provided for the total demand by customer k for commodity p, Dkp, the supply of commodityp at plant i denoted as Sip, as well as the lower and upper bounds on the available throughput in adistribution center j denoted by V^ and V^7, respectively.

    The objective is to minimize the total cost which includes shipping costs, setup costs, andthroughput costs. The shipping costs are denoted via linear coefficients c^fcp multiplying thecommodity flows x+jkp. The setup costs are denoted by fj for establishing each distribution centerj. The throughput costs for distribution center j consists of a constant, Vj multiplying a nonlinearfunctionality of the flow through the distribution center.

    The set of constraints ensure that supply and demand requirements are met, provide the logicalconnection between the existence of a distribution, the assignment of customers to distributioncenters, and the demand for commodities, and make certain that only one distribution center isassigned to a customer.

    The binary variables correspond to the existence of a distribution center j, and the assignmentof a customer k to a distribution center j. These are denoted as Zj and yjk, respectively. Thecontinuous variables are represented by the commodity flows x+jkp. The mathematical formulationof this problem becomes

    Objective function

    a. Supply requirements

    b. Demand constraints

    c. Logical constraints

  • 1.4 Scope of the Book

    The remaining chapters of this book form three parts. Part 1 presents the fundamental notions ofconvex analysis, the basic theory of nonlinear unconstrained and constrained optimization, andthe basics of duality theory. Part 1 acquaints the reader with the important aspects of convexanalysis and nonlinear optimization without presenting the proofs of the theoretical results and thealgorithmic details which can be found in several other textbooks. The main objective of Part 1is to prepare the reader for Part 2. Part 2 introduces first the elementary notions of mixed-integerlinear optimization and focuses subsequently on the theoretical and algorithmic developments inmixed-integer nonlinear optimization. Part 3 introduces first the generic problems in the areaof Process Synthesis, discusses key ideas in the mathematical modeling of process systems, andconcentrates on the important application areas of heat exchanger networks, separation systemsynthesis, and reactor-based system synthesis.

    12

    d. Assignment constraints

    e. Nonnegativity and integrality conditions

    Note that in the above formulation the binary variables yjk, Zj appear linearly and separably in theobjective function and constraints. Note also that the continuous variables x^p appear linearlyin the constraints while we have a nonlinear contribution of such terms in the objective function.

  • Introductiona 13

    Figure 1.1: Superstructure for distillation column

  • This page intentionally left blank

  • Parti

    Fundamentals of Convex Analysis andNonlinear Optimization

  • This page intentionally left blank

  • Chapter 2 Convex Analysis

    This chapter discusses the elements of convex analysis which are very important in the study ofoptimization problems. In section 2.1 the fundamentals of convex sets are discussed. In section2.2 the subject of convex and concave functions is presented, while in section 2.3 generalizationsof convex and concave functions are outlined.

    2.1 Convex Sets

    This section introduces the fundamental concept of convex sets, describes their basic properties,and presents theoretical results on the separation and support of convex sets.

    2.1.1 Basic Definitions

    Definition 2.1.1 ( Line) Let the vectors Xi , jca

  • 18

    that is any point (x,i/) satisfying the above equation lies on the line passing through (1,1) and^2,3). From definition 2.1.1, we can express any point x as

    ForA = 0.5, we obtain (x,y) = (1.5,2), which lies on the line segment bet ween (1,1) and (2,3).For A = 2, we obtain (z, y) = (3,5), which lies on the line but not on the line segment between(1,1) and (2,3).

    Definition 2.1.3 (Half-space) Let the vector c 3n , c ^ 0, and the scalar z e 3ft. The openhalf-space in n is defined as the set:

    The closed half-space in 3ftn is defined as the set:

    Definition 2.1.4 (Hyperplane) The hyperplane in 3n is defined as the set:

    Illustration 2.1.2 The hyperplane in !>R2

    divides 3?2 into the half-spaces #1 and #2 as shown in Figure 2.1.

    Definition 2.1.5 (Polytope and polyhedron) The intersection of a finite number of closed half-spaces in !Rn is defined as a. polytope. A bounded polytope is called a. polyhedron.

    Definition 2.1.6 (Convex set) A set 5 3ftn is said to be convex if the closed line segment joiningany two points jca and jca of the set S , that is, (1 - A) x^ + A xa, belongs to the set 5 for each Asuch that 0 < A < 1.

    Illustration 2.1.3 (Examples of convex sets) The following are some examples of convex sets:

    (i) Line(ii) Open and closed half-space(iii) Polytope, polyhedron(iv) All points inside or on the circle

    or

  • Convex Analysis 19

    Figure 2.1: Half-spaces

    (v) All points inside or on a polygon

    Figure 2.2 illustrates convex and nonconvex sets.

    Lemma 2.1.1 (Properties of convex sets) Let 5i and 52 be convex sets in ?n. Then,

    (i) The intersection 5i n 52 is a convex set.

    (ii) The sum 5i + 52 of two convex sets is a convex set.(iii) The product 9 Si of the real number 9 and the set 5i is a convex set.

    Definition 2.1.7 (Extreme point (vertex)) Let S be a convex set in 3?n. The point x 5 forwhich there exist no two distinct x^ , :ca 5 different from x such that x [jca , jca], is called avertex or extreme point of S.

    Remark 1 A convex set may have no vertices (e.g., a line, an open ball), a finite number ofvertices (e.g., a polygon), or an infinite number of vertices (e.g., all points on a closed ball).

    Theorem 2.1.1 (Characterization of extreme points)Let the polyhedron S = {x\Ax = b,x>Q}, where A is an m x n matrix of rank m, andb is anm vector. A point x is an extreme point of S if and only if A can be decomposed into A = [B , N]such that:

    where B is an m x m invertible matrix satisfying J5""1 b > 0, N is an m x (n m) matrix, andXB , XN are the vectors corresponding to B , N.

  • 20

    Figure 2.2: Convex and nonconvex sets

    Remark 2 The number of extreme points of 5 is less than or equal to the maximum number ofpossible ways to select ra columns of A to form B, which is

    Thus, S has a finite number of extreme points.

    2.1.2 Convex Combination and Convex Hull

    Definition 2.1.8 (Convex combination) Let {x^, . . ., xr] be any finite set of points in !ftn. Aconvex combination of this set is a point of the form:

    Remark 1 A convex combination of two points is in the closed interval of these two points.

  • Convex Analysis 21

    Figure 2.3: Convex hull

    Definition 2.1.9 (Simplex) Let {jt0, . . ., jcr} be r + 1 distinct points in J?n, (r < n), and thevectors x^ - x0 , . . . , xr - x0 be linearly independent. An r-simplex in ?n is defined as the set ofall convex combinations of {x0, . . ., jcr}.Remark 2 A 0-simplex (i.e., r = 0) is a point, a 1 -simplex (i.e., r = 1) is a closed line segment,a 2-simplex (i.e., r = 2) is a triangle, and a 3-simplex (i.e., r = 3) is a tetrahedron.

    Definition 2.1.10 (Convex hull) Let 5 be a set (convex or nonconvex) in 3?n. The convex hull,H(S), of S is defined as the intersection of all convex sets in !Rn which contain 5 as a subset.

    Illustration 2.1.4 Figure 2.3 shows a nonconvex set 5 and its convex hull H(S). The dotted linesin H(S) represent the portion of the boundary of S which is not on the boundary of H(S).

    Theorem 2.1.2The convex hull, H(S), of S is defined as the set of all convex combinations ofS. Then x 6 H ( S ]if and only ifx can be represented as

    where r is a positive integer.

    Remark 3 Any point x in the convex hull of a set S in 3?n can be written as a convex combinationof at most n + 1 points in 5 as demonstrated by the following theorem.

    Theorem 2.1.3 (Caratheodory)Let S be a set (convex or nonconvex) in 3n. Ifx 6 H(S), then it can be expressed as

  • 22

    Figure 2.4: Separating hyperplanes and disjoint sets

    2.1.3 Separation of Convex Sets

    Definition 2.1.11 (Separating hyperplane) Let Si and 52 be nonempty sets in 3Rn. The hyper-plane

    is said to separate (strictly separate) Si and 52 if

    Illustration 2.1.5 Figure 2.4(a) illustrates two sets which are separable, but which are neitherdisjoint or convex. It should be noted that separability does not imply that the sets are disjoint.Also, two disjoint sets are not in general separable as shown in Figure 2.4(b).

    Theorem 2.1.4 (Separation of a convex set and a point)Let S be a nonempty closed convex set in ?n, and a vector y which does not belong to the set S.Then there exist a nonzero vector c and a scalar z such that

    and

  • Convex Analysis 23

    Figure 2.5: Illustration of Parkas' theorem

    Theorem 2.1.5 (Farkas)Let A be an m x n matrix and c be an n vector. Exactly one of the following two systems has asolution:

    System 1: Ax < 0 andcix > Qfor somex G 3Rn.System 2: Aiy = c andy > Ofor somey ^m.

    Illustration 2.1.6 Consider the cases shown in Figure 2.5(a,b). Let us denote the columns of A*as !, 0a, and a3. System 1 has a solution if the closed convex cone defined by Ax < 0 and theopen half-space defined by c*x > 0 have a nonempty intersection. System 2 has a solution if clies within the convex cone generated by aa, aa, and a3.

    Remark 1 Farkas ' theorem has been used extensively in the development of optimality conditionsfor linear and nonlinear optimization problems.

    Theorem 2.1.6 (Separation of two convex sets)Let Si and S^ be nonempty disjoint convex sets in 5Rn. Then, there exists a hyperplane

    which separates Si and 82; that is

    Theorem 2.1.7 (Gordan)Let A be an m x n matrix andy be an m vector. Exactly one of the following two systems has asolution:

  • 24

    Figure 2.6: Supporting hyperplanes

    System 1: Ax < 0 for some x W.System 2: A*y 0 andy > 0 for some y 3?m.

    Remark 2 Gordon's theorem has been frequently used in the derivation of optimality conditionsof nonlinearly constrained problems.

    2.1.4 Support of Convex Sets

    Definition 2.1.12 (Supporting hyperplane) Let S be a nonempty set in 3ftn, and z be in theboundary of 5. The supporting hyperplane of S at z is defined as the hyperplane:

    that passes through z and has the property that all of 5 is contained in one of the two closedhalf-spaces:

    or

    produced by the hyperplane.

    Illustration 2.1.7 Figure 2.6 provides a few examples of supporting hyperplanes for convex andnonconvex sets.

    2.2 Convex and Concave Functions

    This section presents (i) the definitions and properties of convex and concave functions, (ii) thedefinitions of continuity, semicontinuity and subgradients, (iii) the definitions and properties ofdifferentiable convex and concave functions, and (iv) the definitions and properties of local anglobal extremum points.

  • Convex Analysis 25

    2.2.1 Basic Definitions

    Definition 2.2.1 (Convex function) Let S be a convex subset of !Rn, and f(x) be a real valuedfunction defined on 5. The function f(jc) is said to be convex if for any *i, x2 G S, and 0 < A < 1,we have

    This inequality is called Jensen's inequality after the Danish mathematician who first introducedit.

    Definition 2.2.2 (Strictly convex function) Let 5 be a convex subset of ftn, and f(*) be a realvalued function defined on 5". The function f(x) is said to be strictly convex if for any xi, *2 S,and 0 < A < 1, we have

    Remark 1 A strictly convex function on a subset 5 of ?n is convex on S. The converse, however,is not true. For instance, a linear function is convex but not strictly convex.

    Definition 2.2.3 (Concave function) Let S be a convex subset of !Rn, and f(jt) be a real valuedfunction defined on S. The function f(je) is said to be concave if for any *i,#2 5,andO < A < 1,we have

    Remark 2 The function /(*) is concave on S if and only if f(x) is convex on S. Then,the results obtained for convex functions can be modified into results for concave functions bymultiplication by -1 and vice versa.

    Definition 2.2.4 (Strictly concave function) Let S be a convex subset of 3?n, and f(jt) be a realvalued function defined on S. The function f(*) is said to be strictly concave if for any x\^x^ 5,and 0 < A < 1, we have

    Illustration 2.2.1 Figure 2.7 provides an illustration of convex, concave, and nonconvex functionsin^R1 .

    2.2.2 Properties of Convex and Concave Functions

    Convex functions can be combined in a number of ways to produce new convex functions asillustrated by the following:

  • 26

    Figure 2.7: Convex, concave and nonconvex functions

    (i) Let /i(je), . . . , /(*) be convex functions on a convex subset 5 of n. Then, theirsummation

    is convex. Furthermore, if at least one /(#) is strictly convex on 5, then theirsummation is strictly convex.

    (ii) Let f(x) be convex (strictly convex) on a convex subset 5 of 3ftn, and A is a positivenumber. Then, A f(jc) is convex (strictly convex).

    (iii) Let f(je) be convex (strictly convex) on a convex subset S of n, and g(y) be anincreasing convex function defined on the range of f(:c) in . Then, the compositefunction g[f(*)] is convex (strictly convex) on S.

    (iv) Let /i(jc),..., fn(x) be convex functions and bounded from above on a convex subset5" of ;Rn. Then, the pointwise supremum function

    is a convex function on 5.

  • Convex Analysis 27

    Figure 2.8: Epigraph and hypograph of a function

    (v) Let /i (*),..., fn(x) be concave functions and bounded from below on a convex subsetS of !Rn. Then, the pointwise infimum function

    is a concave function on 5.

    Definition 2.2.5 (Epigraph of a function) Let S be a nonempty set in !Rn. The epigraph of afunction /(*), denoted by epi(f), is a subset of !Rn+1 defined as the set of (n -f 1) vectors (x,y):

    Definition 2.2.6 (Hypograph of a function) The hypograph of /(*)> denoted by hyp(f), is asubset of ?n+1 defined as the set of (n + 1) vectors (x,y):

    Illustration 2.2.2 Figure 2.8 shows the epigraph and hypograph of a convex and concave function.

    Theorem 2.2.1Let S be a nonempty set in 3n. The function f ( x ) is convex if and only ifepi(f] is a convex set.

    Remark 1 The epigraph of a convex function and the hypograph of a concave function are convexsets.

    2.2.3 Continuity and Semicontinuity

    Definition 2.2.7 (Continuous function) Let 5 be a subset of 3n, x S, and /(*) a real valuedfunction defined on 5. f ( x ) is continuous at jc if either of the following equivalent conditionshold:

  • 28

    Condition 1: For each ei > 0, there exists an e2 > 0:|| x - x || < 2 , x S implies that

    Condition 2: For each sequence Jt1,*2,.. jcn (x 5) converging to jc,

    f ( x ) is continuous on S if it is continuous at each x S.Definition 2.2.8 (Lower semicontinuous function) /(jc) is lower semicontinuous at jc if eitherof the following equivalent conditions hold:

    Condition 1: For each ei > 0, there exists an 62 > 0:|| x - x || < 2 , * S implies that

    Condition 2: For each sequence x1, x2,..., xn (x S) converging to jc,

    where lim inf f(xn) is the infimum of the limit points of the sequencen>oo

    f ( x ) is lower semicontinuous on S if it is lower semicontinuous at each jc S.Definition 2.2.9 (Upper semicontinuous function) f(x) is upper semicontinuous at x if eitherof the following equivalent conditions hold:

    Condition 1: For each ei > 0, there exists an 62 > 0:|| JC x || < 2 , x S implies that

    Condition 2: For each sequence x1, jc2 , . . . , xn (x 6 5) converging to jc,

    where lim sup /(jcn) is the supremum of the limit points of the sequence.TlKX>

    f ( x ) is upper semicontinuous on S if it is upper semicontinuous at each jc G S.Remark 1 f ( x ) is lower semicontinuous at jc e S if and only if -/(*) is upper semicontinuousatjc S.Remark 2 f ( x ) is continuous at jc E S if and only if it is both lower and upper semicontinuousatjc 65.

  • Convex Analysis 29

    Figure 2.9: Lower and upper semicontinuous functions

    Illustration 2.2.3 Consider the functions

    which are shown in Figure 2.9. /i(x) is lower semicontinuous at x = 2, while /2(x) is uppersemicontinuous at x = 2. Hence, /i(x) is lower semicontinuous and fz(x) is upper semicontinu-ous.

    Theorem 2.2.2Let S be a nonempty convex set in ?n and f(x) be a convex function. Then, f(x] is continuous onthe interior of S.Remark 3 Convex and concave functions may not be continuous everywhere but the points ofdiscontinuity have to be on the boundary of S.

    Theorem 2.2.3Let f i ( x ) bg a family of lower (upper) semicontinuous functions on S. Then

    (i) Its least upper bound (greatest lower bound)

    is lower (upper) semicontinuous on S.(ii) If the family fi(x) is finite, its greatest lower bound (least upper bound)

    is lower (upper) semicontinuous on S.

  • 30

    2.2.4 Directional Derivative and Subgradients

    Definition 2.2.10 (Directional derivative) Let S be a nonempty convex set in fjn, x 6 5, andy be a nonzero vector such that (jc + Ay) 5 for sufficiently small and strictly positive A. Thedirectional derivative of /(*) at the point*0, along the direction 3>, denoted as f'(x,y), is definedas the limit (00 included) of

    Definition 2.2.11 (Subgradient of convex function) Let S be a nonempty convex set in Rn and/(jc) be a convex function. The subgradient of /(*) at x G 5, denoted by the vector d, is definedas

    Remark 1 The right-hand side of the above inequality (2.1) is a linear function in x and representsthe first-order Taylor expansion of f ( x ) around jc using the vector d instead of the gradient vectorof /(*) at x. Hence, d is a subgradient of f(x] at jc if and only if the first-order Taylorapproximation always provides an underestimation of f ( x ) for all x.

    Illustration 2.2.4 Consider the convex function

    the set S = {x \ 2 < x < 2} and the point*0 = 1. Let us assume that d = 2. The right-handside of (2.1) is

    Note that (2.1) holds for d = 2, and hence d = 2 is a subgradient for /(jc) at x = 1 (see alsoFigure 2.10).

    Definition 2.2.12 (Subgradient of a concave function) Let S be a nonempty convex set in "and f ( x ) be a concave function. The subgradient of /(*) at * 6 S, denoted by the vector d, isdefined as:

    Definition 2.2.13 (Subdifferential) The set of all subgradients of a function f ( x ) at jc e S,denoted by d f ( x ) , is the subdifferential of f ( x ) at x.Theorem 2.2.4Let S be a nonempty convex set in n. If, for all points x int S there exists a subgradientvector d:

    then, f ( x ) is convex on int S.

  • Convex Analysis 31

    Figure 2.10: Subgradient of a function

    2.2.5 Differentiable Convex and Concave Functions

    Definition 2.2.14 (Differentiable function) Let 5 be a nonempty open set in n, f(x] be afunction defined on S, and*0

  • 32

    with

    where /3(*, A*) is a function of AJC, and V2/(*) is the Hessian of f(x) evaluated at Jt, that is,an n x n matrix whose ijih element is

    Remark 3 If V/(jc) is differentiate at x (i.e., it has continuous partial derivatives at jc), thenf ( x ) is twice differentiable at jc.

    Remark 4 If V2/(jc) is continuous at jc, then

    V2/(jc) is symmetricTheorem 2.2.5Let S be a nonempty open set in !Rn, and f(x) a differentiable function atx 5".

    (i) If f(x] is convex at x 6 S, then

    (ii) If f(x] is concave atx 5, then

    Theorem 2.2.6Let S be a nonempty open set in ?n, and f(x) a differentiable function on S.

    (i) f(x] is convex on S if and only if

    (ii) f(x) is concave on S if and only if

    Remark 5 The above two theorems can be directly extended to strictly convex and strictly concavefunctions by replacing the inequalities > and < with strict inequalities > and

  • Convex Analysis 33

    Figure 2.11: Differentiate functions and linearizations

    Theorem 2.2.7Let S be a nonempty open set in 3?n, and f(x] be a twice differentiable function atx 6 5.

    (i) If f ( x ) is convex at x, thenV2/(jc) is positive semidefinite

    (ii) Iff(x) is concave atx, then

    V2/(jc) is negative semidefiniteTheorem 2.2.8Let S be a nonempty open set in ?n, and f(x] be a twice differentiable function at x G S.

    (i) /(*) is convex on S if and only ifV2/(jc) is positive semidefinite on S for all x S.

    (ii) f(x] is concave on S if and only if

    V2/(jt) is negative semidefinite on S for allx (E S.

    Remark 6 (i) If f ( x ) is strictly convex at *, then V2/(*) is positive semidefinite (notnecessarily positive definite).

    (ii) If V2/(jc) is positive definite, then f ( x ) is strictly convex at x.(iii) If /(jc) is strictly concave at jc, then V2/(jc) is negative semidefinite (not necessarily

    negative definite).(iv) If V2/(jr) is negative definite, then /(*) is strictly concave at x.

  • 34

    Remark 7 Theorem 2.2.8 provides the conditions for checking the convexity or concavity of afunction /(*). These conditions correspond to positive semidefinite (P.S.D.) or negative semidef-inite (N.S.D.) Hessian of /(*) for all x S, respectively. One test of PSD or NSD Hessian off ( x ) is based on the sign of eigenvalues of the Hessian. If all eigenvalues are greater than orequal to zero for all x 6 5, then the Hessian is PSD and hence the function /(#) is convex. If alleigenvalues are less or equal than zero for all x S then the Hessian is NSD and therefore thefunction f(x) is concave.

    Illustration 2.2.6 Consider the function

    The Hessian of

    The eigenvalues of the Hessian of /(*) are calculated from

    which becomes

    After algebraic manipulation the determinant becomes

    which implies that

    Therefore, the function /(zi, x2, z3) is convex.

    2.2.6 Minimum (Infimum) and Maximum (Supremum)Definition 2.2.16 (Minimum) Let /(*) be a function defined on the set S.If there exists Jt* 5:

    then /(**) is called the minimum of f(x) on S, denoted by

    Definition 2.2.17 (Infimum) Let /(*) be a function defined on the set S.If there exists a number a:

  • Convex Analysis 35

    (i)x S implies f ( x ) > a, and

    (ii) for sufficiently small e > 0 there exists x G S:

    then a is the infimum of /(jc) on S, denoted by

    Definition 2.2.18 (Maximum) Let /(jc) be a function defined on the set S.If there exists x* 5":

    x 0 there exists x 5:

    then /3 is the supremum of /(*) on 5, denoted by

    Remark 1 If we admit the points 00, then every function f(x) has a supremum and infimum onthe set S.

    Remark 2 The minimum (maximum) of a function /(*), if it exists, must be finite, and is anattained infimum (supremum); that is,

    Remark 3 Not all functions have a minimum (maximum). For instance, ex has no maximum on5?, and e~x has no minimum on $. However, ex has a supremum of +00 on , and e~x has aninfimum of 0 on .

  • 36

    Theorem 2.2.9 (Existence of minimum (maximum))A function f ( x ) defined on a set S in ?n exhibits a minimum (maximum) x* 6 5 if

    (i) f(x] is lower (upper) semicontinuous on S, and(ii) S is closed and bounded.

    Illustration 2.2.7 Consider the function

    in the closed and bounded set -1 < x < 1.

    but no minimum exists since /(jc) is not lower semicontinuous.

    Illustration 2.2.8 Consider the function

    in the open and bounded set -1 < x < 1, x 5?.

    but no minimum exists because the set is not closed.

    Illustration 2.2.9 Consider the function

    but no maximum exists since the set is unbounded.

    2.2.7 Feasible Solution, Local and Global Minimum

    Consider the problem of minimizing f ( x ) subject to x S.

    Definition 2.2.20 (Feasible solution) A point x e S is a feasible solution to this problem.

    Definition 2.2.21 (Local minimum) Suppose that jc* 5 and that there exists an c > 0 such that

    then x* is a local minimum.

    Definition 2.2.22 (Global minimum) Suppose that x* 6 S and

    then x* is a global minimum.

  • Convex Analysis 37

    Definition 2.2.23 (Unique global minimum) Suppose that x*

  • 38

    Figure 2.12: Quasi-convex and quasi-concave functions

    Theorem 2.3.1Consider the function f(x) on a convex set S 3n, and

    and

    Then,

    (i) f(x) is quasi-convex on S if and only ifSa is convex for each a 9ft.(ii) f(x) is quasi-concave on S if and only if 8/3 is convex for each (3 3ft.

    Definition 2.3.3 (Strictly quasi-convex function) f ( x ) is strictly quasi-convex if

    for all A (0,1) and all*i,*2 S,/(*i) ^ /(*2).

    Definition 2.3.4 (Strictly quasi-concave function) /(*) is strictly quasi-concave if

    for all A (0,1) and all*i,x2 5, /(*i) f ( x 2 ) .Note that f(x) is strictly quasi-concave if -f(x) is strictly quasi-convex.

    Illustration 2.3.2 Figure 2.13 shows a strictly quasi-convex and strictly quasi-concave function.

    Theorem 2.3.2Let f(x] be a lower (upper) semicontinuous function on the convex set S in 3Jn. Iff(x) is strictlyquasi-convex (strictly quasi-concave) on S, then

    /(jc) is quasi-convex (quasi-concave) on S, but the converse is not true.

    and

  • Convex Analysis 39

    Figure 2.13: Strictly quasi-convex and quasi-concave functions

    Theorem 2.3.3Let /(*) be a function on the convex set S in 3?n, and letx* 6 S be a local minimum (maximum)of f ( x ) . If f(x] is strictly quasi-convex (strictly quasi-concave) on S, then

    /(**) is a global minimum (maximum) of f ( x ) on S.

    2.3.2 Properties of Quasi-convex and Quasi-concave Functions

    Quasi-convex and quasi-concave functions satisfy the following properties:

    (i) Let f ( x ) be a quasi-convex (quasi-concave) function on a subset S of !Rn and g(y)be a nondecreasing function defined on the range of /(#) in -ft. Then the compositefunction g(f(x)) is quasi-convex (quasi-concave) on 5.

    (ii) If f ( x ) is either a positive or negative quasi-concave function on a subset S of 3?n,then yro is quasi-convex on 5.

    (iii) If /(*) is either a positive or negative quasi-convex function on a subset S of !ftn, thenis quasi-concave on S.

    Remark 1 Note that the summation of quasi-convex functions is not necessarily a quasi-convexfunction as in convex functions. Also, note that the summation of convex and quasi-convexfunctions is not necessarily a convex or quasi-convex function.

    Remark 2 Convex and concave functions do not satisfy properties (ii) and (iii) of the quasi-convex and quasi-concave functions. For instance, it is true that the reciprocal of a positiveconcave function is convex, but the reverse does not hold. As an example consider the functionf(x] ex which is convex and whose reciprocal is also convex.

  • 40

    2.3.3 Differentiate Quasi-convex, Quasi-concave FunctionsTheorem 2.3.4Let f(x] be differentiable on a nonempty open convex set S in 3?n. Then, f ( x ) is quasi-convex ifand only if for every Jti, *2 S

    Similarly, f(x) is quasi-concave if and only if for every x\, 2 6 S

    For twice differentiable quasi-concave functions f ( x ) on the open, nonempty convex set 5 in Kn,a direction z orthogonal to V/ exhibits the following interesting properties:

    (ii) The Hessian of f(x] has at most one positive eigenvalue at everyRemark 1 From property (ii) we observe that the generalization of concavity to quasi-concavityis equivalent to allowing the existence of at most one positive eigenvalue of the Hessian.

    2.3.4 Pseudo-convex and Pseudo-concave Functions

    Let S be a nonempty open set in 3ftn and let f ( x ) be a differentiable function on 5".Definition 2.3.5 (Pseudo-convex function) /(*) is pseudo-convex if for every Jti,Jt2 S,

    implies that

    Definition 2.3.6 (Pseudo-concave function) f ( x ) is pseudo-concave if for every Xi, x%

    implies thatNote that f ( x ) is pseudo-concave if /(*) is pseudo-convex.

    2.3.5 Properties of Pseudo-convex and Pseudo-concave Functions

    Pseudo-convex and pseudo-concave functions exhibit the following properties:

    (i) Let /(jc) be a pseudo-convex (pseudo-concave) function on a subset S of ftn andg(y) be a differentiate function defined on the range of /(*) in ? and which satisfies^'(j) > 0. Then the composite function g[/(*)] is pseudo-convex (pseudo-concave)on 5.

    (ii) If /(or) is a positive or negative pseudo-concave function on a subset S of n, thenis pseudo-convex on S.

    implies that

    implies that

    and then

  • Convex Analysis 41

    2.3.6 Relationships among Convex, Quasi-convex and Pseudo-convex Functions

    The relationships among convex, quasi-convex and pseudo-convex functions are summarized inthe following:

    (i) A convex differentiable function is pseudo-convex,(ii) A convex function is strictly quasi-convex,(iii) A convex function is quasi-convex,(iv) A pseudo-convex function is strictly quasi-convex, and(v) A strictly quasi-convex function which is lower semicontinuous is quasi-convex.

    Summary and Further Reading

    In this chapter, the basic elements of convex analysis are introduced. Section 2.1 presents thedefinitions and properties of convex sets, the definitions of convex combination and convex hullalong with the important theorem of Caratheodory, and key results on the separation and supportof convex sets. Further reading on the subject of convex sets is in the excellent books of Avriel(1976), Bazaraa et al. (1993), Mangasarian (1969), and Rockefellar (1970).

    Section 2.2 discusses the definitions and properties of convex and concave functions, the defi-nitions of continuity, lower and upper semicontinuity of functions, the definitions of subgradientsof convex and concave functions, the definitions and properties of differentiable convex and con-cave functions, the conditions of convexity and concavity along with their associated tests, and thedefinitions of extremum points. For further reading, refer to Avriel (1976), Mangasarian (1969),and Rockefellar (1970).

    Section 2.3 focuses on the generalizations of convex and concave functions and treats thequasi-convex, quasi-concave, pseudo-convex and pseudo-concave functions, and their properties.Further reading in this subject is the excellent book of Avriel et al. (1988).

  • 42

    Problems on Convex Analysis

    1. Show that the interior of a convex set is convex.

    2. Show that an open and closed ball around a point x 6 3?" is a convex set.

    3. Show that the function

    is strictly convex.

    4. Show that the function

    where a, represents fixed vectors in 3n and Cj are positive real numbers, is convex.5. Show that the function

    is strictly convex for zi, x% strictly positive.

    6. Determine whether the function

    is convex, concave, or neither.

    7. Prove property (iii), (iv), and (v) of convex and concave functions.

    8. Show that the function

    9. Show that the function

    with fixed values of the parameters Wi > 0, oti, fa is convex.10. Determine whether the function

    with ai,/?i, a2,/?2 fixed values of parameters is convex, concave, or neither.

  • Convex Analysis 43

    11. Determine whether the function

    is quasi-concave.

    12. Show that the function

    with

    13. Show that the function

    with xi > 0, x2 > 0 is quasi-convex. Is it also convex? Why?

    14. Determine whether the function

    with xi > 0, x2 > 0 is quasi-convex, quasi-concave, convex or neither of the above.

    15. Show that the function

    with xi > 0, x2 > 0 is quasi-concave.

    16. Consider the quadratic function

    where Q is an n x n matrix and c G ?n.

    show that f(x) is strictly pseudo-convex on

    show that f ( x ) is pseudo-concave on !R+.

    17. Consider the function

    with xi > 0, x2 > 0 . Show that /(xi, x2) is strictly quasi-concave.18. Let /(*) be a differentiable function on an open convex set S of ?n. Prove that it is

    concave if and only if

    for every two points xi

    is both pseudo-convex and pseudo-concave.

  • 44

    19. Let /i(jc), /2(*) be functions defined on a convex set S n and h(x) ^ 0 on 5.Show:

    (i) If fi(x) is convex, fi(x) < 0, /2(*) is concave, and /2(*) > 0,then /i(jc) /2(*) is quasi-convex on S.

    (ii) If /i(*) is convex, fi(x] < 0, /2(*) is convex, and /2(jc) > 0,then fi(x) fi(x) is quasi-concave on S.

    20. What additional conditions are needed in problem 19 so as to have pseudo-convexityin (i) and pseudo-concavity in (ii)?

    21. Consider the function

    with x > 0, and y > 0. Find the conditions on a and 6 for which the function f ( x , y )is convex (concave).

  • Chapter 3 Fundamentals of NonlinearOptimization

    This chapter discusses the fundamentals of nonlinear optimization. Section 3.1 focuses on opti-mality conditions for unconstrained nonlinear optimization. Section 3.2 presents the first-orderand second-order optimality conditions for constrained nonlinear optimization problems.

    3.1 Unconstrained Nonlinear Optimization

    This section presents the formulation and basic definitions of unconstrained nonlinear optimizationalong with the necessary, sufficient, and necessary and sufficient optimality conditions.

    3.1.1 Formulation and Definitions

    An unconstrained nonlinear optimization problem deals with the search for a minimum of anonlinear function /(jc) of n real variables x = (x\, X2, > Zn)> and is denoted as

    Each of the n nonlinear variables x\, 12 > > zn are allowed to take any value from oo to -foe.Unconstrained nonlinear optimization problems arise in several science and engineering ap-

    plications ranging from simultaneous solution of nonlinear equations (e.g., chemical phase equi-librium) to parameter estimation and identification problems (e.g., nonlinear least squares).Definition 3.1.1 (Local Minimum) x* e n is called a local optimum of (3.1) if there exists aball of radius e around or*, Bt(x*) :

    Definition 3.1.2 (Global Minimum) Jt* e 3?n is called a local optimum of (3.1) if

    45

  • 46

    Figure 3.1: Local minimum, global minimum and saddle points

    A global minimum is unique if the strict form of the inequality holds.

    Definition 3.1.3 (Saddle Point) Let the vector x be partitioned into two subvectors xa and xt>.is called a saddle point of f ( x a , X b ) if there exists a ball of radius c around

    Illustration 3.1.1 Figure 3.1 shows a local minimum, unique global minimum, nonunique globalminimum, and a saddle point.

    3.1.2 Necessary Optimality Conditions

    The necessary optimality conditions are the conditions which must hold at any minimum for aproblem.

  • Fundamentals of Nonlinear Optimization 47

    Theorem 3.1.1 (First-order necessary conditions)Let f ( x ) be a differentiable function in ?n at x*. Ifx* is a local minimum, then

    Note: A point x* satisfying (3.2) is called a stationary point.Theorem 3.1.2 (Second-order necessary conditions)Let f ( x ) be a twice differentiable function in !Rn at x*. Ifx* is a local minimum, then

    (ii) The Hessian matrix H(x*), given by

    is positive semidefinite; that is,

    Illustration 3.1.2 Consider the unconstrained quadratic problem

    The first order necessary conditions are

    The second order necessary conditions are

    Q must be positive semidefmite.

    3.1.3 Sufficient Optimality Conditions

    The sufficient optimality conditions are the conditions which, if satisfied at a point, guarantee thatthe point is a minimum.

    Theorem 3.1.3Let f ( x ) be twice differentiable in ftn at x*. If

    (ii) H(x*) is positive semidefinite,then x* is a local minimum.

    Remark 1 If condition (ii) becomes H(x*} is positive definite, then x* is a strict local minimum.

    and

  • 48

    Illustration 3.1.3

    The stationary points are

    The Hessian is

    which is positive definite, and hence (x*, x*) is a strict local minimum. However, at

    which is not positive semidefinite.

    3.1.4 Necessary and Sufficient Optimality Conditions

    Theorem 3.1.4Let f ( x ) be pseudoconvex in 3n at x*. Then, x* is a global minimum if and only if

    Illustration 3.1.4

    The stationary points are

    The Hessian is

    which is positive definite (the eigenvalues Aj = 4, A2 = 12 are positive everywhere and hence atAs a result, /(z*, z*) *s convex and hence pseudoconvex. Thus, the stationary point

    (1,0.25) is a global minimum.

    the Hessian is

    the hessian is

  • Fundamentals of Nonlinear Optimization 49

    Remark 1 Necessary and sufficient optimality conditions can be also expressed in terms of higherorder derivatives assuming that the function /(#) has such higher order derivatives. For instance,a necessary and sufficient condition for Jt* G ftn being a local minimum of a univariate functionf(x) which has (k + l}ih derivative can be stated as

    is a local minimum of /(*) if and only if either f^k\x*} = 0 for allk = 1,2, . . . or else there exists an even k > 1 such that f(k+l\x*} > 0 while

    3.2 Constrained Nonlinear Optimization

    This section presents first the formulation and basic definitions of constrained nonlinear opti-mization problems and introduces the Lagrange function and the Lagrange multipliers along withtheir interpretation. Subsequently, the Fritz John first-order necessary optimality conditions arediscussed as well as the need for first-order constraint qualifications. Finally, the necessary, suf-ficient Karush-Kuhn-Tucker conditions are introduced along with the saddle point necessary andsufficient optimality conditions.

    3.2.1 Formulation and Definitions

    A constrained nonlinear programming problem deals with the search for a minimum of a functionf(x) of n real variables x (z1} x2,..., xn) X C !Rn subject to a set of equality constraintsh(x] 0 (h{(x] = 0, i = 1 , 2 , . . . , m), and a set of inequality constraints g(x]Q,j = 1 ,2, . . .,p), and is denoted as

    If any of the functions f(x],h(x},g(x] is nonlinear, then the formulation (3.3) is called a con-aastrained nonlinear programming problem. The functions f(x),h(x},g(x) can take any form ofnonlinearity, and we will assume that they satisfy continuity and differentiability requirements.

    Constrained nonlinear programming problems abound in a very large number of science andengineering areas such as chemical process design, synthesis and control; facility location; networkdesign; electronic circuit design; and thermodynamics of atomic/molecular clusters.

    Definition 3.2.1 (Feasible Point(s)) A point x e X satisfying the equality and inequality con-straints is called a feasible point. Thus, the set of all feasible points of f(x) is defined as

    FOR ALL WHERE ARE THE KTH ANDORDER DERIVATIVES OF

  • 50

    Definition 3.2.2 (Active, inactive constraints) An inequality constraint g,(x) is called active ata feasible point x X if g}(x) = 0. An inequality constraint g,(x) is called inactive at a feasiblepoint

    Remark 1 The constraints that are active at a feasible point x restrict the feasibility domain whilethe inactive constraints do not impose any restrictions on the feasibility in the neighborhood ofdefined as a ball of radius e around

    Definition 3.2.3 (Local minimum) x* G F is a local minimum of (3.3) if there exists a ball ofradius e around x*, Be(x*):

    Definition 3.2.4 (Global minimum) x* 6 F is a global minimum of (3.3) if

    Definition 3.2.5 (Feasible direction vector) Let a feasible point x F. Then, any point x in aball of radius e around x which can be written as x + d is a nonzero vector if and only if xvector d ^ 0 is called a feasible direction vector from Jc if there exists a ball of radius e:

    The set of feasible direction vectors d ^ 0 from x is called the cone of feasible directions of F at

    Illustration 3.2.1 Figure 3.2 shows the feasible region, a point x, and feasible direction vectorsfrom x.

    Remark 2 If x is a local minimum of (3.3) and d is a feasible direction vector from x, then forsufficiently small A, we must have

    Lemma 3.2.1 Let d be a non-zero feasible direction vector from x. Then, x must satisfy theconditions:

    Definition 3.2.6 (Improving feasible direction vector) A feasible direction vector d ^ 0 at x iscalled an improving feasible direction vector at x if

    The set of improving feasible direction vectors d ^ 0 from x is called the cone for improvingfeasible directions of F at Jc.

    Remark 3 I f d ^ O , and dTVf(x) < 0 then d is an improving feasible direction vector at x.

    x.

    for each

    for each

    for active

    for all

  • Fundamentals of Nonlinear Optimization 51

    Figure 3.2: Feasible region and feasible direction vectors

    3.2.2 Lagrange Functions and Multipliers

    A key idea in developing necessary and sufficient optimality conditions for nonlinear constrainedoptimization problems is to transform them into unconstrained problems and apply the optimalityconditions discussed in Section 3.1 for the determination of the stationary points of the uncon-strained function . One such transformation involves the introduction of an auxiliary function,called the Lagrange function L(x, A, //), defined as

    where AT = (Ai, A 2 , . . . , Am) and fiT (/^i, ^2> > Up) are the Lagrange multipliers associatedwith the equality and inequality constraints, respectively. The multipliers A associated with theequalities h(jc) = 0 are unrestricted in sign, while the multipliers fi associated with the inequalitiesg(x] < 0 must be nonnegative.

    The transformed unconstrained problem then becomes to find the stationary points of theLagrange function

  • 52

    Remark 1 The implications of transforming the constrained problem (3.3) into finding the sta-tionary points of the Lagrange function are two-fold: (i) the number of variables has increasedfrom n (i.e. the x variables) to n + ra + p (i.e. the *, A and \i variables); and (ii) we need toestablish the relation between problem (3.3) and the minimization of the Lagrange function withrespect to x for fixed values of the lagrange multipliers. This will be discussed in the dualitytheory chapter. Note also that we need to identify which of the stationary points of the Lagrangefunction correspond to the minimum of (3.3).

    Illustration 3.2.2 Consider the following two-variable constrained nonlinear programming prob-lem in the form (3.3):

    The Lagrange function is

    and has four variables x\, x2,

    Illustration 3.2.3 Consider the following quadratic programming problem:

    where A, B, are (m x n), (p x n) matrices and Q is the Hessian matrix.The Lagrange function is

    3.2.3 Interpretation of Lagrange Multipliers

    The Lagrange multipliers in a constrained nonlinear optimization problem have a similar inter-pretation to the dual variables or shadow prices in linear programming. To provide such aninterpretation, we will consider problem (3.3) with only equality constraints; that is,

  • Fundamentals of Nonlinear Optimization 53

    Let x be a global minimum of (3.6) at which the gradients of the equality constraints are linearlyindependent (i.e., x is a regular point). Perturbing the right-hand sides of the equality constraints,we have

    where b = (61,62, , bm) is the perturbation vector.If the perturbation vector changes, then the optimal solution of (3.8) and its multipliers will

    change, since in general jc = x(b) and A = X(b}. Then, the Lagrange function takes the form

    Let us assume that the stationary point of L corresponds to the global minimum. Then,

    Taking the gradient of the Lagrange function with respect to the perturbation vector b andrearranging the terms we obtain

    Note that the terms within the first and second parentheses correspond to the gradients of theLagrange function with respect to x and A, respectively, and hence they are equal to zero due tothe necessary conditions VXL = V\L 0. Then we have

    Since x is a global minimum, then we have

    Therefore, the Lagrange multipliers A provide information on the sensitivity of the objectivefunction with respect to the perturbation vector b at the optimum point x.

    Illustration 3.2.4 Consider the following convex quadratic problem subject to a linear equalityconstraint:

    where

  • 54

    We consider a perturbation of the right-hand sides of the equality constraint

    The Lagrange function takes the form

    The gradients of the Lagrange function with respect to xi, 12, and A are

    Then, the stationary point of the Lagrange function is

    The sensitivity of the objective function with respect to the perturbation vector 6 is

    which implies that (i) for 6 > 0, an increase in b would result in an increase of the objectivefunction; (ii) for b < 0 an increase in 6 would represent a decrease of the objective function; and(iii) for 6 = 0, we have A = 0 and hence the solution of the constrained problem is identical to theunconstrained one (xi, x-z) (5,5).

    3.2.4 Existence of Lagrange Multipliers

    The existence of Lagrange multipliers depends on the form of the constraints and is not alwaysguaranteed. To illustrate instances in which Lagrange multipliers may not have finite values (i.e.,no existence), we will study problem (3.6), and we will assume that we have identified a candidateoptimum point, x', which satisfies the equality constraints; that is,

    The Lagrange function is

  • Fundamentals of Nonlinear Optimization 55

    Figure 3.3: Feasible region and objective of illustration 3.2.5

    and the stationary point of the Lagrange functions is obtained from

    Then, at the candidate point x', we have

    To calculate A, we need to have the matrix of full rank (i.e., m) since we have to take its

    Illustration 3.2.5 Consider the minimization of a squared distance subject to one equality con-straint:

    and let x' = (0,2) be the candidate optimum point (see Figure 3.3). The Lagrange function is

    Then the rank of

    inverse. Hence, if is of full rank (i.e., m) then, the Lagrange multipliers have finite values.

  • 56

    Thus, the Lagrange multiplier A cannot take a finite value (i.e., it does not exist).We can also illustrate it by taking the gradients of the Lagrange function with respect to xl , x2,

    and A.

    At (EI, z2) = (0,2) we can see that we cannot find a finite A that satisfies

    3.2.5 Weak Lagrange Functions

    In the definition of the Lagrange function L(x, A, /x) (see section 3.2.2) we associated Lagrangemultipliers with the equality and inequality constraints only. If, however, a Lagrange multiplier/x0 is associated with the objective function as well, the definition of the weak Lagrange functionL'(x, A, /z) results; that is,

    3.2.6 First-Order Necessary Optimality Conditions

    In this section we present, under the assumption of differentiability, the first-order necessaryoptimality conditions for the constrained nonlinear programming problem (3.3) as well as thecorresponding geometric necessary optimality conditions.

    3.2.6.1 Fritz John Conditions

    Let x e X be a feasible solution of (3.3), that is, h(x) = Otg(x) < 0. Let also f ( x ) and g(x)be differentiate at x and h(x) have continuous first partial derivatives at x. Then, if x is a localsolution of (3.3), there exist Lagrange multipliers /z0, A and (j.:

    where V/(jf) is an (n x 1) vector, Vh(x) is a (m x n) matrix,Vg(x) is a (p x n) matrix,fj,Q is a scalar, A is a (m x 1) vector, and /x is a (p x 1) vector.

    The constraints {^g^x) = 0, j 1,2,. . . ,p} are called complementarity constraints.

  • Fundamentals of Nonlinear Optimization 57

    Remark 1 The corresponding geometric necessary optimality condition is that the set of feasibledirections defined by

    is empty (i.e., Z = 0) assuming that V/i,(jc), i = 1,2, . . . , m are linearly independent.Illustration 3.2.6 (Verification of Fritz John necessary conditions) Verify the Fritz John con-ditions at x = (0,1) for

    Notice that at the point x = (0,1) the inequality constraint is binding.

    Then, the Fritz John conditions are satisfied at = 1 for instance, since

    Remark 2 In the Fritz John first-order necessary optimality conditions, the multiplier /40 associ-ated with the objective function can become zero at the considered point x without violating theoptimality conditions. In such a case, the Lagrange function becomes independent of /(jt) and theconditions are satisfied for any differentiate objective function /(*) whether it exhibits a localoptimum at x or not. This weakness of the Fritz John conditions is illustrated in the followingexample.

    Illustration 3.2.7 Consider the problem

    x satisfies the h(x) = 0, g(x) 0;

  • Figure 3.4: Example of a degenerate feasible region

    which has only one feasible point (2,0) and its feasible region is shown in Figure 3.4 (degeneratefeasible region). Note that at (2,0) both gi(x) and #2(*) are active:

    Note that , Vg^x) are linearly dependent.

    For /ix = fj,2 = 1 and /z0 = 0, for instance, the Fritz John conditions are satisfied at (2,0). In thiscase, however, the objective function disappears from consideration.

    To remove this weakness of the Fritz John necessary conditions, we need to determine the requiredrestrictions under which n0 is strictly positive (/i0 > 0). These restrictions are called first-orderconstraint qualifications and will be discussed in the following section.

    3.2.6.2 First-Order Constraint Qualifications

    As we have seen from the previous illustration, when /z0 equals zero, the Fritz John first ordernecessary optimality conditions do not utilize the gradient of the objective function. As a result,

    58

  • Fundamentals of Nonlinear Optimization 59

    the gradient conditions represent simply a linear combination of the active inequalities and equalityconstraints that is equal to zero. In such cases, the Fritz-John conditions are not useful in identifyinga local optimum of the function /(*). A number of additional conditions are needed to guaranteethat /Lt0 > 0. These are the first-order constraint qualifications, and a selected number is presentedin the following.

    Let x be a local optimum of (3.3), X be an open set, and J be defined as the set J i = 1,2, . . . , ra be continuously differentiate at x.

    Linear Independence Constraint QualificationThe gradients Vj(x) for j 6 J and V/i,(Jc) for i = 1,2, . . . , m are linearly independent.

    Slater Constraint QualificationThe constraints g^x") for j 6 J are pseudo-convex at x. The constraints ht(x) for i = 1,2, . . . , mare quasi-convex and quasi-concave. The gradients V/i,(jc) for i = 1,2,. . .,ra are linearlyindependent, and there exists an Jc X such that g}(x) < 0 for j J and /i,(Jc) = 0 fori = 1,2, . . . ,m .

    Kuhn-Tucker Constraint QualificationThere exists a nonzero vector z 3?" for which

    implies that there exists an n-dimensional vector function w(r) on the interval [0,1]:

    (iii) iu is once-differentiable at r 0, and

    The Weak Reverse Convex Constraint QualificationThe constraints h(x) and g(x] are continuously differentiable at x. Each^*), j Jispseu-

    doconcave at x or linear. Each ht(x), i 1 ,2 , . . . , m is both pseudoconvex and pseudoconcaveat x.

    Remark 1 Note that in the Kuhn-Tucker constraint qualification w(r) is a once-differentiable arcwhich starts at x. Then, the Kuhn-Tucker constraint qualification holds if z is tangent to W(T) inthe constrained region.

    Remark 2 The linear independence constraint qualification as well as the Slater's imply theKuhn-Tucker constraint qualification.

    Let also

    for some

  • 60

    3.2.6.3 Karush-Kuhn-Tucker Necessary Conditions

    Let x 6 X be a feasible solution of (3.3). Let also f ( x ) and g(x] be differentiate at x and h(x]have continuous first partial derivatives at x. If x is a local optimum of (3.3) and one of thefollowing constraint qualifications:

    (i) Linear independence,(ii) Slater,

    (iii) Kuhn-Tucker, or(iv) Weak reverse convex

    is satisfied, then there exist Lagrange multipliers A, /^:

    Geometric Interpretation of Karush-Kuhn-Tucker Necessary ConditionsFrom the gradient KKT conditions we have that

    with \L} > Q,j = 1,2,... ,p. The complementarity conditions enforce the Lagrange multipliersof the inactive constraints to take zero values. As a result,vector that belongs to the cone of the gradients of the active constraints (i.e. equalities and activeinequalities). Then the geometrical interpretation of the gradient KKT conditions is thatbelongs to the cone of the gradients of the active constraints at the feasible solution x. Iflies outside the cone of the gradients of the active constraints, then x is not a KKT point.

    Illustration 3.2.8 Consider the following problem:

    and verify the Karush-Kuhn-Tucker conditions for x (2,1).The Lagrange function is

    represents a

  • Fundamentals of Nonlinear Optimization 61

    The point x = (2,1) is a feasible point. The gradient conditions of the Lagrange function become

    g(x] 0 (active) and hence ^g(x) = 0 is satisfied. Also, the linear independence constraintqualification is satisfied. Thus, the KKT conditions are satisfied.

    Illustration 3.2.9 Consider the example that demonstrated the weakness of the Fritz-John condi-tions:

    at the point x (2,0) which is feasible and at which both #i(*), g2(x) are active. The gradientsof the objective function and constraints are :

    The gradient KKT conditions are

    which are not satisfied. Note that the KKT conditions cannot be satisfied because of the linearindependence of the active constraints.

    3.2.7 First-Order Sufficient Optimality Conditions

    In this section, we discuss the first-order sufficient optimality conditions for the constrainednonlinear programming problem (3.3).

    3.2.7.1 Karush-Kuhn-Tucker Sufficient Conditions

    Let x G X be a feasible solution of (3.3), and let x be a KKT point (i.e., it satisfies the gradientconditions, complementarity, nonnegativity of /i's, and constraint qualification). Let also /+

    f(x] is pseudo-convex at x with all other feasible points x,

  • are quasi-convex at x with all other feasible points x,

    are quasi-concave at x with all other feasible points x, and

    are quasi-convex at x with all other feasible points x,

    then x is a global optimum of (3.3). If the above convexity conditions on f(x),h(x),g(x) arerestricted within a ball of radius c around x, then x is a local optimum of (3.3).

    Illustration 3.2.10 Consider the problem

    and verify the global optimality of the KKT point x = (1,1). f(x] is convex, continuous,differentiable and hence pseudo-convex. gi(x] is quasi-convex, while g2(x) and g3(x) are linearand hence quasi-convex. The linear independence constraint qualification is met since we haveonly one active constraint (i.e.,

  • Fundamentals of Nonlinear Optimization 63

    3.2.8.1 Saddle Point Necessary Optimality Conditions

    If

    (i) x X is a local optimum solution of (3.3),(ii) X is a convex set,

    (iii) f(x),g(x) are convex functions, and h(x] are affine, and(v) g(x) < 0,/i(jc) = 0 has a solution,

    then there exist A, /Z with ~p, > 0 satisfying /!#(*) = 0 and

    that is, (Jc, A, fL] is a saddle point.

    Remark 1 If x is a KKT point and the additional condition (iii) holds, then (jc, A, /Z) is a saddlepoint. Thus, under certain convexity assumptions of /, g, and affinity of h, the Lagrange multipliersin the KKT conditions are also the multipliers in the saddle point criterion.

    3.2.8.2 Saddle Point Sufficient Optimality Conditions

    If (jc, A, /I) is a Karush-Kuhn-Tucker saddle point, that is, there exist

    for every x G X and all A, /z with p, > 0, then x is a solution of problem (3.3).

    Remark 1 Note that the saddle point sufficiency conditions do not require either additionalconvexity assumptions or a constraint qualification like condition. Note also that the saddle pointsufficiency conditions do not require any differentiability on the Lagrange function. If in addition,the functions /(*), h(x], g(x] are differentiable, and hence the Lagrange function is differentiable,and (jc, A, /Z) is a Karush-Kuhn-Tucker Saddle point, then it is a Karush-Kuhn-Tucker point [i.e.,it is a solution of (3.3) and it satisfies the constraint qualification].

    Remark 2 A KKT saddle point of the Lagrange function implies that the conditions

    hold, without the need for a constraint qualification. However, an optimal solution of (3.3) x doesnot necessarily imply the existence of (A, /z) unless a constraint qualification is imposed.

  • 64

    3.2.9 Second-Order Necessary Optimality Conditions

    In this section, we discuss the need for second-order optimality conditions, and present thesecond-order constraint qualification along with the second-order necessary optimality conditionsfor problem (3.3).

    3.2.9.1 Motivation

    The first-order optimality conditions utilize information only on the gradients of the objectivefunction and constraints. As a result, the curvature of the functions, measured by the secondderivatives, is not taken into account. To illustrate the case in which the first-order necessaryoptimality conditions do not pr