A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex...

14
A Tutorial on Convex Optimization Haitham Hindi Palo Alto Research Center (PARC), Palo Alto, California email: [email protected] Abstract— In recent years, convex optimization has be- come a computational tool of central importance in engi- neering, thanks to it’s ability to solve very large, practical engineering problems reliably and efciently. The goal of this tutorial is to give an overview of the basic concepts of convex sets, functions and convex optimization problems, so that the reader can more readily recognize and formulate engineering problems using modern convex optimization. This tutorial coincides with the publication of the new book on convex optimization, by Boyd and Vandenberghe [7], who have made available a large amount of free course material and links to freely available code. These can be downloaded and used immediately by the audience both for self-study and to solve real problems. I. I NTRODUCTION Convex optimization can be described as a fusion of three disciplines: optimization [22], [20], [1], [3], [4], convex analysis [19], [24], [27], [16], [13], and numerical computation [26], [12], [10], [17]. It has recently become a tool of central importance in engi- neering, enabling the solution of very large, practical engineering problems reliably and efciently. In some sense, convex optimization is providing new indispens- able computational tools today, which naturally extend our ability to solve problems such as least squares and linear programming to a much larger and richer class of problems. Our ability to solve these new types of problems comes from recent breakthroughs in algorithms for solv- ing convex optimization problems [18], [23], [29], [30], coupled with the dramatic improvements in computing power, both of which have happened only in the past decade or so. Today, new applications of convex op- timization are constantly being reported from almost every area of engineering, including: control, signal processing, networks, circuit design, communication, in- formation theory, computer science, operations research, economics, statistics, structural design. See [7], [2], [5], [6], [9], [11], [15], [8], [21], [14], [28] and the references therein. The objectives of this tutorial are: 1) to show that there is are straight forward, sys- tematic rules and facts, which when mastered, allow one to quickly deduce the convexity (and hence tractability) of many problems, often by inspection; 2) to review and introduce some canonical opti- mization problems, which can be used to model problems and for which reliable optimization code can be readily obtained; 3) emphasize the modeling and formulation aspect; we do not discuss the aspects of writing custom codes. We assume that the reader has a working knowledge of linear algebra and vector calculus, and some (minimal) exposure to optimization. Our presentation is quite informal. Rather than pro- vide details for all the facts and claims presented, our goal is instead to give the reader a avor for what is possible with convex optimization. Complete details can be found in [7], from which all the material presented here is taken. Thus we encourage the reader to skip sections that might not seem clear and continue reading; the topics are not all interdependent. Motivation A vast number of design problems in engineering can be posed as constrained optimization problems, of the form: minimize f 0 (x) subject to f i (x) 0, i =1,...,m h i (x)=0, i =1,...,p. (1) where x is a vector of decision variables, and the functions f 0 , f i and h i , respectively, are the cost, in- equality constraints, and equality constraints. However, such problems can be very hard to solve in general, especially when the number of decision variables in x is large. There are several reasons for this difculty: First, the problem “terrain” may be riddled with local optima. Second, it might be very hard to nd a fea- sible point (i.e., an x which satisfy all equalities and inequalities), in fact, the feasible set which needn’t even be fully connected, could be empty. Third, stopping criteria used in general optimization algorithms are often arbitrary. Forth, optimization algorithms might have very poor convergence rates. Fifth, numerical problems could

Transcript of A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex...

Page 1: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

A Tutorial on Convex OptimizationHaitham Hindi

Palo Alto Research Center (PARC), Palo Alto, Californiaemail: [email protected]

Abstract—In recent years, convex optimization has be-come a computational tool of central importance in engi-neering, thanks to it’s ability to solve very large, practicalengineering problems reliably and efficiently. The goal ofthis tutorial is to give an overview of the basic concepts ofconvex sets, functions and convex optimization problems, sothat the reader can more readily recognize and formulateengineering problems using modern convex optimization.This tutorial coincides with the publication of the new bookon convex optimization, by Boyd and Vandenberghe [7],who have made available a large amount of free coursematerial and links to freely available code. These can bedownloaded and used immediately by the audience bothfor self-study and to solve real problems.

I. INTRODUCTION

Convex optimization can be described as a fusionof three disciplines: optimization [22], [20], [1], [3],[4], convex analysis [19], [24], [27], [16], [13], andnumerical computation [26], [12], [10], [17]. It hasrecently become a tool of central importance in engi-neering, enabling the solution of very large, practicalengineering problems reliably and efficiently. In somesense, convex optimization is providing new indispens-able computational tools today, which naturally extendour ability to solve problems such as least squares andlinear programming to a much larger and richer class ofproblems.Our ability to solve these new types of problems

comes from recent breakthroughs in algorithms for solv-ing convex optimization problems [18], [23], [29], [30],coupled with the dramatic improvements in computingpower, both of which have happened only in the pastdecade or so. Today, new applications of convex op-timization are constantly being reported from almostevery area of engineering, including: control, signalprocessing, networks, circuit design, communication, in-formation theory, computer science, operations research,economics, statistics, structural design. See [7], [2], [5],[6], [9], [11], [15], [8], [21], [14], [28] and the referencestherein.The objectives of this tutorial are:

1) to show that there is are straight forward, sys-tematic rules and facts, which when mastered,allow one to quickly deduce the convexity (and

hence tractability) of many problems, often byinspection;

2) to review and introduce some canonical opti-mization problems, which can be used to modelproblems and for which reliable optimization codecan be readily obtained;

3) emphasize the modeling and formulation aspect;we do not discuss the aspects of writing customcodes.

We assume that the reader has a working knowledge oflinear algebra and vector calculus, and some (minimal)exposure to optimization.Our presentation is quite informal. Rather than pro-

vide details for all the facts and claims presented, ourgoal is instead to give the reader a flavor for what ispossible with convex optimization. Complete details canbe found in [7], from which all the material presentedhere is taken. Thus we encourage the reader to skipsections that might not seem clear and continue reading;the topics are not all interdependent.

Motivation

A vast number of design problems in engineering canbe posed as constrained optimization problems, of theform:

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . , m

hi(x) = 0, i = 1, . . . , p.(1)

where x is a vector of decision variables, and thefunctions f0, fi and hi, respectively, are the cost, in-equality constraints, and equality constraints. However,such problems can be very hard to solve in general,especially when the number of decision variables in xis large. There are several reasons for this difficulty:First, the problem “terrain” may be riddled with localoptima. Second, it might be very hard to find a fea-sible point (i.e., an x which satisfy all equalities andinequalities), in fact, the feasible set which needn’t evenbe fully connected, could be empty. Third, stoppingcriteria used in general optimization algorithms are oftenarbitrary. Forth, optimization algorithms might have verypoor convergence rates. Fifth, numerical problems could

Page 2: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

cause the minimization algorithm to stop all together orwander.It has been known for a long time [19], [3], [16], [13]

that if the fi are all convex, and the hi are affine, thenthe first three problems disappear: any local optimumis, in fact, a global optimum; feasibility of convex op-timization problems can be determined unambiguously,at least in principle; and very precise stopping criteriaare available using duality. However, convergence rateand numerical sensitivity issues still remain a potentialproblem.It was not until the late ’80’s and ’90’s that researchers

in the former Soviet Union and United States discoveredthat if, in addition to convexity, the f i satisfied a propertyknown as self-concordance, then issues of convergenceand numerical sensitivity could be avoided using interiorpoint methods [18], [23], [29], [30], [25]. The self-concordance property is satisfied by a very large set ofimportant functions used in engineering. Hence, it is nowpossible to solve a large class of convex optimizationproblems in engineering with great efficiency.

II. CONVEX SETS

In this section we list some important convex setsand operations. It is important to note that some ofthese sets have different representations. Picking theright representation can make the difference between atractable problem and an intractable one.We will be concerned only with optimization prob-

lems whose decision variables are vectors in Rn ormatrices in Rm×n. Throughout the paper, we will makefrequent use of informal sketches to help the readerdevelop an intuition for the geometry of convex opti-mization.A function f : Rn → Rm is affine if it has the form

linear plus constant f(x) = Ax + b. If F is a matrixvalued function, i.e., F : Rn → Rp×q, then F is affineif it has the form

F (x) = A0 + x1A1 + · · · + xnAn

where Ai ∈ Rp×q . Affine functions are sometimesloosely refered to as linear.Recall that S ⊆ Rn is a subspace if it contains the

plane through any two of its points and the origin, i.e.,

x, y ∈ S, λ, µ ∈ R =⇒ λx + µy ∈ S.

Two common representations of a subspace are as therange of a matrix

range(A) = {Aw | w ∈ Rq}= {w1a1 + · · · + wqaq | wi ∈ R}

where A =[

a1 · · · aq

]; or as the nullspace of a

matrix

nullspace(B) = {x | Bx = 0}= {x | bT

1 x = 0, . . . , bTp x = 0}

where B =[

b1 · · · bp

]T. As an example, let

Sn = {X ∈ Rn×n | X = XT} denote the setof symmetric matrices. Then Sn is a subspace, sincesymmetric matrices are closed under addition. Anotherway to see this is to note that Sn can be written as{X ∈ Rn×n | Xij = Xji, ∀i, j} which is the nullspaceof linear function X − XT .A set S ⊆ Rn is affine if it contains line through any

two points in it, i.e.,

x, y ∈ S, λ, µ ∈ R, λ + µ = 1 =⇒ λx + µy ∈ S.

�x

�y

�λ = 0.6

�λ = 1.5

�λ = −0.5

Geometrically, an affine set is simply a subspace whichis not necessarily centered at the origin. Two commonrepresentations for of an affine set are: the range of affinefunction

S = {Az + b | z ∈ Rq},or as the solution of a set of linear equalities:

S = {x | bT1 x = d1, . . . , b

Tp x = dp}

= {x | Bx = d}.A set S ⊆ Rn is a convex set if it contains the line

segment joining any of its points, i.e.,

x, y ∈ S, λ, µ ≥ 0, λ + µ = 1 =⇒ λx + µy ∈ S

xy

convex not convex

Geometrically, we can think of convex sets as alwaysbulging outward, with no dents or kinks in them. Clearlysubspaces and affine sets are convex, since their defini-tions subsume convexity.A set S ⊆ Rn is a convex cone if it contains all

rays passing through its points which emanate from the

2

Page 3: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

origin, as well as all line segments joining any pointson those rays, i.e.,

x, y ∈ S, λ, µ ≥ 0, =⇒ λx + µy ∈ S

Geometrically, x, y ∈ S means that S contains the entire‘pie slice’ between x, and y.

x

y0

The nonnegative orthant, Rn+ is a convex cone. The set

Sn+ = {X ∈ Sn | X � 0} of symmetric positivesemidefinite (PSD) matrices is also a convex cone, sinceany positive combination of semidefinite matrices issemidefinite. Hence we call Sn

+ the positive semidefinitecone.A convex cone K ⊆ Rn is said to be proper if it is

closed, has nonempty interior, and is pointed, i.e., thereis no line in K . A proper cone K defines a generalizedinequality K in Rn:

x K y ⇐⇒ y − x ∈ K

(strict version x ≺K y ⇐⇒ y − x ∈ intK).

K

x

y �K x

This formalizes our use of the � symbol:• K = Rn

+: x K y means xi ≤ yi

(componentwise vector inequality)• K = Sn

+: X K Y means Y − X is PSDThese are so common we drop the K .Given points xi ∈ Rn and θi ∈ R, then y = θ1x1 +

· · · + θkxk is said to be a• linear combination for any real θi

• affine combination if∑

i θi = 1• convex combination if

∑i θi = 1, θi ≥ 0

• conic combination if θi ≥ 0The linear (resp. affine, convex, conic) hull of a setS is the set of all linear (resp. affine, convex, conic)combinations of points from S, and is denoted byspan(S) (resp. Aff(S), Co(S), Cone(S)). It can beshown that this is the smallest such set containing S.As an example, consider the set S =

{(1, 0, 0), (0, 1, 0), (0, 0, 1)}. Then span(S) is R3;Aff(S) is the hyperplane passing through the three

points; Co(S) is the unit simplex which is the trianglejoining the vectors along with all the points inside it;Cone(S) is the nonnegative orthant R3

+.Recall that a hyperplane, represented as {x | aT x =

b} (a �= 0), is in general an affine set, and is a subspaceif b = 0. Another useful representation of a hyperplaneis given by {x | aT (x − x0) = 0}, where a is normalvector; x0 lies on hyperplane. Hyperplanes are convex,since they contain all lines (and hence segments) joiningany of their points.A halfspace, described as {x | aT x ≤ b} (a �= 0) is

generally convex and is a convex cone if b = 0. Anotheruseful representation is {x | aT (x − x0) ≤ 0}, wherea is (outward) normal vector and x0 lies on boundary.Halfspaces are convex.

a

aT x ≥ baT x ≤ b

�aT x = b

�x0

We now come to a very important fact about prop-erties which are preserved under intersection: Let A bean arbitrary index set (possibly uncountably infinite) and{Sα | α ∈ A} a collection of sets , then we have thefollowing:

Sα is

subspace

affineconvexconvex cone

=⇒

⋂α∈A

Sα is

subspace

affineconvexconvex cone

In fact, every closed convex set S is the (usually infinite)intersection of halfspaces which contain it, i.e.,S =

⋂ {H | H halfspace, S ⊆ H}. For example,another way to see that Sn

+ is a convex cone is to recallthat a matrix X ∈ Sn is positive semidefinite if zT Xz ≥0, ∀z ∈ Rn. Thus we can write

Sn+ =

⋂z∈Rn

X ∈ Sn

∣∣∣∣∣∣ zT Xz =n∑

i,j=1

zizjXij ≥ 0

.

Now observe that the summation above is actually linearin the components ofX , so Sn

+ is the infinite intersectionof halfspaces containing the origin (which are convexcones) in Sn.We continue with our listing of useful convex sets.A polyhedron is intersection of a finite number of

halfspaces

P = {x | aTi x ≤ bi, i = 1, . . . , k}

= {x | Ax b}

3

Page 4: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

where above means componentwise inequality.

a1 a2

a3

a4

a5

P

A bounded polyhedron is called a polytope, which alsohas the alternative representation P = Co{v1, . . . , vN},where {v1, . . . , vN} are its vertices. For example, thenonnegative orthant Rn

+ = {x ∈ Rn | x � 0}is a polyhedron, while the probability simplex {x ∈Rn | x � 0,

∑i xi = 1} is a polytope.

If f is a norm, then the norm ball B = {x | f(x −xc) ≤ 1} is convex, and the norm cone C ={(x, t) | f(x) ≤ t} is a convex cone. Perhaps the mostfamiliar norms are the �p norms on Rn:

‖x‖p ={

(∑

i |xi|p)1/p ; p ≥ 1,maxi |xi| ; p = ∞

The corresponding norm balls (in R2) look like this:

(1, −1)

(1, 1)(−1, 1)

(−1, −1)

�p = 1 ��p = 1.5 ���

p = 2 ���

p = 3 p = ∞

Another workhorse of convex optimization is theellipsoid. In addition to their computational convenience,ellipsoids can be used as crude models or approximatesfor other convex sets. In fact, it can be shown that anybounded convex set in Rn can be approximated to withina factor of n by an ellipsoid.There are many representations for ellipsoids. For

example, we can use

E = {x | (x − xc)T A−1(x − xc) ≤ 1}

where the parameters are A = AT � 0, the center xc ∈Rn.

�xc

√λ1

√λ2

In this case, the semiaxis lengths are√

λi, where λi

eigenvalues of A; the semiaxis directions are eigenvec-tors of A and the volume is proportional to (detA)1/2.However, depending on the problem, other descriptionsmight be more appropriate, such as (‖u‖ =

√uT u)

• image of unit ball under affine transformation: E ={Bu + xc | ‖u‖ ≤ 1}; vol. ∝ detB

• preimage of unit ball under affine transformation:E = {x | ‖Ax − b‖ ≤ 1}; vol. ∝ detA−1

• sublevel set of nondegenerate quadratic: E ={x | f(x) ≤ 0} where f(x) = xT Cx + 2dT x + ewith C = CT � 0, e − dT C−1d < 0; vol. ∝(detA)−1/2

It is an instructive exercise to convert among represen-tations.The second-order cone is the norm cone associated

with the Euclidean norm.

S = {(x, t) |√

xT x ≤ t}

=

{(x, t)

∣∣∣∣∣[

xt

]T [I 00 −1

] [xt

]≤ 0, t ≥ 0

}

−1−0.5

00.5

1

−1

−0.5

0

0.5

10

0.2

0.4

0.6

0.8

1

The image (or preimage) under an affinte transforma-tion of a convex set are convex, i.e., if S, T convex,then so are the sets

f−1(S) = {x | Ax + b ∈ S}f(T ) = {Ax + b | x ∈ T }

An example is coordinate projection {x | (x, y) ∈S for some y}. As another example, a constraint of theform

‖Ax + b‖2 ≤ cT x + d,

where A ∈ Rk×n, a second-order cone constraint, sinceit is the same as requiring the affine function (Ax +b, cT x + d) to lie in the second-order cone in Rk+1.Similarly, if A0, A1, . . . , Am ∈ Sn, solution set of thelinear matrix inequality (LMI)

F (x) = A0 + x1A1 + · · · + xmAm � 0

4

Page 5: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

is convex (preimage of the semidefinite cone under anaffine function).A linear-fractional (or projective) function f : Rm →

Rn has the form

f(x) =Ax + b

cT x + d

and domain dom f = H = {x | cT x + d > 0}. If Cis a convex set, then its linear-fractional transformationf(C) is also convex. This is because linear fractionaltransformations preserve line segments: for x, y ∈ H,

f([x, y]) = [f(x), f(y)]

x1 x2

x3x4

f(x1) f(x2)

f(x3)f(x4)

Two further properties are helpful in visualizing thegeometry of convex sets. The first is the separatinghyperplane theorem, which states that if S, T ⊆ Rn

are convex and disjoint (S ∩ T = ∅), then there exists ahyperplane {x | aT x − b = 0} which separates them.

ST

a

The second property is due to the supporting hyperplanetheorem which states that there exists a supportinghyperplan at every point on the boundary of a convexset, where a supporting hyperplane {x | aT x = aT x0}supports S at x0 ∈ ∂S if

x ∈ S ⇒ aT x ≤ aT x0

S

x0

a

III. CONVEX FUNCTIONS

In this section, we introduce the reader to someimportant convex functions and techniques for verifyingconvexity. The objective is to sharpen the reader’s abilityto recognize convexity.

A. Convex functions

A function f : Rn → R is convex if its domain dom fis convex and for all x, y ∈ dom f , θ ∈ [0, 1]

f(θx + (1 − θ)y) ≤ θf(x) + (1 − θ)f(y);

f is concave if −f is convex.

xxx

convex concave neither

Here are some simple examples on R) are: x2 is convex(dom f = R); log x is concave (dom f = R++); andf(x) = 1/x is convex (dom f = R++).It is convenient to define the extension of a convex

function f

f(x) ={

f(x) x ∈ dom f+∞ x �∈ dom f

Note that f still satisfies the basic definition on for allx, y ∈ Rn, 0 ≤ θ ≤ 1 (as an inequality in R ∪ {+∞}).We will use same symbol for f and its extension, i.e., wewill implicitly assume convex functions are extended.The epigraph of a function f is

epi f = {(x, t) | x ∈ dom f, f(x) ≤ t }

x

f(x)

epi f

The (α-)sublevel set of a function f is

Sα∆= {x ∈ dom f | f(x) ≤ α}

Form the basic definition of convexity, it follows thatif f is a convex function if and only if its epigraph epi fis a convex set. It also follows that if f is convex, thenits sublevel sets Sα are convex (the converse is false -see quasiconvex functions later).The convexity of a differentiable function f : Rn → R

can also be characterized by conditions on its gradient∇f and Hessian ∇2f . Recall that, in general, thegradient yields a first order Taylor approximation at x0:

f(x) � f(x0) + ∇f(x0)T (x − x0)

We have the following first-order condition: f is convexif and only if for all x, x0 ∈ dom f ,

f(x) ≥ f(x0) + ∇f(x0)T (x − x0),

5

Page 6: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

i.e., the first order approximation of f is a globalunderestimator.

xx0

f(x)

f(x0) + ∇f(x0)T (x − x0)

To see why this is so, we can rewrite the condition abovein terms of the epigraph of f as: for all (x, t) ∈ epi f ,[ ∇f(x0)

−1

]T [x − x0

t − f(x0)

]≤ 0,

i.e., (∇f(x0),−1) defines supporting hyperplane toepi f at (x0, f(x0))

f(x)

(∇f(x0),−1)

Recall that the Hessian of f , ∇2f , yields a secondorder Taylor series expansion around x0:

f(x) � f(x0)+∇f(x0)T (x−x0)+12(x−x0)T∇2f(x0)(x−x0)

We have the following necessary and sufficient secondorder condition: a twice differentiable function f isconvex if and only if for all x ∈ dom f , ∇2f(x) � 0,i.e., its Hessian is positive semidefinite on its domain.The convexity of the following functions is easy to

verify, using the first and second order characterizationsof convexity:

• xα is convex on R++ for α ≥ 1;• x log x is convex on R+;• log-sum-exp function f(x) = log

∑i exi (tricky!)

• affine functions f(x) = aT x+b where a ∈ Rn, b ∈R are convex and concave since ∇2f ≡ 0.

• quadratic functions f(x) = xT Px+2qT x+r (P =PT ) whose Hessian ∇2f(x) = 2P are convex⇐⇒P � 0; concave ⇐⇒ P 0

Further elementary properties are extremely helpfulin verifying convexity. We list several:

• f : Rn → R is convex iff it is convex on all lines:f(t) ∆= f(x0 + th) is convex in t ∈ R, for allx0, h ∈ Rn

• nonnegative sums of convex functions are convex:α1, α2 ≥ 0 and f1, f2 convex=⇒ α1f1 + α2f2 convex

• nonnegative infinite sums, integrals:p(y) ≥ 0, g(x, y) convex in x=⇒ ∫

p(y)g(x, y)dy convex

• pointwise supremum (maximum):fα convex =⇒ supα∈A fα convex (correspondsto intersection of epigraphs)

f1(x)

f2(x)

x

epimax{f1, f2}

• affine transformation of domain:f convex =⇒ f(Ax + b) convex

The following are important examples of how thesefurther properties can be applied:

• piecewise-linear functions: f(x) = maxi{aTi x+bi}

is convex in x (epi f is polyhedron)• maximum distance to any set, sups∈S ‖x − s‖, isconvex in x [(x− s) is affine in x, ‖ · ‖ is a norm,so fs(x) = ‖x − s‖ is convex].

• expected value: f(x, u) convex in x=⇒ g(x) = Eu f(x, u) convex

• f(x) = x[1] + x[2] + x[3] is convex on Rn

(x[i] is the ith largest xj ). To see this, note thatf(x) is the sum of the largest triple of componentsof x. So f can be written as f(x) = maxi cT

i x,where ci are the set of all vectors with componentszero except for three 1’s.

• f(x) =∑m

i=1 log(bi − aTi x)−1 is convex

(dom f = {x | aTi x < bi, i = 1, . . . , m})

Another convexity preserving operation is that ofminimizing over some variables. Specifically, if h(x, y)is convex in x and y, then

f(x) = infy

h(x, y)

is convex in x. This is because the operation abovecorresponds to projection of the epigraph, (x, y, t) →(x, t).

x

y

h(x, y)

f(x)

An important example of this is the minimum distancefunction to a set S ⊆ Rn, defined as:dist(x, S) = infy∈S ‖x − y‖

= infy ‖x − y‖ + δ(y|S)where δ(y|S) is the indicator function of the set S,which is infinite everywhere except on S, where it iszero. Note that in contrast to the maximum distancefunction defined earlier, dist(x, S) is not convex in

6

Page 7: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

general. However, if S is convex, then dist(x, S) isconvex since ‖x − y‖ + δ(y|S) is convex in (x, y).The technique of composition can also be used to

deduce the convexity of differentiable functions, bymeans of the chain rule. For example, one can showresults like: f(x) = log

∑i exp gi(x) is convex if each

gi is; see [7].Jensen’s inequality, a classical result which essentially

a restatement of convexity, is used extensively in variousforms. If f : Rn → R is convex

• two points: θ1 + θ2 = 1, θi ≥ 0 =⇒f(θ1x1 + θ2x2) ≤ θ1f(x1) + θ2f(x2)

• more than two points:∑

i θi = 1, θi ≥ 0 =⇒f(

∑i θixi) ≤

∑i θif(xi)

• continuous version: p(x) ≥ 0,∫

p(x) dx = 1 =⇒f(

∫xp(x) dx) ≤ ∫

f(x)p(x) dxor more generally, f(Ex) ≤ E f(x).

The simple techniques above can also be used toverify the convexity of functions of matrices, which arisein many important problems.

• TrAT X =∑

i,j AijXij is linear in X on Rn×n

• log detX−1 is convex on {X ∈ Sn | X � 0}Proof: Verify convexity along lines. Let λ i be theeigenvalues of X

−1/20 HX

−1/20

f(t)∆= log det(X0 + tH)−1

= log detX−10 + log det(I + tX

−1/20 HX

−1/20 )−1

= log detX−10 − ∑

ilog(1 + tλi)

which is a convex function of t.• (detX)1/n is concave on {X ∈ Sn | X � 0}• λmax(X) is convex on Sn since, for X symmetric,

λmax(X) = sup‖y‖2=1 yT Xy, which is a supre-mum of linear functions of X .

• ‖X‖2 = σ1(X) = (λmax(XT X))1/2 is con-vex on Rm×n since, by definition, ‖X‖2 =sup‖u‖2=1 ‖Xu‖2.

The Schur complement technique is an indispensibletool for modeling and transforming nonlinear constraintsinto convex LMI constraints. Suppose a symmetric ma-trix X can be decomposed as

X =[

A BBT C

]

with A invertible. The matrix S∆= C − BT A−1B is

called the Schur complement of A in X , and has thefollowing important attributes, which are not too hardto prove:

• X � 0 if and only if A � 0 and S∆= C −

BT A−1B � 0• if A � 0, then X � 0 if and only if S � 0

For example, the following second-order cone con-straint on x

‖Ax + b‖ ≤ eT x + d

is equivalent to LMI[(eT x + d)I Ax + b(Ax + b)T eT x + d

]� 0.

This shows that all sets that can be modeled by an SOCPconstraint can be modeled as LMI’s. Hence LMI’s aremore “expressive”. As another nontrivial example, thequadratic-over-linear function

f(x, y) = x2/y, dom f = R× R++

is convex because its epigraph {(x, y, t) | y > 0, x2/y ≤t} is convex: by Schur complememts, it’s equivalent tothe solution set of the LMI constraints[

y xx t

]� 0, y > 0.

The concept of K-convexity generalizes convexity tovector- or matrix-valued functions. As mentioned earlier,a pointed convex cone K ⊆ Rm induces generalizedinequality K . A function f : Rn → Rm is K-convexif for all θ ∈ [0, 1]

f(θx + (1 − θ)y) K θf(x) + (1 − θ)f(y)

In many problems, especially in statistics, log-concavefunctions are frequently encountered. A function f :Rn → R+ is log-concave (log-convex) if log f isconcave (convex). Hence one can transform problems inlog convex functions into convex optimization problems.As an example, the normal density, f(x) = const ·e−(1/2)(x−x0)

T Σ−1(x−x0) is log-concave.

B. Quasiconvex functions

The next best thing to a convex function in optimiza-tion is a quasiconvex function, since it can be minimizedin a few iterations of convex optimization problems.A function f : Rn → R is quasiconvex if every

sublevel set Sα = {x ∈ dom f | f(x) ≤ α} isconvex. Note that if f is convex, then it is automaticallyquasiconvex. Quasiconvex functions can have ‘locallyflat’ regions.

x

y

α

7

Page 8: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

We say f is quasiconcave if −f is quasiconvex, i.e.,superlevel sets {x | f(x) ≥ α} are convex. A functionwhich is both quasiconvex and quasiconcave is calledquasilinear.We list some examples:

• f(x) =√|x| is quasiconvex on R

• f(x) = log x is quasilinear on R+

• linear fractional function,

f(x) =aT x + b

cT x + d

is quasilinear on the halfspace cT x + d > 0• f(x) = ‖x−a‖2

‖x−b‖2is quasiconvex on the halfspace

{x | ‖x − a‖2 ≤ ‖x − b‖2}Quasiconvex functions have a lot of similar features to

convex functions, but also some important differences.The following quasiconvex properties are helpful inverifying quasiconvexity:

• f is quasiconvex if and only if it is quasiconvex onlines, i.e., f(x0 + th) quasiconvex in t for all x0, h

• modified Jensen’s inequality: f is quasiconvex ifffor all x, y ∈ dom f , θ ∈ [0, 1],

f(θx + (1 − θ)y) ≤ max{f(x), f(y)}f(x)

x y

• for f differentiable, f quasiconvex ⇐⇒ for allx, y ∈ dom f

f(y) ≤ f(x) ⇒ (y − x)T∇f(x) ≤ 0

Sα1

Sα2

Sα3

x∇f(x)

α1 < α2 < α3

• positive multiplesf quasiconvex, α ≥ 0 =⇒ αf quasiconvex

• pointwise supremum f1, f2 quasiconvex =⇒max{f1, f2} quasiconvex(extends to supremum over arbitrary set)

• affine transformation of domainf quasiconvex =⇒ f(Ax + b) quasiconvex

• linear-fractional transformation of domainf quasiconvex =⇒ f

(Ax+bcT x+d

)quasiconvex

on cT x + d > 0• composition with monotone increasing function:

f quasiconvex, g monotone increasing=⇒ g(f(x)) quasiconvex

• sums of quasiconvex functions are not quasiconvexin general

• f quasiconvex in x, y =⇒ g(x) = inf y f(x, y)quasiconvex in x (projection of epigraph remainsquasiconvex)

IV. CONVEX OPTIMIZATION PROBLEMS

In this section, we will discuss the formulation ofoptimization problems at a general level. In the nextsection we will focus on useful optimization problemswith specific structure.Consider the following optimization in standard form

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . , m

hi(x) = 0, i = 1, . . . , p

where fi, hi : Rn → R; x is the optimization variable;f0 is the objective or cost function; fi(x) ≤ 0 arethe inequality constraints; hi(x) = 0 are the equalityconstraints. Geometrically, this problem corresponds tothe minimization of f0, over a set described by as theintersection of 0-sublevel sets of fi, i = 1, . . . , m withsurfaces described by the 0-solution sets of h i.A point x is feasible if it satisfies the constraints;

the feasible set C is the set of all feasible points; andthe problem is feasible if there are feasible points. Theproblem is said to be unconstrained if m = p = 0 Theoptimal value is denoted by f � = infx∈C f0(x), and weadopt the convention that f � = +∞ if the problem isinfeasible). A point x ∈ C is an optimal point if f(x) =f� and the optimal set is Xopt = {x ∈ C | f(x) = f�}.As an example consider the problem

minimize x1 + x2

subject to −x1 ≤ 0−x2 ≤ 01 −√

x1x2 ≤ 0

0 1 2 3 4 50

1

2

3

4

5

x1

x2

CCC

The objective function is f0(x) = [1 1]T x; the feasibleset C is half-hyperboloid; the optimal value is f � = 2;and the only optimal point is x� = (1, 1).In the standard problem above, the explicit constraints

are given by fi(x) ≤ 0, hi(x) = 0. However, there arealso the implicit constraints: x ∈ dom fi, x ∈ domhi,

8

Page 9: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

i.e., x must lie in the set

D = dom f0∩· · ·∩dom fm∩dom h1∩· · ·∩dom hp

which is called the domain of the problem. For example,

minimize − log x1 − log x2

subject to x1 + x2 − 1 ≤ 0

has the implicit constraint x ∈ D = {x ∈ R2 | x1 >0, x2 > 0}.A feasibility problem is a special case of the standard

problem, where we are interested merely in finding anyfeasible point. Thus, problem is really to

• either find x ∈ C• or determine that C = ∅ .

Equivalently, the feasibility problem requires that weeither solve the inequality / equality system

fi(x) ≤ 0, i = 1, . . . , mhi(x) = 0, i = 1, . . . , p

or determine that it is inconsistent.An optimization problem in standard form is a convex

optimization problem if f0, f1, . . . , fm are all convex,and hi are all affine:

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . , m

aTi x − bi = 0, i = 1, . . . , p.

This is often written as

minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . , m

Ax = b

where A ∈ Rp×n and b ∈ Rp. As mentioned in theintroduction, convex optimization problems have threecrucial properties that makes them fundamentally moretractable than generic nonconvex optimization problems:1) no local minima: any local optimum is necessarilya global optimum;

2) exact infeasibility detection: using duality theory(which is not cover here), hence algorithms areeasy to initialize;

3) efficient numerical solution methods that can han-dle very large problems.

Note that often seemingly ‘slight’ modifications ofconvex problem can be very hard. Examples include:

• convex maximization, concave minimization, e.g.

maximize ‖x‖subject to Ax b

• nonlinear equality constraints, e.g.

minimize cT xsubject to xT Pix + qT

i x + ri = 0, i = 1, . . . , K

• minimizing over non-convex sets, e.g., Booleanvariables

find xsuch that Ax b,

xi ∈ {0, 1}To understand global optimality in convex problems,

recall that x ∈ C is locally optimal if it satisfies

y ∈ C, ‖y − x‖ ≤ R =⇒ f0(y) ≥ f0(x)

for some R > 0. A point x ∈ C is globally optimalmeans that

y ∈ C =⇒ f0(y) ≥ f0(x).

For convex optimization problems, any local solution isalso global. [Proof sketch: Suppose x is locally optimal,but that there is a y ∈ C, with f0(y) < f0(x). Thenwe may take small step from x towards y, i.e., z =λy +(1−λ)x with λ > 0 small. Then z is near x, withf0(z) < f0(x) which contradicts local optimality.]There is also a first order condition that characterizes

optimality in convex optimization problems. Suppose f 0

is differentiable, then x ∈ C is optimal iff

y ∈ C =⇒ ∇f0(x)T (y − x) ≥ 0

So −∇f0(x) defines supporting hyperplane for C at x.This means that if we move from x towards any otherfeasible y, f0 does not decrease.

x−∇f0(x)

C

contour lines of f0

Many standard convex optimization algorithms as-sume a linear objective function. So it is important torealize that any convex optimization problems can beconverted to one with a linear objective function. Thisis called putting the problem into epigraph form. Itinvolves rewriting the problem as:

minimize tsubject to f0(x) − t ≤ 0,

fi(x) ≤ 0, i = 1, . . . , mhi(x) = 0, i = 1, . . . , p

where the variables are now (x, t) and the objectivehas essentially be moved into the inequality constraints.Observe that the new objective is linear: t = eT

n+1(x, t)

9

Page 10: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

(en+1 is the vector of all zeros except for a 1 inits (n + 1)th component). Minimizing t will result inminimizing f0 with respect to x, so any optimal x� ofthe new problem with be optimal for the epigraph formand vice versa.

−en+1 C

f0(x)

So the linear objective is ‘universal’ for convex opti-mization.The above trick of introducing the extra variable t

is known as the method of slack variables. It is veryhelpful in transforming problems into canonical form.We will see another example in the next section.A convex optimization problem in standard form with

generalized inequalities:

minimize f0(x)subject to fi(x) Ki 0, i = 1, . . . , L

Ax = b

where f0 : Rn → R are all convex; the Ki aregeneralized inequalities on Rmi ; and fi : Rn → Rmi

are Ki-convex.A quasiconvex optimization problem is exactly the

same as a convex optimization problem, except for onedifference: the objective function f0 is quasiconvex.We will see an important example of quasiconvex

optimization in the next section: linear-fractional pro-gramming.

V. CANONICAL OPTIMIZATION PROBLEMS

In this section, we present several canonical optimiza-tion problem formulations, which have been found to beextremely useful in practice, and for which extremelyefficient solution codes are available (often for free!).Thus if a real problem can be cast into one of theseforms, then it can be considered as essentially solved.

A. Conic programming

We will present three canonical problems in thissection. They are called conic problems because theinequalities are specified in terms of affine functions andgeneralized inequalities. Geometrically, the inequalitiesare feasible if the range of the affine mappings intersectsthe cone of the inequality.The problems are of increasing expressivity and mod-

eling power. However, roughly speaking, each added

level of modeling capability is at the cost of longer com-putation time. Thus one should use the least complexform for the problem at hand.A general linear program (LP) has the form

minimize cT x + dsubject to Gx h

Ax = b,

where G ∈ Rm×n and A ∈ Rp×n.A problem that subsumes both linear and quadratic

programming is the second-order cone program(SOCP):

minimize fT xsubject to ‖Aix + bi‖2 ≤ cT

i x + di, i = 1, . . . , mFx = g,

where x ∈ Rn is the optimization variable, Ai ∈ Rni×n,and F ∈ Rp×n. When ci = 0, i = 1, . . . , m, the SOCPis equivalent to a quadratically constrained quadraticprogram (QCQP), (which is obtained by squaring eachof the constraints). Similarly, if Ai = 0, i = 1, . . . , m,then the SOCP reduces to a (general) LP. Second-ordercone programs are, however, more general than QCQPs(and of course, LPs).A problem which subsumes linear, quadratic and

second-order cone programming is called a semidefiniteprogram (SDP), and has the form

minimize cT xsubject to x1F1 + · · · + xnFn + G 0

Ax = b,

where G, F1, . . . , Fn ∈ Sk, and A ∈ Rp×n. Theinequality here is a linear matrix inequality. As shownearlier, since SOCP constraints can be written as LMI’s,SDP’s subsume SOCP’s, and hence LP’s as well. (Ifthere are multiple LMI’s, they can be stacked into onelarge block diagonal LMI, where the blocks are theindividual LMIs.)We will now consider some examples which are

themselves very instructive demonstrations of modelingand the power of convex optimization.First, we show how slack variables can be used to

convert a given problem into canonical form. Considerthe constrained �∞- (Chebychev) approximation

minimize ‖Ax − b‖∞subject to Fx g.

By introducing the extra variable t, we can rewrite theproblem as:

minimize tsubject to Ax − b t1

Ax − b � −t1Fx g,

10

Page 11: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

which is an LP in variables x ∈ Rn, t ∈ R (1 is thevector with all components 1).Second, as (more extensive) example of how an SOCP

might arise, consider linear programming with randomconstraints

minimize cT xsubject to Prob(aT

i x ≤ bi) ≥ η, i = 1, . . . , m

Here we suppose that the parameters ai are independentGaussian random vectors, with mean ai and covarianceΣi. We require that each constraint aT

i x ≤ bi shouldhold with a probability (or confidence) exceeding η,where η ≥ 0.5, i.e.,

Prob(aTi x ≤ bi) ≥ η.

We will show that this probability constraint can beexpressed as a second-order cone constraint. Lettingu = aT

i x, with σ2 denoting its variance, this constraintcan be written as

Prob(

u − u

σ≤ bi − u

σ

)≥ η.

Since (u − u)/σ is a zero mean unit variance Gaussianvariable, the probability above is simply Φ((b i −u)/σ),where

Φ(z) =1√2π

∫ z

−∞e−t2/2 dt

is the cumulative distribution function of a zero meanunit variance Gaussian random variable. Thus the prob-ability constraint can be expressed as

bi − u

σ≥ Φ−1(η),

or, equivalently,

u + Φ−1(η)σ ≤ bi.

From u = aTi x and σ = (xT Σix)1/2 we obtain

aTi x + Φ−1(η)‖Σ1/2

i x‖2 ≤ bi.

By our assumption that η ≥ 1/2, we have Φ−1(η) ≥0, so this constraint is a second-order cone constraint.In summary, the stochastic LP can be expressed as theSOCP

minimize cT x

subject to aTi x + Φ−1(η)‖Σ1/2

i x‖2 ≤ bi, i = 1, . . . , m.

We will see examples of using LMI constraints below.

B. Extensions of conic programming

We now present two more canonical convex optimiza-tion formulations, which can be viewed as extensions ofthe above conic formulations.A generalized linear fractional program (GLFP) has

the form:minimize f0(x)subject to Gx h

Ax = b

where the objective function is given by

f0(x) = maxi=1,...,r

cTi x + di

eTi x + fi

,

with dom f0 = {x | eTi x + fi > 0, i = 1, . . . , r}.

The objective function is the pointwise maximum ofr quasiconvex functions, and therefore quasiconvex, sothis problem is quasiconvex.A determinant maximization program (maxdet) has

the form:

minimize cT x − log detG(x)subject to G(x) � 0

F (x) � 0Ax = b

where F and G are linear (affine) matrix functions ofx, so the inequality constraints are LMI’s. By flippingsigns, one can pose this as a maximization problem,which is the historic reason for the name. Since G isaffine and, as shown earlier, the function − log detXis convex on the symmetric semidefinite cone Sn

+, themaxdet program is a convex problem.We now consider an example of each type. First, we

consider an optimal transmitter power allocation prob-lem where the variables are the transmitter powers pk,k = 1, . . . , m. Given m transmitters and mn receiversall at the same frequency. Transmitter i wants to transmitto its n receivers labeled (i, j), j = 1, . . . , n, as shownbelow (transmitter and receiver locations are fixed):

transmitter i

transmitter k

receiver (i, j)

Let Aijk ∈ R denote the path gain from transmitterk to receiver (i, j), and Nij ∈ R be the (self) noisepower of receiver (i, j) So at receiver (i, j) the signalpower is Sij = Aijipi and the noise plus interference

11

Page 12: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

power is Iij =∑

k �=i Aijkpk +Nij .Therefore the signalto interference/noise ratio (SINR) is

Sij/Iij .

The optimal transmitter power allocation problem is thento choose pi to maximize smallest SINR:

maximize mini,j

Aijipi∑k �=i Aijkpk + Nij

subject to 0 ≤ pi ≤ pmax

This is a GLFP.

Next we consider two related ellipsoid approximationproblems. In addition to the use of LMI constraints,this example illustrates techniques of computing withellipsoids and also how the choice representation of theellipsoids and polyhedra can make the difference be-tween a tractable convex problem, and a hard intractableone.

First consider computing a minimum volume ellipsoidaround points v1, . . . , vK ∈ Rn. This is equivalentto finding the minimum volume ellipsoid around thepolytope defined by the convex hull of those points,represented in vertex form.

E

For this problem, we choose the representation forthe ellipsoid as E = {x | ‖Ax − b‖ ≤ 1}, which isparametrized by its center A−1b, and the shape matrixA = AT � 0, and whose volume is proportional todetA−1. This problem immediately reduces to:

minimize log det A−1

subject to A = AT � 0‖Avi − b‖ ≤ 1, i = 1, . . . , K

which is a convex optimization problem in A, b (n +n(n + 1)/2 variables), and can be written as a maxdetproblem.

Now, consider finding the maximum volume ellipsoidin a polytope given in inequality form

P = {x | aTi x ≤ bi, i = 1, . . . , L}

E�

d

For this problem, we represent the ellipsoid as E ={By+d | ‖y‖ ≤ 1}, which is parametrized by its centerd and shape matrix B = BT � 0, and whose volume isproportional to det B. Note the containment conditionmeans

E ⊆ P ⇐⇒ aTi (By + d) ≤ bi for all ‖y‖ ≤ 1

⇐⇒ sup‖y‖≤1

(aTi By + aT

i d) ≤ bi

⇐⇒ ‖Bai‖ + aTi d ≤ bi, i = 1, . . . , L

which is a convex constraint in B and d. Hence findingthe maximum volume E ⊆ P : is convex problem invariables B, d:

maximize log detBsubject to B = BT � 0

‖Bai‖ + aTi d ≤ bi, i = 1, . . . , L

Note, however, that minor variations on these twoproblems, which are convex and hence easy, resutl inproblems that are very difficult:

• compute the maximum volume ellipsoid insidepolyhedron given in vertex form Co{v1, . . . , vK}

• compute the minimum volume ellipsoid containingpolyhedron given in inequality form Ax b

In fact, just checking whether a given ellipsoid E coversa polytope described in inequality form is extremelydifficult (equivalent to maximizing a convex quadratics.t. linear inequalities).

C. Geometric programming

In this section we describe a family of optimizationproblems that are not convex in their natural form.However, they are an excellent example of how problemscan sometimes be transformed to convex optimizationproblems, by a change of variables and a transformationof the objective and constraint functions. In addition,they have been found tremendously useful for modelingreal problems, e.g.circuit design and communicationnetworks.A function f : Rn → R with dom f = Rn

++, definedas

f(x) = cxa11 xa2

2 · · ·xann ,

12

Page 13: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

where c > 0 and ai ∈ R, is called a monomialfunction, or simply, a monomial. The exponents a i of amonomial can be any real numbers, including fractionalor negative, but the coefficient c must be nonnegative.A sum of monomials, i.e., a function of the form

f(x) =K∑

k=1

ckxa1k1 xa2k

2 · · ·xankn ,

where ck > 0, is called a posynomial function (with Kterms), or simply, a posynomial. Posynomials are closedunder addition, multiplication, and nonnegative scaling.Monomials are closed under multiplication and division.If a posynomial is multiplied by a monomial, the resultis a posynomial; similarly, a posynomial can be dividedby a monomial, with the result a posynomial.An optimization problem of the form

minimize f0(x)subject to fi(x) ≤ 1, i = 1, . . . , m

hi(x) = 1, i = 1, . . . , p

where f0, . . . , fm are posynomials and h1, . . . , hp aremonomials, is called a geometric program (GP). Thedomain of this problem is D = Rn

++; the constraintx � 0 is implicit.Several extensions are readily handled. If f is a

posynomial and h is a monomial, then the constraintf(x) ≤ h(x) can be handled by expressing it asf(x)/h(x) ≤ 1 (since f/h is posynomial). This includesas a special case a constraint of the form f(x) ≤ a,where f is posynomial and a > 0. In a similar way ifh1 and h2 are both nonzero monomial functions, thenwe can handle the equality constraint h1(x) = h2(x)by expressing it as h1(x)/h2(x) = 1 (since h1/h2

is monomial). We can maximize a nonzero monomialobjective function, by minimizing its inverse (which isalso a monomial).We will now transform geometric programs to convex

problems by a change of variables and a transformationof the objective and constraint functions.We will use the variables defined as yi = log xi, so

xi = eyi . If f a monomial function of x, i.e.,

f(x) = cxa11 xa2

2 · · ·xann ,

then

f(x) = f(ey1, . . . , eyn)= c(ey1)a1 · · · (eyn)an

= eaT y+b,

where b = log c. The change of variables yi = log xi

turns a monomial function into the exponential of anaffine function.

Similarly, if f a posynomial, i.e.,

f(x) =K∑

k=1

ckxa1k1 xa2k

2 · · ·xankn ,

then

f(x) =K∑

k=1

eaTk y+bk ,

where ak = (a1k, . . . , ank) and bk = log ck. After thechange of variables, a posynomial becomes a sum ofexponentials of affine functions.The geometric program can be expressed in terms of

the new variable y as

minimize∑K0

k=1 eaT0ky+b0k

subject to∑Ki

k=1 eaTiky+bik ≤ 1, i = 1, . . . , m

egTi y+hi = 1, i = 1, . . . , p,

where aik ∈ Rn, i = 0, . . . , m, contain the exponentsof the posynomial inequality constraints, and g i ∈ Rn,i = 1, . . . , p, contain the exponents of the monomialequality constraints of the original geometric program.Now we transform the objective and constraint func-

tions, by taking the logarithm. This results in the prob-lem

minimize f0(y) = log(∑K0

k=1 eaT0ky+b0k

)subject to fi(y) = log

(∑Ki

k=1 eaTiky+bik

)≤ 0, i = 1, . . . , m

hi(y) = gTi y + hi = 0, i = 1, . . . , p.

Since the functions fi are convex, and hi are affine, thisproblem is a convex optimization problem. We refer toit as a geometric program in convex form.Note that thetransformation between the posynomial form geometricprogram and the convex form geometric program doesnot involve any computation; the problem data for thetwo problems are the same.

VI. NONSTANDARD AND NONCONVEX PROBLEMS

In practice, one might encounter convex problemswhich do not fall into any of the canonical formsabove. In this case, it is necessary to develop customcode for the problem. Developing such codes requiresgradient, for optimal performance, Hessian information.If only gradient information is available, the ellipsoid,subgradient or cutting plane methods can be used. Theseare reliable methods with exact stopping criteria. Thesame is true for interior point methods, however thesealso require Hessian information. The payoff for havingHessian information is much faster convergence; inpractice, they solve most problems at the cost of a dozen

13

Page 14: A Tutorial on Convex Optimization - uok.ac.ir Eng... · convex sets, functionsand convex optimization problems, so that the reader can more readily recognize and formulate engineering

least squares problems of about the size of the decisionvariables. These methods are described in the references.Nonconvex problems are more common, of course,

than nonconvex problems. However, convex optimiza-tion can still often be helpful in solving these as well:it can often be used to compute lower; useful initialstarting points; etc.

VII. CONCLUSION

In this paper we have reviewed some essentials ofconvex sets, functions and optimization problems. Wehope that the reader is convinced that convex optimiza-tion dramatically increases the range of problems inengineering that can be modeled and solved exactly.

ACKNOWLEDGEMENT

I am deeply indebted to Stephen Boyd and LievenVandenberghe for generously allowing me to use mate-rial from [7] in preparing this tutorial.

REFERENCES

[1] M. S. Bazaraa, H. D. Sherali, and C. M. Shetty. NonlinearProgramming. Theory and Algorithms. John Wiley & Sons,second edition, 1993.

[2] A. Ben-Tal and A. Nemirovski. Lectures on Modern Convex Op-timization. Analysis, Algorithms, and Engineering Applications.Society for Industrial and Applied Mathematics, 2001.

[3] D. P. Bertsekas. Nonlinear Programming. Athena Scientific,1999.

[4] D. Bertsimas and J. N. Tsitsiklis. Introduction to LinearOptimization. Athena Scientific, 1997.

[5] S. Boyd and C. Barratt. Linear Controller Design: Limits ofPerformance. Prentice-Hall, 1991.

[6] S. Boyd, L. El Ghaoui, E. Feron, and V. Balakrishnan. LinearMatrix Inequalities in System and Control Theory. Society forIndustrial and Applied Mathematics, 1994.

[7] S.P. Boyd and L. Vandenberghe. Convex Optimization. Cam-bridge University Press, 2003. In Press. Material available atwww.stanford.edu/˜boyd.

[8] D. Colleran, C. Portmann, A. Hassibi, C. Crusius, S. Mohan,T. Lee S. Boyd, and M. Hershenson. Optimization of phase-locked loop circuits via geometric programming. In CustomIntegrated Circuits Conference (CICC), San Jose, California,September 2003.

[9] M. A. Dahleh and I. J. Diaz-Bobillo. Control of UncertainSystems: A Linear Programming Approach. Prentice-Hall, 1995.

[10] J. W. Demmel. Applied Numerical Linear Algebra. Society forIndustrial and Applied Mathematics, 1997.

[11] G. E. Dullerud and F. Paganini. A Course in Robust ControlTheory: A Convex Approach. Springer, 2000.

[12] G. Golub and C. F. Van Loan. Matrix Computations. JohnsHopkins University Press, second edition, 1989.

[13] M. Grotschel, L. Lovasz, and A. Schrijver. Geometric Algorithmsand Combinatorial Optimization. Springer, 1988.

[14] T. Hastie, R. Tibshirani, and J. Friedman. The Elements ofStatistical Learning. Data Mining, Inference, and Prediction.Springer, 2001.

[15] M. del Mar Hershenson, S. P. Boyd, and T. H. Lee. Optimaldesign of a CMOS op-amp via geometric programming. IEEETransactions on Computer-Aided Design of Integrated Circuitsand Systems, 20(1):1–21, 2001.

[16] J.-B. Hiriart-Urruty and C. Lemarechal. Fundamentals of ConvexAnalysis. Springer, 2001. Abridged version of Convex Analysisand Minimization Algorithms volumes 1 and 2.

[17] R. A. Horn and C. A. Johnson. Matrix Analysis. CambridgeUniversity Press, 1985.

[18] N. Karmarkar. A new polynomial-time algorithm for linearprogramming. Combinatorica, 4(4):373–395, 1984.

[19] D. G. Luenberger. Optimization by Vector Space Methods. JohnWiley & Sons, 1969.

[20] D. G. Luenberger. Linear and Nonlinear Programming. Addison-Wesley, second edition, 1984.

[21] Z.-Q. Luo. Applications of convex optimization in signal pro-cessing and digital communication. Mathematical ProgrammingSeries B, 97:177–207, 2003.

[22] O. Mangasarian. Nonlinear Programming. Society for Industrialand Applied Mathematics, 1994. First published in 1969 byMcGraw-Hill.

[23] Y. Nesterov and A. Nemirovskii. Interior-Point PolynomialMethods in Convex Programming. Society for Industrial andApplied Mathematics, 1994.

[24] R. T. Rockafellar. Convex Analysis. Princeton University Press,1970.

[25] C. Roos, T. Terlaky, and J.-Ph. Vial. Theory and Algorithms forLinear Optimization. An Interior Point Approach. John Wiley &Sons, 1997.

[26] G. Strang. Linear Algebra and its Applications. Academic Press,1980.

[27] J. van Tiel. Convex Analysis. An Introductory Text. John Wiley& Sons, 1984.

[28] H. Wolkowicz, R. Saigal, and L. Vandenberghe, editors. Hand-book of Semidefinite Programming. Kluwer Academic Publish-ers, 2000.

[29] S. J. Wright. Primal-Dual Interior-Point Methods. Society forIndustrial and Applied Mathematics, 1997.

[30] Y. Ye. Interior Point Algorithms. Theory and Analysis. JohnWiley & Sons, 1997.

14