Convex Optimization & Lagrange Duality

Post on 16-Oct-2021

17 views 0 download

Transcript of Convex Optimization & Lagrange Duality

Convex Optimization & Lagrange Duality

Chee Wei Tan

CS6491 Topics in Optimization and its Applications to Computer Science

Outline

• Convex optimization

• Optimality condition

• Lagrange duality

• KKT optimality condition

• Sensitivity analysis

1

Convex Optimization

A convex optimization problem with variables x:

minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m

aTi x = bi, i = 1, 2, . . . , p

where f0, f1, . . . , fm are convex functions.

• Minimize convex objective function (or maximize concave objectivefunction)

• Upper bound inequality constraints on convex functions ()Constraint set is convex)

• Equality constraints must be a�ne

2

Convex Optimization

• Epigraph form:

minimize tsubject to f0(x)� t 0

fi(x) 0, i = 1, 2, . . . ,maTi x = bi, i = 1, 2, . . . , p

• Can you reformulate the Illumination Problem in Lecture 1?

3

• Can you reformulate the following problem (not in convexoptimization form):

minimize x21 + x2

2

subject to x11+x2

2 0

(x1 + x2)2 = 0

Now transformed into a convex optimization problem:

minimize x21 + x2

2

subject to x1 0x1 + x2 = 0

4

Locally Optimal ) Globally Optimal

Given x is locally optimal for a convex optimization problem, i.e., x is feasibleand for some R > 0,

f0(x) = inf{f0(z)|z is feasible , kz � xk2 R}

Suppose x is not globally optimal, i.e., there is a feasible y such thatf0(y) < f0(x)

Since ky � xk2 > R, we can construct a point z = (1� ✓)x+ ✓y where✓ = R

2ky�xk2. By convexity of feasible set, z is feasible. By convexity of f0, wehave

f0(z) (1� ✓)f0(x) + ✓f0(y) < f0(x)

which contradicts locally optimality of x

Therefore, there exists no feasible y such that f0(y) < f0(x)

5

Optimality Condition for Di↵erentiable f0

x is optimal for a convex optimization problem i↵ x is feasible andfor all feasible y:

rf0(x)T (y � x) � 0

�rf0(x) is supporting hyperplane to feasible set

Unconstrained convex optimization: condition reduces to:

rf0(x) = 0

Proof: take y = x� trf0(x) where t 2 R+. For small enough t, y isfeasible, so rf0(x)T (y � x) = �tkrf0(x)k22 � 0. Thus rf0(x) = 0

6

Unconstrained Quadratic Optimization

Minimize f0(x) =12x

TPx+ qTx+ r

P is positive semidefinite. So it’s a convex optimization problem

x minimizes f0 i↵ (P, q) satisfy this linear equality:

rf0(x) = Px+ q = 0

• If q /2 R(P ), no solution. f0 unbounded below

• If q 2 R(P ) and P � 0, there is a unique minimizer x⇤ = �P�1q

• If q 2 R(P ) and P is singular, set of optimal x: �P †q +N (P )

7

Equality Constrained Convex Optimization

minimize f0(x)subject to Ax = b

Assume nonempty feasible set. Sincerf0(x)T (y � x) � 0, 8y : Ay = b, and every feasible y can be

written as y = x+ v for some v 2 N (A), optimality condition is:

rf0(x)Tv � 0, 8v 2 N (A)

which implies rf0(x)Tv = 0, i.e., rf0(x) is orthogonal to N (A).Thus rf0(x) 2 R(AT ), i.e., there exists ⌫ such that

rf0(x) +AT⌫ = 0

Conclusion: x is optimal i↵ Ax = b and 9⌫ s.t. rf0(x) +AT⌫ = 0

8

Duality Mentality

Bound or solve an optimization problem via a di↵erent optimizationproblem!

We’ll develop the basic Lagrange duality theory for a generaloptimization problem, then specialize for convex optimization

9

Lagrange Dual Function

An optimization problem in standard form:

minimize f0(x)subject to fi(x) 0, i = 1, 2, . . . ,m

hi(x) = 0, i = 1, 2, . . . , p

Variables: x 2 Rn. Assume nonempty feasible set

Optimal value: p⇤. Optimizer: x⇤

Idea: augment objective with a weighted sum of constraints

Lagrangian L(x,�, ⌫) = f0(x) +Pm

i=1 �ifi(x) +Pp

i=1 ⌫ihi(x)

Lagrange multipliers (dual variables): � ⌫ 0, ⌫

Lagrange dual function: g(�, ⌫) = infxL(x,�, ⌫)

10

Lower Bound on Optimal Value

Claim: g(�, ⌫) p⇤, 8� ⌫ 0, ⌫

Proof: Consider feasible x̃:

L(x̃,�, ⌫) = f0(x̃) +mX

i=1

�ifi(x̃) +pX

i=1

⌫ihi(x̃) f0(x̃)

since fi(x̃) 0 and �i � 0

Hence, g(�, ⌫) L(x̃,�, ⌫) f0(x̃) for all feasible x̃

Therefore, g(�, ⌫) p⇤

11

Lagrange Dual Function and ConjugateFunction

• Lagrange dual function g(�, ⌫)

• Conjugate function: f⇤(y) = supx2dom f(yTx� f(x))

Consider linearly constrained optimization:

minimize f0(x)subject to Ax � b

Cx = d

g(�, ⌫) = infx

�f0(x) + �T (Ax� b) + ⌫T (Cx� d)

= �bT�� dT⌫ + infx

�f0(x) + (AT�+ CT⌫)Tx

= �bT�� dT⌫ � f⇤0 (�AT�� CT⌫)

12

Example

We’ll use the simplest version of entropy maximization as ourexample for the rest of this lecture on duality. Entropy maximization

is an important basic problem in information theory:

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b1Tx = 1

Since the conjugate function of u log u is ey�1, by independence ofthe sum, we have

f⇤0 (y) =

nX

i=1

eyi�1

13

Therefore, dual function of entropy maximization is

g(�, ⌫) = �bT�� ⌫ � e�⌫�1nX

i=1

e�aTi �

where ai are columns of A

14

Lagrange Dual Problem

Lower bound from Lagrange dual function depends on (�, ⌫).What’s the best lower bound that can be obtained from Lagrange

dual function?

maximize g(�, ⌫)subject to � ⌫ 0

This is the Lagrange dual problem with dual variables (�, ⌫)

Always a convex optimization! (Dual objective function always aconcave function since it’s the infimum of a family of a�ne

functions in (�, ⌫))

Denote the optimal value of Lagrange dual problem by d⇤

15

Weak Duality

What’s the relationship between d⇤ and p⇤?

Weak duality always hold (even if primal problem is not convex):

d⇤ p⇤

Optimal duality gap:

p⇤ � d⇤

E�cient generation of lower bounds through (convex) dual problem

16

Strong Duality

Strong duality (zero optimal duality gap):

d⇤ = p⇤

If strong duality holds, solving dual is ‘equivalent’ to solving primal.But strong duality does not always hold

Convexity and constraint qualifications ) Strong duality

A simple constraint qualification: Slater’s condition (there existsstrictly feasible primal variables fi(x) < 0 for non-a�ne fi)

Another reason why convex optimization is ‘easy’

17

Example: Entropy Maximization

Primal optimization problem (variables x):

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b, 1Tx = 1

Dual optimization problem (variables �, ⌫):

maximize �bT�� ⌫ � e�⌫�1Pn

i=1 e�aTi �

subject to � ⌫ 0

Analytically maximize over the unconstrained ⌫ ) Simplified dualoptimization problem (variables �) and strong duality holds:

maximize �bT�� logPn

i=1 exp(�aTi �)subject to � ⌫ 0

18

Saddle Point Interpretation

Assume no equality constraints. We can express primal optimalvalue as

p⇤ = infx

sup�⌫0

L(x,�)

By definition of dual optimal value:

d⇤ = sup�⌫0

infx

L(x,�)

Weak duality (max min inequality):

sup�⌫0

infx

L(x,�) infx

sup�⌫0

L(x,�)

19

Strong duality (saddle point property):

sup�⌫0

infx

L(x,�) = infx

sup�⌫0

L(x,�)

20

Complementary Slackness

Assume strong duality holds:

f0(x⇤) = g(�⇤, ⌫⇤)

= infx

f0(x) +

mX

i=1

�⇤i fi(x) +

pX

i=1

⌫⇤i hi(x)

!

f0(x⇤) +

mX

i=1

�⇤i fi(x

⇤) +pX

i=1

⌫⇤i hi(x⇤)

f0(x⇤)

21

Complementary Slackness

So the two inequalities must hold with equality. This implies:

�⇤i fi(x

⇤) = 0, i = 1, 2, . . . ,m

Complementary Slackness Property:

�⇤i > 0 ) fi(x

⇤) = 0

fi(x⇤) < 0 ) �⇤

i = 0

22

KKT Optimality Conditions

Since x⇤ minimizes L(x,�⇤, ⌫⇤) over x, we have

rf0(x⇤) +

mX

i=1

�⇤irfi(x

⇤) +pX

i=1

⌫⇤irhi(x⇤) = 0

Karush-Kuhn-Tucker optimality conditions:

fi(x⇤) 0, hi(x⇤) = 0, �⇤i ⌫ 0

�⇤i fi(x

⇤) = 0

rf0(x⇤) +Pm

i=1 �⇤irfi(x⇤) +

Ppi=1 ⌫

⇤irhi(x⇤) = 0

23

KKT Optimality Conditions

• Any optimization (with di↵erentiable objective and constraintfunctions) with strong duality, KKT condition is necessarycondition for primal-dual optimality

• Convex optimization (with di↵erentiable objective and constraintfunctions) with Slater’s condition, KKT condition is also su�cientcondition for primal-dual optimality (useful for theoretical andnumerical purposes)

24

Example: Entropy Maximization

Primal optimization problem (variables x):

minimize f0(x) =Pn

i=1 xi log xi

subject to Ax � b, 1Tx = 1

Dual optimization problem (variables �, ⌫):

maximize �bT�� ⌫ � e�⌫�1Pn

i=1 e�aTi �

subject to � ⌫ 0

Having solved dual problem, recover optimal primal variables asminimizer of

L(x,�⇤, ⌫⇤) =nX

i=1

xi log xi + �⇤T (Ax� b) + ⌫⇤(1Tx� 1)

25

i.e., x⇤i = 1/ exp(aTi �

⇤ + ⌫⇤ + 1)

If x⇤ above is primal feasible, it’s the optimal primal. Otherwise, theprimal optimum is not attained

26

Example: Maximizing Channel Capacity asWater-filling

maximizenX

i=1

log

✓1 +

xi

↵i

subject to x ⌫ 0, 1Tx = 1

Variables: x (powers). Constants: ↵i > 0 (channel noise)

KKT conditions:

x⇤ ⌫ 0, 1Tx⇤ = 1, �⇤ ⌫ 0

�⇤ix

⇤i = 0, �1/(↵i + x⇤

i )� �⇤i + ⌫⇤ = 0

27

Since �⇤ are slack variables, reduce to

x⇤ ⌫ 0, 1Tx⇤ = 1

x⇤i (⌫

⇤ � 1/(↵i + x⇤i )) = 0, ⌫⇤ � 1/(↵i + x⇤

i )

If ⌫⇤ < 1/↵i, x⇤i > 0. So x⇤

i = 1/⌫⇤ � ↵i. Otherwise, x⇤i = 0

Thus, x⇤i = [1/⌫⇤ � ↵i]+ where ⌫⇤ is such that

Pi x

⇤i = 1

• Draw a picture to interpret this optimal solution

• Design an algorithm to compute this optimal solution

28

Global Sensitivity Analysis

Perturbed optimization problem:

minimize f0(x)subject to fi(x) ui, i = 1, 2, . . . ,m

hi(x) = vi i = 1, 2, . . . , p

Optimal value p⇤(u, v) as a function of parameters (u, v)

Assume strong duality and that dual optimum is attained:

p⇤(0, 0) = g(�⇤, ⌫⇤) f0(x) +

Pi �

⇤i fi(x) +

Pi ⌫

⇤i hi(x)

f0(x) + �⇤Tu+ ⌫⇤Tv

29

Global Sensitivity Analysis

p⇤(u, v) � p⇤(0, 0)� �⇤Tu� ⌫⇤Tv

• If �⇤i is large, tightening ith constraint (ui < 0) will increase

optimal value greatly

• If �⇤i is small, loosening ith constraint (ui > 0) will reduce optimal

value only slightly

30

Local Sensitivity Analysis

Assume that p⇤(u, v) is di↵erentiable at (0, 0):

�⇤i = �@p⇤(0, 0)

@ui, ⌫⇤i = �@p⇤(0, 0)

@vi

Shadow price interpretation of Lagrange dual variables

Small �⇤i means tightening or loosening ith constraint will not

change optimal value by much

31

Summary

• Convexity mentality. Convex optimization is ‘nice’ for severalreasons: local optimum is global optimum, zero optimal duality gap(under mild conditions), KKT optimality conditions are necessaryand su�cient

• Duality mentality. Can always bound primal through dual,sometimes indirectly solve primal through dual

• Primal-dual: where is the optimum, how sensitive is it toperturbation in problem parameters?

Reading assignment: Sections 4.1-4.2 and 5.1-5.7 of textbook.

32