calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 1:

Convexity

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

What’s to come?

IPMs for Optimization

• Convexity Theory

• Duality Theory

• Newton Method (self-concordant barriers)

• Interior Point Methods for LP, QP and NLP

(motivation, theory, polynomial complexity)

Nonlinear Optimization

• Linesearch methods

• Trust region methods

Optimization Relies on Linear Algebra

• Positive definite, indefinite, quasidefinite sys-

tems, Cholesky factorization

• Sparse Matrix Techniques

– LU decomp. (unsymmetric matrices)

– Cholesky decomp. (symmetric matrices)

• Reordering for sparsity

(minimum degree, nested dissection)

Applications

• Data mining: Support Vector Machines

• Markowitz portfolio optimization2


Convex Optimization

Consider the general optimization problem

min f(x)

s.t. g(x) ≤ 0,

where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm

are convex, twice differentiable.

Basic Assumptions:

f and g are convex

⇒ If there exists a local minimum then it is a

global one.

f and g are twice differentiable

⇒ We can use the second order Taylor

approximations of them.

3


Glossary

LP: Linear Programming

both f and g are linear.

QP: Quadratic Programming

f is quadratic and g is linear.

NLP: Nonlinear Programming

f or g is nonlinear.

SDP: Semidefinite Programming

f, g are functions of positive definite matrices.

4


Convexity

Convexity is a key property in optimization.

Def. A set C ⊂ Rn is convex if

λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0,1].

y

z

x

x

y

z

Convex set Nonconvex set

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is convex if

f(λx+(1−λ)y)≤λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].

x z y x z y

Convex function Nonconvex function5


Convexity (cnt’d)


A function f : C 7→ R is concave if

f(λx+(1−λ)y)≥λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].

Remark. A function f : C 7→ R is concave if and

only if function −f is convex.


A function f : C 7→ R is strictly convex if

f(λx+(1−λ)y)<λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).


A function f : C 7→ R is strictly concave if

f(λx+(1−λ)y)>λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).

6


Convexity and Optimization

Consider a problem

min f(x) s.t. x ∈ X,

where X is a set of feasible solutions

and f : X → R is an objective function.

Def. A vector x is a local minimum of f if

∃ǫ > 0 such that f(x) ≤ f(x), ∀x | ‖x − x‖ < ǫ.

Def. A vector x is a global minimum of f if

f(x) ≤ f(x), ∀x ∈ X.

Lemma. If X is a convex set and f : X 7→ R

is a convex function, then a local minimum is a

global minimum.

Proof. Suppose that x is a local minimum, but

not a global one. Then ∃y 6=x such that f(y)<

f(x). From convexity of f , we have ∀λ∈ [0,1]

f((1−λ)x+λy) ≤ (1−λ)f(x)+λf(y)

< (1−λ)f(x)+λf(x) = f(x).

In particular, for a sufficiently small λ, the point

z = (1−λ)x+λy lies in the ǫ-neighbourhood of x

and f(z) < f(x). This contradicts the assump-

tion that x is a local minimum. 7


Properties

1. For any collection {Ci | i ∈ I} of convex sets,

the intersection⋂

i∈I Ci is convex.

2. The vector sum {x1 + x2 | x1 ∈ C1, x2 ∈ C2}of two convex sets C1 and C2 is convex.

3. The image of a convex set under a linear

transformation is convex.

4. If C is a convex set and f : C 7→ R is a convex

function, the level sets {x ∈ C | f(x) ≤ α} and

{x ∈ C | f(x) < α} are convex for all scalars α.

5. For any collection {fi : C 7→ R | i ∈ I} of

convex functions, the weighted sum, with pos-

itive weights wi > 0, i ∈ I, i.e. the function

f =∑

i∈I wifi : C 7→ R, is convex.

6. If I is an index set, C ∈ Rn is a convex set,

and fi : C 7→ R is convex ∀i ∈ I, then the function

h : C 7→ R defined by

h(x) = supi∈I

fi(x)

is also convex.8


Differentiable Convex Fnctns

7. Let C ∈ Rn be a convex set and f : C 7→ R be

differentiable over C.

(a) The function f is convex if and only if

f(y) ≥ f(x) + ∇Tf(x)(y − x), ∀x, y ∈ C.

(b) If the inequality is strict for x 6= y, then f is

strictly convex.

8. Let C ∈ Rn be a convex set and f : C 7→ R be

twice continuously differentiable over C.

(a) If ∇2f(x) is positive semidefinite for all x ∈ C,

then f is convex.

(b) If ∇2f(x) is positive definite for all x ∈ C,

then f is strictly convex.

(c) If f is convex, then ∇2f(x) is positive semi-

definite for all x ∈ C.

9. Let C ∈ Rn be a convex set and Q a square

matrix. Let f(x) = xTQ x be a quadratic function

f : C 7→ R.

(a) f is convex iff Q is positive semidefinite.

(b) f is strictly convex iff Q is positive definite.

9


Proof of Property 4

Define

Xα = {x ∈ C : f(x) ≤ α}.

We will prove that Xα is convex.

Take any x, y ∈ Xα. From the definition of Xα

we get that f(x) ≤ α and f(y) ≤ α.

Take any λ ∈ [0,1] and define z = (1− λ)x + λy.

From the convexity of f we get

f(z) = f((1−λ)x+λy)

≤ (1−λ)f(x)+λf(y)

≤ (1−λ)α+λα = α.

Hence z ∈ Xα which completes the proof.

The proof for a strong inequality is identical.

10


Proof of Property 7 (a)

Part 1 ( ⇒ )

Take any x, y ∈ C, and any λ ∈ [0,1].

From convexity of f we get

f(x + λ(y − x)) ≤ (1 − λ)f(x) + λf(y).

Hence

f(x+λ(y−x))−f(x)≤λ(f(y)−f(x))

and

f(x + λ(y − x)) − f(x)

λ≤ f(y) − f(x).

Let λ → 0+. Then the left hand side becomes

∇Tf(x)(y−x) (a derivative of f in direction y−x)

implying

∇Tf(x)(y − x) ≤ f(y) − f(x),

which completes this part of the proof.

11


Proof of Property 7 (a)

Part 2 ( ⇐ )

Take any x, y ∈ C, and any λ ∈ [0,1].

Let z = λx + (1 − λ)y.

Since x − z = x − y − λ(x − y), we have

f(x) ≥ f(z) + ∇Tf(z)(1 − λ)(x − y).

For y − z = −λ(x − y), we have

f(y) ≥ f(z) + ∇Tf(z)(−λ)(x − y).

Having multiplied the first inequality by λ and

the second by 1 − λ and having added them we

get

λf(x) + (1 − λ)f(y) ≥ f(z),

which proves the convexity of f .

12


More on Convexity


A function f : C 7→ R is quasi-convex if

f(λx+(1−λ)y)≤max{f(x),f(y)}, ∀x, y∈C, ∀λ∈[0,1].

Quasi-convex fctn Quasi-concave fctn

Lemma. Let C be a nonempty convex set. A

function f : C 7→ R is quasi-convex if and only if

the level set Sα = {x ∈ C | f(x) ≤ α} is convexfor every real number α.

Def. Let C be a convex subset of Rn.A differentiable function f : C 7→ R is called

pseudo-convex if for any x, y∈C, the inequality

∇Tf(x)(y − x) ≥ 0

implies thatf(y) ≥ f(x).

13


Linear Programming

Consider a Linear Program (LP)

min cTx

s. t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.

Matrix A has a full row rank, m (m ≤ n).

Let P be the (primal) feasible set:

P = {x ∈ Rn |Ax = b, x ≥ 0},and P0 be the (primal) strictly feasible set:

P0 = {x ∈ Rn |Ax = b, x > 0}.

Lemma. P is a convex set.

Proof. Note that a linear function is convex.

Thus P is an intersection of convex sets and from

Property 1 it is convex.

Corollary. LP is a convex optimization problem.

Proof. The objective function is linear hence

convex. From Lemma, the feasible set of an LP

is also convex.

14


Convex Quadratic Program

Def. A matrix H ∈ Rn×n is positive definite if

xTH x > 0 for any x 6= 0.

Example: H1 is positive definite, H2 is not.

H1 =

[

2 33 5

]

and H2 =

[

2 33 4

]

.

Indeed:

f1(x1, x2) = 2x21 + 6x1x2 + 5x2

2

= 2(x1 +3

2x2)

2 +1

2x22 ≥ 0.

f2(x1, x2) = 2x21 + 6x1x2 + 4x2

2

= 2(x1 +3

2x2)

2 −1

2x22

and this does not have to be nonnegative.

From Property 9, the function f(x) = xTQ x is

convex iff Q is positive semidefinite. The QP

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

is a convex optimization problem iff Q is positive

semidefinite. 15


Convex Nonlinear Program

Consider a general optimization problem

min f(x)

s.t. g(x) ≤ 0,

where x∈Rn, and f : Rn 7→R and g : Rn 7→Rm.

Lemma. If f : Rn 7→ R and g : Rn 7→ Rm are

convex, then the above problem is convex.

Proof. Since the objective function f is convex,

we only need to prove that the feasible set of the

above problem

X = {x ∈ Rn : g(x) ≤ 0}

is convex. Define for i = 1,2, ..., m

Xi = {x ∈ Rn : gi(x) ≤ 0}.

From Property 4, Xi is convex for all i.

We observe that

X = {x ∈ Rn : gi(x) ≤ 0, ∀i = 1..m} =⋂

i

Xi.

i.e., X is an intersection of convex sets and from

Property 1, X is a convex set.

16




Turin 2008

Jacek Gondzio

Lecture 2:

Duality

1


Lagrangian

Consider a general optimization problem

min f(x)

s.t. g(x) ≤ 0, (1)

x ∈ X ⊆ Rn,

where f : Rn 7→R and g : Rn 7→Rm.

The set X is arbitrary; it may include, for exam-

ple, an integrality constraint.

The constraint g(x) ≤ 0 is understood as:

gi(x) ≤ 0, ∀i = 1,2, ..., m,

i.e., as m inequalities.

Let x be an optimal solution of (1) and define

f = f(x).

Introduce the Lagrange multiplier yi ≥ 0 for

every inequality constraint gi(x) ≤ 0.

Define y = (y1, . . . , ym)T and the Lagrangian

L(x, y) = f(x) + yTg(x),

y are also called dual variables.

2


Lagrangian Duality

Consider the problem

LD(y) = minx

L(x, y)

x ∈ X ⊆ Rn.

Its optimal solution x depends on y and so doesthe optimal objective LD(y).

Lemma. For any y ≥ 0, LD(y) is a lower boundon f (the optimal solution of (1)), i.e.,

f ≥ LD(y) ∀y ≥ 0.

Proof.

f = min {f(x) | g(x) ≤ 0, x ∈ X}

≥ min{

f(x) + yTg(x) | g(x) ≤ 0, y ≥ 0, x ∈ X}

≥ min{

f(x) + yTg(x) | y ≥ 0, x ∈ X}

= LD(y).

Corollary.f ≥ max

y≥0LD(y),

i.e.,f ≥ max

y≥0minx∈X

L(x, y).

3


Lagrangian Duality

Observe that:

If ∃i gi(x) > 0, then

maxy≥0

L(x, y) = +∞

(we let the corresponding yi grow to +∞).

If ∀i gi(x) ≤ 0, then

maxy≥0

L(x, y) = f(x),

because ∀i yigi(x) ≤ 0 and the maximum is at-

tained when

yigi(x) = 0, ∀i = 1,2, ..., m.

Hence the problem (1) is equivalent to the fol-

lowing MinMax problem

minx∈X

maxy≥0

L(x, y),

which could also be written as follows:

f = minx∈X

maxy≥0

L(x, y).

4


Weak Duality

Consider the following problem

min {f(x) | g(x) ≤ 0, x ∈ X} ,

where f , g and X are arbitrary.

With this problem we associate the Lagrangian

L(x, y) = f(x) + yTg(x),

y are dual variables (Lagrange multipliers).

The weak duality always holds:

minx∈X

maxy≥0

L(x, y) ≥ maxy≥0

minx∈X

L(x, y).

Observe that we have not made any assumption

about functions f and g and set X.

If f and g are convex, X is convex and certain

regularity conditions are satisfied, then

minx∈X

maxy≥0

L(x, y) = maxy≥0

minx∈X

L(x, y).

This is called the strong duality.

5


Notation

Consider again the problem

min f(x)

s.t. g(x) ≤ 0,

x ∈ X ⊆ Rn,

where f : Rn 7→R and g : Rn 7→Rm.

Take x ∈ X ⊆ Rn and y ∈ Y = {y ∈ Rm, y ≥ 0}and write the Lagrangian

L(x, y) = f(x) + yTg(x).

Define the primal function

LP (x) =

{

f(x) if ∀i gi(x) ≤ 0+∞ if ∃i gi(x) > 0.

Observe that

LP (x) = maxy≥0

L(x, y). (2)

Define the dual function

LD(y) = minx∈X

L(x, y). (3)

6


Primal & Dual Problems

The problem (1) can be formulated as looking

for x ∈ X ⊆ Rn such that

LP (x) = minx∈X

LP (x).

It is called the primal problem.

The problem

LD(y) = maxy≥0

LD(y).

is called the dual problem.

The weak duality can be rewritten as:

LP (x) ≥ LD(y).

Def. Primal feasible set.

XP = {x : x ∈ X, gi(x) ≤ 0, i = 1,2, . . . , m}.

Def. Dual feasible set. A tuple (x, y) ∈ Rn+m

is feasible for the dual problem if

(x,y)∈YD = {(x,y): x∈X, y∈Y, LD(y)=L(x,y)}.

Def. Dual optimal solution.

A tuple (x, y) ∈ Rn+m is called dual optimal if

(x, y) ∈ YD and y maximizes LD(y).7


Primal-Dual Bounds

Lemma. If x1 ∈ XP and (x2, y2) ∈ YD (i.e., x1 is

primal feasible and (x2, y2) is dual feasible), then

LP (x1) ≥ LD(y2).

Proof. Since x1 ∈ XP we get LP (x1) = f(x1).

For any y ∈ Y , from definition (2) we have

LP (x1) ≥ L(x1, y). In particular, for y = y2:

LP (x1) ≥ L(x1, y2). (4)

On the other hand, (x2, y2) ∈ YD hence for any

x ∈ X from (3) we have L(x, y2) ≥ LD(y2) and,

in particular, for x = x1:

L(x1, y2) ≥ LD(y2). (5)

From (4) and (5) we get

f(x1) = LP (x1) ≥ L(x1, y2) ≥ LD(y2),

which completes the proof.

Any primal feasible solution provides an upper

bound for the dual problem, and

any dual feasible solution provides a lower

bound for the primal problem.

8


Duality and Convexity

Recall that the weak duality holds regardless of

the form of functions f , g and set X:

minx∈X

maxy≥0

L(x, y) ≥ maxy≥0

minx∈X

L(x, y).

What do we need to assume for the inequality in

the weak duality to become an equation?

If

• X ⊆ Rn is convex;

• f and g are convex;

• optimal solution is finite;

• some mysterious regularity conditions hold,

then strong duality holds.

That is

minx∈X

maxy≥0

L(x, y) = maxy≥0

minx∈X

L(x, y).

An example of regularity conditions:

∃x ∈ int(X) such that g(x) < 0.

9


Geometric View

Consider a mapping which for any x ∈ X defines

a point in Rm+1 of the form (g(x), f(x)).

We write x 7→ (g, f). Let H be the image of X.

In the example below n = 2 and m = 1. Hence:

x ∈ X ⊆ R2 and f : R2 7→ R and g : R2 7→ R.

Lagrange multiplier: y ∈ R (y ≥ 0).

x x[g( ),f( )]

yL ( )D

x

x

x

1

2f

X

(g,f)

f+yg = constg

(g,f)H

slope: -y

slope: -y

10


Figure Interpretation

Primal problem:

We look for a point (g, f) ∈ H such that

g ≤ 0 and f attains its minimum.

This is the point (g, f) in the Figure.

11


Figure Interpretation

Dual problem:

Take y≥0. To find LD(y), we need to minimize

f(x) + yg(x) with respect to x ∈ X. This cor-

responds to the minimization of the linear form

f + yg in the set H.

For a given y ≥ 0, the linear form f + yg has

a fixed slope (equal to −y) and the minimum is

attained when the line f + yg touches the bot-

tom of H. We say that “the hyperplane f + yg

supports the set H”.

The intersection of the supporting plane and the

f line determines the value of LD(y).

The dual problem consists in finding such a slope

y that LD(y) is maximized, i.e., the intersection

of the supporting plane and the f axis is the

highest possible.

There are two supporting hyperplanes in the Fig-

ure. The one corresponding to y corresponds to

the maximum of LD(y).

12


Nonzero Duality Gap

When sufficient conditions for strong duality are

not satisfied, we may observe a nonzero duality

gap:

minx∈X

maxy≥0

L(x, y) − maxy≥0

minx∈X

L(x, y) > 0.

In the Figure below:

f − LD(y) > 0.

x x[g( ),f( )]

yL ( )D

x

x2f

X

(g,f)

g

(g,f)H

slope: -y

x1

f

13


Read more on duality

1. Bertsekas, D., Nonlinear Programming,

Athena Scientific, Massachusetts, 1995.

ISBN 1-886529-14-0, pages 415-486.

2. Hillier, F.S. and Lieberman, G.J.,

Introduction to Operations Research,

7th edition, McGraw Hill, 2001.

ISBN 0-07-232169-5, pages 230-308.

14




Turin 2008

Jacek Gondzio

Lecture 3:

Duality

1


Equality Constraints

Let h : Rn 7→ Rk define an equality constraint

h(x) = 0 (understood as hj(x) = 0, j = 1, ..., k).

Replace hj(x) = 0 with two inequalities:

hj(x) ≤ 0 and − hj(x) ≤ 0.

Then the optimization problem

min f(x)

s.t. g(x) ≤ 0,

h(x) = 0,

x ∈ X ⊆ Rn,

where f : Rn 7→R, g : Rn 7→Rm and h : Rn 7→Rk,

becomes:

min f(x)

s.t. g(x) ≤ 0,

h(x) ≤ 0,

−h(x) ≤ 0,

x ∈ X ⊆ Rn.

2


Equality Constraints (cont’d)

Use nonnegative Lagrange multipliers y ∈ Rm for

g constraints.

Use a pair of Lagrange multipliers u+j ≥ 0 and

u−j ≥ 0 for inequalities hj(x) ≤ 0 and −hj(x) ≤

0, respectively. In other words, use two vectors

u+ ≥ 0 and u− ≥ 0, both in Rk and write the

Lagrangian

L(x, y, u+,u−) = f(x)+yTg(x)+(u+)Th(x)−(u−)Th(x)

= f(x)+yTg(x)+(u+− u−)Th(x)

= f(x)+yTg(x)+uTh(x),

where the vector u = u+ − u− ∈ Rk has no sign

restriction.

The Lagrangian becomes:

L(x, y, u) = f(x)+yTg(x)+uTh(x),

and all theoretical results derived earlier can be

replicated for this new problem formulation.

3


Wolfe Duality

Lagrange duality does not need differentiability.

Suppose f and g are convex and differentiable.

Suppose X is convex.

The dual function

LD(y) = minx∈X

L(x, y).

requires minimization with respect to x.

Instead of minimization with respect to x,

we ask for a stationarity with respect to x:

∇xL(x, y) = 0.

Lagrange dual problem:

maxy≥0

LD(y)

(

i.e., maxy≥0

minx∈X

L(x, y)

)

.

Wolfe dual problem:

max L(x, y)

s.t. ∇xL(x, y) = 0

y ≥ 0.

4


Duality: Example

Consider the nonlinear program:

min f(x) = x21 + x2

2 s.t. x1 + x2 ≥ 1.

f(x) = x21+x2

2 and g(x) = 1−x1−x2 are convex.

Observe that x=0 is an unconstrained minimizer

but this point does not satisfy the constraint.

The solution must therefore lie on the boundary

of the feasible region and satisfy x1 + x2 = 1. It

is easy to find that x = (0.5, 0.5) and f = 0.5.

Lagrangian:

L(x, y) = x21 + x2

2 + y(1 − x1 − x2).

The Lagrangian dual:

LD(y) = minx

[x21 + x2

2 + y(1 − x1 − x2)].

For any y the Lagrangian L(x, y) is convex in x.

We can use the stationarity condition to replace

the minimization. We write:

∇xL(x, y) =

[

2x1 − y

2x2 − y

]

=

[

00

]

,

which gives x1 = 0.5y and x2 = 0.5y.

5


Example (continued)

Having substituted x1 = 0.5y and x2 = 0.5y, we

obtain:

LD(y) = y −1

2y2.

The dual problem

maxy≥0

LD(y),

thus becomes

maxy≥0

[y −1

2y2].

It has a trivial solution y = 1.

We observe that LD(y) = 12

= f . Indeed, in this

easy convex program, the duality gap is zero, i.e.,

the strong duality holds.

6


Dual Linear Program

Consider a linear program

min cTx

s.t. Ax = b,

x ≥ 0,


We associate Lagrange multipliers y ∈ Rm and

s ∈ Rn (s ≥ 0) with the constraints Ax = b and

x ≥ 0, and write the Lagrangian

L(x, y, s) = cTx − yT (Ax − b) − sTx.

7


Dual LP (cont’d)

To determine the Lagrangian dual

LD(y, s) = minx∈X

L(x, y, s)

we need stationarity with respect to x:

∇xL(x, y, s) = c − ATy − s = 0.

Hence

LD(y, s) = cTx − yT (Ax − b) − sTx

= bTy + xT(c − ATy − s) = bTy.

and the dual problem has a form:

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

where y ∈ Rm and s ∈ Rn.

8


Duality in LP

Consider a primal program

min cTx

s.t. Ax = b,

x ≥ 0,

(1)


With the primal we associate a dual program

max bTy

s.t. ATy ≤ c,

y free,

where y∈Rm. We add dual slack variables s∈Rn,

s ≥ 0 to convert inequality constraints ATy ≤ c

into equalities ATy + s = c and get an equivalent

dual program

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

(2)

where y ∈ Rm and s ∈ Rn.

Let P, D be the feasible sets of the primal and

the dual, respectively:

P = {x ∈ Rn |Ax = b, x ≥ 0}D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.

9


Weak & Strong Duality in LP

Let us introduce a convention thatinfx∈P

cTx = +∞, if P = ∅; supy∈D bTy = −∞, if D = ∅.

Weak Duality Theorem

infx∈P

cTx ≥ supy∈D

bTy.

Strong Duality Theorem

If either P 6= ∅ or D 6= ∅ then

infx∈P

cTx = supy∈D

bTy.

If one of problems (1) and (2) is solvable then

minx∈P

cTx = maxy∈D

bTy.

In IPMs we shall use the term interior-point.

Let P0, D0 be the strictly feasible sets of the

primal and the dual, respectively:

P0 = {x ∈ Rn |Ax = b, x > 0}D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.

We shall often refer to the primal-dual pair.

Hence we define primal-dual feasible set F and

primal-dual strictly feasible set F0:

F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.

10


Weak Duality for LP

Weak Duality Theorem

Let the primal-dual pair of linear programs be

given. If x ∈ P and (y, s) ∈ D, then

bTy ≤ cTx.

Proof.

Since (y, s) ∈ D, we have

ATy ≤ c.

By multiplying each of these n inequalities by an

appropriate xj, j = 1,2, ..., n and adding them up

(note that x ≥ 0 since x ∈ P), we obtain

xTATy ≤ cTx.

x ∈ P implies that Ax = b hence xTATy = bTy.

Thus we finally get

bTy ≤ cTx.

11


Complementarity, Optimality

Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

The simplex method maintains complementarity

xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,

xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .

Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.

Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.

Note that the reduced costs:

d =

[

dB

dN

]

=

[

cB

cN

]

−

[

BT

NT

]

· y = c − ATy,

are dual slack variables.

At optimality: d ≥ 0.12


Dual Quadratic Program

Consider a quadratic program

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.




L(x, y, s) = cTx +1

2xTQ x − yT(Ax−b) − sTx.

13


Dual QP (cont’d)

To determine the Lagrangian dual

LD(y, s) = minx∈X

L(x, y, s)

we need stationarity with respect to x:

∇xL(x, y, s) = c + Qx − ATy − s = 0.

Hence

LD(y, s) = cTx + 12xTQ x − yT(Ax−b) − sTx

= bTy + xT(c+Qx−ATy−s)− 12xTQ x

= bTy − 12xTQ x,

and the dual problem has the form:

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

x, s ≥ 0,

where y ∈ Rm and x, s ∈ Rn.

14


Primal-Dual Pairs

Linear programs:

The primal

min cTx

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy

s.t. ATy + s = c,

x, s ≥ 0.

Convex quadratic programs:

The primal

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy − 12xTQ x


x, s ≥ 0.

15




Turin 2008

Jacek Gondzio

Lecture 4:

IPM for LP: Motivation

1


Simplex: What’s wrong?

A vertex is defined by a set of n equations:[

B N

0 In−m

] [

xB

xN

]

=

[

b

0

]

.

The linear program with m constraints and n

variables (n ≥ m) has at most

NV =

(

nm

)

=n!

m!(n − m)!

vertices.

The simplex method can make a non-polynomial

number of iterations to reach the optimality:

V. Klee and G. Minty gave an example LP

the solution of which needs 2n

iterations:

How good is the simplex algorithm,

in: Inequalities-III, O. Shisha, ed.,

Academic Press, 1972, 159–175.

2


Simplex: What’s wrong?

Narendra Karmarkar from AT&T Bell Labs:

“the simplex [method] is complex”

N. Karmarkar:

A New Polynomial–time Algorithm for LP,

Combinatorica 4 (1984) 373–395.

3


“Elements” of the IPM

What do we need

to derive the Interior Point Method?

• duality theory:

Lagrangian function;

first order optimality conditions.

• logarithmic barriers.

• Newton method.

4


Optimality Conditions in LP

Consider the primal-dual pair:

Primal Dual

min cTx max bTy

s.t. Ax = b, s.t. ATy + s = c,

x≥0; s≥0.

Lagrangian

L(x, y) = cTx − yT(Ax − b).

Optimality Conditions in LP

Ax = b,

ATy + s = c,

XSe = 0,

x ≥ 0,

s ≥ 0,

where X = diag{x1, · · · , xn}, S = diag{s1, · · · , sn}and e = (1,1, · · · ,1) ∈ Rn.

5


Complementarity

Recall that the Simplex Method works with apartitioned formulation:

LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

Dual variables are defined as follows:

BTy = cB.

Hence the reduced costs for basic variables are

dTB = cT

B − yTB

= cTB − cT

B

= 0.

Thus, for basic variables, dB = 0 and

(xB)j · (dB)j = 0 ∀j ∈ B.

For non-basic variables, xN = 0 hence

(xN)j · (dN)j = 0 ∀j ∈ N .

The simplex method maintains the complemen-tarity of primal and dual solutions:

xj · dj = 0 ∀j = 1,2, ..., n.

6


Complementarity, Optimality

Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

The simplex method maintains complementarity

xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,

xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .

Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.

Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.

Note that the reduced costs:

d =

[

dB

dN

]

=

[

cB

cN

]

−[

BT

NT

]

· y = c − ATy,

are dual slack variables.At optimality: d ≥ 0.

7


Logarithmic barriers

The following logarithmic barrier − lnxjadded to the objective in the optimization prob-

lem prevents variable xj from approaching zero.

x

−ln x

1

In other words, the logarithmic barrier can be

used to “replace” the inequality

xj ≥ 0.

Observe that

min e−

∑nj=1

lnxj ⇐⇒ maxn∏

j=1

xj

The minimization of −∑nj=1 ln xj is equivalent

to the maximization of the product of distances

from all hyperplanes defining the positive orthant:

it prevents all xj from approaching zero.8


Use Logarithmic Barriers

Replace the primal LP

min cTx

s.t. Ax = b,

x ≥ 0,

with the primal barrier program

min cTx −n∑

j=1

lnxj

s.t. Ax = b.

Replace the dual LP

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

with the dual barrier program

max bTy +n∑

j=1ln sj

s.t. ATy + s = c.

9


First Order Optimality Conds

Consider the primal barrier program

min cTx − µn∑

j=1ln xj

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter.

Write out the Lagrangian

L(x, y, µ) = cTx − yT(Ax − b) − µn∑

j=1

lnxj,

and the conditions for a stationary point

∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,

where X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

Let us denote

s = µX−1e, i.e. XSe = µe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s = c,

XSe = µe.10


Central Trajectory

Note that the first order optimality conditions for

the barrier problemAx = b,

ATy + s = c,

XSe = µe,

approximate the first order optimality conditions

for the linear programAx = b,

ATy + s = c,

XSe = 0,

more and more closely as µ goes to zero.

Parameterµ controls the distance to optimality.

cTx−bTy = cTx−xTATy = xT(c−ATy) = xTs = nµ.

Analytic center (µ-center): a (unique) point

(x(µ), y(µ), s(µ)), x(µ) > 0, s(µ) > 0

that satisfies FOC.

The path

{(x(µ), y(µ), s(µ)) : µ > 0}is called the primal-dual central trajectory.

11


Newton Method

We use Newton Method to find a stationary

point of the barrier problem.

Recall how to use Newton Method to find a root

of a nonlinear equation

f(x) = 0.

A tangent line

z − f(xk) = ∇f(xk) · (x − xk)

is a local approximation of the graph of the func-

tion f(x). Substituting z = 0 gives a new point

xk+1 = xk − (∇f(xk))−1f(xk).

x

f(x)

xk xk+1 xk+2

f(x )k+2

f(x )k+1

f(x )k

k

z

k kz-f(x ) = f(x )(x-x )

12


Apply Newton M. to the FOC

The first order optimality conditions for the bar-

rier problem form a large system of nonlinear

equationsF(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an application

defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

XSe − µe

.

Actually, the first two terms of it are linear; only

the last one, corresponding to the complemen-

tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT I

S 0 X

.

Thus, for a given point (x, y, s) we find the New-

ton direction (∆x,∆y,∆s) by solving the system

of linear equations:

A 0 0

0 AT I

S 0 X

·

∆x

∆y

∆s

=

b − Ax

c − ATy − s

µe − XSe

.

13


Interior-Point Framework

We have already gathered all the necessary

elements to derive an interior point method.

The logarithmic barrier

− lnxj

added to the objective in the optimization prob-

lem prevents variable xj from approaching zero

and “replaces” the inequality

xj ≥ 0.

We derive the first order optimality conditionsfor the primal barrier problem:

Ax = b,

ATy + s = c,

XSe = µe,

and apply Newton method to solve this system

of nonlinear equations.

Actually, we fix the barrier parameter µ and make

only one (damped) Newton step towards the so-

lution of FOC. We do not solve the current FOC

exactly. Instead, we immediately reduce the bar-

rier parameter µ (to ensure progress towards op-

timality) and repeat the process.14


Interior Point Algorithm

Initialize

k = 0

(x0, y0, s0) ∈ F0

µ0 = 1n· (x0)T s0

α0 = 0.9995

Repeat until optimality

k = k + 1

µk = σµk−1, where σ ∈ (0,1)

∆ = Newton direction towards µ-center

Ratio test:

αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.

Make step:

xk+1 = xk + α0αP∆x,

yk+1 = yk + α0αD∆y,

sk+1 = sk + α0αD∆s.

15


Interior Point Method

• Lagrange (1788)

handling equality constraints - multipliers

minimization with equality constraints

replaced with unconstrained minimization

• Fiacco & McCormick (1968)

handling inequality constraints - log barrier

minimization with inequality constraints

replaced with a sequence of unconstrained

minimizations

• Newton (1687)

solving unconstrained minimization problems

16


Interior Point Method

x

−ln x

1

� � � � � � � � � � ��

� � � � � � � � � � ��

Analytic Center

min e−

∑nj=1

lnxj ⇐⇒ maxn∏

j=1

xj

Advantages of IPMs:

suitable for very large problems;

natural extension from LP via QP to NLP.

Iterations to reach optimum:

Theory PracticeSize o(

√n) o(log10 n)

1000 C × 32 10-2010000 C × 100 20-40100000 C × 320 30-501000000 C × 1000 40-60 17


Approaching Optimality

Simplex Method:

Basic: x > 0, s = 0 Nonbasic: x = 0, s > 0

s

x

s

x

Interior Point Method:

"Nonbasic": x = 0, s > 0"Basic": x > 0, s = 0

s

x

s

x

18


Notations

A vector of ones: e = (1,1, · · · ,1) ∈ Rn.

X = diag{x1, x2, · · · , xn} =

x1

x2. . .

xn

.

X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

An equation XSe = µe,

is equivalent to xjsj = µ, ∀j = 1,2, · · · , n.

Primal feasible set

P = {x ∈ Rn |Ax = b, x ≥ 0}.Primal strictly feasible set

P0 = {x ∈ Rn |Ax = b, x > 0}.Dual feasible set

D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.Dual strictly feasible set

D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.

Primal-dual feasible set

F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}.Primal-dual strictly feasible set

F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.19




Turin 2008

Jacek Gondzio

Lecture 5:

Path-following Method: Theory

1


Path-Following Algorithm

The analysis given in this lecture comes from the

book of Steve Wright:

Primal-Dual Interior-Point Methods,

SIAM Philadelphia, 1997.

We analyze a feasible interior-point algorithm

with the following properties:

• all its iterates are feasible and stay in a close

neighbourhood of the central path;

• the iterates follow the central path towards

optimality;

• systematic (though very slow) reduction of

duality gap is ensured.

This algorithm is called

the short-step path-following method.

Indeed, it makes very slow progress (short-steps)

to optimality.

2


Central Path Neighbourhood

Assume a primal-dual strictly feasible solution

(x, y, s) ∈ F0 lying in a neighbourhood of the

central path is given; namely (x, y, s) satisfies:

Ax = b,

ATy + s = c,XSe ≈ µe.

We define a θ-neighbourhood of the central

path N2(θ), a set of primal-dual strictly feasible

solutions (x, y, s) ∈ F0 that satisfy:

‖XSe − µe‖ ≤ θµ,

where θ ∈ (0,1) and the barrier µ satisfies:

xTs = nµ.

Hence N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}.

2θN ( ) neighbourhoodof the central path 3


Progress towards optimality

Assume a primal-dual strictly feasible solution

(x, y, s) ∈ N2(θ) for some θ ∈ (0,1) is given.

Interior point algorithm tries to move from this

point to another one that also belongs to a θ-neighbourhood of the central path but corre-

sponds to a smaller µ. The required reduction

of µ is small:

µk+1 = σµk,

whereσ = 1 − β/

√n,

for some β ∈ (0,1).

Given a new µ-center, interior point algorithm

computes Newton direction:

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

00

σµe − XSe

,

and makes step in this direction.

Magic numbers (will be explained later):

θ = 0.1 and β = 0.1.

4


O(√

n) Complexity Result

We will prove the following:

• full step in Newton direction is feasible;

• the new iterate

(xk+1,yk+1,sk+1) = (xk,yk,sk)+(∆xk,∆yk,∆sk)

belongs to a θ-neighbourhood of the new

µ-center (with µk+1 = σµk);

• duality gap is reduced 1 − β/√

n times.

Note that since at one iteration duality gap is

reduced 1 − β/√

n times, after√

n iterations the

reduction achieves:

(1 − β/√

n)√

n ≈ e−β.

After C · √n iterations, the reduction is e−Cβ.

For sufficiently large constant C the reduction

can thus be arbitrarily large (i.e. the duality gap

can become arbitrarily small).

Hence this algorithm has complexity O(√

n).

5


Technical Results

Lemma 1

Newton direction (∆x,∆y,∆s) defined by the

equation system

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

00

σµe − XSe

, (1)

satisfies:

∆xT∆s = 0.

Proof:

From the first two equations in (1) we get

A∆x = 0 and ∆s = −AT∆y.

Hence

∆xT∆s = ∆xT · (−AT∆y) = −∆yT · (A∆x) = 0.

6


Technical Results (cont’d)

Lemma 2

Let (∆x,∆y,∆s) be the Newton direction that

solves the system (1). The new iterate

(x, y, s) = (x, y, s) + (∆x,∆y,∆s)

satisfiesxT s = nµ,

whereµ = σµ.

Proof: From the third equation in (1) we get

S∆x + X∆s = −XSe + σµe.

By summing the n components of this equation

we obtain

eT(S∆x+X∆s) = sT∆x+xT∆s = −eTXSe+σµeTe

= −xT s + nσµ = −xTs · (1 − σ).

ThusxT s = (x + ∆x)T (s + ∆s)

= xT s + (sT∆x + xT∆s) + (∆x)T∆s

= xTs + (σ − 1)xT s + 0 = σxT s,

which is equivalent to:

nµ = σnµ.

7


Reminder: Norms

Norms of the vector x ∈ Rn.

‖x‖ = (n∑

j=1x2

j )1/2

‖x‖∞ = maxj∈{1..n}

|xj|

‖x‖1 =n∑

j=1|xj|

Note that for any x ∈ Rn:

‖x‖∞ ≤ ‖x‖1‖x‖1 ≤ n· ‖x‖∞‖x‖∞ ≤ ‖x‖‖x‖ ≤ √

n· ‖x‖∞‖x‖ ≤ ‖x‖1‖x‖1 ≤ √

n· ‖x‖

Recall triangle inequality.

For any vectors p, q and r and for any norm ‖.‖‖p − q‖ ≤ ‖p − r‖ + ‖r − q‖.

The relation between algebraic and geometric

means. For any scalars a and b such that ab ≥ 0:√

|ab| ≤ 1

2· |a + b|.

8


Technical Result (algebra)

Lemma 3 Let u and v be any two vectors in Rn

such that uTv ≥ 0. Then

‖UV e‖ ≤ 2−3/2‖u + v‖2,

where U =diag{u1, · · · , un}, V =diag{v1, · · · , vn}.Proof: Let us partition all products ujvj into

positive and negative ones:

P = {j |ujvj ≥ 0} and M = {j |ujvj < 0} :

0≤uTv=∑

j∈Pujvj +

∑

j∈Mujvj =

∑

j∈P|ujvj| −

∑

j∈M|ujvj|.

We can now write‖UV e‖ = (‖[ujvj]j∈P‖2 + ‖[ujvj]j∈M‖2)1/2

≤ (‖[ujvj]j∈P‖21 + ‖[ujvj]j∈M‖21)1/2

≤ (2‖[ujvj]j∈P‖21)1/2

≤√

2‖[14(uj + vj)

2]j∈P‖1= 2−3/2

∑

j∈P(uj + vj)

2

≤ 2−3/2n

∑

j=1

(uj + vj)2

= 2−3/2‖u + v‖2, as requested.

9


IPM Technical Results (cnt’d)

Lemma 4If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.

In other words,

minj∈{1..n}

xjsj ≥ (1 − θ)µ, maxj∈{1..n}

xjsj ≤ (1 + θ)µ.

Proof:Since ‖x‖∞ ≤ ‖x‖, from the definition of N2(θ),

N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ},we conclude

‖XSe − µe‖∞ ≤ ‖XSe − µe‖ ≤ θµ.

Hence

|xjsj − µ| ≤ θµ ∀j,

which is equivalent to

−θµ ≤ xjsj − µ ≤ θµ ∀j.

Thus

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.

10



Lemma 5

If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

‖XSe − σµe‖2 ≤ θ2µ2 + (1 − σ)2µ2n.

Proof:

Note first that

eT (XSe − µe) = xTs − µeT e = nµ − nµ = 0.

Therefore

‖XSe − σµe‖2

= ‖(XSe−µe) + (1−σ)µe‖2

= ‖XSe−µe‖2+2(1−σ)µeT (XSe−µe)+(1−σ)2µ2eTe

≤ θ2µ2 + (1−σ)2µ2n.

11



Lemma 6If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

‖∆X∆Se‖ ≤ θ2 + n(1−σ)2

23/2(1 − θ)µ.

Proof: 3rd equation in the Newton system gives

S∆x + X∆s = −XSe + σµe.

Having multiplied it with (XS)−1/2, we obtain

X−1/2S1/2∆x+X1/2S−1/2∆s=(XS)−1/2(−XSe+σµe).

Now apply Lemma 3 for u = X−1/2S1/2∆x and

v = X1/2S−1/2∆s (with uTv = 0 from Lemma 1)to get

‖∆X∆Se‖ = ‖(X−1/2S1/2∆X)(X1/2S−1/2∆S)e‖≤ 2−3/2‖X−1/2S1/2∆x+X1/2S−1/2∆s‖2

= 2−3/2‖X−1/2S−1/2(−XSe + σµe)‖2

= 2−3/2n∑

j=1

(−xjsj+σµ)2

xjsj

≤ 2−3/2‖XSe−σµe‖2minj xjsj

≤ θ2+n(1−σ)2

23/2(1−θ)µ.

12


Magic Numbers

We have previously set two parameters for the

short-step path-following method:

θ = 0.1 and β = 0.1.

Now it’s time to justify this particular choice.

Lemma 7

If θ = 0.1 and β = 0.1, then

θ2 + n(1−σ)2

23/2(1 − θ)≤ σθ.

Proof:

Recall that

σ = 1 − β/√

n.

Hence

n(1−σ)2 = β2

and for β = 0.1 (for any n ≥ 1)

σ ≥ 0.9.

Substituting θ = 0.1 and β = 0.1, we obtain

θ2+n(1−σ)2

23/2(1 − θ)=

0.12 + 0.12

23/2 · 0.9≤0.02≤0.9 · 0.1≤σθ.

13


Full Newton step in N2(θ)

Lemma 8Suppose (x, y, s) ∈ N2(θ) and (∆x,∆y,∆s) is theNewton direction computed from the system (1).Then the new iterate

(x, y, s) = (x, y, s) + (∆x,∆y,∆s)

satisfies (x, y, s) ∈ N2(θ), i.e. ‖XSe − µe‖ ≤ θµ.

Proof: From Lemma 2, the new iterate (x, y, s)satisfies

xT s = nµ = nσµ,

so we have to prove that ‖XSe − µe‖ ≤ θµ.For a given component j ∈ {1..n}, we have

xj sj − µ = (xj + ∆xj)(sj + ∆sj) − µ

= xjsj + (sj∆xj + xj∆sj) + ∆xj∆sj−µ= xjsj + (−xjsj + σµ) + ∆xj∆sj − σµ= ∆xj∆sj.

Thus, from Lemmas 6 and 7, we get

‖XSe − µe‖ = ‖∆X∆Se‖≤ θ2+n(1−σ)2

23/2(1−θ)µ

≤ σθµ= θµ.

14


A property of log function

Lemma 9

For all δ > −1:

ln(1 + δ) ≤ δ.

Proof:

Consider the function

f(δ) = δ − ln(1 + δ).

Its derivative is:

f′

(δ) = 1 − 1

1 + δ=

δ

1 + δ.

Obviously f′

(δ) < 0 for δ ∈ (−1,0) and f′

(δ) > 0

for δ ∈ (0,∞). Hence f(.) has a minimum at

δ=0. We find that f(δ = 0) = 0. Consequently,

for any δ ∈ (−1,∞), f(δ) ≥ 0, i.e.

δ − ln(1 + δ) ≥ 0.

15


O(√


Theorem 10

Given ǫ > 0, suppose that a feasible starting

point (x0, y0, s0) ∈ N2(0.1) satisfies

(x0)T s0 = nµ0, where µ0 ≤ 1/ǫκ,

for some positive constant κ. Then there exists

an index K with K = O(√

n ln(1/ǫ)) such that

µk ≤ ǫ, ∀k ≥ K.

16


O(√


Proof:

From Lemma 2, µk+1 = σµk. Having taken log-

arithms of both sides of this equality we obtain

lnµk+1 = lnσ + lnµk.

By repeatedly applying this formula and using

µ0 ≤ 1/ǫκ, we get

lnµk = k lnσ + lnµ0 ≤ k ln(1− β/√

n) + κ ln(1/ǫ).

From Lemma 9 we have ln(1−β/√

n)≤−β/√

n.

Thus

lnµk ≤ k(−β/√

n) + κ ln(1/ǫ).

To satisfy µk ≤ ǫ, we need:

k(−β/√

n) + κ ln(1/ǫ) ≤ ln ǫ.

This inequality holds for any k ≥ K, where

K =κ + 1

β·√

n · ln(1/ǫ).

17


Polynomial Complexity Result

Main ingredients of the polynomial complexity

result for the short-step path-following algorithm:

Stay close to the central path:

all iterates stay in the N2(θ) neighbourhood of

the central path.

Make (slow) progress towards optimality:

reduce systematically duality gap.

µk+1 = σµk,

where

σ = 1 − β/√

n,


18




Turin 2008

Jacek Gondzio

Lecture 6:

IPMs: From Theory to Practice

1


Proximity to the Central Path

The neighbourhood

N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}

is very small. In other words, the requirement

that (x, y, s) ∈ N2(θ) is extremely restrictive.

Note (Lemma 4) that if (x, y, s) ∈ N2(θ), then

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ, ∀j.

For small θ ∈ (0,1) this means that (x, y, s) is an

excellent approximation of the µ-center.

Example:

For n = 106 and θ = 0.1 suppose:

xjsj = 0.9999µ for j ≤ 500,000, and

xjsj = 1.0001µ for j ≥ 500,001.

Then ‖XSe−µe‖ = (106×0.00012µ2)1/2 = 0.1µ.

2


Wide Neighbourhood

In practice, (x, y, s) can stay quite far away from

the µ-center. The algorithm behaves well as

long as there are not too small complementarity

products compared with the others, i.e., when

(x, y, s) ∈ N∞(γ), where

N∞(γ) = {(x, y, s) ∈ F0 |xjsj ≥ γµ, ∀j},

for some (possibly small) γ ∈ (0,1).

Observe that we limit the complementarity prod-

ucts only from below but there is also an implicit

upper bound on xjsj. Indeed, since

n∑

j=1

xjsj = nµ,

we have xjsj ≤ nµ.

Advice: Use γ = 0.01.

3


Speed of Convergence

The short-step path following algorithm asks for

a very small reduction of duality gap per itera-

tion. Indeed, the required reduction of µ is:

µk+1 = σµk,

where

σ = 1 − β/√

n,


Example:

For n = 106 and β = 0.1 we have:

σ = 1 − 0.0001 = 0.9999

hence after 10,000 iterations the duality gap will

be reduced by a factor

(1 − 0.0001)10000 ≈ e−1 = 0.368

4


Aggressive targets

In practice, much more important reduction can

be achieved. On the average, the duality gap

is usually reduced by a factor of σ ∈ (0.1,0.5).

Certainly, for a practical algorithm it is absolutely

justified to set the target reduction:

σ = 0.1.

The consequence of such optimistic targets is

unfortunately the loss of the property of being

always able to make the full step in the Newton

direction. Instead, a damped Newton step is

made such that preserves nonnegativity of x and

s.

Advice: Do not use short-step method with

σ = 1 − β/√

n,

Use long-step method with

σ ≪ 1.

5


Feasible Method

The short-step (feasible) path-following method

we have analysed requires all its iterates to be

strictly feasible:

x ∈ P0={x∈Rn |Ax = b, x > 0}(y, s) ∈ D0={y∈Rm, s∈Rn |ATy + s = c, s > 0}.

In consequence the right hand side of the Newton

equation system has the form:

ξp

ξdξµ

=

b − Ax

c − ATy − sσµe − XSe

=

00ξµ

.

6


Infeasible Method

Th feasibility requirement can be relaxed. It is

possible to generalize the notion of µ-center as

well as that of the central path for infeasible

points (x, y, s).

The Newton direction is then computed from the

following equation system

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

b − Ax

c − ATy − sσµe − XSe

.

7


Further Practical Issues

Linear Algebra

Predictor-Corrector Technique

Multiple Centrality Correctors

8


Project: IPMs in the Internet

I encourage you to do this one-hour project.

9


Project: IPMs in the Internet

IPMs in the Internet:

• LP FAQ (Frequently Asked Questions):

http://www-unix.mcs.anl.gov/otc/Guide/faq/

• Interior Point Methods On-Line:

http://www-unix.mcs.anl.gov/otc/InteriorPoint/

Public Domain IPM Solvers:

• HOPDM (FORTRAN 77) by Jacek Gondzio:

http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html

• LIPSOL (MATLAB) by Yin Zhang:

http://www.caam.rice.edu/~zhang/lipsol/

• PCx (ANSI C) by Steve Wright:

http://www-fp.mcs.anl.gov/otc/Tools/PCx/

10


Your Linear Program

Let m and d denote the month and day of your

birthday. Clearly, 1 ≤ m ≤ 12 and 1 ≤ d ≤ 31.

Define a number: α = 100 · m + d.

Consider an LP

min x1 + x2 + x3 + x4 + x5 − x6

s.t. x1 + 2x2 + x4 ≤ 3

x2 + 2x3 − x6 ≤ 3

x3 − x4 + 2x5 + 3x6 ≥ 2

x1 + 3x3 − x5 + x6 ≤ αx1≥0 x2≥0 x3≥0 x4≥0 x5≥0 x6≥0.

11


Your Task

Grab one of available interior point solvers.

Install it on your computer.

You may find it easier to install it on a Unix

machine than on a PC running MS Windows.

Prepare MPS data file for your linear program

and solve the problem.

Check if the solution satisfies all the constraints.

Print MPS file and the solution file and show

them to me.

12




Turin 2008

Jacek Gondzio

Lecture 7:

IPM for Quadratic Programs

1


Convex Quadratic Programs

The quadratic function

f(x) = xTQ x

is convex if and only if the matrix Q is positive

definite.

In such case the quadratic programming problem

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

is well defined.

If there exists a feasible solution to it, then there

exists an optimal solution.

2


Convex Quadratic Programs

Convexity: Property 9:

Let C ∈ Rn be a convex set and Q a square

matrix. Let f(x) = xTQ x be a quadratic function

f : C 7→ R.

(a) f is convex iff Q is positive semidefinite.

(b) f is strictly convex iff Q is positive definite.

Def. A matrix Q ∈ Rn×n is positive definite if

xTQ x > 0 for any x 6= 0.

Example:

Consider quadratic functions f(x) = xTQ x with

the following matrices:

Q1=

[

1 00 2

]

, Q2=

[

1 00 −1

]

, Q3=

[

5 44 3

]

.

Q1 is positive definite (hence f1 is convex).

Q2 and Q3 are indefinite (f2, f3 are not convex).

3


Dual Quadratic Program

Consider a QPmin cTx + 1

2xTQ x

s.t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.




L(x, y, s) = cTx +1

2xTQ x − yT(Ax−b) − sTx.

Stationarity with respect to x:

∇xL(x, y, s) = c + Qx − ATy − s = 0

is used to determine the Lagrangian dual:

LD(y, s) = minx∈X

L(x, y, s)

= cTx + 12xTQ x − yT(Ax−b)− sTx

= bTy + xT (c+Qx−ATy−s)− 12xTQ x

= bTy − 12xTQ x,

and the dual problem has the form:

max bTy − 12xTQ x


x, s ≥ 0,

where y ∈ Rm and x, s ∈ Rn. 4


QP with IPMs

Consider the convex quadratic programming

problem.

The primal

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy − 12xTQ x


x, s ≥ 0.

Apply the usual procedure:

• replace inequalities with log barriers;

• form the Lagrangian;

• write the first order optimality conditions;

• apply Newton method to them.5


QP with IPMs: Log Barriers

Replace the primal QP

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

with the primal barrier QP

min cTx + 12xTQ x −

n∑

j=1lnxj

s.t. Ax = b.

Replace the dual QP

max bTy − 12xTQ x


y free, s ≥ 0,

with the dual barrier QP

max bTy − 12xTQ x +

n∑

j=1

ln sj

s.t. ATy + s − Qx = c.

6


First Order Optimality Conds

Consider the primal barrier QP

min cTx + 12xTQ x − µ

n∑

j=1lnxj

s.t. Ax = b,



L(x, y, µ) = cTx +1

2xTQ x − yT(Ax−b)−µ

n∑

j=1

ln xj,


∇xL(x, y, µ) = c − ATy − µX−1e + Qx = 0∇yL(x, y, µ) = Ax − b = 0,


2 , · · · , x−1n }.

Let us denote



Ax = b,

ATy + s − Qx = c,

XSe = µe.

7


Newton Method for the FOC

The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an applicationdefined as follows:

F(x, y, s) =

Ax − b

ATy + s − Qx − c

XSe − µe

.

Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

−Q AT I

S 0 X

.

Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:

A 0 0

−Q AT I

S 0 X

·

∆x

∆y

∆s

=

b − Ax

c − ATy − s + Qx

µe − XSe

.

8


Interior-Point QP Algorithm

Initialize

k = 0

(x0, y0, s0) ∈ F0

µ0 = 1n· (x0)T s0

α0 = 0.9995


k = k + 1



Ratio test:

αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.

Make step:

xk+1 = xk + α0αP∆x,

yk+1 = yk + α0αD∆y,

sk+1 = sk + α0αD∆s.

9


From LP to QP

QP problemmin cTx + 1

2xTQx

s.t. Ax = b,

x ≥ 0.

First order conditions (for barrier problem)

Ax = b,

ATy + s−Qx = c,

XSe = µe.

Newton direction

A 0 0

−Q AT I

S 0 X

∆x

∆y

∆s

=

ξp

ξd

ξµ

,

whereξp = b − Ax,

ξd = c − ATy − s+Qx,

ξµ = µe − XSe.

Augmented system[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Conclusion:

QP is a natural extension of LP.10


IPMs: LP vs QP

Augmented system in LP[

−Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Eliminate ∆x from the first equation and get

normal equations

(AΘAT)∆y = g.

Augmented system in QP[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Eliminate ∆x from the first equation and get

normal equations

(A(Q + Θ−1)−1AT)∆y = g.

One can use normal equations in LP, but not

in QP. Normal equations in QP may become al-

most completely dense even for sparse matrices

A and Q. Thus, in QP, usually the indefinite

augmented system form is used.

11


Sparsity Issues in QP

Example

1 1

1 2 1

1 2 1

1 2 1

1 2

−1

=

1

1 1

1 1

1 1

1 1

·

1 1

1 1

1 1

1 1

1

−1

=

1 −1 1 −1 1

1 −1 1 −1

1 −1 1

1 −1

1

·

1

−1 1

1 −1 1

−1 1 −1 1

1 −1 1 −1 1

=

5 −4 3 −2 1

−4 4 −3 2 −1

3 −3 3 −2 1

−2 2 −2 2 −1

1 −1 1 −1 1

.

Conclusion:

the inverse of the sparse matrix may be dense.

IPMs for QP:

Do not explicitly invert the matrix Q + Θ−1

in the matrix A(Q + Θ−1)−1AT .

Use augmented system instead.12




Turin 2008

Jacek Gondzio

Lecture 8:

Separable QuadraticPrograms

1


Separable QPs are Easy

Regarding the computations involved, a quadratic

program with diagonal matrix Q = D:

min cTx + 12xTD x

s.t. Ax = b,

x ≥ 0,

is as easy as a linear program.

Indeed, in this case, the Newton equation system

can be reduced to the following normal equation

system:

(A(D + Θ−1)−1AT)∆y = g.

Since Θ−1 = D + Θ−1 is a diagonal matrix, this

system is not more difficult to solve than a usual

system arising in LP:

(AΘAT)∆y = g.

Conclusion:

If you can formulate the QP as a separable prob-

lem, then it’s usually worth a try.

2


Separable QP: Example 1

Suppose the symmetric positive definite matrix

Q ∈ Rn×n in the quadratic program

min cTx + 12xTQ x

s.t. Ax = b, (1)

x ≥ 0,

is a product of the following matrices:

Q = FTDF,

where F ∈ Rk×n for some k ≪ n. Introduce new

variables u ∈ Rk such that u = Fx. Then

xTQ x = xTFT DFx = (Fx)T D(Fx) = uTD u.

The problem (1) can be replaced by the following

equivalent separable one:

min cTx + 12uTD u

s.t. Ax = b, (2)

Fx − u = 0,

x ≥ 0.

Although this problem has n + k variables (x, u)

(while (1) had only n variables), for small k, it is

usually much easier to solve than (1).

3


Example 1 (cont’d)

To derive the first order optimality conditions

for (2) we first introduce x = (x, u) ∈ Rn+k c =

(c,0) ∈ Rn+k and b = (b,0) ∈ Rm+k, define

A =

[

A 0F −I

]

and Q =

[

0 00 D

]

and rewrite the problem

min cT x + 12xT Q x

s.t. Ax = b,

x ≥ 0.

We associate dual variables y ∈ Rm and z ∈ Rk

with linear constraints Ax = b and Fx − u = 0,

respectively and write the Lagrangian

L(x,u,y,z,µ)=cT x+1

2xTQx−(y, z)T(Ax−b)−µ

n∑

j=1

lnxj

=cTx+1

2uTDu−yT(Ax−b)−zT(Fx−u)−µ

n∑

j=1

lnxj.

Observe that u is a free variable and there is no

logarithmic barrier introduced for it.

4



We write the conditions for a stationary point

∇xL(x, u, y, z, µ) = c − ATy − FT z − µX−1e = 0∇uL(x, u, y, z, µ) = Du + z = 0∇yL(x, u, y, z, µ) = Ax − b = 0∇zL(x, u, y, z, µ) = −Fx + u = 0,

and substitute s = µX−1e to get the first order

optimality conditions:

Ax = b,

Fx − u = 0,

ATy + FT z + s = c,

−Du − z = 0,

XSe = µe.

The Newton equation system for the FOC is:

0 AT FT I

−D −I

A

F −I

S X

∆x

∆u

∆y

∆z

∆s

=

rx

ru

ry

rz

rµ

.

5



For nonseparable problem (1), we have to solve[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

rx

ry

]

.

This is a linear system with n+m equations and

n + m unknowns.

For separable problem (2), we have to solve

−Θ−1x AT FT

−D −I

A

F −I

∆x

∆u

∆y

∆z

=

rx

ru

ry

rz

,

where Θx = XS−1 ∈ Rn×n.

This new system has n + m + 2k equations and

n+m+2k unknowns. It is larger but the matrix

involved in it is much sparser.

6



This larger system is much easier to solve. In-

deed, the upper-left 2 × 2 block is a diagonal

matrix. It can be eliminated giving the system

of equations[

A 0F −I

] [

Θx 0

0 D−1

] [

AT FT

0 −I

] [

∆y

∆z

]

=

[

ry

rz

]

.

Having done the multiplications on the left-hand-

side, we obtain[

AΘxAT AΘxFT

FΘxAT FΘxFT +D−1

] [

∆y

∆z

]

=

[

ry

rz

]

.

This reduced system has only m + k equations

and m + k unknowns.

7


Separable QP: Example 2

Suppose the symmetric positive definite matrix

Q ∈ Rn×n in the quadratic program

min cTx + 12xTQ x

s.t. Ax = b, (3)

x ≥ 0,

has the following form

Q = D + ddT ,

where D is a diagonal matrix and d ∈ Rn. Intro-

duce the new variable u ∈ R such that u = dTx.

Then

xTQx=xT(D+ddT )x=xTDx+(dTx)(dTx)=xTDx+u2.

The problem (3) can be replaced by the following

equivalent separable one:

min cTx + 12xTDx + 1

2u2

s.t. Ax = b, (4)

dTx − u = 0,

x ≥ 0.

This problem has n+1 variables (x, u) (while (3)

had only n variables). However, it is much easier

to solve than (3).

8



For nonseparable problem (3), we have to solve[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

rx

ry

]

.

This is a linear system with n+m equations and

n + m unknowns.

For separable problem (4), we have to solve

−D −Θ−1x AT d

−1 −1A

dT −1

∆x

∆u

∆y

∆z

=

rx

ru

ry

rz

,

where y ∈ Rm and z ∈ R are dual variables

associated with linear constraints Ax = b and

dTx − u = 0, respectively.

This new system has n + m + 2 equations and

n + m + 2 unknowns. It is larger but the matrix

involved in it is much sparser.

9



This larger system is much easier to solve. In-

deed, the upper-left 2 × 2 block is a diagonal

matrix. It can be eliminated giving the system

of equations[

A 0

dT −1

][

(D+Θ−1x )−1 0

0 1

][

AT d

0 −1

][

∆y

∆z

]

=

[

ry

rz

]

.

Having done the multiplications on the left-hand-

side, we obtain[

AΘxAT AΘxd

dT ΘxAT dT Θxd+1

] [

∆y

∆z

]

=

[

ry

rz

]

,

where Θx = (D+Θ−1x )−1.

This reduced system has only m + 1 equations

and m + 1 unknowns.

10




Turin 2008

Jacek Gondzio

Lecture 9:

IPMs for Nonlinear Programs

1


Convex Nonlinear Optimization

Consider the nonlinear optimization problem

min f(x)

s.t. g(x) ≤ 0,



Assumptions:

f and g are convex

⇒ If there exists a local minimum then it is a

global one.

f and g are twice differentiable

⇒ We can use the second order Taylor

approximations of them.

Some additional (technical) conditions

⇒ We need them to prove that the point which

satisfies the first order optimality conditions is

the optimum. We won’t use them in this course.

2


Taylor Expansion of f : R 7→ R

Let f : R 7→ R.

If all derivatives of f are continuously differen-

tiable at x0, then

f(x) =∞∑

k=0

f(k)(x0)

k!(x − x0)

k,

where f(k)(x0) is the k-th derivative of f at x0.

The first order approximation of the function:

f(x) = f(x0) + f′

(x0)(x − x0) + r2(x − x0),

where the remainder satisfies:

limx→x0

r2(x − x0)

x − x0= 0.

The second order approximation:

f(x) = f(x0)+f′

(x0)(x−x0)

+1

2f′′

(x0)(x−x0)2+r3(x−x0),


limx→x0

r3(x − x0)

(x − x0)2

= 0.

3


Derivatives of f : Rn7→ R

Consider a real-valued function f : Rn 7→ R.

The vector

∇f(x) =

∂f∂x1

(x)

∂f∂x2

(x)

...

∂f∂xn

(x)

is called the gradient of f at x.

The matrix

∇2f(x)=

∂2f

∂x21

(x) ∂2f∂x1∂x2

(x) . . .∂2f

∂x1∂xn

(x)

∂2f∂x2∂x1

(x) ∂2f

∂x22

(x) . . .∂2f

∂x2∂xn

(x)

. . . . . . . . . ...

∂2f∂xn∂x1

(x) ∂2f∂xn∂x2

(x) . . . ∂2f∂x2

n

(x)

is called the Hessian of f at x.

4


Taylor Expansion of f : Rn7→R

Let f : Rn 7→ R.

If all derivatives of f are continuously differen-

tiable at x0, then

f(x) =∞∑

k=0

f(k)(x0)

k!(x − x0)

k,

where f(k)(x0) is the k-th derivative of f at x0.

The first order approximation of the function:

f(x) = f(x0) + ∇f(x0)T (x − x0) + r2(x − x0),


limx→x0

r2(x − x0)

‖x − x0‖= 0.

The second order approximation:

f(x) = f(x0)+∇f(x0)T (x−x0)

+1

2(x−x0)

T∇2f(x0)(x−x0)+r3(x−x0),


limx→x0

r3(x − x0)

‖x − x0‖2= 0.

5


Convexity: Reminder

Property 1.

For any collection {Ci | i ∈ I} of convex sets, the

intersection⋂

i∈I Ci is convex.

Property 4.

If C is a convex set and f : C 7→ R is convex

function, the level sets {x ∈ C | f(x) ≤ α} and

{x ∈ C | f(x) < α} are convex for all scalars α.

Lemma 1:

If g : Rn 7→ Rm is a convex function, then the set

{x ∈ Rn | g(x) ≤ 0} is convex.

Proof:

Since every function gi : Rn 7→ R, i = 1,2, ..., m is

convex, from Property 4, we conclude that every

set Xi = {x ∈ Rn | gi(x) ≤ 0} is convex. From

Property 1, we conclude that the intersection

X =⋂m

i=1 Xi = {x ∈ Rn | g(x) ≤ 0} is convex,

which completes the proof.

6


Diff’ble Convex Functions

Property 8.

Let C ∈ Rn be a convex set and f : C 7→ R be

twice continuously differentiable over C.(a) If ∇2f(x) is positive semidefinite for all x ∈ C,

then f is convex.(b) If ∇2f(x) is positive definite for all x ∈ C,

then f is strictly convex.(c) If f is convex, then ∇2f(x) is positive semi-

definite for all x ∈ C.

Let the second order approximation of the func-

tion be given:

f(x) ≈ f(x0) + cT (x−x0) +1

2(x−x0)

TQ(x−x0),

where c = ∇f(x0) and Q = ∇2f(x0).

From Property 8, it follows that when f is convex

and twice differentiable, then Q exists and is apositive semidefinite matrix.

Conclusion:

If f is convex and twice differentiable, then op-

timization of f(x) can (locally) be replaced with

the minimization of its quadratic model.

7


Nonlinear Opt. with IPMs

Nonlinear Optimization via QPs:

Sequential Quadratic Programming (SQP).

Repeat until optimality:

• approximate NLP (locally) with a QP;

• solve (approximately) the QP.

Nonlinear Optimization with IPMs:

works similarly to SQP scheme.

However, the (local) QP approximations are not

solved to optimality. Instead, only one step in

the Newton direction corresponding to a given

QP approximation is made and the new QP ap-

proximation is computed.

Derive an IPM for NLP:

• replace inequalities with log barriers;

• form the Lagrangian;

• write the first order optimality conditions;

• apply Newton method to them.8


NLP Notation

Consider the nonlinear optimization problem

min f(x) s.t. g(x) ≤ 0,



The vector-valued function g : Rn 7→ Rm has aderivative A(x) ∈ Rm×n

A(x) = ∇g(x) =

[

∂gi

∂xj

]

i=1..m, j=1..n

which is called the Jacobian of g.

The Lagrangian associated with the NLP is:

L(x, y) = f(x) + yTg(x),

where y ∈ Rm, y ≥ 0 are Lagrange multipliers(dual variables).

The first derivatives of the Lagrangian:

∇xL(x, y) = ∇f(x) + ∇g(x)Ty

∇yL(x, y) = g(x).

The Hessian of the Lagrangian, Q(x,y)∈Rn×n:

Q(x, y) = ∇2xxL(x, y) = ∇2f(x) +

m∑

i=1

yi∇2gi(x).

9


Convexity in NLP

Lemma 2:

If f : Rn 7→ R and g : Rn 7→ Rm are convex,

twice differentiable, then the Hessian of the La-

grangian

Q(x, y) = ∇2f(x) +m∑

i=1

yi∇2gi(x)

is positive semidefinite for any x and any y ≥ 0.

If f is strictly convex, then Q(x, y) is positive

definite for any x and any y ≥ 0.

Proof:

Using Property 8, the convexity of f implies that

∇2f(x) is positive semidefinite for any x. Simi-

larly, the convexity of g implies that for all i =

1,2, ..., m, ∇2gi(x) is positive semidefinite for any

x.

Since yi ≥ 0 for all i = 1,2, ..., m and Q(x, y)

is the sum of positive semidefinite matrices, we

conclude that Q(x, y) is positive semidefinite.

If f is strictly convex, then ∇2f(x) is positive

definite and so is Q(x, y).

10


IPM for NLP

Add slack variables to nonlinear inequalities:

min f(x)

s.t. g(x) + z = 0

z ≥ 0,

where z ∈ Rm. Replace inequality z ≥ 0 with the

logarithmic barrier:

min f(x) − µm∑

i=1ln zi

s.t. g(x) + z = 0.


L(x, y, z, µ) = f(x) + yT(g(x) + z) − µm∑

i=1

ln zi,


∇xL(x, y, z, µ) = ∇f(x) + ∇g(x)Ty = 0∇yL(x, y, z, µ) = g(x) + z = 0

∇zL(x, y, z, µ) = y − µZ−1e = 0,

where Z−1 = diag{z−11 , z−1

2 , · · · , z−1m }.


∇f(x) + ∇g(x)Ty = 0,

g(x) + z = 0,

Y Ze = µe.

11


Newton Method for the FOC


rier problem form a large system of nonlinear

equationsF(x, y, z) = 0,

where F : Rn+2m 7→ Rn+2m is an application

defined as follows:

F(x, y, z) =

∇f(x) + ∇g(x)Ty

g(x) + z

Y Ze − µe

.

Note that all three terms of it are nonlinear.

(In LP and QP the first two terms were linear.)

Observe that

∇F(x, y, z)=

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

,

where A(x) is the Jacobian of g

and Q(x, y) is the Hessian of L.

They are defined as follows:

A(x) = ∇g(x) ∈ Rm×n

Q(x, y) = ∇2f(x)+m∑

i=1yi∇

2gi(x) ∈ Rn×n

12


Newton Method (cont’d)

For a given point (x, y, z) we find the Newton

direction (∆x,∆y,∆z) by solving the system of

linear equations:

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

Using the third equation we eliminate

∆z = µY −1e − Ze − ZY −1∆y,

from the second equation and get[

Q(x, y) A(x)T

A(x) −ZY −1

] [

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

13


Interior-Point NLP Algorithm

Initialize

k = 0

(x0, y0, z0) such that y0 > 0 and z0 > 0

µ0 = 1m · (y0)T z0


k = k + 1


Compute A(x) and Q(x, y)


Ratio test:

α1 := max {α > 0 : y + α∆y ≥ 0},α2 := max {α > 0 : z + α∆z ≥ 0}.

Choose the step:

(use trust region or line search)

α ≤ min {α1, α2}.

Make step:

xk+1 = xk + α∆x,

yk+1 = yk + α∆y,

zk+1 = zk + α∆z.

14


From QP to NLP

Newton direction for QP

−Q AT I

A 0 0S 0 X

∆x

∆y

∆s

=

ξd

ξp

ξµ

.

Augmented system for QP[

−Q−SX−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Newton direction for NLP

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

Augmented system for NLP[

Q(x, y) A(x)T

A(x) −ZY −1

] [

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

Conclusion:

NLP is a natural extension of QP.

15


Lin. Algebra in IPM for NLP

Newton direction for NLP

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

The corresponding augmented system[

Q(x, y) A(x)T

A(x) −ZY −1

][

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

where A(x) ∈ Rm×n is the Jacobian of g

and Q(x, y) ∈ Rn×n is the Hessian of L

A(x) = ∇g(x)

Q(x, y) = ∇2f(x)+m∑

i=1yi∇

2gi(x)

Automatic differentiation is very useful ...

get Q(x, y) and A(x) from Algebraic Modeling

Language.

Output

AML SOLVER

Num. Anal.Package

ModelSolution

16


Automatic Differentiation

AD in the Internet:

• ADIFOR (FORTRAN code for AD):

http://www-unix.mcs.anl.gov/autodiff/ADIFOR/

• ADOL-C (C/C++ code for AD):

http://www-unix.mcs.anl.gov/autodiff/

AD Tools/adolc.anl/adolc.html

• AD page at Cornell:

http://www.tc.cornell.edu/~averma/AD/

17



Conclusions:

• Interior Point Methods provide the unified

framework for convex optimization.

• Interior Point Methods provide polynomial al-

gorithms for LP, QP and NLP.

• The linear algebra in LP, QP and NLP is very

similar.

• Use IPMs to solve very large problems.

Further Extensions:

• Nonconvex optimization.

IPMs in the Internet:

• LP FAQ (Frequently Asked Questions):

http://www-unix.mcs.anl.gov/otc/Guide/faq/

• Interior Point Methods On-Line:

http://www-unix.mcs.anl.gov/otc/InteriorPoint/

• NEOS (Network Enabled Opt. Services):

http://www-neos.mcs.anl.gov/18


Project: NEOS

I encourage you to do this one-hour project.

19


Project: Use NEOS

NEOS stands for

Network Enabled Optimization Services:

http://www-neos.mcs.anl.gov/

You can use optimization facilities remotely:

prepare an MPS file with your problem,

submit it to NEOS.

NEOS server will execute your job on one of

available machines (you do not know where) and

will send you the solution by e-mail.

Your Task

Solve your LP problem via NEOS.

Use at least 3 different LP solvers.

Compare the solutions obtained.

20




Turin 2008

Jacek Gondzio

Lecture 10:

More on Newton Method

1


Newton Method

Let f : Rn 7→ Rn be a twice continuously dif-

ferentiable function such that ∇f(x) ∈ Rn×n is

nonsingular at any x. Newton method finds a

root of a nonlinear equation system

f(x) = 0,

by repeating the following step

xk+1 = xk − (∇f(xk))−1f(xk).

Let us rewrite it in a simplified form

xk+1 = φ(xk),

and observe that at the solution x, φ′

(x) = 0.

Indeed (we check it for f : R 7→ R),

φ′

(x)=(x−f(x)/f′

(x))′

=f(x)f′′

(x)/(f′

(x))2=0

because at the solution x: f(x) = 0.

Near the solution x, Newton method converges

quadratically, i.e., the error of solution reduces

as follows:

‖ek+1‖ ≤ C‖ek‖2,

where C is a constant.

2


Equations ( ⇒ ) Optimization

Let f : Rn 7→ R be a twice continuously differen-

tiable function.

Finding an (unconstrained) minimum of f (or

more generally, finding a stationary point of f)

is equivalent to solving equation

∇f(x) = 0.

This is a nonlinear system of equations that can

be solved with the Newton method.

Assume ∇2f(x) ∈ Rn×n is nonsingular at any x.

Newton method for optimization repeats the fol-

lowing step

xk+1 = xk − (∇2f(xk))−1∇f(xk).

3


Another View

Newton Method for Optimization

Let f : Rn 7→ R be a twice continuously differ-

entiable function. Suppose we build a quadratic

model f of f around a given point xk, i.e., we

define ∆x = x − xk and write:

f(x) = f(xk) + ∇f(xk)T∆x +1

2∆xT∇2f(xk)∆x

Now we optimize the model f

instead of optimizing f .

A minimum (or, more generally, a stationary point)

of the quadratic model satisfies:

∇f(x) = ∇f(xk) + ∇2f(xk)∆x = 0,

i.e.

∆x = x − xk = −(∇2f(xk))−1∇f(xk),

which reduces to the usual equation:

xk+1 = xk − (∇2f(xk))−1∇f(xk).

4


Quadratic Convergence

Let f : Rn 7→ R be a twice continuously differen-

tiable function. Let us apply Newton method to

optimize it:

xk+1 = xk − (∇2f(xk))−1∇f(xk).

Lemma (Quadratic Convergence).

If f is strongly convex with some constant m,

i.e.,

hT∇2f(x)h ≥ m‖h‖22, x, h ∈ Rn,

and ∇2f is Lipschitz continuous with constant

L, i.e.,

‖(∇2f(x)−∇2f(y))h‖2≤L‖x−y‖2‖h‖2, x,y,h∈Rn,

then

‖∇f(xk+1)‖2 ≤L

2m2‖∇f(xk)‖22.

In particular, in the region defined by the inequal-

ityL

2m2‖∇f(xk)‖2 ≤ 1,

Newton method converges quadratically.

5


Global vs Local Behaviour

Newton method behaves very well near the so-

lution where is displays quadratic convergence.

But it may behave very badly far away from it.

Why?

Newton method uses quadratic approximation.

Such approximation is valid only locally. Thus

one cannot expect that the Newton direction

∆x = −(∇2f(x))−1∇f(x),

is an improvement direction everywhere.

6


Newton Method in IPMs

Newton method behaves very well in the case of

Interior Point Methods for Linear Programming.

This is a consequence of a ’weak nonlinearity’

introduced by the logarithmic barrier function.

Newton method applied in IPMs for NLP needs

additional safeguards to ensure global conver-

gence.

There are two possible safeguards:

• use Trust Region,

i.e. believe the quadratic model only in a

neighbourhood of the current point; or

• use Line Search,

i.e. optimize f along Newton direction

∆x = −(∇2f(x))−1∇f(x).

7


Newton method may diverge

1

2

-e

x

1-e

1

f(x) = e - e1/x

-2

-1

-1

Newton method applied from different starting

points:

iter xk xk xk xk

0 −1.0 0.5 1.5 2.01 −.739 · 101 .65803014 .60987204 −.595

2 −.123 · 103 .83352370 .78563159 −.541 · 101

3 −.263 · 105 .95930672 .93302399 −.718 · 102

4 −.119 · 1010 .99752767 .99332404 −.913 · 104

5 −.244 · 1019 .99999083 .99993320 −.143 · 109

6 −.102 · 1038 1.0000000 .99999999 −.352 · 1017

7 −.180 · 1075 1.0000000 1.0000000 −.213 · 1034

8 −.555 · 10148 1.0000000 1.0000000 −.780 · 1067

The algorithm converges from x0=.5 or x0=1.5

but diverges from x0=−1.0 and x0=2.0.8


− logx Barrier Function

Consider the primal barrier linear program

min cTx − µn∑

j=1ln xj

s.t. Ax = b,



L(x, y, µ) = cTx − yT(Ax − b) − µn

∑

j=1

lnxj,


∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,


2 , · · · , x−1n }.

Let us denote



Ax = b,

ATy + s = c,XSe = µe.

9


- logx bf: Newton Method

The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an applicationdefined as follows:

F(x, y, s) =

Ax − b

ATy + s − cXSe − µe

.

Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT IS 0 X

.

Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

b − Ax

c − ATy − sµe − XSe

.

10


1/xα, α > 0 Barrier Function


min cTx + µn∑

j=1

1xα

j

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter and α > 0.


L(x, y, µ) = cTx − yT (Ax − b) + µn

∑

j=1

1

xαj

,


∇xL(x, y, µ) = c − ATy − µαX−α−1e = 0∇yL(x, y, µ) = Ax − b = 0,

where X−α−1 = diag{x−α−11 , x−α−1

2 , · · · , x−α−1n }.

Let us denote

s = µαX−α−1e, i.e. Xα+1Se = µαe.


Ax = b,

ATy + s = c,

Xα+1Se = µαe.11


1/xα, α>0 bf: Newton Method


rier problem are

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an application

defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

Xα+1Se − µαe

.

As before, only the last term, corresponding to

the complementarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT I

(α + 1)XαS 0 Xα+1

.

Thus, for a given point (x, y, s) we find the New-

ton direction (∆x,∆y,∆s) by solving the system


A 0 0

0 AT I

(α+1)XαS 0 Xα+1

·

∆x∆y∆s

=

b − Ax

c − ATy − s

µαe−Xα+1Se

.

12


e1/x Barrier Function


min cTx + µn∑

j=1

e1/xj

s.t. Ax = b,



L(x, y, µ) = cTx − yT (Ax − b) + µn

∑

j=1

e1/xj ,


∇xL(x, y, µ) = c−ATy−µX−2exp(X−1)e = 0∇yL(x, y, µ) = Ax − b = 0,

where exp(X−1) = diag{e1/x1, e1/x2, · · · , e1/xn}.

Let us denote

s=µX−2exp(X−1)e, i.e. X2exp(−X−1)Se=µe.


Ax = b,

ATy + s = c,

X2exp(−X−1)Se = µe.13


e1/x bf: Newton Method

The first order optimality conditions are

F(x, y, s) = 0,

where F :R2n+m 7→R2n+m is defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

X2exp(−X−1)Se − µe

.

As before, only the last term, corresponding to

the complementarity condition, is nonlinear.

Note that

∇F(x,y,s)=

A 0 0

0 AT I

(2X+I)exp(−X−1) 0 X2exp(−X−1)

.

Newton direction (∆x,∆y,∆s) solves the system


A 0 0

0 AT I

(2X+I)exp(−X−1)S 0 X2exp(−X−1)

·

∆x∆y∆s

=

b − Ax

c − ATy − s

µe−X2exp(−X−1)Se

.

14


Why Log Barrier is the Best?

The First Order Optimality Conditions:

− log x : XSe = µe,

1/xα : Xα+1Se = µαe,

e1/x : X2exp(−X−1)Se = µe.

Log Barrier ensures

the symmetry between the primal and the dual.

Newton Equation System:

−logx : ∇F3= [S,0, X],

1/xα : ∇F3= [(α + 1)XαS,0, Xα+1]

e1/x : ∇F3= [(2X+I)exp(−X−1)S,0, X2exp(−X−1)]

Log Barrier produces

’the weakest nonlinearity’.

15


Self-concordant Functions

There is a nice property of the function that is

responsible for a good behaviour of the Newton

method.

Def

Let C ∈ Rn be an open nonempty convex set.

Let f : C 7→ R be a three times continuously

differentiable convex function.

A function f is called self-concordant if there

exists a constant p > 0 such that

|∇3f(x)[h, h, h]| ≤ 2p−1/2(∇2f(x)[h, h])3/2,

∀x ∈ C, ∀h : x + h ∈ C.

(We then say that f is p-self-concordant).

Note that a self-concordant function is always

well approximated by the quadratic model be-

cause the error of such an approximation can be

bounded by the 3/2 power of ∇2f(x)[h, h].

16


Self-concordant Barriers

Lemma

The barrier function − logx is self-concordant on

R+.

Proof

Consider f(x) = − log x.

We compute

f′

(x) = −x−1, f′′

(x) = x−2 and f′′′

(x) = −2x−3

and check that the self-concordance condition is

satisfied for p = 1.

Lemma

The barrier function 1/xα, with α ∈ (0,∞) is not

self-concordant on R+.

Lemma

The barrier function e1/x is not self-concordant

on R+.

Use self-concordant barriers inoptimization

17

calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...

Documents

Transcript of calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...