calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...

24
Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek Gondzio Lecture 1: Convexity 1 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 What’s to come? IPMs for Optimization Convexity Theory Duality Theory Newton Method (self-concordant barriers) Interior Point Methods for LP, QP and NLP (motivation, theory, polynomial complexity) Nonlinear Optimization Linesearch methods Trust region methods Optimization Relies on Linear Algebra Positive definite, indefinite, quasidefinite sys- tems, Cholesky factorization Sparse Matrix Techniques LU decomp. (unsymmetric matrices) Cholesky decomp. (symmetric matrices) Reordering for sparsity (minimum degree, nested dissection) Applications Data mining: Support Vector Machines Markowitz portfolio optimization 2 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Convex Optimization Consider the general optimization problem min f (x) s.t. g(x) 0, where x ∈R n , and f : R n →R and g : R n →R m are convex, twice differentiable. Basic Assumptions: f and g are convex If there exists a local minimum then it is a global one. f and g are twice differentiable We can use the second order Taylor approximations of them. 3 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Glossary LP: Linear Programming both f and g are linear. QP: Quadratic Programming f is quadratic and g is linear. NLP: Nonlinear Programming f or g is nonlinear. SDP: Semidefinite Programming f,g are functions of positive definite matrices. 4 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Convexity Convexity is a key property in optimization. Def. A set C ⊂R n is convex if λx + (1 λ)y C, x, y C, λ [0, 1]. y z x x y z Convex set Nonconvex set Def. Let C be a convex subset of R n . A function f : C →R is convex if f (λx+(1 λ)y)λf (x)+(1 λ)f (y), x, yC, λ[0,1]. x z y x z y Convex function Nonconvex function 5 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Convexity (cnt’d) Def. Let C be a convex subset of R n . A function f : C →R is concave if f (λx+(1 λ)y)λf (x)+(1 λ)f (y), x, yC, λ[0,1]. Remark. A function f : C →R is concave if and only if function f is convex. Def. Let C be a convex subset of R n . A function f : C →R is strictly convex if f (λx +(1λ)y)<λf (x)+(1λ)f (y), x,yC, λ(0,1). Def. Let C be a convex subset of R n . A function f : C →R is strictly concave if f (λx +(1λ)y)>λf (x)+(1λ)f (y), x,yC, λ(0,1). 6 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Convexity and Optimization Consider a problem min f (x) s.t. x X, where X is a set of feasible solutions and f : X →R is an objective function. Def. A vector ˆ x is a local minimum of f if ǫ> 0 such that f x) f (x), x |‖x ˆ x< ǫ. Def. A vector ˆ x is a global minimum of f if f x) f (x), x X. Lemma. If X is a convex set and f : X →R is a convex function, then a local minimum is a global minimum. Proof. Suppose that x is a local minimum, but not a global one. Then y = x such that f (y) < f (x). From convexity of f , we have λ [0, 1] f ((1λ)x+λy) (1λ)f (x)+λf (y) < (1λ)f (x)+λf (x)= f (x). In particular, for a sufficiently small λ, the point z = (1λ)x+λy lies in the ǫ-neighbourhood of x and f (z) <f (x). This contradicts the assump- tion that x is a local minimum. 7 IPMs for LP, QP, NLP, J. Gondzio, Turin 2008 Properties 1. For any collection {C i | i I } of convex sets, the intersection iI C i is convex. 2. The vector sum {x 1 + x 2 | x 1 C 1 ,x 2 C 2 } of two convex sets C 1 and C 2 is convex. 3. The image of a convex set under a linear transformation is convex. 4. If C is a convex set and f : C →R is a convex function, the level sets {x C | f (x) α} and {x C | f (x) } are convex for all scalars α. 5. For any collection {f i : C →R| i I } of convex functions, the weighted sum, with pos- itive weights w i > 0, i I , i.e. the function f = iI w i f i : C →R, is convex. 6. If I is an index set, C ∈R n is a convex set, and f i : C →R is convex i I , then the function h : C →R defined by h(x) = sup iI f i (x) is also convex. 8

Transcript of calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...

Page 1: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 1:

Convexity

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

What’s to come?

IPMs for Optimization

• Convexity Theory

• Duality Theory

• Newton Method (self-concordant barriers)

• Interior Point Methods for LP, QP and NLP

(motivation, theory, polynomial complexity)

Nonlinear Optimization

• Linesearch methods

• Trust region methods

Optimization Relies on Linear Algebra

• Positive definite, indefinite, quasidefinite sys-

tems, Cholesky factorization

• Sparse Matrix Techniques

– LU decomp. (unsymmetric matrices)

– Cholesky decomp. (symmetric matrices)

• Reordering for sparsity

(minimum degree, nested dissection)

Applications

• Data mining: Support Vector Machines

• Markowitz portfolio optimization2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Optimization

Consider the general optimization problem

min f(x)

s.t. g(x) ≤ 0,

where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm

are convex, twice differentiable.

Basic Assumptions:

f and g are convex

⇒ If there exists a local minimum then it is a

global one.

f and g are twice differentiable

⇒ We can use the second order Taylor

approximations of them.

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Glossary

LP: Linear Programming

both f and g are linear.

QP: Quadratic Programming

f is quadratic and g is linear.

NLP: Nonlinear Programming

f or g is nonlinear.

SDP: Semidefinite Programming

f, g are functions of positive definite matrices.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convexity

Convexity is a key property in optimization.

Def. A set C ⊂ Rn is convex if

λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0,1].

y

z

x

x

y

z

Convex set Nonconvex set

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is convex if

f(λx+(1−λ)y)≤λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].

x z y x z y

Convex function Nonconvex function5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convexity (cnt’d)

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is concave if

f(λx+(1−λ)y)≥λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].

Remark. A function f : C 7→ R is concave if and

only if function −f is convex.

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is strictly convex if

f(λx+(1−λ)y)<λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is strictly concave if

f(λx+(1−λ)y)>λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convexity and Optimization

Consider a problem

min f(x) s.t. x ∈ X,

where X is a set of feasible solutions

and f : X → R is an objective function.

Def. A vector x is a local minimum of f if

∃ǫ > 0 such that f(x) ≤ f(x), ∀x | ‖x − x‖ < ǫ.

Def. A vector x is a global minimum of f if

f(x) ≤ f(x), ∀x ∈ X.

Lemma. If X is a convex set and f : X 7→ R

is a convex function, then a local minimum is a

global minimum.

Proof. Suppose that x is a local minimum, but

not a global one. Then ∃y 6=x such that f(y)<

f(x). From convexity of f , we have ∀λ∈ [0,1]

f((1−λ)x+λy) ≤ (1−λ)f(x)+λf(y)

< (1−λ)f(x)+λf(x) = f(x).

In particular, for a sufficiently small λ, the point

z = (1−λ)x+λy lies in the ǫ-neighbourhood of x

and f(z) < f(x). This contradicts the assump-

tion that x is a local minimum. 7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Properties

1. For any collection {Ci | i ∈ I} of convex sets,

the intersection⋂

i∈I Ci is convex.

2. The vector sum {x1 + x2 | x1 ∈ C1, x2 ∈ C2}of two convex sets C1 and C2 is convex.

3. The image of a convex set under a linear

transformation is convex.

4. If C is a convex set and f : C 7→ R is a convex

function, the level sets {x ∈ C | f(x) ≤ α} and

{x ∈ C | f(x) < α} are convex for all scalars α.

5. For any collection {fi : C 7→ R | i ∈ I} of

convex functions, the weighted sum, with pos-

itive weights wi > 0, i ∈ I, i.e. the function

f =∑

i∈I wifi : C 7→ R, is convex.

6. If I is an index set, C ∈ Rn is a convex set,

and fi : C 7→ R is convex ∀i ∈ I, then the function

h : C 7→ R defined by

h(x) = supi∈I

fi(x)

is also convex.8

Page 2: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Differentiable Convex Fnctns

7. Let C ∈ Rn be a convex set and f : C 7→ R be

differentiable over C.

(a) The function f is convex if and only if

f(y) ≥ f(x) + ∇Tf(x)(y − x), ∀x, y ∈ C.

(b) If the inequality is strict for x 6= y, then f is

strictly convex.

8. Let C ∈ Rn be a convex set and f : C 7→ R be

twice continuously differentiable over C.

(a) If ∇2f(x) is positive semidefinite for all x ∈ C,

then f is convex.

(b) If ∇2f(x) is positive definite for all x ∈ C,

then f is strictly convex.

(c) If f is convex, then ∇2f(x) is positive semi-

definite for all x ∈ C.

9. Let C ∈ Rn be a convex set and Q a square

matrix. Let f(x) = xTQ x be a quadratic function

f : C 7→ R.

(a) f is convex iff Q is positive semidefinite.

(b) f is strictly convex iff Q is positive definite.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Proof of Property 4

Define

Xα = {x ∈ C : f(x) ≤ α}.

We will prove that Xα is convex.

Take any x, y ∈ Xα. From the definition of Xα

we get that f(x) ≤ α and f(y) ≤ α.

Take any λ ∈ [0,1] and define z = (1− λ)x + λy.

From the convexity of f we get

f(z) = f((1−λ)x+λy)

≤ (1−λ)f(x)+λf(y)

≤ (1−λ)α+λα = α.

Hence z ∈ Xα which completes the proof.

The proof for a strong inequality is identical.

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Proof of Property 7 (a)

Part 1 ( ⇒ )

Take any x, y ∈ C, and any λ ∈ [0,1].

From convexity of f we get

f(x + λ(y − x)) ≤ (1 − λ)f(x) + λf(y).

Hence

f(x+λ(y−x))−f(x)≤λ(f(y)−f(x))

and

f(x + λ(y − x)) − f(x)

λ≤ f(y) − f(x).

Let λ → 0+. Then the left hand side becomes

∇Tf(x)(y−x) (a derivative of f in direction y−x)

implying

∇Tf(x)(y − x) ≤ f(y) − f(x),

which completes this part of the proof.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Proof of Property 7 (a)

Part 2 ( ⇐ )

Take any x, y ∈ C, and any λ ∈ [0,1].

Let z = λx + (1 − λ)y.

Since x − z = x − y − λ(x − y), we have

f(x) ≥ f(z) + ∇Tf(z)(1 − λ)(x − y).

For y − z = −λ(x − y), we have

f(y) ≥ f(z) + ∇Tf(z)(−λ)(x − y).

Having multiplied the first inequality by λ and

the second by 1 − λ and having added them we

get

λf(x) + (1 − λ)f(y) ≥ f(z),

which proves the convexity of f .

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

More on Convexity

Def. Let C be a convex subset of Rn.

A function f : C 7→ R is quasi-convex if

f(λx+(1−λ)y)≤max{f(x),f(y)}, ∀x, y∈C, ∀λ∈[0,1].

Quasi-convex fctn Quasi-concave fctn

Lemma. Let C be a nonempty convex set. A

function f : C 7→ R is quasi-convex if and only if

the level set Sα = {x ∈ C | f(x) ≤ α} is convexfor every real number α.

Def. Let C be a convex subset of Rn.A differentiable function f : C 7→ R is called

pseudo-convex if for any x, y∈C, the inequality

∇Tf(x)(y − x) ≥ 0

implies thatf(y) ≥ f(x).

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Linear Programming

Consider a Linear Program (LP)

min cTx

s. t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.

Matrix A has a full row rank, m (m ≤ n).

Let P be the (primal) feasible set:

P = {x ∈ Rn |Ax = b, x ≥ 0},and P0 be the (primal) strictly feasible set:

P0 = {x ∈ Rn |Ax = b, x > 0}.

Lemma. P is a convex set.

Proof. Note that a linear function is convex.

Thus P is an intersection of convex sets and from

Property 1 it is convex.

Corollary. LP is a convex optimization problem.

Proof. The objective function is linear hence

convex. From Lemma, the feasible set of an LP

is also convex.

14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Quadratic Program

Def. A matrix H ∈ Rn×n is positive definite if

xTH x > 0 for any x 6= 0.

Example: H1 is positive definite, H2 is not.

H1 =

[

2 33 5

]

and H2 =

[

2 33 4

]

.

Indeed:

f1(x1, x2) = 2x21 + 6x1x2 + 5x2

2

= 2(x1 +3

2x2)

2 +1

2x22 ≥ 0.

f2(x1, x2) = 2x21 + 6x1x2 + 4x2

2

= 2(x1 +3

2x2)

2 −1

2x22

and this does not have to be nonnegative.

From Property 9, the function f(x) = xTQ x is

convex iff Q is positive semidefinite. The QP

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

is a convex optimization problem iff Q is positive

semidefinite. 15

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Nonlinear Program

Consider a general optimization problem

min f(x)

s.t. g(x) ≤ 0,

where x∈Rn, and f : Rn 7→R and g : Rn 7→Rm.

Lemma. If f : Rn 7→ R and g : Rn 7→ Rm are

convex, then the above problem is convex.

Proof. Since the objective function f is convex,

we only need to prove that the feasible set of the

above problem

X = {x ∈ Rn : g(x) ≤ 0}

is convex. Define for i = 1,2, ..., m

Xi = {x ∈ Rn : gi(x) ≤ 0}.

From Property 4, Xi is convex for all i.

We observe that

X = {x ∈ Rn : gi(x) ≤ 0, ∀i = 1..m} =⋂

i

Xi.

i.e., X is an intersection of convex sets and from

Property 1, X is a convex set.

16

Page 3: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 2:

Duality

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Lagrangian

Consider a general optimization problem

min f(x)

s.t. g(x) ≤ 0, (1)

x ∈ X ⊆ Rn,

where f : Rn 7→R and g : Rn 7→Rm.

The set X is arbitrary; it may include, for exam-

ple, an integrality constraint.

The constraint g(x) ≤ 0 is understood as:

gi(x) ≤ 0, ∀i = 1,2, ..., m,

i.e., as m inequalities.

Let x be an optimal solution of (1) and define

f = f(x).

Introduce the Lagrange multiplier yi ≥ 0 for

every inequality constraint gi(x) ≤ 0.

Define y = (y1, . . . , ym)T and the Lagrangian

L(x, y) = f(x) + yTg(x),

y are also called dual variables.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Lagrangian Duality

Consider the problem

LD(y) = minx

L(x, y)

x ∈ X ⊆ Rn.

Its optimal solution x depends on y and so doesthe optimal objective LD(y).

Lemma. For any y ≥ 0, LD(y) is a lower boundon f (the optimal solution of (1)), i.e.,

f ≥ LD(y) ∀y ≥ 0.

Proof.

f = min {f(x) | g(x) ≤ 0, x ∈ X}

≥ min{

f(x) + yTg(x) | g(x) ≤ 0, y ≥ 0, x ∈ X}

≥ min{

f(x) + yTg(x) | y ≥ 0, x ∈ X}

= LD(y).

Corollary.f ≥ max

y≥0LD(y),

i.e.,f ≥ max

y≥0minx∈X

L(x, y).

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Lagrangian Duality

Observe that:

If ∃i gi(x) > 0, then

maxy≥0

L(x, y) = +∞

(we let the corresponding yi grow to +∞).

If ∀i gi(x) ≤ 0, then

maxy≥0

L(x, y) = f(x),

because ∀i yigi(x) ≤ 0 and the maximum is at-

tained when

yigi(x) = 0, ∀i = 1,2, ..., m.

Hence the problem (1) is equivalent to the fol-

lowing MinMax problem

minx∈X

maxy≥0

L(x, y),

which could also be written as follows:

f = minx∈X

maxy≥0

L(x, y).

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Weak Duality

Consider the following problem

min {f(x) | g(x) ≤ 0, x ∈ X} ,

where f , g and X are arbitrary.

With this problem we associate the Lagrangian

L(x, y) = f(x) + yTg(x),

y are dual variables (Lagrange multipliers).

The weak duality always holds:

minx∈X

maxy≥0

L(x, y) ≥ maxy≥0

minx∈X

L(x, y).

Observe that we have not made any assumption

about functions f and g and set X.

If f and g are convex, X is convex and certain

regularity conditions are satisfied, then

minx∈X

maxy≥0

L(x, y) = maxy≥0

minx∈X

L(x, y).

This is called the strong duality.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Notation

Consider again the problem

min f(x)

s.t. g(x) ≤ 0,

x ∈ X ⊆ Rn,

where f : Rn 7→R and g : Rn 7→Rm.

Take x ∈ X ⊆ Rn and y ∈ Y = {y ∈ Rm, y ≥ 0}and write the Lagrangian

L(x, y) = f(x) + yTg(x).

Define the primal function

LP (x) =

{

f(x) if ∀i gi(x) ≤ 0+∞ if ∃i gi(x) > 0.

Observe that

LP (x) = maxy≥0

L(x, y). (2)

Define the dual function

LD(y) = minx∈X

L(x, y). (3)

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Primal & Dual Problems

The problem (1) can be formulated as looking

for x ∈ X ⊆ Rn such that

LP (x) = minx∈X

LP (x).

It is called the primal problem.

The problem

LD(y) = maxy≥0

LD(y).

is called the dual problem.

The weak duality can be rewritten as:

LP (x) ≥ LD(y).

Def. Primal feasible set.

XP = {x : x ∈ X, gi(x) ≤ 0, i = 1,2, . . . , m}.

Def. Dual feasible set. A tuple (x, y) ∈ Rn+m

is feasible for the dual problem if

(x,y)∈YD = {(x,y): x∈X, y∈Y, LD(y)=L(x,y)}.

Def. Dual optimal solution.

A tuple (x, y) ∈ Rn+m is called dual optimal if

(x, y) ∈ YD and y maximizes LD(y).7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Primal-Dual Bounds

Lemma. If x1 ∈ XP and (x2, y2) ∈ YD (i.e., x1 is

primal feasible and (x2, y2) is dual feasible), then

LP (x1) ≥ LD(y2).

Proof. Since x1 ∈ XP we get LP (x1) = f(x1).

For any y ∈ Y , from definition (2) we have

LP (x1) ≥ L(x1, y). In particular, for y = y2:

LP (x1) ≥ L(x1, y2). (4)

On the other hand, (x2, y2) ∈ YD hence for any

x ∈ X from (3) we have L(x, y2) ≥ LD(y2) and,

in particular, for x = x1:

L(x1, y2) ≥ LD(y2). (5)

From (4) and (5) we get

f(x1) = LP (x1) ≥ L(x1, y2) ≥ LD(y2),

which completes the proof.

Any primal feasible solution provides an upper

bound for the dual problem, and

any dual feasible solution provides a lower

bound for the primal problem.

8

Page 4: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Duality and Convexity

Recall that the weak duality holds regardless of

the form of functions f , g and set X:

minx∈X

maxy≥0

L(x, y) ≥ maxy≥0

minx∈X

L(x, y).

What do we need to assume for the inequality in

the weak duality to become an equation?

If

• X ⊆ Rn is convex;

• f and g are convex;

• optimal solution is finite;

• some mysterious regularity conditions hold,

then strong duality holds.

That is

minx∈X

maxy≥0

L(x, y) = maxy≥0

minx∈X

L(x, y).

An example of regularity conditions:

∃x ∈ int(X) such that g(x) < 0.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Geometric View

Consider a mapping which for any x ∈ X defines

a point in Rm+1 of the form (g(x), f(x)).

We write x 7→ (g, f). Let H be the image of X.

In the example below n = 2 and m = 1. Hence:

x ∈ X ⊆ R2 and f : R2 7→ R and g : R2 7→ R.

Lagrange multiplier: y ∈ R (y ≥ 0).

x x[g( ),f( )]

yL ( )D

x

x

x

1

2f

X

(g,f)

f+yg = constg

(g,f)H

slope: -y

slope: -y

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Figure Interpretation

Primal problem:

We look for a point (g, f) ∈ H such that

g ≤ 0 and f attains its minimum.

This is the point (g, f) in the Figure.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Figure Interpretation

Dual problem:

Take y≥0. To find LD(y), we need to minimize

f(x) + yg(x) with respect to x ∈ X. This cor-

responds to the minimization of the linear form

f + yg in the set H.

For a given y ≥ 0, the linear form f + yg has

a fixed slope (equal to −y) and the minimum is

attained when the line f + yg touches the bot-

tom of H. We say that “the hyperplane f + yg

supports the set H”.

The intersection of the supporting plane and the

f line determines the value of LD(y).

The dual problem consists in finding such a slope

y that LD(y) is maximized, i.e., the intersection

of the supporting plane and the f axis is the

highest possible.

There are two supporting hyperplanes in the Fig-

ure. The one corresponding to y corresponds to

the maximum of LD(y).

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Nonzero Duality Gap

When sufficient conditions for strong duality are

not satisfied, we may observe a nonzero duality

gap:

minx∈X

maxy≥0

L(x, y) − maxy≥0

minx∈X

L(x, y) > 0.

In the Figure below:

f − LD(y) > 0.

x x[g( ),f( )]

yL ( )D

x

x2f

X

(g,f)

g

(g,f)H

slope: -y

x1

f

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Read more on duality

1. Bertsekas, D., Nonlinear Programming,

Athena Scientific, Massachusetts, 1995.

ISBN 1-886529-14-0, pages 415-486.

2. Hillier, F.S. and Lieberman, G.J.,

Introduction to Operations Research,

7th edition, McGraw Hill, 2001.

ISBN 0-07-232169-5, pages 230-308.

14

Page 5: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 3:

Duality

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Equality Constraints

Let h : Rn 7→ Rk define an equality constraint

h(x) = 0 (understood as hj(x) = 0, j = 1, ..., k).

Replace hj(x) = 0 with two inequalities:

hj(x) ≤ 0 and − hj(x) ≤ 0.

Then the optimization problem

min f(x)

s.t. g(x) ≤ 0,

h(x) = 0,

x ∈ X ⊆ Rn,

where f : Rn 7→R, g : Rn 7→Rm and h : Rn 7→Rk,

becomes:

min f(x)

s.t. g(x) ≤ 0,

h(x) ≤ 0,

−h(x) ≤ 0,

x ∈ X ⊆ Rn.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Equality Constraints (cont’d)

Use nonnegative Lagrange multipliers y ∈ Rm for

g constraints.

Use a pair of Lagrange multipliers u+j ≥ 0 and

u−j ≥ 0 for inequalities hj(x) ≤ 0 and −hj(x) ≤

0, respectively. In other words, use two vectors

u+ ≥ 0 and u− ≥ 0, both in Rk and write the

Lagrangian

L(x, y, u+,u−) = f(x)+yTg(x)+(u+)Th(x)−(u−)Th(x)

= f(x)+yTg(x)+(u+− u−)Th(x)

= f(x)+yTg(x)+uTh(x),

where the vector u = u+ − u− ∈ Rk has no sign

restriction.

The Lagrangian becomes:

L(x, y, u) = f(x)+yTg(x)+uTh(x),

and all theoretical results derived earlier can be

replicated for this new problem formulation.

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Wolfe Duality

Lagrange duality does not need differentiability.

Suppose f and g are convex and differentiable.

Suppose X is convex.

The dual function

LD(y) = minx∈X

L(x, y).

requires minimization with respect to x.

Instead of minimization with respect to x,

we ask for a stationarity with respect to x:

∇xL(x, y) = 0.

Lagrange dual problem:

maxy≥0

LD(y)

(

i.e., maxy≥0

minx∈X

L(x, y)

)

.

Wolfe dual problem:

max L(x, y)

s.t. ∇xL(x, y) = 0

y ≥ 0.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Duality: Example

Consider the nonlinear program:

min f(x) = x21 + x2

2 s.t. x1 + x2 ≥ 1.

f(x) = x21+x2

2 and g(x) = 1−x1−x2 are convex.

Observe that x=0 is an unconstrained minimizer

but this point does not satisfy the constraint.

The solution must therefore lie on the boundary

of the feasible region and satisfy x1 + x2 = 1. It

is easy to find that x = (0.5, 0.5) and f = 0.5.

Lagrangian:

L(x, y) = x21 + x2

2 + y(1 − x1 − x2).

The Lagrangian dual:

LD(y) = minx

[x21 + x2

2 + y(1 − x1 − x2)].

For any y the Lagrangian L(x, y) is convex in x.

We can use the stationarity condition to replace

the minimization. We write:

∇xL(x, y) =

[

2x1 − y

2x2 − y

]

=

[

00

]

,

which gives x1 = 0.5y and x2 = 0.5y.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example (continued)

Having substituted x1 = 0.5y and x2 = 0.5y, we

obtain:

LD(y) = y −1

2y2.

The dual problem

maxy≥0

LD(y),

thus becomes

maxy≥0

[y −1

2y2].

It has a trivial solution y = 1.

We observe that LD(y) = 12

= f . Indeed, in this

easy convex program, the duality gap is zero, i.e.,

the strong duality holds.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Dual Linear Program

Consider a linear program

min cTx

s.t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.

We associate Lagrange multipliers y ∈ Rm and

s ∈ Rn (s ≥ 0) with the constraints Ax = b and

x ≥ 0, and write the Lagrangian

L(x, y, s) = cTx − yT (Ax − b) − sTx.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Dual LP (cont’d)

To determine the Lagrangian dual

LD(y, s) = minx∈X

L(x, y, s)

we need stationarity with respect to x:

∇xL(x, y, s) = c − ATy − s = 0.

Hence

LD(y, s) = cTx − yT (Ax − b) − sTx

= bTy + xT(c − ATy − s) = bTy.

and the dual problem has a form:

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

where y ∈ Rm and s ∈ Rn.

8

Page 6: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Duality in LP

Consider a primal program

min cTx

s.t. Ax = b,

x ≥ 0,

(1)

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.

With the primal we associate a dual program

max bTy

s.t. ATy ≤ c,

y free,

where y∈Rm. We add dual slack variables s∈Rn,

s ≥ 0 to convert inequality constraints ATy ≤ c

into equalities ATy + s = c and get an equivalent

dual program

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

(2)

where y ∈ Rm and s ∈ Rn.

Let P, D be the feasible sets of the primal and

the dual, respectively:

P = {x ∈ Rn |Ax = b, x ≥ 0}D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Weak & Strong Duality in LP

Let us introduce a convention thatinfx∈P

cTx = +∞, if P = ∅; supy∈D bTy = −∞, if D = ∅.

Weak Duality Theorem

infx∈P

cTx ≥ supy∈D

bTy.

Strong Duality Theorem

If either P 6= ∅ or D 6= ∅ then

infx∈P

cTx = supy∈D

bTy.

If one of problems (1) and (2) is solvable then

minx∈P

cTx = maxy∈D

bTy.

In IPMs we shall use the term interior-point.

Let P0, D0 be the strictly feasible sets of the

primal and the dual, respectively:

P0 = {x ∈ Rn |Ax = b, x > 0}D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.

We shall often refer to the primal-dual pair.

Hence we define primal-dual feasible set F and

primal-dual strictly feasible set F0:

F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Weak Duality for LP

Weak Duality Theorem

Let the primal-dual pair of linear programs be

given. If x ∈ P and (y, s) ∈ D, then

bTy ≤ cTx.

Proof.

Since (y, s) ∈ D, we have

ATy ≤ c.

By multiplying each of these n inequalities by an

appropriate xj, j = 1,2, ..., n and adding them up

(note that x ≥ 0 since x ∈ P), we obtain

xTATy ≤ cTx.

x ∈ P implies that Ax = b hence xTATy = bTy.

Thus we finally get

bTy ≤ cTx.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Complementarity, Optimality

Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

The simplex method maintains complementarity

xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,

xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .

Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.

Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.

Note that the reduced costs:

d =

[

dB

dN

]

=

[

cB

cN

]

[

BT

NT

]

· y = c − ATy,

are dual slack variables.

At optimality: d ≥ 0.12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Dual Quadratic Program

Consider a quadratic program

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.

We associate Lagrange multipliers y ∈ Rm and

s ∈ Rn (s ≥ 0) with the constraints Ax = b and

x ≥ 0, and write the Lagrangian

L(x, y, s) = cTx +1

2xTQ x − yT(Ax−b) − sTx.

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Dual QP (cont’d)

To determine the Lagrangian dual

LD(y, s) = minx∈X

L(x, y, s)

we need stationarity with respect to x:

∇xL(x, y, s) = c + Qx − ATy − s = 0.

Hence

LD(y, s) = cTx + 12xTQ x − yT(Ax−b) − sTx

= bTy + xT(c+Qx−ATy−s)− 12xTQ x

= bTy − 12xTQ x,

and the dual problem has the form:

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

x, s ≥ 0,

where y ∈ Rm and x, s ∈ Rn.

14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Primal-Dual Pairs

Linear programs:

The primal

min cTx

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy

s.t. ATy + s = c,

x, s ≥ 0.

Convex quadratic programs:

The primal

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

x, s ≥ 0.

15

Page 7: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 4:

IPM for LP: Motivation

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Simplex: What’s wrong?

A vertex is defined by a set of n equations:[

B N

0 In−m

] [

xB

xN

]

=

[

b

0

]

.

The linear program with m constraints and n

variables (n ≥ m) has at most

NV =

(

nm

)

=n!

m!(n − m)!

vertices.

The simplex method can make a non-polynomial

number of iterations to reach the optimality:

V. Klee and G. Minty gave an example LP

the solution of which needs 2n

iterations:

How good is the simplex algorithm,

in: Inequalities-III, O. Shisha, ed.,

Academic Press, 1972, 159–175.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Simplex: What’s wrong?

Narendra Karmarkar from AT&T Bell Labs:

“the simplex [method] is complex”

N. Karmarkar:

A New Polynomial–time Algorithm for LP,

Combinatorica 4 (1984) 373–395.

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

“Elements” of the IPM

What do we need

to derive the Interior Point Method?

• duality theory:

Lagrangian function;

first order optimality conditions.

• logarithmic barriers.

• Newton method.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Optimality Conditions in LP

Consider the primal-dual pair:

Primal Dual

min cTx max bTy

s.t. Ax = b, s.t. ATy + s = c,

x≥0; s≥0.

Lagrangian

L(x, y) = cTx − yT(Ax − b).

Optimality Conditions in LP

Ax = b,

ATy + s = c,

XSe = 0,

x ≥ 0,

s ≥ 0,

where X = diag{x1, · · · , xn}, S = diag{s1, · · · , sn}and e = (1,1, · · · ,1) ∈ Rn.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Complementarity

Recall that the Simplex Method works with apartitioned formulation:

LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

Dual variables are defined as follows:

BTy = cB.

Hence the reduced costs for basic variables are

dTB = cT

B − yTB

= cTB − cT

B

= 0.

Thus, for basic variables, dB = 0 and

(xB)j · (dB)j = 0 ∀j ∈ B.

For non-basic variables, xN = 0 hence

(xN)j · (dN)j = 0 ∀j ∈ N .

The simplex method maintains the complemen-tarity of primal and dual solutions:

xj · dj = 0 ∀j = 1,2, ..., n.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Complementarity, Optimality

Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).

The simplex method maintains complementarity

xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,

xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .

Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.

Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.

Note that the reduced costs:

d =

[

dB

dN

]

=

[

cB

cN

]

−[

BT

NT

]

· y = c − ATy,

are dual slack variables.At optimality: d ≥ 0.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Logarithmic barriers

The following logarithmic barrier − lnxjadded to the objective in the optimization prob-

lem prevents variable xj from approaching zero.

x

−ln x

1

In other words, the logarithmic barrier can be

used to “replace” the inequality

xj ≥ 0.

Observe that

min e−

∑nj=1

lnxj ⇐⇒ maxn∏

j=1

xj

The minimization of −∑nj=1 ln xj is equivalent

to the maximization of the product of distances

from all hyperplanes defining the positive orthant:

it prevents all xj from approaching zero.8

Page 8: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Use Logarithmic Barriers

Replace the primal LP

min cTx

s.t. Ax = b,

x ≥ 0,

with the primal barrier program

min cTx −n∑

j=1

lnxj

s.t. Ax = b.

Replace the dual LP

max bTy

s.t. ATy + s = c,

y free, s ≥ 0,

with the dual barrier program

max bTy +n∑

j=1ln sj

s.t. ATy + s = c.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

First Order Optimality Conds

Consider the primal barrier program

min cTx − µn∑

j=1ln xj

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter.

Write out the Lagrangian

L(x, y, µ) = cTx − yT(Ax − b) − µn∑

j=1

lnxj,

and the conditions for a stationary point

∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,

where X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

Let us denote

s = µX−1e, i.e. XSe = µe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s = c,

XSe = µe.10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Central Trajectory

Note that the first order optimality conditions for

the barrier problemAx = b,

ATy + s = c,

XSe = µe,

approximate the first order optimality conditions

for the linear programAx = b,

ATy + s = c,

XSe = 0,

more and more closely as µ goes to zero.

Parameterµ controls the distance to optimality.

cTx−bTy = cTx−xTATy = xT(c−ATy) = xTs = nµ.

Analytic center (µ-center): a (unique) point

(x(µ), y(µ), s(µ)), x(µ) > 0, s(µ) > 0

that satisfies FOC.

The path

{(x(µ), y(µ), s(µ)) : µ > 0}is called the primal-dual central trajectory.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method

We use Newton Method to find a stationary

point of the barrier problem.

Recall how to use Newton Method to find a root

of a nonlinear equation

f(x) = 0.

A tangent line

z − f(xk) = ∇f(xk) · (x − xk)

is a local approximation of the graph of the func-

tion f(x). Substituting z = 0 gives a new point

xk+1 = xk − (∇f(xk))−1f(xk).

x

f(x)

xk xk+1 xk+2

f(x )k+2

f(x )k+1

f(x )k

k

z

k kz-f(x ) = f(x )(x-x )

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Apply Newton M. to the FOC

The first order optimality conditions for the bar-

rier problem form a large system of nonlinear

equationsF(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an application

defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

XSe − µe

.

Actually, the first two terms of it are linear; only

the last one, corresponding to the complemen-

tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT I

S 0 X

.

Thus, for a given point (x, y, s) we find the New-

ton direction (∆x,∆y,∆s) by solving the system

of linear equations:

A 0 0

0 AT I

S 0 X

·

∆x

∆y

∆s

=

b − Ax

c − ATy − s

µe − XSe

.

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior-Point Framework

We have already gathered all the necessary

elements to derive an interior point method.

The logarithmic barrier

− lnxj

added to the objective in the optimization prob-

lem prevents variable xj from approaching zero

and “replaces” the inequality

xj ≥ 0.

We derive the first order optimality conditionsfor the primal barrier problem:

Ax = b,

ATy + s = c,

XSe = µe,

and apply Newton method to solve this system

of nonlinear equations.

Actually, we fix the barrier parameter µ and make

only one (damped) Newton step towards the so-

lution of FOC. We do not solve the current FOC

exactly. Instead, we immediately reduce the bar-

rier parameter µ (to ensure progress towards op-

timality) and repeat the process.14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior Point Algorithm

Initialize

k = 0

(x0, y0, s0) ∈ F0

µ0 = 1n· (x0)T s0

α0 = 0.9995

Repeat until optimality

k = k + 1

µk = σµk−1, where σ ∈ (0,1)

∆ = Newton direction towards µ-center

Ratio test:

αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.

Make step:

xk+1 = xk + α0αP∆x,

yk+1 = yk + α0αD∆y,

sk+1 = sk + α0αD∆s.

15

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior Point Method

• Lagrange (1788)

handling equality constraints - multipliers

minimization with equality constraints

replaced with unconstrained minimization

• Fiacco & McCormick (1968)

handling inequality constraints - log barrier

minimization with inequality constraints

replaced with a sequence of unconstrained

minimizations

• Newton (1687)

solving unconstrained minimization problems

16

Page 9: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior Point Method

x

−ln x

1

� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �

� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �

Analytic Center

min e−

∑nj=1

lnxj ⇐⇒ maxn∏

j=1

xj

Advantages of IPMs:

suitable for very large problems;

natural extension from LP via QP to NLP.

Iterations to reach optimum:

Theory PracticeSize o(

√n) o(log10 n)

1000 C × 32 10-2010000 C × 100 20-40100000 C × 320 30-501000000 C × 1000 40-60 17

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Approaching Optimality

Simplex Method:

Basic: x > 0, s = 0 Nonbasic: x = 0, s > 0

s

x

s

x

Interior Point Method:

"Nonbasic": x = 0, s > 0"Basic": x > 0, s = 0

s

x

s

x

18

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Notations

A vector of ones: e = (1,1, · · · ,1) ∈ Rn.

X = diag{x1, x2, · · · , xn} =

x1

x2. . .

xn

.

X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

An equation XSe = µe,

is equivalent to xjsj = µ, ∀j = 1,2, · · · , n.

Primal feasible set

P = {x ∈ Rn |Ax = b, x ≥ 0}.Primal strictly feasible set

P0 = {x ∈ Rn |Ax = b, x > 0}.Dual feasible set

D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.Dual strictly feasible set

D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.

Primal-dual feasible set

F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}.Primal-dual strictly feasible set

F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.19

Page 10: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 5:

Path-following Method: Theory

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Path-Following Algorithm

The analysis given in this lecture comes from the

book of Steve Wright:

Primal-Dual Interior-Point Methods,

SIAM Philadelphia, 1997.

We analyze a feasible interior-point algorithm

with the following properties:

• all its iterates are feasible and stay in a close

neighbourhood of the central path;

• the iterates follow the central path towards

optimality;

• systematic (though very slow) reduction of

duality gap is ensured.

This algorithm is called

the short-step path-following method.

Indeed, it makes very slow progress (short-steps)

to optimality.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Central Path Neighbourhood

Assume a primal-dual strictly feasible solution

(x, y, s) ∈ F0 lying in a neighbourhood of the

central path is given; namely (x, y, s) satisfies:

Ax = b,

ATy + s = c,XSe ≈ µe.

We define a θ-neighbourhood of the central

path N2(θ), a set of primal-dual strictly feasible

solutions (x, y, s) ∈ F0 that satisfy:

‖XSe − µe‖ ≤ θµ,

where θ ∈ (0,1) and the barrier µ satisfies:

xTs = nµ.

Hence N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}.

2θN ( ) neighbourhoodof the central path 3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Progress towards optimality

Assume a primal-dual strictly feasible solution

(x, y, s) ∈ N2(θ) for some θ ∈ (0,1) is given.

Interior point algorithm tries to move from this

point to another one that also belongs to a θ-neighbourhood of the central path but corre-

sponds to a smaller µ. The required reduction

of µ is small:

µk+1 = σµk,

whereσ = 1 − β/

√n,

for some β ∈ (0,1).

Given a new µ-center, interior point algorithm

computes Newton direction:

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

00

σµe − XSe

,

and makes step in this direction.

Magic numbers (will be explained later):

θ = 0.1 and β = 0.1.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

O(√

n) Complexity Result

We will prove the following:

• full step in Newton direction is feasible;

• the new iterate

(xk+1,yk+1,sk+1) = (xk,yk,sk)+(∆xk,∆yk,∆sk)

belongs to a θ-neighbourhood of the new

µ-center (with µk+1 = σµk);

• duality gap is reduced 1 − β/√

n times.

Note that since at one iteration duality gap is

reduced 1 − β/√

n times, after√

n iterations the

reduction achieves:

(1 − β/√

n)√

n ≈ e−β.

After C · √n iterations, the reduction is e−Cβ.

For sufficiently large constant C the reduction

can thus be arbitrarily large (i.e. the duality gap

can become arbitrarily small).

Hence this algorithm has complexity O(√

n).

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Technical Results

Lemma 1

Newton direction (∆x,∆y,∆s) defined by the

equation system

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

00

σµe − XSe

, (1)

satisfies:

∆xT∆s = 0.

Proof:

From the first two equations in (1) we get

A∆x = 0 and ∆s = −AT∆y.

Hence

∆xT∆s = ∆xT · (−AT∆y) = −∆yT · (A∆x) = 0.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Technical Results (cont’d)

Lemma 2

Let (∆x,∆y,∆s) be the Newton direction that

solves the system (1). The new iterate

(x, y, s) = (x, y, s) + (∆x,∆y,∆s)

satisfiesxT s = nµ,

whereµ = σµ.

Proof: From the third equation in (1) we get

S∆x + X∆s = −XSe + σµe.

By summing the n components of this equation

we obtain

eT(S∆x+X∆s) = sT∆x+xT∆s = −eTXSe+σµeTe

= −xT s + nσµ = −xTs · (1 − σ).

ThusxT s = (x + ∆x)T (s + ∆s)

= xT s + (sT∆x + xT∆s) + (∆x)T∆s

= xTs + (σ − 1)xT s + 0 = σxT s,

which is equivalent to:

nµ = σnµ.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Reminder: Norms

Norms of the vector x ∈ Rn.

‖x‖ = (n∑

j=1x2

j )1/2

‖x‖∞ = maxj∈{1..n}

|xj|

‖x‖1 =n∑

j=1|xj|

Note that for any x ∈ Rn:

‖x‖∞ ≤ ‖x‖1‖x‖1 ≤ n· ‖x‖∞‖x‖∞ ≤ ‖x‖‖x‖ ≤ √

n· ‖x‖∞‖x‖ ≤ ‖x‖1‖x‖1 ≤ √

n· ‖x‖

Recall triangle inequality.

For any vectors p, q and r and for any norm ‖.‖‖p − q‖ ≤ ‖p − r‖ + ‖r − q‖.

The relation between algebraic and geometric

means. For any scalars a and b such that ab ≥ 0:√

|ab| ≤ 1

2· |a + b|.

8

Page 11: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Technical Result (algebra)

Lemma 3 Let u and v be any two vectors in Rn

such that uTv ≥ 0. Then

‖UV e‖ ≤ 2−3/2‖u + v‖2,

where U =diag{u1, · · · , un}, V =diag{v1, · · · , vn}.Proof: Let us partition all products ujvj into

positive and negative ones:

P = {j |ujvj ≥ 0} and M = {j |ujvj < 0} :

0≤uTv=∑

j∈Pujvj +

j∈Mujvj =

j∈P|ujvj| −

j∈M|ujvj|.

We can now write‖UV e‖ = (‖[ujvj]j∈P‖2 + ‖[ujvj]j∈M‖2)1/2

≤ (‖[ujvj]j∈P‖21 + ‖[ujvj]j∈M‖21)1/2

≤ (2‖[ujvj]j∈P‖21)1/2

≤√

2‖[14(uj + vj)

2]j∈P‖1= 2−3/2

j∈P(uj + vj)

2

≤ 2−3/2n

j=1

(uj + vj)2

= 2−3/2‖u + v‖2, as requested.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

IPM Technical Results (cnt’d)

Lemma 4If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.

In other words,

minj∈{1..n}

xjsj ≥ (1 − θ)µ, maxj∈{1..n}

xjsj ≤ (1 + θ)µ.

Proof:Since ‖x‖∞ ≤ ‖x‖, from the definition of N2(θ),

N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ},we conclude

‖XSe − µe‖∞ ≤ ‖XSe − µe‖ ≤ θµ.

Hence

|xjsj − µ| ≤ θµ ∀j,

which is equivalent to

−θµ ≤ xjsj − µ ≤ θµ ∀j.

Thus

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

IPM Technical Results (cnt’d)

Lemma 5

If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

‖XSe − σµe‖2 ≤ θ2µ2 + (1 − σ)2µ2n.

Proof:

Note first that

eT (XSe − µe) = xTs − µeT e = nµ − nµ = 0.

Therefore

‖XSe − σµe‖2

= ‖(XSe−µe) + (1−σ)µe‖2

= ‖XSe−µe‖2+2(1−σ)µeT (XSe−µe)+(1−σ)2µ2eTe

≤ θ2µ2 + (1−σ)2µ2n.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

IPM Technical Results (cnt’d)

Lemma 6If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then

‖∆X∆Se‖ ≤ θ2 + n(1−σ)2

23/2(1 − θ)µ.

Proof: 3rd equation in the Newton system gives

S∆x + X∆s = −XSe + σµe.

Having multiplied it with (XS)−1/2, we obtain

X−1/2S1/2∆x+X1/2S−1/2∆s=(XS)−1/2(−XSe+σµe).

Now apply Lemma 3 for u = X−1/2S1/2∆x and

v = X1/2S−1/2∆s (with uTv = 0 from Lemma 1)to get

‖∆X∆Se‖ = ‖(X−1/2S1/2∆X)(X1/2S−1/2∆S)e‖≤ 2−3/2‖X−1/2S1/2∆x+X1/2S−1/2∆s‖2

= 2−3/2‖X−1/2S−1/2(−XSe + σµe)‖2

= 2−3/2n∑

j=1

(−xjsj+σµ)2

xjsj

≤ 2−3/2‖XSe−σµe‖2minj xjsj

≤ θ2+n(1−σ)2

23/2(1−θ)µ.

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Magic Numbers

We have previously set two parameters for the

short-step path-following method:

θ = 0.1 and β = 0.1.

Now it’s time to justify this particular choice.

Lemma 7

If θ = 0.1 and β = 0.1, then

θ2 + n(1−σ)2

23/2(1 − θ)≤ σθ.

Proof:

Recall that

σ = 1 − β/√

n.

Hence

n(1−σ)2 = β2

and for β = 0.1 (for any n ≥ 1)

σ ≥ 0.9.

Substituting θ = 0.1 and β = 0.1, we obtain

θ2+n(1−σ)2

23/2(1 − θ)=

0.12 + 0.12

23/2 · 0.9≤0.02≤0.9 · 0.1≤σθ.

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Full Newton step in N2(θ)

Lemma 8Suppose (x, y, s) ∈ N2(θ) and (∆x,∆y,∆s) is theNewton direction computed from the system (1).Then the new iterate

(x, y, s) = (x, y, s) + (∆x,∆y,∆s)

satisfies (x, y, s) ∈ N2(θ), i.e. ‖XSe − µe‖ ≤ θµ.

Proof: From Lemma 2, the new iterate (x, y, s)satisfies

xT s = nµ = nσµ,

so we have to prove that ‖XSe − µe‖ ≤ θµ.For a given component j ∈ {1..n}, we have

xj sj − µ = (xj + ∆xj)(sj + ∆sj) − µ

= xjsj + (sj∆xj + xj∆sj) + ∆xj∆sj−µ= xjsj + (−xjsj + σµ) + ∆xj∆sj − σµ= ∆xj∆sj.

Thus, from Lemmas 6 and 7, we get

‖XSe − µe‖ = ‖∆X∆Se‖≤ θ2+n(1−σ)2

23/2(1−θ)µ

≤ σθµ= θµ.

14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

A property of log function

Lemma 9

For all δ > −1:

ln(1 + δ) ≤ δ.

Proof:

Consider the function

f(δ) = δ − ln(1 + δ).

Its derivative is:

f′

(δ) = 1 − 1

1 + δ=

δ

1 + δ.

Obviously f′

(δ) < 0 for δ ∈ (−1,0) and f′

(δ) > 0

for δ ∈ (0,∞). Hence f(.) has a minimum at

δ=0. We find that f(δ = 0) = 0. Consequently,

for any δ ∈ (−1,∞), f(δ) ≥ 0, i.e.

δ − ln(1 + δ) ≥ 0.

15

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

O(√

n) Complexity Result

Theorem 10

Given ǫ > 0, suppose that a feasible starting

point (x0, y0, s0) ∈ N2(0.1) satisfies

(x0)T s0 = nµ0, where µ0 ≤ 1/ǫκ,

for some positive constant κ. Then there exists

an index K with K = O(√

n ln(1/ǫ)) such that

µk ≤ ǫ, ∀k ≥ K.

16

Page 12: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

O(√

n) Complexity Result

Proof:

From Lemma 2, µk+1 = σµk. Having taken log-

arithms of both sides of this equality we obtain

lnµk+1 = lnσ + lnµk.

By repeatedly applying this formula and using

µ0 ≤ 1/ǫκ, we get

lnµk = k lnσ + lnµ0 ≤ k ln(1− β/√

n) + κ ln(1/ǫ).

From Lemma 9 we have ln(1−β/√

n)≤−β/√

n.

Thus

lnµk ≤ k(−β/√

n) + κ ln(1/ǫ).

To satisfy µk ≤ ǫ, we need:

k(−β/√

n) + κ ln(1/ǫ) ≤ ln ǫ.

This inequality holds for any k ≥ K, where

K =κ + 1

β·√

n · ln(1/ǫ).

17

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Polynomial Complexity Result

Main ingredients of the polynomial complexity

result for the short-step path-following algorithm:

Stay close to the central path:

all iterates stay in the N2(θ) neighbourhood of

the central path.

Make (slow) progress towards optimality:

reduce systematically duality gap.

µk+1 = σµk,

where

σ = 1 − β/√

n,

for some β ∈ (0,1).

18

Page 13: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 6:

IPMs: From Theory to Practice

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Proximity to the Central Path

The neighbourhood

N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}

is very small. In other words, the requirement

that (x, y, s) ∈ N2(θ) is extremely restrictive.

Note (Lemma 4) that if (x, y, s) ∈ N2(θ), then

(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ, ∀j.

For small θ ∈ (0,1) this means that (x, y, s) is an

excellent approximation of the µ-center.

Example:

For n = 106 and θ = 0.1 suppose:

xjsj = 0.9999µ for j ≤ 500,000, and

xjsj = 1.0001µ for j ≥ 500,001.

Then ‖XSe−µe‖ = (106×0.00012µ2)1/2 = 0.1µ.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Wide Neighbourhood

In practice, (x, y, s) can stay quite far away from

the µ-center. The algorithm behaves well as

long as there are not too small complementarity

products compared with the others, i.e., when

(x, y, s) ∈ N∞(γ), where

N∞(γ) = {(x, y, s) ∈ F0 |xjsj ≥ γµ, ∀j},

for some (possibly small) γ ∈ (0,1).

Observe that we limit the complementarity prod-

ucts only from below but there is also an implicit

upper bound on xjsj. Indeed, since

n∑

j=1

xjsj = nµ,

we have xjsj ≤ nµ.

Advice: Use γ = 0.01.

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Speed of Convergence

The short-step path following algorithm asks for

a very small reduction of duality gap per itera-

tion. Indeed, the required reduction of µ is:

µk+1 = σµk,

where

σ = 1 − β/√

n,

for some β ∈ (0,1).

Example:

For n = 106 and β = 0.1 we have:

σ = 1 − 0.0001 = 0.9999

hence after 10,000 iterations the duality gap will

be reduced by a factor

(1 − 0.0001)10000 ≈ e−1 = 0.368

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Aggressive targets

In practice, much more important reduction can

be achieved. On the average, the duality gap

is usually reduced by a factor of σ ∈ (0.1,0.5).

Certainly, for a practical algorithm it is absolutely

justified to set the target reduction:

σ = 0.1.

The consequence of such optimistic targets is

unfortunately the loss of the property of being

always able to make the full step in the Newton

direction. Instead, a damped Newton step is

made such that preserves nonnegativity of x and

s.

Advice: Do not use short-step method with

σ = 1 − β/√

n,

Use long-step method with

σ ≪ 1.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Feasible Method

The short-step (feasible) path-following method

we have analysed requires all its iterates to be

strictly feasible:

x ∈ P0={x∈Rn |Ax = b, x > 0}(y, s) ∈ D0={y∈Rm, s∈Rn |ATy + s = c, s > 0}.

In consequence the right hand side of the Newton

equation system has the form:

ξp

ξdξµ

=

b − Ax

c − ATy − sσµe − XSe

=

00ξµ

.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Infeasible Method

Th feasibility requirement can be relaxed. It is

possible to generalize the notion of µ-center as

well as that of the central path for infeasible

points (x, y, s).

The Newton direction is then computed from the

following equation system

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

b − Ax

c − ATy − sσµe − XSe

.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Further Practical Issues

Linear Algebra

Predictor-Corrector Technique

Multiple Centrality Correctors

8

Page 14: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Project: IPMs in the Internet

I encourage you to do this one-hour project.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Project: IPMs in the Internet

IPMs in the Internet:

• LP FAQ (Frequently Asked Questions):

http://www-unix.mcs.anl.gov/otc/Guide/faq/

• Interior Point Methods On-Line:

http://www-unix.mcs.anl.gov/otc/InteriorPoint/

Public Domain IPM Solvers:

• HOPDM (FORTRAN 77) by Jacek Gondzio:

http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html

• LIPSOL (MATLAB) by Yin Zhang:

http://www.caam.rice.edu/~zhang/lipsol/

• PCx (ANSI C) by Steve Wright:

http://www-fp.mcs.anl.gov/otc/Tools/PCx/

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Your Linear Program

Let m and d denote the month and day of your

birthday. Clearly, 1 ≤ m ≤ 12 and 1 ≤ d ≤ 31.

Define a number: α = 100 · m + d.

Consider an LP

min x1 + x2 + x3 + x4 + x5 − x6

s.t. x1 + 2x2 + x4 ≤ 3

x2 + 2x3 − x6 ≤ 3

x3 − x4 + 2x5 + 3x6 ≥ 2

x1 + 3x3 − x5 + x6 ≤ αx1≥0 x2≥0 x3≥0 x4≥0 x5≥0 x6≥0.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Your Task

Grab one of available interior point solvers.

Install it on your computer.

You may find it easier to install it on a Unix

machine than on a PC running MS Windows.

Prepare MPS data file for your linear program

and solve the problem.

Check if the solution satisfies all the constraints.

Print MPS file and the solution file and show

them to me.

12

Page 15: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 7:

IPM for Quadratic Programs

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Quadratic Programs

The quadratic function

f(x) = xTQ x

is convex if and only if the matrix Q is positive

definite.

In such case the quadratic programming problem

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

is well defined.

If there exists a feasible solution to it, then there

exists an optimal solution.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Quadratic Programs

Convexity: Property 9:

Let C ∈ Rn be a convex set and Q a square

matrix. Let f(x) = xTQ x be a quadratic function

f : C 7→ R.

(a) f is convex iff Q is positive semidefinite.

(b) f is strictly convex iff Q is positive definite.

Def. A matrix Q ∈ Rn×n is positive definite if

xTQ x > 0 for any x 6= 0.

Example:

Consider quadratic functions f(x) = xTQ x with

the following matrices:

Q1=

[

1 00 2

]

, Q2=

[

1 00 −1

]

, Q3=

[

5 44 3

]

.

Q1 is positive definite (hence f1 is convex).

Q2 and Q3 are indefinite (f2, f3 are not convex).

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Dual Quadratic Program

Consider a QPmin cTx + 1

2xTQ x

s.t. Ax = b,

x ≥ 0,

where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.

We associate Lagrange multipliers y ∈ Rm and

s ∈ Rn (s ≥ 0) with the constraints Ax = b and

x ≥ 0, and write the Lagrangian

L(x, y, s) = cTx +1

2xTQ x − yT(Ax−b) − sTx.

Stationarity with respect to x:

∇xL(x, y, s) = c + Qx − ATy − s = 0

is used to determine the Lagrangian dual:

LD(y, s) = minx∈X

L(x, y, s)

= cTx + 12xTQ x − yT(Ax−b)− sTx

= bTy + xT (c+Qx−ATy−s)− 12xTQ x

= bTy − 12xTQ x,

and the dual problem has the form:

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

x, s ≥ 0,

where y ∈ Rm and x, s ∈ Rn. 4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

QP with IPMs

Consider the convex quadratic programming

problem.

The primal

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

and the dual

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

x, s ≥ 0.

Apply the usual procedure:

• replace inequalities with log barriers;

• form the Lagrangian;

• write the first order optimality conditions;

• apply Newton method to them.5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

QP with IPMs: Log Barriers

Replace the primal QP

min cTx + 12xTQ x

s.t. Ax = b,

x ≥ 0,

with the primal barrier QP

min cTx + 12xTQ x −

n∑

j=1lnxj

s.t. Ax = b.

Replace the dual QP

max bTy − 12xTQ x

s.t. ATy + s − Qx = c,

y free, s ≥ 0,

with the dual barrier QP

max bTy − 12xTQ x +

n∑

j=1

ln sj

s.t. ATy + s − Qx = c.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

First Order Optimality Conds

Consider the primal barrier QP

min cTx + 12xTQ x − µ

n∑

j=1lnxj

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter.

Write out the Lagrangian

L(x, y, µ) = cTx +1

2xTQ x − yT(Ax−b)−µ

n∑

j=1

ln xj,

and the conditions for a stationary point

∇xL(x, y, µ) = c − ATy − µX−1e + Qx = 0∇yL(x, y, µ) = Ax − b = 0,

where X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

Let us denote

s = µX−1e, i.e. XSe = µe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s − Qx = c,

XSe = µe.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method for the FOC

The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an applicationdefined as follows:

F(x, y, s) =

Ax − b

ATy + s − Qx − c

XSe − µe

.

Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

−Q AT I

S 0 X

.

Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:

A 0 0

−Q AT I

S 0 X

·

∆x

∆y

∆s

=

b − Ax

c − ATy − s + Qx

µe − XSe

.

8

Page 16: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior-Point QP Algorithm

Initialize

k = 0

(x0, y0, s0) ∈ F0

µ0 = 1n· (x0)T s0

α0 = 0.9995

Repeat until optimality

k = k + 1

µk = σµk−1, where σ ∈ (0,1)

∆ = Newton direction towards µ-center

Ratio test:

αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.

Make step:

xk+1 = xk + α0αP∆x,

yk+1 = yk + α0αD∆y,

sk+1 = sk + α0αD∆s.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

From LP to QP

QP problemmin cTx + 1

2xTQx

s.t. Ax = b,

x ≥ 0.

First order conditions (for barrier problem)

Ax = b,

ATy + s−Qx = c,

XSe = µe.

Newton direction

A 0 0

−Q AT I

S 0 X

∆x

∆y

∆s

=

ξp

ξd

ξµ

,

whereξp = b − Ax,

ξd = c − ATy − s+Qx,

ξµ = µe − XSe.

Augmented system[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Conclusion:

QP is a natural extension of LP.10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

IPMs: LP vs QP

Augmented system in LP[

−Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Eliminate ∆x from the first equation and get

normal equations

(AΘAT)∆y = g.

Augmented system in QP[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Eliminate ∆x from the first equation and get

normal equations

(A(Q + Θ−1)−1AT)∆y = g.

One can use normal equations in LP, but not

in QP. Normal equations in QP may become al-

most completely dense even for sparse matrices

A and Q. Thus, in QP, usually the indefinite

augmented system form is used.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Sparsity Issues in QP

Example

1 1

1 2 1

1 2 1

1 2 1

1 2

−1

=

1

1 1

1 1

1 1

1 1

·

1 1

1 1

1 1

1 1

1

−1

=

1 −1 1 −1 1

1 −1 1 −1

1 −1 1

1 −1

1

·

1

−1 1

1 −1 1

−1 1 −1 1

1 −1 1 −1 1

=

5 −4 3 −2 1

−4 4 −3 2 −1

3 −3 3 −2 1

−2 2 −2 2 −1

1 −1 1 −1 1

.

Conclusion:

the inverse of the sparse matrix may be dense.

IPMs for QP:

Do not explicitly invert the matrix Q + Θ−1

in the matrix A(Q + Θ−1)−1AT .

Use augmented system instead.12

Page 17: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 8:

Separable QuadraticPrograms

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Separable QPs are Easy

Regarding the computations involved, a quadratic

program with diagonal matrix Q = D:

min cTx + 12xTD x

s.t. Ax = b,

x ≥ 0,

is as easy as a linear program.

Indeed, in this case, the Newton equation system

can be reduced to the following normal equation

system:

(A(D + Θ−1)−1AT)∆y = g.

Since Θ−1 = D + Θ−1 is a diagonal matrix, this

system is not more difficult to solve than a usual

system arising in LP:

(AΘAT)∆y = g.

Conclusion:

If you can formulate the QP as a separable prob-

lem, then it’s usually worth a try.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Separable QP: Example 1

Suppose the symmetric positive definite matrix

Q ∈ Rn×n in the quadratic program

min cTx + 12xTQ x

s.t. Ax = b, (1)

x ≥ 0,

is a product of the following matrices:

Q = FTDF,

where F ∈ Rk×n for some k ≪ n. Introduce new

variables u ∈ Rk such that u = Fx. Then

xTQ x = xTFT DFx = (Fx)T D(Fx) = uTD u.

The problem (1) can be replaced by the following

equivalent separable one:

min cTx + 12uTD u

s.t. Ax = b, (2)

Fx − u = 0,

x ≥ 0.

Although this problem has n + k variables (x, u)

(while (1) had only n variables), for small k, it is

usually much easier to solve than (1).

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 1 (cont’d)

To derive the first order optimality conditions

for (2) we first introduce x = (x, u) ∈ Rn+k c =

(c,0) ∈ Rn+k and b = (b,0) ∈ Rm+k, define

A =

[

A 0F −I

]

and Q =

[

0 00 D

]

and rewrite the problem

min cT x + 12xT Q x

s.t. Ax = b,

x ≥ 0.

We associate dual variables y ∈ Rm and z ∈ Rk

with linear constraints Ax = b and Fx − u = 0,

respectively and write the Lagrangian

L(x,u,y,z,µ)=cT x+1

2xTQx−(y, z)T(Ax−b)−µ

n∑

j=1

lnxj

=cTx+1

2uTDu−yT(Ax−b)−zT(Fx−u)−µ

n∑

j=1

lnxj.

Observe that u is a free variable and there is no

logarithmic barrier introduced for it.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 1 (cont’d)

We write the conditions for a stationary point

∇xL(x, u, y, z, µ) = c − ATy − FT z − µX−1e = 0∇uL(x, u, y, z, µ) = Du + z = 0∇yL(x, u, y, z, µ) = Ax − b = 0∇zL(x, u, y, z, µ) = −Fx + u = 0,

and substitute s = µX−1e to get the first order

optimality conditions:

Ax = b,

Fx − u = 0,

ATy + FT z + s = c,

−Du − z = 0,

XSe = µe.

The Newton equation system for the FOC is:

0 AT FT I

−D −I

A

F −I

S X

∆x

∆u

∆y

∆z

∆s

=

rx

ru

ry

rz

.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 1 (cont’d)

For nonseparable problem (1), we have to solve[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

rx

ry

]

.

This is a linear system with n+m equations and

n + m unknowns.

For separable problem (2), we have to solve

−Θ−1x AT FT

−D −I

A

F −I

∆x

∆u

∆y

∆z

=

rx

ru

ry

rz

,

where Θx = XS−1 ∈ Rn×n.

This new system has n + m + 2k equations and

n+m+2k unknowns. It is larger but the matrix

involved in it is much sparser.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 1 (cont’d)

This larger system is much easier to solve. In-

deed, the upper-left 2 × 2 block is a diagonal

matrix. It can be eliminated giving the system

of equations[

A 0F −I

] [

Θx 0

0 D−1

] [

AT FT

0 −I

] [

∆y

∆z

]

=

[

ry

rz

]

.

Having done the multiplications on the left-hand-

side, we obtain[

AΘxAT AΘxFT

FΘxAT FΘxFT +D−1

] [

∆y

∆z

]

=

[

ry

rz

]

.

This reduced system has only m + k equations

and m + k unknowns.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Separable QP: Example 2

Suppose the symmetric positive definite matrix

Q ∈ Rn×n in the quadratic program

min cTx + 12xTQ x

s.t. Ax = b, (3)

x ≥ 0,

has the following form

Q = D + ddT ,

where D is a diagonal matrix and d ∈ Rn. Intro-

duce the new variable u ∈ R such that u = dTx.

Then

xTQx=xT(D+ddT )x=xTDx+(dTx)(dTx)=xTDx+u2.

The problem (3) can be replaced by the following

equivalent separable one:

min cTx + 12xTDx + 1

2u2

s.t. Ax = b, (4)

dTx − u = 0,

x ≥ 0.

This problem has n+1 variables (x, u) (while (3)

had only n variables). However, it is much easier

to solve than (3).

8

Page 18: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 2 (cont’d)

For nonseparable problem (3), we have to solve[

−Q − Θ−1 AT

A 0

] [

∆x

∆y

]

=

[

rx

ry

]

.

This is a linear system with n+m equations and

n + m unknowns.

For separable problem (4), we have to solve

−D −Θ−1x AT d

−1 −1A

dT −1

∆x

∆u

∆y

∆z

=

rx

ru

ry

rz

,

where y ∈ Rm and z ∈ R are dual variables

associated with linear constraints Ax = b and

dTx − u = 0, respectively.

This new system has n + m + 2 equations and

n + m + 2 unknowns. It is larger but the matrix

involved in it is much sparser.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Example 2 (cont’d)

This larger system is much easier to solve. In-

deed, the upper-left 2 × 2 block is a diagonal

matrix. It can be eliminated giving the system

of equations[

A 0

dT −1

][

(D+Θ−1x )−1 0

0 1

][

AT d

0 −1

][

∆y

∆z

]

=

[

ry

rz

]

.

Having done the multiplications on the left-hand-

side, we obtain[

AΘxAT AΘxd

dT ΘxAT dT Θxd+1

] [

∆y

∆z

]

=

[

ry

rz

]

,

where Θx = (D+Θ−1x )−1.

This reduced system has only m + 1 equations

and m + 1 unknowns.

10

Page 19: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 9:

IPMs for Nonlinear Programs

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convex Nonlinear Optimization

Consider the nonlinear optimization problem

min f(x)

s.t. g(x) ≤ 0,

where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm

are convex, twice differentiable.

Assumptions:

f and g are convex

⇒ If there exists a local minimum then it is a

global one.

f and g are twice differentiable

⇒ We can use the second order Taylor

approximations of them.

Some additional (technical) conditions

⇒ We need them to prove that the point which

satisfies the first order optimality conditions is

the optimum. We won’t use them in this course.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Taylor Expansion of f : R 7→ R

Let f : R 7→ R.

If all derivatives of f are continuously differen-

tiable at x0, then

f(x) =∞∑

k=0

f(k)(x0)

k!(x − x0)

k,

where f(k)(x0) is the k-th derivative of f at x0.

The first order approximation of the function:

f(x) = f(x0) + f′

(x0)(x − x0) + r2(x − x0),

where the remainder satisfies:

limx→x0

r2(x − x0)

x − x0= 0.

The second order approximation:

f(x) = f(x0)+f′

(x0)(x−x0)

+1

2f′′

(x0)(x−x0)2+r3(x−x0),

where the remainder satisfies:

limx→x0

r3(x − x0)

(x − x0)2

= 0.

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Derivatives of f : Rn7→ R

Consider a real-valued function f : Rn 7→ R.

The vector

∇f(x) =

∂f∂x1

(x)

∂f∂x2

(x)

...

∂f∂xn

(x)

is called the gradient of f at x.

The matrix

∇2f(x)=

∂2f

∂x21

(x) ∂2f∂x1∂x2

(x) . . .∂2f

∂x1∂xn

(x)

∂2f∂x2∂x1

(x) ∂2f

∂x22

(x) . . .∂2f

∂x2∂xn

(x)

. . . . . . . . . ...

∂2f∂xn∂x1

(x) ∂2f∂xn∂x2

(x) . . . ∂2f∂x2

n

(x)

is called the Hessian of f at x.

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Taylor Expansion of f : Rn7→R

Let f : Rn 7→ R.

If all derivatives of f are continuously differen-

tiable at x0, then

f(x) =∞∑

k=0

f(k)(x0)

k!(x − x0)

k,

where f(k)(x0) is the k-th derivative of f at x0.

The first order approximation of the function:

f(x) = f(x0) + ∇f(x0)T (x − x0) + r2(x − x0),

where the remainder satisfies:

limx→x0

r2(x − x0)

‖x − x0‖= 0.

The second order approximation:

f(x) = f(x0)+∇f(x0)T (x−x0)

+1

2(x−x0)

T∇2f(x0)(x−x0)+r3(x−x0),

where the remainder satisfies:

limx→x0

r3(x − x0)

‖x − x0‖2= 0.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convexity: Reminder

Property 1.

For any collection {Ci | i ∈ I} of convex sets, the

intersection⋂

i∈I Ci is convex.

Property 4.

If C is a convex set and f : C 7→ R is convex

function, the level sets {x ∈ C | f(x) ≤ α} and

{x ∈ C | f(x) < α} are convex for all scalars α.

Lemma 1:

If g : Rn 7→ Rm is a convex function, then the set

{x ∈ Rn | g(x) ≤ 0} is convex.

Proof:

Since every function gi : Rn 7→ R, i = 1,2, ..., m is

convex, from Property 4, we conclude that every

set Xi = {x ∈ Rn | gi(x) ≤ 0} is convex. From

Property 1, we conclude that the intersection

X =⋂m

i=1 Xi = {x ∈ Rn | g(x) ≤ 0} is convex,

which completes the proof.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Diff’ble Convex Functions

Property 8.

Let C ∈ Rn be a convex set and f : C 7→ R be

twice continuously differentiable over C.(a) If ∇2f(x) is positive semidefinite for all x ∈ C,

then f is convex.(b) If ∇2f(x) is positive definite for all x ∈ C,

then f is strictly convex.(c) If f is convex, then ∇2f(x) is positive semi-

definite for all x ∈ C.

Let the second order approximation of the func-

tion be given:

f(x) ≈ f(x0) + cT (x−x0) +1

2(x−x0)

TQ(x−x0),

where c = ∇f(x0) and Q = ∇2f(x0).

From Property 8, it follows that when f is convex

and twice differentiable, then Q exists and is apositive semidefinite matrix.

Conclusion:

If f is convex and twice differentiable, then op-

timization of f(x) can (locally) be replaced with

the minimization of its quadratic model.

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Nonlinear Opt. with IPMs

Nonlinear Optimization via QPs:

Sequential Quadratic Programming (SQP).

Repeat until optimality:

• approximate NLP (locally) with a QP;

• solve (approximately) the QP.

Nonlinear Optimization with IPMs:

works similarly to SQP scheme.

However, the (local) QP approximations are not

solved to optimality. Instead, only one step in

the Newton direction corresponding to a given

QP approximation is made and the new QP ap-

proximation is computed.

Derive an IPM for NLP:

• replace inequalities with log barriers;

• form the Lagrangian;

• write the first order optimality conditions;

• apply Newton method to them.8

Page 20: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

NLP Notation

Consider the nonlinear optimization problem

min f(x) s.t. g(x) ≤ 0,

where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm

are convex, twice differentiable.

The vector-valued function g : Rn 7→ Rm has aderivative A(x) ∈ Rm×n

A(x) = ∇g(x) =

[

∂gi

∂xj

]

i=1..m, j=1..n

which is called the Jacobian of g.

The Lagrangian associated with the NLP is:

L(x, y) = f(x) + yTg(x),

where y ∈ Rm, y ≥ 0 are Lagrange multipliers(dual variables).

The first derivatives of the Lagrangian:

∇xL(x, y) = ∇f(x) + ∇g(x)Ty

∇yL(x, y) = g(x).

The Hessian of the Lagrangian, Q(x,y)∈Rn×n:

Q(x, y) = ∇2xxL(x, y) = ∇2f(x) +

m∑

i=1

yi∇2gi(x).

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Convexity in NLP

Lemma 2:

If f : Rn 7→ R and g : Rn 7→ Rm are convex,

twice differentiable, then the Hessian of the La-

grangian

Q(x, y) = ∇2f(x) +m∑

i=1

yi∇2gi(x)

is positive semidefinite for any x and any y ≥ 0.

If f is strictly convex, then Q(x, y) is positive

definite for any x and any y ≥ 0.

Proof:

Using Property 8, the convexity of f implies that

∇2f(x) is positive semidefinite for any x. Simi-

larly, the convexity of g implies that for all i =

1,2, ..., m, ∇2gi(x) is positive semidefinite for any

x.

Since yi ≥ 0 for all i = 1,2, ..., m and Q(x, y)

is the sum of positive semidefinite matrices, we

conclude that Q(x, y) is positive semidefinite.

If f is strictly convex, then ∇2f(x) is positive

definite and so is Q(x, y).

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

IPM for NLP

Add slack variables to nonlinear inequalities:

min f(x)

s.t. g(x) + z = 0

z ≥ 0,

where z ∈ Rm. Replace inequality z ≥ 0 with the

logarithmic barrier:

min f(x) − µm∑

i=1ln zi

s.t. g(x) + z = 0.

Write out the Lagrangian

L(x, y, z, µ) = f(x) + yT(g(x) + z) − µm∑

i=1

ln zi,

and the conditions for a stationary point

∇xL(x, y, z, µ) = ∇f(x) + ∇g(x)Ty = 0∇yL(x, y, z, µ) = g(x) + z = 0

∇zL(x, y, z, µ) = y − µZ−1e = 0,

where Z−1 = diag{z−11 , z−1

2 , · · · , z−1m }.

The First Order Optimality Conditions are:

∇f(x) + ∇g(x)Ty = 0,

g(x) + z = 0,

Y Ze = µe.

11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method for the FOC

The first order optimality conditions for the bar-

rier problem form a large system of nonlinear

equationsF(x, y, z) = 0,

where F : Rn+2m 7→ Rn+2m is an application

defined as follows:

F(x, y, z) =

∇f(x) + ∇g(x)Ty

g(x) + z

Y Ze − µe

.

Note that all three terms of it are nonlinear.

(In LP and QP the first two terms were linear.)

Observe that

∇F(x, y, z)=

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

,

where A(x) is the Jacobian of g

and Q(x, y) is the Hessian of L.

They are defined as follows:

A(x) = ∇g(x) ∈ Rm×n

Q(x, y) = ∇2f(x)+m∑

i=1yi∇

2gi(x) ∈ Rn×n

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method (cont’d)

For a given point (x, y, z) we find the Newton

direction (∆x,∆y,∆z) by solving the system of

linear equations:

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

Using the third equation we eliminate

∆z = µY −1e − Ze − ZY −1∆y,

from the second equation and get[

Q(x, y) A(x)T

A(x) −ZY −1

] [

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior-Point NLP Algorithm

Initialize

k = 0

(x0, y0, z0) such that y0 > 0 and z0 > 0

µ0 = 1m · (y0)T z0

Repeat until optimality

k = k + 1

µk = σµk−1, where σ ∈ (0,1)

Compute A(x) and Q(x, y)

∆ = Newton direction towards µ-center

Ratio test:

α1 := max {α > 0 : y + α∆y ≥ 0},α2 := max {α > 0 : z + α∆z ≥ 0}.

Choose the step:

(use trust region or line search)

α ≤ min {α1, α2}.

Make step:

xk+1 = xk + α∆x,

yk+1 = yk + α∆y,

zk+1 = zk + α∆z.

14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

From QP to NLP

Newton direction for QP

−Q AT I

A 0 0S 0 X

∆x

∆y

∆s

=

ξd

ξp

ξµ

.

Augmented system for QP[

−Q−SX−1 AT

A 0

] [

∆x

∆y

]

=

[

ξd − X−1ξµ

ξp

]

.

Newton direction for NLP

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

Augmented system for NLP[

Q(x, y) A(x)T

A(x) −ZY −1

] [

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

Conclusion:

NLP is a natural extension of QP.

15

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Lin. Algebra in IPM for NLP

Newton direction for NLP

Q(x, y) A(x)T 0A(x) 0 I

0 Z Y

∆x

∆y

∆z

=

−∇f(x)−A(x)Ty

−g(x)−z

µe−Y Ze

.

The corresponding augmented system[

Q(x, y) A(x)T

A(x) −ZY −1

][

∆x

∆y

]

=

[

−∇f(x)−A(x)Ty

−g(x)−µY −1e

]

.

where A(x) ∈ Rm×n is the Jacobian of g

and Q(x, y) ∈ Rn×n is the Hessian of L

A(x) = ∇g(x)

Q(x, y) = ∇2f(x)+m∑

i=1yi∇

2gi(x)

Automatic differentiation is very useful ...

get Q(x, y) and A(x) from Algebraic Modeling

Language.

Output

AML SOLVER

Num. Anal.Package

ModelSolution

16

Page 21: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Automatic Differentiation

AD in the Internet:

• ADIFOR (FORTRAN code for AD):

http://www-unix.mcs.anl.gov/autodiff/ADIFOR/

• ADOL-C (C/C++ code for AD):

http://www-unix.mcs.anl.gov/autodiff/

AD Tools/adolc.anl/adolc.html

• AD page at Cornell:

http://www.tc.cornell.edu/~averma/AD/

17

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Interior Point Methods

Conclusions:

• Interior Point Methods provide the unified

framework for convex optimization.

• Interior Point Methods provide polynomial al-

gorithms for LP, QP and NLP.

• The linear algebra in LP, QP and NLP is very

similar.

• Use IPMs to solve very large problems.

Further Extensions:

• Nonconvex optimization.

IPMs in the Internet:

• LP FAQ (Frequently Asked Questions):

http://www-unix.mcs.anl.gov/otc/Guide/faq/

• Interior Point Methods On-Line:

http://www-unix.mcs.anl.gov/otc/InteriorPoint/

• NEOS (Network Enabled Opt. Services):

http://www-neos.mcs.anl.gov/18

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Project: NEOS

I encourage you to do this one-hour project.

19

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Project: Use NEOS

NEOS stands for

Network Enabled Optimization Services:

http://www-neos.mcs.anl.gov/

You can use optimization facilities remotely:

prepare an MPS file with your problem,

submit it to NEOS.

NEOS server will execute your job on one of

available machines (you do not know where) and

will send you the solution by e-mail.

Your Task

Solve your LP problem via NEOS.

Use at least 3 different LP solvers.

Compare the solutions obtained.

20

Page 22: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

Interior Point Methods

for Linear, Quadratic

and Nonlinear Programming

Turin 2008

Jacek Gondzio

Lecture 10:

More on Newton Method

1

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method

Let f : Rn 7→ Rn be a twice continuously dif-

ferentiable function such that ∇f(x) ∈ Rn×n is

nonsingular at any x. Newton method finds a

root of a nonlinear equation system

f(x) = 0,

by repeating the following step

xk+1 = xk − (∇f(xk))−1f(xk).

Let us rewrite it in a simplified form

xk+1 = φ(xk),

and observe that at the solution x, φ′

(x) = 0.

Indeed (we check it for f : R 7→ R),

φ′

(x)=(x−f(x)/f′

(x))′

=f(x)f′′

(x)/(f′

(x))2=0

because at the solution x: f(x) = 0.

Near the solution x, Newton method converges

quadratically, i.e., the error of solution reduces

as follows:

‖ek+1‖ ≤ C‖ek‖2,

where C is a constant.

2

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Equations ( ⇒ ) Optimization

Let f : Rn 7→ R be a twice continuously differen-

tiable function.

Finding an (unconstrained) minimum of f (or

more generally, finding a stationary point of f)

is equivalent to solving equation

∇f(x) = 0.

This is a nonlinear system of equations that can

be solved with the Newton method.

Assume ∇2f(x) ∈ Rn×n is nonsingular at any x.

Newton method for optimization repeats the fol-

lowing step

xk+1 = xk − (∇2f(xk))−1∇f(xk).

3

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Another View

Newton Method for Optimization

Let f : Rn 7→ R be a twice continuously differ-

entiable function. Suppose we build a quadratic

model f of f around a given point xk, i.e., we

define ∆x = x − xk and write:

f(x) = f(xk) + ∇f(xk)T∆x +1

2∆xT∇2f(xk)∆x

Now we optimize the model f

instead of optimizing f .

A minimum (or, more generally, a stationary point)

of the quadratic model satisfies:

∇f(x) = ∇f(xk) + ∇2f(xk)∆x = 0,

i.e.

∆x = x − xk = −(∇2f(xk))−1∇f(xk),

which reduces to the usual equation:

xk+1 = xk − (∇2f(xk))−1∇f(xk).

4

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Quadratic Convergence

Let f : Rn 7→ R be a twice continuously differen-

tiable function. Let us apply Newton method to

optimize it:

xk+1 = xk − (∇2f(xk))−1∇f(xk).

Lemma (Quadratic Convergence).

If f is strongly convex with some constant m,

i.e.,

hT∇2f(x)h ≥ m‖h‖22, x, h ∈ Rn,

and ∇2f is Lipschitz continuous with constant

L, i.e.,

‖(∇2f(x)−∇2f(y))h‖2≤L‖x−y‖2‖h‖2, x,y,h∈Rn,

then

‖∇f(xk+1)‖2 ≤L

2m2‖∇f(xk)‖22.

In particular, in the region defined by the inequal-

ityL

2m2‖∇f(xk)‖2 ≤ 1,

Newton method converges quadratically.

5

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Global vs Local Behaviour

Newton method behaves very well near the so-

lution where is displays quadratic convergence.

But it may behave very badly far away from it.

Why?

Newton method uses quadratic approximation.

Such approximation is valid only locally. Thus

one cannot expect that the Newton direction

∆x = −(∇2f(x))−1∇f(x),

is an improvement direction everywhere.

6

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton Method in IPMs

Newton method behaves very well in the case of

Interior Point Methods for Linear Programming.

This is a consequence of a ’weak nonlinearity’

introduced by the logarithmic barrier function.

Newton method applied in IPMs for NLP needs

additional safeguards to ensure global conver-

gence.

There are two possible safeguards:

• use Trust Region,

i.e. believe the quadratic model only in a

neighbourhood of the current point; or

• use Line Search,

i.e. optimize f along Newton direction

∆x = −(∇2f(x))−1∇f(x).

7

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Newton method may diverge

1

2

-e

x

1-e

1

f(x) = e - e1/x

-2

-1

-1

Newton method applied from different starting

points:

iter xk xk xk xk

0 −1.0 0.5 1.5 2.01 −.739 · 101 .65803014 .60987204 −.595

2 −.123 · 103 .83352370 .78563159 −.541 · 101

3 −.263 · 105 .95930672 .93302399 −.718 · 102

4 −.119 · 1010 .99752767 .99332404 −.913 · 104

5 −.244 · 1019 .99999083 .99993320 −.143 · 109

6 −.102 · 1038 1.0000000 .99999999 −.352 · 1017

7 −.180 · 1075 1.0000000 1.0000000 −.213 · 1034

8 −.555 · 10148 1.0000000 1.0000000 −.780 · 1067

The algorithm converges from x0=.5 or x0=1.5

but diverges from x0=−1.0 and x0=2.0.8

Page 23: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

− logx Barrier Function

Consider the primal barrier linear program

min cTx − µn∑

j=1ln xj

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter.

Write out the Lagrangian

L(x, y, µ) = cTx − yT(Ax − b) − µn

j=1

lnxj,

and the conditions for a stationary point

∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,

where X−1 = diag{x−11 , x−1

2 , · · · , x−1n }.

Let us denote

s = µX−1e, i.e. XSe = µe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s = c,XSe = µe.

9

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

- logx bf: Newton Method

The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an applicationdefined as follows:

F(x, y, s) =

Ax − b

ATy + s − cXSe − µe

.

Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT IS 0 X

.

Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:

A 0 0

0 AT IS 0 X

·

∆x∆y∆s

=

b − Ax

c − ATy − sµe − XSe

.

10

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

1/xα, α > 0 Barrier Function

Consider the primal barrier linear program

min cTx + µn∑

j=1

1xα

j

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter and α > 0.

Write out the Lagrangian

L(x, y, µ) = cTx − yT (Ax − b) + µn

j=1

1

xαj

,

and the conditions for a stationary point

∇xL(x, y, µ) = c − ATy − µαX−α−1e = 0∇yL(x, y, µ) = Ax − b = 0,

where X−α−1 = diag{x−α−11 , x−α−1

2 , · · · , x−α−1n }.

Let us denote

s = µαX−α−1e, i.e. Xα+1Se = µαe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s = c,

Xα+1Se = µαe.11

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

1/xα, α>0 bf: Newton Method

The first order optimality conditions for the bar-

rier problem are

F(x, y, s) = 0,

where F : R2n+m 7→ R2n+m is an application

defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

Xα+1Se − µαe

.

As before, only the last term, corresponding to

the complementarity condition, is nonlinear.

Note that

∇F(x, y, s) =

A 0 0

0 AT I

(α + 1)XαS 0 Xα+1

.

Thus, for a given point (x, y, s) we find the New-

ton direction (∆x,∆y,∆s) by solving the system

of linear equations:

A 0 0

0 AT I

(α+1)XαS 0 Xα+1

·

∆x∆y∆s

=

b − Ax

c − ATy − s

µαe−Xα+1Se

.

12

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

e1/x Barrier Function

Consider the primal barrier linear program

min cTx + µn∑

j=1

e1/xj

s.t. Ax = b,

where µ ≥ 0 is a barrier parameter.

Write out the Lagrangian

L(x, y, µ) = cTx − yT (Ax − b) + µn

j=1

e1/xj ,

and the conditions for a stationary point

∇xL(x, y, µ) = c−ATy−µX−2exp(X−1)e = 0∇yL(x, y, µ) = Ax − b = 0,

where exp(X−1) = diag{e1/x1, e1/x2, · · · , e1/xn}.

Let us denote

s=µX−2exp(X−1)e, i.e. X2exp(−X−1)Se=µe.

The First Order Optimality Conditions are:

Ax = b,

ATy + s = c,

X2exp(−X−1)Se = µe.13

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

e1/x bf: Newton Method

The first order optimality conditions are

F(x, y, s) = 0,

where F :R2n+m 7→R2n+m is defined as follows:

F(x, y, s) =

Ax − b

ATy + s − c

X2exp(−X−1)Se − µe

.

As before, only the last term, corresponding to

the complementarity condition, is nonlinear.

Note that

∇F(x,y,s)=

A 0 0

0 AT I

(2X+I)exp(−X−1) 0 X2exp(−X−1)

.

Newton direction (∆x,∆y,∆s) solves the system

of linear equations:

A 0 0

0 AT I

(2X+I)exp(−X−1)S 0 X2exp(−X−1)

·

∆x∆y∆s

=

b − Ax

c − ATy − s

µe−X2exp(−X−1)Se

.

14

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Why Log Barrier is the Best?

The First Order Optimality Conditions:

− log x : XSe = µe,

1/xα : Xα+1Se = µαe,

e1/x : X2exp(−X−1)Se = µe.

Log Barrier ensures

the symmetry between the primal and the dual.

Newton Equation System:

−logx : ∇F3= [S,0, X],

1/xα : ∇F3= [(α + 1)XαS,0, Xα+1]

e1/x : ∇F3= [(2X+I)exp(−X−1)S,0, X2exp(−X−1)]

Log Barrier produces

’the weakest nonlinearity’.

15

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Self-concordant Functions

There is a nice property of the function that is

responsible for a good behaviour of the Newton

method.

Def

Let C ∈ Rn be an open nonempty convex set.

Let f : C 7→ R be a three times continuously

differentiable convex function.

A function f is called self-concordant if there

exists a constant p > 0 such that

|∇3f(x)[h, h, h]| ≤ 2p−1/2(∇2f(x)[h, h])3/2,

∀x ∈ C, ∀h : x + h ∈ C.

(We then say that f is p-self-concordant).

Note that a self-concordant function is always

well approximated by the quadratic model be-

cause the error of such an approximation can be

bounded by the 3/2 power of ∇2f(x)[h, h].

16

Page 24: calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… · Interior Point Methods for Linear, Quadratic and Nonlinear Programming Turin 2008 Jacek

IPMs for LP, QP, NLP, J. Gondzio, Turin 2008

Self-concordant Barriers

Lemma

The barrier function − logx is self-concordant on

R+.

Proof

Consider f(x) = − log x.

We compute

f′

(x) = −x−1, f′′

(x) = x−2 and f′′′

(x) = −2x−3

and check that the self-concordance condition is

satisfied for p = 1.

Lemma

The barrier function 1/xα, with α ∈ (0,∞) is not

self-concordant on R+.

Lemma

The barrier function e1/x is not self-concordant

on R+.

Use self-concordant barriers inoptimization

17