calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...
Transcript of calvino.polito.itcalvino.polito.it/~pieraccini/Didattica/Gondzio/Gondzio_lectures_1_10… ·...
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 1:
Convexity
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
What’s to come?
IPMs for Optimization
• Convexity Theory
• Duality Theory
• Newton Method (self-concordant barriers)
• Interior Point Methods for LP, QP and NLP
(motivation, theory, polynomial complexity)
Nonlinear Optimization
• Linesearch methods
• Trust region methods
Optimization Relies on Linear Algebra
• Positive definite, indefinite, quasidefinite sys-
tems, Cholesky factorization
• Sparse Matrix Techniques
– LU decomp. (unsymmetric matrices)
– Cholesky decomp. (symmetric matrices)
• Reordering for sparsity
(minimum degree, nested dissection)
Applications
• Data mining: Support Vector Machines
• Markowitz portfolio optimization2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Optimization
Consider the general optimization problem
min f(x)
s.t. g(x) ≤ 0,
where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm
are convex, twice differentiable.
Basic Assumptions:
f and g are convex
⇒ If there exists a local minimum then it is a
global one.
f and g are twice differentiable
⇒ We can use the second order Taylor
approximations of them.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Glossary
LP: Linear Programming
both f and g are linear.
QP: Quadratic Programming
f is quadratic and g is linear.
NLP: Nonlinear Programming
f or g is nonlinear.
SDP: Semidefinite Programming
f, g are functions of positive definite matrices.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convexity
Convexity is a key property in optimization.
Def. A set C ⊂ Rn is convex if
λx + (1 − λ)y ∈ C, ∀x, y ∈ C, ∀λ ∈ [0,1].
y
z
x
x
y
z
Convex set Nonconvex set
Def. Let C be a convex subset of Rn.
A function f : C 7→ R is convex if
f(λx+(1−λ)y)≤λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].
x z y x z y
Convex function Nonconvex function5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convexity (cnt’d)
Def. Let C be a convex subset of Rn.
A function f : C 7→ R is concave if
f(λx+(1−λ)y)≥λf(x)+(1−λ)f(y), ∀x, y∈C, ∀λ∈[0,1].
Remark. A function f : C 7→ R is concave if and
only if function −f is convex.
Def. Let C be a convex subset of Rn.
A function f : C 7→ R is strictly convex if
f(λx+(1−λ)y)<λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).
Def. Let C be a convex subset of Rn.
A function f : C 7→ R is strictly concave if
f(λx+(1−λ)y)>λf(x)+(1−λ)f(y), ∀x,y∈C, ∀λ∈(0,1).
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convexity and Optimization
Consider a problem
min f(x) s.t. x ∈ X,
where X is a set of feasible solutions
and f : X → R is an objective function.
Def. A vector x is a local minimum of f if
∃ǫ > 0 such that f(x) ≤ f(x), ∀x | ‖x − x‖ < ǫ.
Def. A vector x is a global minimum of f if
f(x) ≤ f(x), ∀x ∈ X.
Lemma. If X is a convex set and f : X 7→ R
is a convex function, then a local minimum is a
global minimum.
Proof. Suppose that x is a local minimum, but
not a global one. Then ∃y 6=x such that f(y)<
f(x). From convexity of f , we have ∀λ∈ [0,1]
f((1−λ)x+λy) ≤ (1−λ)f(x)+λf(y)
< (1−λ)f(x)+λf(x) = f(x).
In particular, for a sufficiently small λ, the point
z = (1−λ)x+λy lies in the ǫ-neighbourhood of x
and f(z) < f(x). This contradicts the assump-
tion that x is a local minimum. 7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Properties
1. For any collection {Ci | i ∈ I} of convex sets,
the intersection⋂
i∈I Ci is convex.
2. The vector sum {x1 + x2 | x1 ∈ C1, x2 ∈ C2}of two convex sets C1 and C2 is convex.
3. The image of a convex set under a linear
transformation is convex.
4. If C is a convex set and f : C 7→ R is a convex
function, the level sets {x ∈ C | f(x) ≤ α} and
{x ∈ C | f(x) < α} are convex for all scalars α.
5. For any collection {fi : C 7→ R | i ∈ I} of
convex functions, the weighted sum, with pos-
itive weights wi > 0, i ∈ I, i.e. the function
f =∑
i∈I wifi : C 7→ R, is convex.
6. If I is an index set, C ∈ Rn is a convex set,
and fi : C 7→ R is convex ∀i ∈ I, then the function
h : C 7→ R defined by
h(x) = supi∈I
fi(x)
is also convex.8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Differentiable Convex Fnctns
7. Let C ∈ Rn be a convex set and f : C 7→ R be
differentiable over C.
(a) The function f is convex if and only if
f(y) ≥ f(x) + ∇Tf(x)(y − x), ∀x, y ∈ C.
(b) If the inequality is strict for x 6= y, then f is
strictly convex.
8. Let C ∈ Rn be a convex set and f : C 7→ R be
twice continuously differentiable over C.
(a) If ∇2f(x) is positive semidefinite for all x ∈ C,
then f is convex.
(b) If ∇2f(x) is positive definite for all x ∈ C,
then f is strictly convex.
(c) If f is convex, then ∇2f(x) is positive semi-
definite for all x ∈ C.
9. Let C ∈ Rn be a convex set and Q a square
matrix. Let f(x) = xTQ x be a quadratic function
f : C 7→ R.
(a) f is convex iff Q is positive semidefinite.
(b) f is strictly convex iff Q is positive definite.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proof of Property 4
Define
Xα = {x ∈ C : f(x) ≤ α}.
We will prove that Xα is convex.
Take any x, y ∈ Xα. From the definition of Xα
we get that f(x) ≤ α and f(y) ≤ α.
Take any λ ∈ [0,1] and define z = (1− λ)x + λy.
From the convexity of f we get
f(z) = f((1−λ)x+λy)
≤ (1−λ)f(x)+λf(y)
≤ (1−λ)α+λα = α.
Hence z ∈ Xα which completes the proof.
The proof for a strong inequality is identical.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proof of Property 7 (a)
Part 1 ( ⇒ )
Take any x, y ∈ C, and any λ ∈ [0,1].
From convexity of f we get
f(x + λ(y − x)) ≤ (1 − λ)f(x) + λf(y).
Hence
f(x+λ(y−x))−f(x)≤λ(f(y)−f(x))
and
f(x + λ(y − x)) − f(x)
λ≤ f(y) − f(x).
Let λ → 0+. Then the left hand side becomes
∇Tf(x)(y−x) (a derivative of f in direction y−x)
implying
∇Tf(x)(y − x) ≤ f(y) − f(x),
which completes this part of the proof.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proof of Property 7 (a)
Part 2 ( ⇐ )
Take any x, y ∈ C, and any λ ∈ [0,1].
Let z = λx + (1 − λ)y.
Since x − z = x − y − λ(x − y), we have
f(x) ≥ f(z) + ∇Tf(z)(1 − λ)(x − y).
For y − z = −λ(x − y), we have
f(y) ≥ f(z) + ∇Tf(z)(−λ)(x − y).
Having multiplied the first inequality by λ and
the second by 1 − λ and having added them we
get
λf(x) + (1 − λ)f(y) ≥ f(z),
which proves the convexity of f .
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
More on Convexity
Def. Let C be a convex subset of Rn.
A function f : C 7→ R is quasi-convex if
f(λx+(1−λ)y)≤max{f(x),f(y)}, ∀x, y∈C, ∀λ∈[0,1].
Quasi-convex fctn Quasi-concave fctn
Lemma. Let C be a nonempty convex set. A
function f : C 7→ R is quasi-convex if and only if
the level set Sα = {x ∈ C | f(x) ≤ α} is convexfor every real number α.
Def. Let C be a convex subset of Rn.A differentiable function f : C 7→ R is called
pseudo-convex if for any x, y∈C, the inequality
∇Tf(x)(y − x) ≥ 0
implies thatf(y) ≥ f(x).
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Linear Programming
Consider a Linear Program (LP)
min cTx
s. t. Ax = b,
x ≥ 0,
where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.
Matrix A has a full row rank, m (m ≤ n).
Let P be the (primal) feasible set:
P = {x ∈ Rn |Ax = b, x ≥ 0},and P0 be the (primal) strictly feasible set:
P0 = {x ∈ Rn |Ax = b, x > 0}.
Lemma. P is a convex set.
Proof. Note that a linear function is convex.
Thus P is an intersection of convex sets and from
Property 1 it is convex.
Corollary. LP is a convex optimization problem.
Proof. The objective function is linear hence
convex. From Lemma, the feasible set of an LP
is also convex.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Quadratic Program
Def. A matrix H ∈ Rn×n is positive definite if
xTH x > 0 for any x 6= 0.
Example: H1 is positive definite, H2 is not.
H1 =
[
2 33 5
]
and H2 =
[
2 33 4
]
.
Indeed:
f1(x1, x2) = 2x21 + 6x1x2 + 5x2
2
= 2(x1 +3
2x2)
2 +1
2x22 ≥ 0.
f2(x1, x2) = 2x21 + 6x1x2 + 4x2
2
= 2(x1 +3
2x2)
2 −1
2x22
and this does not have to be nonnegative.
From Property 9, the function f(x) = xTQ x is
convex iff Q is positive semidefinite. The QP
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
is a convex optimization problem iff Q is positive
semidefinite. 15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Nonlinear Program
Consider a general optimization problem
min f(x)
s.t. g(x) ≤ 0,
where x∈Rn, and f : Rn 7→R and g : Rn 7→Rm.
Lemma. If f : Rn 7→ R and g : Rn 7→ Rm are
convex, then the above problem is convex.
Proof. Since the objective function f is convex,
we only need to prove that the feasible set of the
above problem
X = {x ∈ Rn : g(x) ≤ 0}
is convex. Define for i = 1,2, ..., m
Xi = {x ∈ Rn : gi(x) ≤ 0}.
From Property 4, Xi is convex for all i.
We observe that
X = {x ∈ Rn : gi(x) ≤ 0, ∀i = 1..m} =⋂
i
Xi.
i.e., X is an intersection of convex sets and from
Property 1, X is a convex set.
16
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 2:
Duality
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Lagrangian
Consider a general optimization problem
min f(x)
s.t. g(x) ≤ 0, (1)
x ∈ X ⊆ Rn,
where f : Rn 7→R and g : Rn 7→Rm.
The set X is arbitrary; it may include, for exam-
ple, an integrality constraint.
The constraint g(x) ≤ 0 is understood as:
gi(x) ≤ 0, ∀i = 1,2, ..., m,
i.e., as m inequalities.
Let x be an optimal solution of (1) and define
f = f(x).
Introduce the Lagrange multiplier yi ≥ 0 for
every inequality constraint gi(x) ≤ 0.
Define y = (y1, . . . , ym)T and the Lagrangian
L(x, y) = f(x) + yTg(x),
y are also called dual variables.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Lagrangian Duality
Consider the problem
LD(y) = minx
L(x, y)
x ∈ X ⊆ Rn.
Its optimal solution x depends on y and so doesthe optimal objective LD(y).
Lemma. For any y ≥ 0, LD(y) is a lower boundon f (the optimal solution of (1)), i.e.,
f ≥ LD(y) ∀y ≥ 0.
Proof.
f = min {f(x) | g(x) ≤ 0, x ∈ X}
≥ min{
f(x) + yTg(x) | g(x) ≤ 0, y ≥ 0, x ∈ X}
≥ min{
f(x) + yTg(x) | y ≥ 0, x ∈ X}
= LD(y).
Corollary.f ≥ max
y≥0LD(y),
i.e.,f ≥ max
y≥0minx∈X
L(x, y).
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Lagrangian Duality
Observe that:
If ∃i gi(x) > 0, then
maxy≥0
L(x, y) = +∞
(we let the corresponding yi grow to +∞).
If ∀i gi(x) ≤ 0, then
maxy≥0
L(x, y) = f(x),
because ∀i yigi(x) ≤ 0 and the maximum is at-
tained when
yigi(x) = 0, ∀i = 1,2, ..., m.
Hence the problem (1) is equivalent to the fol-
lowing MinMax problem
minx∈X
maxy≥0
L(x, y),
which could also be written as follows:
f = minx∈X
maxy≥0
L(x, y).
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Weak Duality
Consider the following problem
min {f(x) | g(x) ≤ 0, x ∈ X} ,
where f , g and X are arbitrary.
With this problem we associate the Lagrangian
L(x, y) = f(x) + yTg(x),
y are dual variables (Lagrange multipliers).
The weak duality always holds:
minx∈X
maxy≥0
L(x, y) ≥ maxy≥0
minx∈X
L(x, y).
Observe that we have not made any assumption
about functions f and g and set X.
If f and g are convex, X is convex and certain
regularity conditions are satisfied, then
minx∈X
maxy≥0
L(x, y) = maxy≥0
minx∈X
L(x, y).
This is called the strong duality.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Notation
Consider again the problem
min f(x)
s.t. g(x) ≤ 0,
x ∈ X ⊆ Rn,
where f : Rn 7→R and g : Rn 7→Rm.
Take x ∈ X ⊆ Rn and y ∈ Y = {y ∈ Rm, y ≥ 0}and write the Lagrangian
L(x, y) = f(x) + yTg(x).
Define the primal function
LP (x) =
{
f(x) if ∀i gi(x) ≤ 0+∞ if ∃i gi(x) > 0.
Observe that
LP (x) = maxy≥0
L(x, y). (2)
Define the dual function
LD(y) = minx∈X
L(x, y). (3)
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal & Dual Problems
The problem (1) can be formulated as looking
for x ∈ X ⊆ Rn such that
LP (x) = minx∈X
LP (x).
It is called the primal problem.
The problem
LD(y) = maxy≥0
LD(y).
is called the dual problem.
The weak duality can be rewritten as:
LP (x) ≥ LD(y).
Def. Primal feasible set.
XP = {x : x ∈ X, gi(x) ≤ 0, i = 1,2, . . . , m}.
Def. Dual feasible set. A tuple (x, y) ∈ Rn+m
is feasible for the dual problem if
(x,y)∈YD = {(x,y): x∈X, y∈Y, LD(y)=L(x,y)}.
Def. Dual optimal solution.
A tuple (x, y) ∈ Rn+m is called dual optimal if
(x, y) ∈ YD and y maximizes LD(y).7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal-Dual Bounds
Lemma. If x1 ∈ XP and (x2, y2) ∈ YD (i.e., x1 is
primal feasible and (x2, y2) is dual feasible), then
LP (x1) ≥ LD(y2).
Proof. Since x1 ∈ XP we get LP (x1) = f(x1).
For any y ∈ Y , from definition (2) we have
LP (x1) ≥ L(x1, y). In particular, for y = y2:
LP (x1) ≥ L(x1, y2). (4)
On the other hand, (x2, y2) ∈ YD hence for any
x ∈ X from (3) we have L(x, y2) ≥ LD(y2) and,
in particular, for x = x1:
L(x1, y2) ≥ LD(y2). (5)
From (4) and (5) we get
f(x1) = LP (x1) ≥ L(x1, y2) ≥ LD(y2),
which completes the proof.
Any primal feasible solution provides an upper
bound for the dual problem, and
any dual feasible solution provides a lower
bound for the primal problem.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Duality and Convexity
Recall that the weak duality holds regardless of
the form of functions f , g and set X:
minx∈X
maxy≥0
L(x, y) ≥ maxy≥0
minx∈X
L(x, y).
What do we need to assume for the inequality in
the weak duality to become an equation?
If
• X ⊆ Rn is convex;
• f and g are convex;
• optimal solution is finite;
• some mysterious regularity conditions hold,
then strong duality holds.
That is
minx∈X
maxy≥0
L(x, y) = maxy≥0
minx∈X
L(x, y).
An example of regularity conditions:
∃x ∈ int(X) such that g(x) < 0.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Geometric View
Consider a mapping which for any x ∈ X defines
a point in Rm+1 of the form (g(x), f(x)).
We write x 7→ (g, f). Let H be the image of X.
In the example below n = 2 and m = 1. Hence:
x ∈ X ⊆ R2 and f : R2 7→ R and g : R2 7→ R.
Lagrange multiplier: y ∈ R (y ≥ 0).
x x[g( ),f( )]
yL ( )D
x
x
x
1
2f
X
(g,f)
f+yg = constg
(g,f)H
slope: -y
slope: -y
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Figure Interpretation
Primal problem:
We look for a point (g, f) ∈ H such that
g ≤ 0 and f attains its minimum.
This is the point (g, f) in the Figure.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Figure Interpretation
Dual problem:
Take y≥0. To find LD(y), we need to minimize
f(x) + yg(x) with respect to x ∈ X. This cor-
responds to the minimization of the linear form
f + yg in the set H.
For a given y ≥ 0, the linear form f + yg has
a fixed slope (equal to −y) and the minimum is
attained when the line f + yg touches the bot-
tom of H. We say that “the hyperplane f + yg
supports the set H”.
The intersection of the supporting plane and the
f line determines the value of LD(y).
The dual problem consists in finding such a slope
y that LD(y) is maximized, i.e., the intersection
of the supporting plane and the f axis is the
highest possible.
There are two supporting hyperplanes in the Fig-
ure. The one corresponding to y corresponds to
the maximum of LD(y).
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Nonzero Duality Gap
When sufficient conditions for strong duality are
not satisfied, we may observe a nonzero duality
gap:
minx∈X
maxy≥0
L(x, y) − maxy≥0
minx∈X
L(x, y) > 0.
In the Figure below:
f − LD(y) > 0.
x x[g( ),f( )]
yL ( )D
x
x2f
X
(g,f)
g
(g,f)H
slope: -y
x1
f
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Read more on duality
1. Bertsekas, D., Nonlinear Programming,
Athena Scientific, Massachusetts, 1995.
ISBN 1-886529-14-0, pages 415-486.
2. Hillier, F.S. and Lieberman, G.J.,
Introduction to Operations Research,
7th edition, McGraw Hill, 2001.
ISBN 0-07-232169-5, pages 230-308.
14
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 3:
Duality
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Equality Constraints
Let h : Rn 7→ Rk define an equality constraint
h(x) = 0 (understood as hj(x) = 0, j = 1, ..., k).
Replace hj(x) = 0 with two inequalities:
hj(x) ≤ 0 and − hj(x) ≤ 0.
Then the optimization problem
min f(x)
s.t. g(x) ≤ 0,
h(x) = 0,
x ∈ X ⊆ Rn,
where f : Rn 7→R, g : Rn 7→Rm and h : Rn 7→Rk,
becomes:
min f(x)
s.t. g(x) ≤ 0,
h(x) ≤ 0,
−h(x) ≤ 0,
x ∈ X ⊆ Rn.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Equality Constraints (cont’d)
Use nonnegative Lagrange multipliers y ∈ Rm for
g constraints.
Use a pair of Lagrange multipliers u+j ≥ 0 and
u−j ≥ 0 for inequalities hj(x) ≤ 0 and −hj(x) ≤
0, respectively. In other words, use two vectors
u+ ≥ 0 and u− ≥ 0, both in Rk and write the
Lagrangian
L(x, y, u+,u−) = f(x)+yTg(x)+(u+)Th(x)−(u−)Th(x)
= f(x)+yTg(x)+(u+− u−)Th(x)
= f(x)+yTg(x)+uTh(x),
where the vector u = u+ − u− ∈ Rk has no sign
restriction.
The Lagrangian becomes:
L(x, y, u) = f(x)+yTg(x)+uTh(x),
and all theoretical results derived earlier can be
replicated for this new problem formulation.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Wolfe Duality
Lagrange duality does not need differentiability.
Suppose f and g are convex and differentiable.
Suppose X is convex.
The dual function
LD(y) = minx∈X
L(x, y).
requires minimization with respect to x.
Instead of minimization with respect to x,
we ask for a stationarity with respect to x:
∇xL(x, y) = 0.
Lagrange dual problem:
maxy≥0
LD(y)
(
i.e., maxy≥0
minx∈X
L(x, y)
)
.
Wolfe dual problem:
max L(x, y)
s.t. ∇xL(x, y) = 0
y ≥ 0.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Duality: Example
Consider the nonlinear program:
min f(x) = x21 + x2
2 s.t. x1 + x2 ≥ 1.
f(x) = x21+x2
2 and g(x) = 1−x1−x2 are convex.
Observe that x=0 is an unconstrained minimizer
but this point does not satisfy the constraint.
The solution must therefore lie on the boundary
of the feasible region and satisfy x1 + x2 = 1. It
is easy to find that x = (0.5, 0.5) and f = 0.5.
Lagrangian:
L(x, y) = x21 + x2
2 + y(1 − x1 − x2).
The Lagrangian dual:
LD(y) = minx
[x21 + x2
2 + y(1 − x1 − x2)].
For any y the Lagrangian L(x, y) is convex in x.
We can use the stationarity condition to replace
the minimization. We write:
∇xL(x, y) =
[
2x1 − y
2x2 − y
]
=
[
00
]
,
which gives x1 = 0.5y and x2 = 0.5y.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example (continued)
Having substituted x1 = 0.5y and x2 = 0.5y, we
obtain:
LD(y) = y −1
2y2.
The dual problem
maxy≥0
LD(y),
thus becomes
maxy≥0
[y −1
2y2].
It has a trivial solution y = 1.
We observe that LD(y) = 12
= f . Indeed, in this
easy convex program, the duality gap is zero, i.e.,
the strong duality holds.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual Linear Program
Consider a linear program
min cTx
s.t. Ax = b,
x ≥ 0,
where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.
We associate Lagrange multipliers y ∈ Rm and
s ∈ Rn (s ≥ 0) with the constraints Ax = b and
x ≥ 0, and write the Lagrangian
L(x, y, s) = cTx − yT (Ax − b) − sTx.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual LP (cont’d)
To determine the Lagrangian dual
LD(y, s) = minx∈X
L(x, y, s)
we need stationarity with respect to x:
∇xL(x, y, s) = c − ATy − s = 0.
Hence
LD(y, s) = cTx − yT (Ax − b) − sTx
= bTy + xT(c − ATy − s) = bTy.
and the dual problem has a form:
max bTy
s.t. ATy + s = c,
y free, s ≥ 0,
where y ∈ Rm and s ∈ Rn.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Duality in LP
Consider a primal program
min cTx
s.t. Ax = b,
x ≥ 0,
(1)
where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n.
With the primal we associate a dual program
max bTy
s.t. ATy ≤ c,
y free,
where y∈Rm. We add dual slack variables s∈Rn,
s ≥ 0 to convert inequality constraints ATy ≤ c
into equalities ATy + s = c and get an equivalent
dual program
max bTy
s.t. ATy + s = c,
y free, s ≥ 0,
(2)
where y ∈ Rm and s ∈ Rn.
Let P, D be the feasible sets of the primal and
the dual, respectively:
P = {x ∈ Rn |Ax = b, x ≥ 0}D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Weak & Strong Duality in LP
Let us introduce a convention thatinfx∈P
cTx = +∞, if P = ∅; supy∈D bTy = −∞, if D = ∅.
Weak Duality Theorem
infx∈P
cTx ≥ supy∈D
bTy.
Strong Duality Theorem
If either P 6= ∅ or D 6= ∅ then
infx∈P
cTx = supy∈D
bTy.
If one of problems (1) and (2) is solvable then
minx∈P
cTx = maxy∈D
bTy.
In IPMs we shall use the term interior-point.
Let P0, D0 be the strictly feasible sets of the
primal and the dual, respectively:
P0 = {x ∈ Rn |Ax = b, x > 0}D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.
We shall often refer to the primal-dual pair.
Hence we define primal-dual feasible set F and
primal-dual strictly feasible set F0:
F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Weak Duality for LP
Weak Duality Theorem
Let the primal-dual pair of linear programs be
given. If x ∈ P and (y, s) ∈ D, then
bTy ≤ cTx.
Proof.
Since (y, s) ∈ D, we have
ATy ≤ c.
By multiplying each of these n inequalities by an
appropriate xj, j = 1,2, ..., n and adding them up
(note that x ≥ 0 since x ∈ P), we obtain
xTATy ≤ cTx.
x ∈ P implies that Ax = b hence xTATy = bTy.
Thus we finally get
bTy ≤ cTx.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Complementarity, Optimality
Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).
The simplex method maintains complementarity
xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,
xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .
Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.
Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.
Note that the reduced costs:
d =
[
dB
dN
]
=
[
cB
cN
]
−
[
BT
NT
]
· y = c − ATy,
are dual slack variables.
At optimality: d ≥ 0.12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual Quadratic Program
Consider a quadratic program
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.
We associate Lagrange multipliers y ∈ Rm and
s ∈ Rn (s ≥ 0) with the constraints Ax = b and
x ≥ 0, and write the Lagrangian
L(x, y, s) = cTx +1
2xTQ x − yT(Ax−b) − sTx.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual QP (cont’d)
To determine the Lagrangian dual
LD(y, s) = minx∈X
L(x, y, s)
we need stationarity with respect to x:
∇xL(x, y, s) = c + Qx − ATy − s = 0.
Hence
LD(y, s) = cTx + 12xTQ x − yT(Ax−b) − sTx
= bTy + xT(c+Qx−ATy−s)− 12xTQ x
= bTy − 12xTQ x,
and the dual problem has the form:
max bTy − 12xTQ x
s.t. ATy + s − Qx = c,
x, s ≥ 0,
where y ∈ Rm and x, s ∈ Rn.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Primal-Dual Pairs
Linear programs:
The primal
min cTx
s.t. Ax = b,
x ≥ 0,
and the dual
max bTy
s.t. ATy + s = c,
x, s ≥ 0.
Convex quadratic programs:
The primal
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
and the dual
max bTy − 12xTQ x
s.t. ATy + s − Qx = c,
x, s ≥ 0.
15
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 4:
IPM for LP: Motivation
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Simplex: What’s wrong?
A vertex is defined by a set of n equations:[
B N
0 In−m
] [
xB
xN
]
=
[
b
0
]
.
The linear program with m constraints and n
variables (n ≥ m) has at most
NV =
(
nm
)
=n!
m!(n − m)!
vertices.
The simplex method can make a non-polynomial
number of iterations to reach the optimality:
V. Klee and G. Minty gave an example LP
the solution of which needs 2n
iterations:
How good is the simplex algorithm,
in: Inequalities-III, O. Shisha, ed.,
Academic Press, 1972, 159–175.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Simplex: What’s wrong?
Narendra Karmarkar from AT&T Bell Labs:
“the simplex [method] is complex”
N. Karmarkar:
A New Polynomial–time Algorithm for LP,
Combinatorica 4 (1984) 373–395.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
“Elements” of the IPM
What do we need
to derive the Interior Point Method?
• duality theory:
Lagrangian function;
first order optimality conditions.
• logarithmic barriers.
• Newton method.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Optimality Conditions in LP
Consider the primal-dual pair:
Primal Dual
min cTx max bTy
s.t. Ax = b, s.t. ATy + s = c,
x≥0; s≥0.
Lagrangian
L(x, y) = cTx − yT(Ax − b).
Optimality Conditions in LP
Ax = b,
ATy + s = c,
XSe = 0,
x ≥ 0,
s ≥ 0,
where X = diag{x1, · · · , xn}, S = diag{s1, · · · , sn}and e = (1,1, · · · ,1) ∈ Rn.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Complementarity
Recall that the Simplex Method works with apartitioned formulation:
LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).
Dual variables are defined as follows:
BTy = cB.
Hence the reduced costs for basic variables are
dTB = cT
B − yTB
= cTB − cT
B
= 0.
Thus, for basic variables, dB = 0 and
(xB)j · (dB)j = 0 ∀j ∈ B.
For non-basic variables, xN = 0 hence
(xN)j · (dN)j = 0 ∀j ∈ N .
The simplex method maintains the complemen-tarity of primal and dual solutions:
xj · dj = 0 ∀j = 1,2, ..., n.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Complementarity, Optimality
Partitioning:LP constraint matrix A = [B, N ],primal variables x = (xB, xN),reduced costs d = (dB, dN).
The simplex method maintains complementarity
xB 6= 0, dB = 0 ⇒ (xB)j · (dB)j = 0, ∀j ∈ B,
xN = 0, dN 6= 0 ⇒ (xN)j · (dN)j = 0, ∀j ∈ N .
Primal simplex method:• maintains primal feasibility: Ax = b;• maintains complementarity: xjdj = 0, ∀j;• seeks dual feasibility: dN ≥ 0.
Dual simplex method:• maintains dual feasibility: dN ≥ 0;• maintains complementarity: xjdj = 0, ∀j;• seeks primal feasibility: Ax = b.
Note that the reduced costs:
d =
[
dB
dN
]
=
[
cB
cN
]
−[
BT
NT
]
· y = c − ATy,
are dual slack variables.At optimality: d ≥ 0.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Logarithmic barriers
The following logarithmic barrier − lnxjadded to the objective in the optimization prob-
lem prevents variable xj from approaching zero.
x
−ln x
1
In other words, the logarithmic barrier can be
used to “replace” the inequality
xj ≥ 0.
Observe that
min e−
∑nj=1
lnxj ⇐⇒ maxn∏
j=1
xj
The minimization of −∑nj=1 ln xj is equivalent
to the maximization of the product of distances
from all hyperplanes defining the positive orthant:
it prevents all xj from approaching zero.8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Use Logarithmic Barriers
Replace the primal LP
min cTx
s.t. Ax = b,
x ≥ 0,
with the primal barrier program
min cTx −n∑
j=1
lnxj
s.t. Ax = b.
Replace the dual LP
max bTy
s.t. ATy + s = c,
y free, s ≥ 0,
with the dual barrier program
max bTy +n∑
j=1ln sj
s.t. ATy + s = c.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
First Order Optimality Conds
Consider the primal barrier program
min cTx − µn∑
j=1ln xj
s.t. Ax = b,
where µ ≥ 0 is a barrier parameter.
Write out the Lagrangian
L(x, y, µ) = cTx − yT(Ax − b) − µn∑
j=1
lnxj,
and the conditions for a stationary point
∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,
where X−1 = diag{x−11 , x−1
2 , · · · , x−1n }.
Let us denote
s = µX−1e, i.e. XSe = µe.
The First Order Optimality Conditions are:
Ax = b,
ATy + s = c,
XSe = µe.10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Central Trajectory
Note that the first order optimality conditions for
the barrier problemAx = b,
ATy + s = c,
XSe = µe,
approximate the first order optimality conditions
for the linear programAx = b,
ATy + s = c,
XSe = 0,
more and more closely as µ goes to zero.
Parameterµ controls the distance to optimality.
cTx−bTy = cTx−xTATy = xT(c−ATy) = xTs = nµ.
Analytic center (µ-center): a (unique) point
(x(µ), y(µ), s(µ)), x(µ) > 0, s(µ) > 0
that satisfies FOC.
The path
{(x(µ), y(µ), s(µ)) : µ > 0}is called the primal-dual central trajectory.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method
We use Newton Method to find a stationary
point of the barrier problem.
Recall how to use Newton Method to find a root
of a nonlinear equation
f(x) = 0.
A tangent line
z − f(xk) = ∇f(xk) · (x − xk)
is a local approximation of the graph of the func-
tion f(x). Substituting z = 0 gives a new point
xk+1 = xk − (∇f(xk))−1f(xk).
x
f(x)
xk xk+1 xk+2
f(x )k+2
f(x )k+1
f(x )k
k
z
k kz-f(x ) = f(x )(x-x )
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Apply Newton M. to the FOC
The first order optimality conditions for the bar-
rier problem form a large system of nonlinear
equationsF(x, y, s) = 0,
where F : R2n+m 7→ R2n+m is an application
defined as follows:
F(x, y, s) =
Ax − b
ATy + s − c
XSe − µe
.
Actually, the first two terms of it are linear; only
the last one, corresponding to the complemen-
tarity condition, is nonlinear.
Note that
∇F(x, y, s) =
A 0 0
0 AT I
S 0 X
.
Thus, for a given point (x, y, s) we find the New-
ton direction (∆x,∆y,∆s) by solving the system
of linear equations:
A 0 0
0 AT I
S 0 X
·
∆x
∆y
∆s
=
b − Ax
c − ATy − s
µe − XSe
.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior-Point Framework
We have already gathered all the necessary
elements to derive an interior point method.
The logarithmic barrier
− lnxj
added to the objective in the optimization prob-
lem prevents variable xj from approaching zero
and “replaces” the inequality
xj ≥ 0.
We derive the first order optimality conditionsfor the primal barrier problem:
Ax = b,
ATy + s = c,
XSe = µe,
and apply Newton method to solve this system
of nonlinear equations.
Actually, we fix the barrier parameter µ and make
only one (damped) Newton step towards the so-
lution of FOC. We do not solve the current FOC
exactly. Instead, we immediately reduce the bar-
rier parameter µ (to ensure progress towards op-
timality) and repeat the process.14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior Point Algorithm
Initialize
k = 0
(x0, y0, s0) ∈ F0
µ0 = 1n· (x0)T s0
α0 = 0.9995
Repeat until optimality
k = k + 1
µk = σµk−1, where σ ∈ (0,1)
∆ = Newton direction towards µ-center
Ratio test:
αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.
Make step:
xk+1 = xk + α0αP∆x,
yk+1 = yk + α0αD∆y,
sk+1 = sk + α0αD∆s.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior Point Method
• Lagrange (1788)
handling equality constraints - multipliers
minimization with equality constraints
replaced with unconstrained minimization
• Fiacco & McCormick (1968)
handling inequality constraints - log barrier
minimization with inequality constraints
replaced with a sequence of unconstrained
minimizations
• Newton (1687)
solving unconstrained minimization problems
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior Point Method
x
−ln x
1
� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �
� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �� � � � � � � � � � �
Analytic Center
min e−
∑nj=1
lnxj ⇐⇒ maxn∏
j=1
xj
Advantages of IPMs:
suitable for very large problems;
natural extension from LP via QP to NLP.
Iterations to reach optimum:
Theory PracticeSize o(
√n) o(log10 n)
1000 C × 32 10-2010000 C × 100 20-40100000 C × 320 30-501000000 C × 1000 40-60 17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Approaching Optimality
Simplex Method:
Basic: x > 0, s = 0 Nonbasic: x = 0, s > 0
s
x
s
x
Interior Point Method:
"Nonbasic": x = 0, s > 0"Basic": x > 0, s = 0
s
x
s
x
18
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Notations
A vector of ones: e = (1,1, · · · ,1) ∈ Rn.
X = diag{x1, x2, · · · , xn} =
x1
x2. . .
xn
.
X−1 = diag{x−11 , x−1
2 , · · · , x−1n }.
An equation XSe = µe,
is equivalent to xjsj = µ, ∀j = 1,2, · · · , n.
Primal feasible set
P = {x ∈ Rn |Ax = b, x ≥ 0}.Primal strictly feasible set
P0 = {x ∈ Rn |Ax = b, x > 0}.Dual feasible set
D = {y ∈ Rm, s ∈ Rn |ATy + s = c, s ≥ 0}.Dual strictly feasible set
D0 = {y ∈ Rm, s ∈ Rn |ATy + s = c, s > 0}.
Primal-dual feasible set
F = {(x, y, s) |Ax = b, ATy + s = c, (x, s) ≥ 0}.Primal-dual strictly feasible set
F0 = {(x, y, s) |Ax = b, ATy + s = c, (x, s) > 0}.19
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 5:
Path-following Method: Theory
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Path-Following Algorithm
The analysis given in this lecture comes from the
book of Steve Wright:
Primal-Dual Interior-Point Methods,
SIAM Philadelphia, 1997.
We analyze a feasible interior-point algorithm
with the following properties:
• all its iterates are feasible and stay in a close
neighbourhood of the central path;
• the iterates follow the central path towards
optimality;
• systematic (though very slow) reduction of
duality gap is ensured.
This algorithm is called
the short-step path-following method.
Indeed, it makes very slow progress (short-steps)
to optimality.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Central Path Neighbourhood
Assume a primal-dual strictly feasible solution
(x, y, s) ∈ F0 lying in a neighbourhood of the
central path is given; namely (x, y, s) satisfies:
Ax = b,
ATy + s = c,XSe ≈ µe.
We define a θ-neighbourhood of the central
path N2(θ), a set of primal-dual strictly feasible
solutions (x, y, s) ∈ F0 that satisfy:
‖XSe − µe‖ ≤ θµ,
where θ ∈ (0,1) and the barrier µ satisfies:
xTs = nµ.
Hence N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}.
2θN ( ) neighbourhoodof the central path 3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Progress towards optimality
Assume a primal-dual strictly feasible solution
(x, y, s) ∈ N2(θ) for some θ ∈ (0,1) is given.
Interior point algorithm tries to move from this
point to another one that also belongs to a θ-neighbourhood of the central path but corre-
sponds to a smaller µ. The required reduction
of µ is small:
µk+1 = σµk,
whereσ = 1 − β/
√n,
for some β ∈ (0,1).
Given a new µ-center, interior point algorithm
computes Newton direction:
A 0 0
0 AT IS 0 X
·
∆x∆y∆s
=
00
σµe − XSe
,
and makes step in this direction.
Magic numbers (will be explained later):
θ = 0.1 and β = 0.1.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
O(√
n) Complexity Result
We will prove the following:
• full step in Newton direction is feasible;
• the new iterate
(xk+1,yk+1,sk+1) = (xk,yk,sk)+(∆xk,∆yk,∆sk)
belongs to a θ-neighbourhood of the new
µ-center (with µk+1 = σµk);
• duality gap is reduced 1 − β/√
n times.
Note that since at one iteration duality gap is
reduced 1 − β/√
n times, after√
n iterations the
reduction achieves:
(1 − β/√
n)√
n ≈ e−β.
After C · √n iterations, the reduction is e−Cβ.
For sufficiently large constant C the reduction
can thus be arbitrarily large (i.e. the duality gap
can become arbitrarily small).
Hence this algorithm has complexity O(√
n).
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Technical Results
Lemma 1
Newton direction (∆x,∆y,∆s) defined by the
equation system
A 0 0
0 AT IS 0 X
·
∆x∆y∆s
=
00
σµe − XSe
, (1)
satisfies:
∆xT∆s = 0.
Proof:
From the first two equations in (1) we get
A∆x = 0 and ∆s = −AT∆y.
Hence
∆xT∆s = ∆xT · (−AT∆y) = −∆yT · (A∆x) = 0.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Technical Results (cont’d)
Lemma 2
Let (∆x,∆y,∆s) be the Newton direction that
solves the system (1). The new iterate
(x, y, s) = (x, y, s) + (∆x,∆y,∆s)
satisfiesxT s = nµ,
whereµ = σµ.
Proof: From the third equation in (1) we get
S∆x + X∆s = −XSe + σµe.
By summing the n components of this equation
we obtain
eT(S∆x+X∆s) = sT∆x+xT∆s = −eTXSe+σµeTe
= −xT s + nσµ = −xTs · (1 − σ).
ThusxT s = (x + ∆x)T (s + ∆s)
= xT s + (sT∆x + xT∆s) + (∆x)T∆s
= xTs + (σ − 1)xT s + 0 = σxT s,
which is equivalent to:
nµ = σnµ.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Reminder: Norms
Norms of the vector x ∈ Rn.
‖x‖ = (n∑
j=1x2
j )1/2
‖x‖∞ = maxj∈{1..n}
|xj|
‖x‖1 =n∑
j=1|xj|
Note that for any x ∈ Rn:
‖x‖∞ ≤ ‖x‖1‖x‖1 ≤ n· ‖x‖∞‖x‖∞ ≤ ‖x‖‖x‖ ≤ √
n· ‖x‖∞‖x‖ ≤ ‖x‖1‖x‖1 ≤ √
n· ‖x‖
Recall triangle inequality.
For any vectors p, q and r and for any norm ‖.‖‖p − q‖ ≤ ‖p − r‖ + ‖r − q‖.
The relation between algebraic and geometric
means. For any scalars a and b such that ab ≥ 0:√
|ab| ≤ 1
2· |a + b|.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Technical Result (algebra)
Lemma 3 Let u and v be any two vectors in Rn
such that uTv ≥ 0. Then
‖UV e‖ ≤ 2−3/2‖u + v‖2,
where U =diag{u1, · · · , un}, V =diag{v1, · · · , vn}.Proof: Let us partition all products ujvj into
positive and negative ones:
P = {j |ujvj ≥ 0} and M = {j |ujvj < 0} :
0≤uTv=∑
j∈Pujvj +
∑
j∈Mujvj =
∑
j∈P|ujvj| −
∑
j∈M|ujvj|.
We can now write‖UV e‖ = (‖[ujvj]j∈P‖2 + ‖[ujvj]j∈M‖2)1/2
≤ (‖[ujvj]j∈P‖21 + ‖[ujvj]j∈M‖21)1/2
≤ (2‖[ujvj]j∈P‖21)1/2
≤√
2‖[14(uj + vj)
2]j∈P‖1= 2−3/2
∑
j∈P(uj + vj)
2
≤ 2−3/2n
∑
j=1
(uj + vj)2
= 2−3/2‖u + v‖2, as requested.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPM Technical Results (cnt’d)
Lemma 4If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then
(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.
In other words,
minj∈{1..n}
xjsj ≥ (1 − θ)µ, maxj∈{1..n}
xjsj ≤ (1 + θ)µ.
Proof:Since ‖x‖∞ ≤ ‖x‖, from the definition of N2(θ),
N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ},we conclude
‖XSe − µe‖∞ ≤ ‖XSe − µe‖ ≤ θµ.
Hence
|xjsj − µ| ≤ θµ ∀j,
which is equivalent to
−θµ ≤ xjsj − µ ≤ θµ ∀j.
Thus
(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ ∀j.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPM Technical Results (cnt’d)
Lemma 5
If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then
‖XSe − σµe‖2 ≤ θ2µ2 + (1 − σ)2µ2n.
Proof:
Note first that
eT (XSe − µe) = xTs − µeT e = nµ − nµ = 0.
Therefore
‖XSe − σµe‖2
= ‖(XSe−µe) + (1−σ)µe‖2
= ‖XSe−µe‖2+2(1−σ)µeT (XSe−µe)+(1−σ)2µ2eTe
≤ θ2µ2 + (1−σ)2µ2n.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPM Technical Results (cnt’d)
Lemma 6If (x, y, s) ∈ N2(θ) for some θ ∈ (0,1), then
‖∆X∆Se‖ ≤ θ2 + n(1−σ)2
23/2(1 − θ)µ.
Proof: 3rd equation in the Newton system gives
S∆x + X∆s = −XSe + σµe.
Having multiplied it with (XS)−1/2, we obtain
X−1/2S1/2∆x+X1/2S−1/2∆s=(XS)−1/2(−XSe+σµe).
Now apply Lemma 3 for u = X−1/2S1/2∆x and
v = X1/2S−1/2∆s (with uTv = 0 from Lemma 1)to get
‖∆X∆Se‖ = ‖(X−1/2S1/2∆X)(X1/2S−1/2∆S)e‖≤ 2−3/2‖X−1/2S1/2∆x+X1/2S−1/2∆s‖2
= 2−3/2‖X−1/2S−1/2(−XSe + σµe)‖2
= 2−3/2n∑
j=1
(−xjsj+σµ)2
xjsj
≤ 2−3/2‖XSe−σµe‖2minj xjsj
≤ θ2+n(1−σ)2
23/2(1−θ)µ.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Magic Numbers
We have previously set two parameters for the
short-step path-following method:
θ = 0.1 and β = 0.1.
Now it’s time to justify this particular choice.
Lemma 7
If θ = 0.1 and β = 0.1, then
θ2 + n(1−σ)2
23/2(1 − θ)≤ σθ.
Proof:
Recall that
σ = 1 − β/√
n.
Hence
n(1−σ)2 = β2
and for β = 0.1 (for any n ≥ 1)
σ ≥ 0.9.
Substituting θ = 0.1 and β = 0.1, we obtain
θ2+n(1−σ)2
23/2(1 − θ)=
0.12 + 0.12
23/2 · 0.9≤0.02≤0.9 · 0.1≤σθ.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Full Newton step in N2(θ)
Lemma 8Suppose (x, y, s) ∈ N2(θ) and (∆x,∆y,∆s) is theNewton direction computed from the system (1).Then the new iterate
(x, y, s) = (x, y, s) + (∆x,∆y,∆s)
satisfies (x, y, s) ∈ N2(θ), i.e. ‖XSe − µe‖ ≤ θµ.
Proof: From Lemma 2, the new iterate (x, y, s)satisfies
xT s = nµ = nσµ,
so we have to prove that ‖XSe − µe‖ ≤ θµ.For a given component j ∈ {1..n}, we have
xj sj − µ = (xj + ∆xj)(sj + ∆sj) − µ
= xjsj + (sj∆xj + xj∆sj) + ∆xj∆sj−µ= xjsj + (−xjsj + σµ) + ∆xj∆sj − σµ= ∆xj∆sj.
Thus, from Lemmas 6 and 7, we get
‖XSe − µe‖ = ‖∆X∆Se‖≤ θ2+n(1−σ)2
23/2(1−θ)µ
≤ σθµ= θµ.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
A property of log function
Lemma 9
For all δ > −1:
ln(1 + δ) ≤ δ.
Proof:
Consider the function
f(δ) = δ − ln(1 + δ).
Its derivative is:
f′
(δ) = 1 − 1
1 + δ=
δ
1 + δ.
Obviously f′
(δ) < 0 for δ ∈ (−1,0) and f′
(δ) > 0
for δ ∈ (0,∞). Hence f(.) has a minimum at
δ=0. We find that f(δ = 0) = 0. Consequently,
for any δ ∈ (−1,∞), f(δ) ≥ 0, i.e.
δ − ln(1 + δ) ≥ 0.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
O(√
n) Complexity Result
Theorem 10
Given ǫ > 0, suppose that a feasible starting
point (x0, y0, s0) ∈ N2(0.1) satisfies
(x0)T s0 = nµ0, where µ0 ≤ 1/ǫκ,
for some positive constant κ. Then there exists
an index K with K = O(√
n ln(1/ǫ)) such that
µk ≤ ǫ, ∀k ≥ K.
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
O(√
n) Complexity Result
Proof:
From Lemma 2, µk+1 = σµk. Having taken log-
arithms of both sides of this equality we obtain
lnµk+1 = lnσ + lnµk.
By repeatedly applying this formula and using
µ0 ≤ 1/ǫκ, we get
lnµk = k lnσ + lnµ0 ≤ k ln(1− β/√
n) + κ ln(1/ǫ).
From Lemma 9 we have ln(1−β/√
n)≤−β/√
n.
Thus
lnµk ≤ k(−β/√
n) + κ ln(1/ǫ).
To satisfy µk ≤ ǫ, we need:
k(−β/√
n) + κ ln(1/ǫ) ≤ ln ǫ.
This inequality holds for any k ≥ K, where
K =κ + 1
β·√
n · ln(1/ǫ).
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Polynomial Complexity Result
Main ingredients of the polynomial complexity
result for the short-step path-following algorithm:
Stay close to the central path:
all iterates stay in the N2(θ) neighbourhood of
the central path.
Make (slow) progress towards optimality:
reduce systematically duality gap.
µk+1 = σµk,
where
σ = 1 − β/√
n,
for some β ∈ (0,1).
18
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 6:
IPMs: From Theory to Practice
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Proximity to the Central Path
The neighbourhood
N2(θ) = {(x, y, s) ∈ F0 | ‖XSe − µe‖ ≤ θµ}
is very small. In other words, the requirement
that (x, y, s) ∈ N2(θ) is extremely restrictive.
Note (Lemma 4) that if (x, y, s) ∈ N2(θ), then
(1 − θ)µ ≤ xjsj ≤ (1 + θ)µ, ∀j.
For small θ ∈ (0,1) this means that (x, y, s) is an
excellent approximation of the µ-center.
Example:
For n = 106 and θ = 0.1 suppose:
xjsj = 0.9999µ for j ≤ 500,000, and
xjsj = 1.0001µ for j ≥ 500,001.
Then ‖XSe−µe‖ = (106×0.00012µ2)1/2 = 0.1µ.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Wide Neighbourhood
In practice, (x, y, s) can stay quite far away from
the µ-center. The algorithm behaves well as
long as there are not too small complementarity
products compared with the others, i.e., when
(x, y, s) ∈ N∞(γ), where
N∞(γ) = {(x, y, s) ∈ F0 |xjsj ≥ γµ, ∀j},
for some (possibly small) γ ∈ (0,1).
Observe that we limit the complementarity prod-
ucts only from below but there is also an implicit
upper bound on xjsj. Indeed, since
n∑
j=1
xjsj = nµ,
we have xjsj ≤ nµ.
Advice: Use γ = 0.01.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Speed of Convergence
The short-step path following algorithm asks for
a very small reduction of duality gap per itera-
tion. Indeed, the required reduction of µ is:
µk+1 = σµk,
where
σ = 1 − β/√
n,
for some β ∈ (0,1).
Example:
For n = 106 and β = 0.1 we have:
σ = 1 − 0.0001 = 0.9999
hence after 10,000 iterations the duality gap will
be reduced by a factor
(1 − 0.0001)10000 ≈ e−1 = 0.368
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Aggressive targets
In practice, much more important reduction can
be achieved. On the average, the duality gap
is usually reduced by a factor of σ ∈ (0.1,0.5).
Certainly, for a practical algorithm it is absolutely
justified to set the target reduction:
σ = 0.1.
The consequence of such optimistic targets is
unfortunately the loss of the property of being
always able to make the full step in the Newton
direction. Instead, a damped Newton step is
made such that preserves nonnegativity of x and
s.
Advice: Do not use short-step method with
σ = 1 − β/√
n,
Use long-step method with
σ ≪ 1.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Feasible Method
The short-step (feasible) path-following method
we have analysed requires all its iterates to be
strictly feasible:
x ∈ P0={x∈Rn |Ax = b, x > 0}(y, s) ∈ D0={y∈Rm, s∈Rn |ATy + s = c, s > 0}.
In consequence the right hand side of the Newton
equation system has the form:
ξp
ξdξµ
=
b − Ax
c − ATy − sσµe − XSe
=
00ξµ
.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Infeasible Method
Th feasibility requirement can be relaxed. It is
possible to generalize the notion of µ-center as
well as that of the central path for infeasible
points (x, y, s).
The Newton direction is then computed from the
following equation system
A 0 0
0 AT IS 0 X
·
∆x∆y∆s
=
b − Ax
c − ATy − sσµe − XSe
.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Further Practical Issues
Linear Algebra
Predictor-Corrector Technique
Multiple Centrality Correctors
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Project: IPMs in the Internet
I encourage you to do this one-hour project.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Project: IPMs in the Internet
IPMs in the Internet:
• LP FAQ (Frequently Asked Questions):
http://www-unix.mcs.anl.gov/otc/Guide/faq/
• Interior Point Methods On-Line:
http://www-unix.mcs.anl.gov/otc/InteriorPoint/
Public Domain IPM Solvers:
• HOPDM (FORTRAN 77) by Jacek Gondzio:
http://www.maths.ed.ac.uk/~gondzio/software/hopdm.html
• LIPSOL (MATLAB) by Yin Zhang:
http://www.caam.rice.edu/~zhang/lipsol/
• PCx (ANSI C) by Steve Wright:
http://www-fp.mcs.anl.gov/otc/Tools/PCx/
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Your Linear Program
Let m and d denote the month and day of your
birthday. Clearly, 1 ≤ m ≤ 12 and 1 ≤ d ≤ 31.
Define a number: α = 100 · m + d.
Consider an LP
min x1 + x2 + x3 + x4 + x5 − x6
s.t. x1 + 2x2 + x4 ≤ 3
x2 + 2x3 − x6 ≤ 3
x3 − x4 + 2x5 + 3x6 ≥ 2
x1 + 3x3 − x5 + x6 ≤ αx1≥0 x2≥0 x3≥0 x4≥0 x5≥0 x6≥0.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Your Task
Grab one of available interior point solvers.
Install it on your computer.
You may find it easier to install it on a Unix
machine than on a PC running MS Windows.
Prepare MPS data file for your linear program
and solve the problem.
Check if the solution satisfies all the constraints.
Print MPS file and the solution file and show
them to me.
12
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 7:
IPM for Quadratic Programs
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Quadratic Programs
The quadratic function
f(x) = xTQ x
is convex if and only if the matrix Q is positive
definite.
In such case the quadratic programming problem
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
is well defined.
If there exists a feasible solution to it, then there
exists an optimal solution.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Quadratic Programs
Convexity: Property 9:
Let C ∈ Rn be a convex set and Q a square
matrix. Let f(x) = xTQ x be a quadratic function
f : C 7→ R.
(a) f is convex iff Q is positive semidefinite.
(b) f is strictly convex iff Q is positive definite.
Def. A matrix Q ∈ Rn×n is positive definite if
xTQ x > 0 for any x 6= 0.
Example:
Consider quadratic functions f(x) = xTQ x with
the following matrices:
Q1=
[
1 00 2
]
, Q2=
[
1 00 −1
]
, Q3=
[
5 44 3
]
.
Q1 is positive definite (hence f1 is convex).
Q2 and Q3 are indefinite (f2, f3 are not convex).
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Dual Quadratic Program
Consider a QPmin cTx + 1
2xTQ x
s.t. Ax = b,
x ≥ 0,
where c, x ∈ Rn, b ∈ Rm, A ∈ Rm×n, Q ∈ Rn×n.
We associate Lagrange multipliers y ∈ Rm and
s ∈ Rn (s ≥ 0) with the constraints Ax = b and
x ≥ 0, and write the Lagrangian
L(x, y, s) = cTx +1
2xTQ x − yT(Ax−b) − sTx.
Stationarity with respect to x:
∇xL(x, y, s) = c + Qx − ATy − s = 0
is used to determine the Lagrangian dual:
LD(y, s) = minx∈X
L(x, y, s)
= cTx + 12xTQ x − yT(Ax−b)− sTx
= bTy + xT (c+Qx−ATy−s)− 12xTQ x
= bTy − 12xTQ x,
and the dual problem has the form:
max bTy − 12xTQ x
s.t. ATy + s − Qx = c,
x, s ≥ 0,
where y ∈ Rm and x, s ∈ Rn. 4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
QP with IPMs
Consider the convex quadratic programming
problem.
The primal
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
and the dual
max bTy − 12xTQ x
s.t. ATy + s − Qx = c,
x, s ≥ 0.
Apply the usual procedure:
• replace inequalities with log barriers;
• form the Lagrangian;
• write the first order optimality conditions;
• apply Newton method to them.5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
QP with IPMs: Log Barriers
Replace the primal QP
min cTx + 12xTQ x
s.t. Ax = b,
x ≥ 0,
with the primal barrier QP
min cTx + 12xTQ x −
n∑
j=1lnxj
s.t. Ax = b.
Replace the dual QP
max bTy − 12xTQ x
s.t. ATy + s − Qx = c,
y free, s ≥ 0,
with the dual barrier QP
max bTy − 12xTQ x +
n∑
j=1
ln sj
s.t. ATy + s − Qx = c.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
First Order Optimality Conds
Consider the primal barrier QP
min cTx + 12xTQ x − µ
n∑
j=1lnxj
s.t. Ax = b,
where µ ≥ 0 is a barrier parameter.
Write out the Lagrangian
L(x, y, µ) = cTx +1
2xTQ x − yT(Ax−b)−µ
n∑
j=1
ln xj,
and the conditions for a stationary point
∇xL(x, y, µ) = c − ATy − µX−1e + Qx = 0∇yL(x, y, µ) = Ax − b = 0,
where X−1 = diag{x−11 , x−1
2 , · · · , x−1n }.
Let us denote
s = µX−1e, i.e. XSe = µe.
The First Order Optimality Conditions are:
Ax = b,
ATy + s − Qx = c,
XSe = µe.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method for the FOC
The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations
F(x, y, s) = 0,
where F : R2n+m 7→ R2n+m is an applicationdefined as follows:
F(x, y, s) =
Ax − b
ATy + s − Qx − c
XSe − µe
.
Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.
Note that
∇F(x, y, s) =
A 0 0
−Q AT I
S 0 X
.
Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:
A 0 0
−Q AT I
S 0 X
·
∆x
∆y
∆s
=
b − Ax
c − ATy − s + Qx
µe − XSe
.
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior-Point QP Algorithm
Initialize
k = 0
(x0, y0, s0) ∈ F0
µ0 = 1n· (x0)T s0
α0 = 0.9995
Repeat until optimality
k = k + 1
µk = σµk−1, where σ ∈ (0,1)
∆ = Newton direction towards µ-center
Ratio test:
αP := max {α > 0 : x + α∆x ≥ 0},αD := max {α > 0 : s + α∆s ≥ 0}.
Make step:
xk+1 = xk + α0αP∆x,
yk+1 = yk + α0αD∆y,
sk+1 = sk + α0αD∆s.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
From LP to QP
QP problemmin cTx + 1
2xTQx
s.t. Ax = b,
x ≥ 0.
First order conditions (for barrier problem)
Ax = b,
ATy + s−Qx = c,
XSe = µe.
Newton direction
A 0 0
−Q AT I
S 0 X
∆x
∆y
∆s
=
ξp
ξd
ξµ
,
whereξp = b − Ax,
ξd = c − ATy − s+Qx,
ξµ = µe − XSe.
Augmented system[
−Q − Θ−1 AT
A 0
] [
∆x
∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Conclusion:
QP is a natural extension of LP.10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPMs: LP vs QP
Augmented system in LP[
−Θ−1 AT
A 0
] [
∆x
∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Eliminate ∆x from the first equation and get
normal equations
(AΘAT)∆y = g.
Augmented system in QP[
−Q − Θ−1 AT
A 0
] [
∆x
∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Eliminate ∆x from the first equation and get
normal equations
(A(Q + Θ−1)−1AT)∆y = g.
One can use normal equations in LP, but not
in QP. Normal equations in QP may become al-
most completely dense even for sparse matrices
A and Q. Thus, in QP, usually the indefinite
augmented system form is used.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Sparsity Issues in QP
Example
1 1
1 2 1
1 2 1
1 2 1
1 2
−1
=
1
1 1
1 1
1 1
1 1
·
1 1
1 1
1 1
1 1
1
−1
=
1 −1 1 −1 1
1 −1 1 −1
1 −1 1
1 −1
1
·
1
−1 1
1 −1 1
−1 1 −1 1
1 −1 1 −1 1
=
5 −4 3 −2 1
−4 4 −3 2 −1
3 −3 3 −2 1
−2 2 −2 2 −1
1 −1 1 −1 1
.
Conclusion:
the inverse of the sparse matrix may be dense.
IPMs for QP:
Do not explicitly invert the matrix Q + Θ−1
in the matrix A(Q + Θ−1)−1AT .
Use augmented system instead.12
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 8:
Separable QuadraticPrograms
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Separable QPs are Easy
Regarding the computations involved, a quadratic
program with diagonal matrix Q = D:
min cTx + 12xTD x
s.t. Ax = b,
x ≥ 0,
is as easy as a linear program.
Indeed, in this case, the Newton equation system
can be reduced to the following normal equation
system:
(A(D + Θ−1)−1AT)∆y = g.
Since Θ−1 = D + Θ−1 is a diagonal matrix, this
system is not more difficult to solve than a usual
system arising in LP:
(AΘAT)∆y = g.
Conclusion:
If you can formulate the QP as a separable prob-
lem, then it’s usually worth a try.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Separable QP: Example 1
Suppose the symmetric positive definite matrix
Q ∈ Rn×n in the quadratic program
min cTx + 12xTQ x
s.t. Ax = b, (1)
x ≥ 0,
is a product of the following matrices:
Q = FTDF,
where F ∈ Rk×n for some k ≪ n. Introduce new
variables u ∈ Rk such that u = Fx. Then
xTQ x = xTFT DFx = (Fx)T D(Fx) = uTD u.
The problem (1) can be replaced by the following
equivalent separable one:
min cTx + 12uTD u
s.t. Ax = b, (2)
Fx − u = 0,
x ≥ 0.
Although this problem has n + k variables (x, u)
(while (1) had only n variables), for small k, it is
usually much easier to solve than (1).
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 1 (cont’d)
To derive the first order optimality conditions
for (2) we first introduce x = (x, u) ∈ Rn+k c =
(c,0) ∈ Rn+k and b = (b,0) ∈ Rm+k, define
A =
[
A 0F −I
]
and Q =
[
0 00 D
]
and rewrite the problem
min cT x + 12xT Q x
s.t. Ax = b,
x ≥ 0.
We associate dual variables y ∈ Rm and z ∈ Rk
with linear constraints Ax = b and Fx − u = 0,
respectively and write the Lagrangian
L(x,u,y,z,µ)=cT x+1
2xTQx−(y, z)T(Ax−b)−µ
n∑
j=1
lnxj
=cTx+1
2uTDu−yT(Ax−b)−zT(Fx−u)−µ
n∑
j=1
lnxj.
Observe that u is a free variable and there is no
logarithmic barrier introduced for it.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 1 (cont’d)
We write the conditions for a stationary point
∇xL(x, u, y, z, µ) = c − ATy − FT z − µX−1e = 0∇uL(x, u, y, z, µ) = Du + z = 0∇yL(x, u, y, z, µ) = Ax − b = 0∇zL(x, u, y, z, µ) = −Fx + u = 0,
and substitute s = µX−1e to get the first order
optimality conditions:
Ax = b,
Fx − u = 0,
ATy + FT z + s = c,
−Du − z = 0,
XSe = µe.
The Newton equation system for the FOC is:
0 AT FT I
−D −I
A
F −I
S X
∆x
∆u
∆y
∆z
∆s
=
rx
ru
ry
rz
rµ
.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 1 (cont’d)
For nonseparable problem (1), we have to solve[
−Q − Θ−1 AT
A 0
] [
∆x
∆y
]
=
[
rx
ry
]
.
This is a linear system with n+m equations and
n + m unknowns.
For separable problem (2), we have to solve
−Θ−1x AT FT
−D −I
A
F −I
∆x
∆u
∆y
∆z
=
rx
ru
ry
rz
,
where Θx = XS−1 ∈ Rn×n.
This new system has n + m + 2k equations and
n+m+2k unknowns. It is larger but the matrix
involved in it is much sparser.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 1 (cont’d)
This larger system is much easier to solve. In-
deed, the upper-left 2 × 2 block is a diagonal
matrix. It can be eliminated giving the system
of equations[
A 0F −I
] [
Θx 0
0 D−1
] [
AT FT
0 −I
] [
∆y
∆z
]
=
[
ry
rz
]
.
Having done the multiplications on the left-hand-
side, we obtain[
AΘxAT AΘxFT
FΘxAT FΘxFT +D−1
] [
∆y
∆z
]
=
[
ry
rz
]
.
This reduced system has only m + k equations
and m + k unknowns.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Separable QP: Example 2
Suppose the symmetric positive definite matrix
Q ∈ Rn×n in the quadratic program
min cTx + 12xTQ x
s.t. Ax = b, (3)
x ≥ 0,
has the following form
Q = D + ddT ,
where D is a diagonal matrix and d ∈ Rn. Intro-
duce the new variable u ∈ R such that u = dTx.
Then
xTQx=xT(D+ddT )x=xTDx+(dTx)(dTx)=xTDx+u2.
The problem (3) can be replaced by the following
equivalent separable one:
min cTx + 12xTDx + 1
2u2
s.t. Ax = b, (4)
dTx − u = 0,
x ≥ 0.
This problem has n+1 variables (x, u) (while (3)
had only n variables). However, it is much easier
to solve than (3).
8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 2 (cont’d)
For nonseparable problem (3), we have to solve[
−Q − Θ−1 AT
A 0
] [
∆x
∆y
]
=
[
rx
ry
]
.
This is a linear system with n+m equations and
n + m unknowns.
For separable problem (4), we have to solve
−D −Θ−1x AT d
−1 −1A
dT −1
∆x
∆u
∆y
∆z
=
rx
ru
ry
rz
,
where y ∈ Rm and z ∈ R are dual variables
associated with linear constraints Ax = b and
dTx − u = 0, respectively.
This new system has n + m + 2 equations and
n + m + 2 unknowns. It is larger but the matrix
involved in it is much sparser.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Example 2 (cont’d)
This larger system is much easier to solve. In-
deed, the upper-left 2 × 2 block is a diagonal
matrix. It can be eliminated giving the system
of equations[
A 0
dT −1
][
(D+Θ−1x )−1 0
0 1
][
AT d
0 −1
][
∆y
∆z
]
=
[
ry
rz
]
.
Having done the multiplications on the left-hand-
side, we obtain[
AΘxAT AΘxd
dT ΘxAT dT Θxd+1
] [
∆y
∆z
]
=
[
ry
rz
]
,
where Θx = (D+Θ−1x )−1.
This reduced system has only m + 1 equations
and m + 1 unknowns.
10
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 9:
IPMs for Nonlinear Programs
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convex Nonlinear Optimization
Consider the nonlinear optimization problem
min f(x)
s.t. g(x) ≤ 0,
where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm
are convex, twice differentiable.
Assumptions:
f and g are convex
⇒ If there exists a local minimum then it is a
global one.
f and g are twice differentiable
⇒ We can use the second order Taylor
approximations of them.
Some additional (technical) conditions
⇒ We need them to prove that the point which
satisfies the first order optimality conditions is
the optimum. We won’t use them in this course.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Taylor Expansion of f : R 7→ R
Let f : R 7→ R.
If all derivatives of f are continuously differen-
tiable at x0, then
f(x) =∞∑
k=0
f(k)(x0)
k!(x − x0)
k,
where f(k)(x0) is the k-th derivative of f at x0.
The first order approximation of the function:
f(x) = f(x0) + f′
(x0)(x − x0) + r2(x − x0),
where the remainder satisfies:
limx→x0
r2(x − x0)
x − x0= 0.
The second order approximation:
f(x) = f(x0)+f′
(x0)(x−x0)
+1
2f′′
(x0)(x−x0)2+r3(x−x0),
where the remainder satisfies:
limx→x0
r3(x − x0)
(x − x0)2
= 0.
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Derivatives of f : Rn7→ R
Consider a real-valued function f : Rn 7→ R.
The vector
∇f(x) =
∂f∂x1
(x)
∂f∂x2
(x)
...
∂f∂xn
(x)
is called the gradient of f at x.
The matrix
∇2f(x)=
∂2f
∂x21
(x) ∂2f∂x1∂x2
(x) . . .∂2f
∂x1∂xn
(x)
∂2f∂x2∂x1
(x) ∂2f
∂x22
(x) . . .∂2f
∂x2∂xn
(x)
. . . . . . . . . ...
∂2f∂xn∂x1
(x) ∂2f∂xn∂x2
(x) . . . ∂2f∂x2
n
(x)
is called the Hessian of f at x.
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Taylor Expansion of f : Rn7→R
Let f : Rn 7→ R.
If all derivatives of f are continuously differen-
tiable at x0, then
f(x) =∞∑
k=0
f(k)(x0)
k!(x − x0)
k,
where f(k)(x0) is the k-th derivative of f at x0.
The first order approximation of the function:
f(x) = f(x0) + ∇f(x0)T (x − x0) + r2(x − x0),
where the remainder satisfies:
limx→x0
r2(x − x0)
‖x − x0‖= 0.
The second order approximation:
f(x) = f(x0)+∇f(x0)T (x−x0)
+1
2(x−x0)
T∇2f(x0)(x−x0)+r3(x−x0),
where the remainder satisfies:
limx→x0
r3(x − x0)
‖x − x0‖2= 0.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convexity: Reminder
Property 1.
For any collection {Ci | i ∈ I} of convex sets, the
intersection⋂
i∈I Ci is convex.
Property 4.
If C is a convex set and f : C 7→ R is convex
function, the level sets {x ∈ C | f(x) ≤ α} and
{x ∈ C | f(x) < α} are convex for all scalars α.
Lemma 1:
If g : Rn 7→ Rm is a convex function, then the set
{x ∈ Rn | g(x) ≤ 0} is convex.
Proof:
Since every function gi : Rn 7→ R, i = 1,2, ..., m is
convex, from Property 4, we conclude that every
set Xi = {x ∈ Rn | gi(x) ≤ 0} is convex. From
Property 1, we conclude that the intersection
X =⋂m
i=1 Xi = {x ∈ Rn | g(x) ≤ 0} is convex,
which completes the proof.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Diff’ble Convex Functions
Property 8.
Let C ∈ Rn be a convex set and f : C 7→ R be
twice continuously differentiable over C.(a) If ∇2f(x) is positive semidefinite for all x ∈ C,
then f is convex.(b) If ∇2f(x) is positive definite for all x ∈ C,
then f is strictly convex.(c) If f is convex, then ∇2f(x) is positive semi-
definite for all x ∈ C.
Let the second order approximation of the func-
tion be given:
f(x) ≈ f(x0) + cT (x−x0) +1
2(x−x0)
TQ(x−x0),
where c = ∇f(x0) and Q = ∇2f(x0).
From Property 8, it follows that when f is convex
and twice differentiable, then Q exists and is apositive semidefinite matrix.
Conclusion:
If f is convex and twice differentiable, then op-
timization of f(x) can (locally) be replaced with
the minimization of its quadratic model.
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Nonlinear Opt. with IPMs
Nonlinear Optimization via QPs:
Sequential Quadratic Programming (SQP).
Repeat until optimality:
• approximate NLP (locally) with a QP;
• solve (approximately) the QP.
Nonlinear Optimization with IPMs:
works similarly to SQP scheme.
However, the (local) QP approximations are not
solved to optimality. Instead, only one step in
the Newton direction corresponding to a given
QP approximation is made and the new QP ap-
proximation is computed.
Derive an IPM for NLP:
• replace inequalities with log barriers;
• form the Lagrangian;
• write the first order optimality conditions;
• apply Newton method to them.8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
NLP Notation
Consider the nonlinear optimization problem
min f(x) s.t. g(x) ≤ 0,
where x ∈ Rn, and f : Rn 7→ R and g : Rn 7→ Rm
are convex, twice differentiable.
The vector-valued function g : Rn 7→ Rm has aderivative A(x) ∈ Rm×n
A(x) = ∇g(x) =
[
∂gi
∂xj
]
i=1..m, j=1..n
which is called the Jacobian of g.
The Lagrangian associated with the NLP is:
L(x, y) = f(x) + yTg(x),
where y ∈ Rm, y ≥ 0 are Lagrange multipliers(dual variables).
The first derivatives of the Lagrangian:
∇xL(x, y) = ∇f(x) + ∇g(x)Ty
∇yL(x, y) = g(x).
The Hessian of the Lagrangian, Q(x,y)∈Rn×n:
Q(x, y) = ∇2xxL(x, y) = ∇2f(x) +
m∑
i=1
yi∇2gi(x).
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Convexity in NLP
Lemma 2:
If f : Rn 7→ R and g : Rn 7→ Rm are convex,
twice differentiable, then the Hessian of the La-
grangian
Q(x, y) = ∇2f(x) +m∑
i=1
yi∇2gi(x)
is positive semidefinite for any x and any y ≥ 0.
If f is strictly convex, then Q(x, y) is positive
definite for any x and any y ≥ 0.
Proof:
Using Property 8, the convexity of f implies that
∇2f(x) is positive semidefinite for any x. Simi-
larly, the convexity of g implies that for all i =
1,2, ..., m, ∇2gi(x) is positive semidefinite for any
x.
Since yi ≥ 0 for all i = 1,2, ..., m and Q(x, y)
is the sum of positive semidefinite matrices, we
conclude that Q(x, y) is positive semidefinite.
If f is strictly convex, then ∇2f(x) is positive
definite and so is Q(x, y).
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
IPM for NLP
Add slack variables to nonlinear inequalities:
min f(x)
s.t. g(x) + z = 0
z ≥ 0,
where z ∈ Rm. Replace inequality z ≥ 0 with the
logarithmic barrier:
min f(x) − µm∑
i=1ln zi
s.t. g(x) + z = 0.
Write out the Lagrangian
L(x, y, z, µ) = f(x) + yT(g(x) + z) − µm∑
i=1
ln zi,
and the conditions for a stationary point
∇xL(x, y, z, µ) = ∇f(x) + ∇g(x)Ty = 0∇yL(x, y, z, µ) = g(x) + z = 0
∇zL(x, y, z, µ) = y − µZ−1e = 0,
where Z−1 = diag{z−11 , z−1
2 , · · · , z−1m }.
The First Order Optimality Conditions are:
∇f(x) + ∇g(x)Ty = 0,
g(x) + z = 0,
Y Ze = µe.
11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method for the FOC
The first order optimality conditions for the bar-
rier problem form a large system of nonlinear
equationsF(x, y, z) = 0,
where F : Rn+2m 7→ Rn+2m is an application
defined as follows:
F(x, y, z) =
∇f(x) + ∇g(x)Ty
g(x) + z
Y Ze − µe
.
Note that all three terms of it are nonlinear.
(In LP and QP the first two terms were linear.)
Observe that
∇F(x, y, z)=
Q(x, y) A(x)T 0A(x) 0 I
0 Z Y
,
where A(x) is the Jacobian of g
and Q(x, y) is the Hessian of L.
They are defined as follows:
A(x) = ∇g(x) ∈ Rm×n
Q(x, y) = ∇2f(x)+m∑
i=1yi∇
2gi(x) ∈ Rn×n
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method (cont’d)
For a given point (x, y, z) we find the Newton
direction (∆x,∆y,∆z) by solving the system of
linear equations:
Q(x, y) A(x)T 0A(x) 0 I
0 Z Y
∆x
∆y
∆z
=
−∇f(x)−A(x)Ty
−g(x)−z
µe−Y Ze
.
Using the third equation we eliminate
∆z = µY −1e − Ze − ZY −1∆y,
from the second equation and get[
Q(x, y) A(x)T
A(x) −ZY −1
] [
∆x
∆y
]
=
[
−∇f(x)−A(x)Ty
−g(x)−µY −1e
]
.
13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior-Point NLP Algorithm
Initialize
k = 0
(x0, y0, z0) such that y0 > 0 and z0 > 0
µ0 = 1m · (y0)T z0
Repeat until optimality
k = k + 1
µk = σµk−1, where σ ∈ (0,1)
Compute A(x) and Q(x, y)
∆ = Newton direction towards µ-center
Ratio test:
α1 := max {α > 0 : y + α∆y ≥ 0},α2 := max {α > 0 : z + α∆z ≥ 0}.
Choose the step:
(use trust region or line search)
α ≤ min {α1, α2}.
Make step:
xk+1 = xk + α∆x,
yk+1 = yk + α∆y,
zk+1 = zk + α∆z.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
From QP to NLP
Newton direction for QP
−Q AT I
A 0 0S 0 X
∆x
∆y
∆s
=
ξd
ξp
ξµ
.
Augmented system for QP[
−Q−SX−1 AT
A 0
] [
∆x
∆y
]
=
[
ξd − X−1ξµ
ξp
]
.
Newton direction for NLP
Q(x, y) A(x)T 0A(x) 0 I
0 Z Y
∆x
∆y
∆z
=
−∇f(x)−A(x)Ty
−g(x)−z
µe−Y Ze
.
Augmented system for NLP[
Q(x, y) A(x)T
A(x) −ZY −1
] [
∆x
∆y
]
=
[
−∇f(x)−A(x)Ty
−g(x)−µY −1e
]
.
Conclusion:
NLP is a natural extension of QP.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Lin. Algebra in IPM for NLP
Newton direction for NLP
Q(x, y) A(x)T 0A(x) 0 I
0 Z Y
∆x
∆y
∆z
=
−∇f(x)−A(x)Ty
−g(x)−z
µe−Y Ze
.
The corresponding augmented system[
Q(x, y) A(x)T
A(x) −ZY −1
][
∆x
∆y
]
=
[
−∇f(x)−A(x)Ty
−g(x)−µY −1e
]
.
where A(x) ∈ Rm×n is the Jacobian of g
and Q(x, y) ∈ Rn×n is the Hessian of L
A(x) = ∇g(x)
Q(x, y) = ∇2f(x)+m∑
i=1yi∇
2gi(x)
Automatic differentiation is very useful ...
get Q(x, y) and A(x) from Algebraic Modeling
Language.
Output
AML SOLVER
Num. Anal.Package
ModelSolution
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Automatic Differentiation
AD in the Internet:
• ADIFOR (FORTRAN code for AD):
http://www-unix.mcs.anl.gov/autodiff/ADIFOR/
• ADOL-C (C/C++ code for AD):
http://www-unix.mcs.anl.gov/autodiff/
AD Tools/adolc.anl/adolc.html
• AD page at Cornell:
http://www.tc.cornell.edu/~averma/AD/
17
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Interior Point Methods
Conclusions:
• Interior Point Methods provide the unified
framework for convex optimization.
• Interior Point Methods provide polynomial al-
gorithms for LP, QP and NLP.
• The linear algebra in LP, QP and NLP is very
similar.
• Use IPMs to solve very large problems.
Further Extensions:
• Nonconvex optimization.
IPMs in the Internet:
• LP FAQ (Frequently Asked Questions):
http://www-unix.mcs.anl.gov/otc/Guide/faq/
• Interior Point Methods On-Line:
http://www-unix.mcs.anl.gov/otc/InteriorPoint/
• NEOS (Network Enabled Opt. Services):
http://www-neos.mcs.anl.gov/18
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Project: NEOS
I encourage you to do this one-hour project.
19
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Project: Use NEOS
NEOS stands for
Network Enabled Optimization Services:
http://www-neos.mcs.anl.gov/
You can use optimization facilities remotely:
prepare an MPS file with your problem,
submit it to NEOS.
NEOS server will execute your job on one of
available machines (you do not know where) and
will send you the solution by e-mail.
Your Task
Solve your LP problem via NEOS.
Use at least 3 different LP solvers.
Compare the solutions obtained.
20
Interior Point Methods
for Linear, Quadratic
and Nonlinear Programming
Turin 2008
Jacek Gondzio
Lecture 10:
More on Newton Method
1
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method
Let f : Rn 7→ Rn be a twice continuously dif-
ferentiable function such that ∇f(x) ∈ Rn×n is
nonsingular at any x. Newton method finds a
root of a nonlinear equation system
f(x) = 0,
by repeating the following step
xk+1 = xk − (∇f(xk))−1f(xk).
Let us rewrite it in a simplified form
xk+1 = φ(xk),
and observe that at the solution x, φ′
(x) = 0.
Indeed (we check it for f : R 7→ R),
φ′
(x)=(x−f(x)/f′
(x))′
=f(x)f′′
(x)/(f′
(x))2=0
because at the solution x: f(x) = 0.
Near the solution x, Newton method converges
quadratically, i.e., the error of solution reduces
as follows:
‖ek+1‖ ≤ C‖ek‖2,
where C is a constant.
2
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Equations ( ⇒ ) Optimization
Let f : Rn 7→ R be a twice continuously differen-
tiable function.
Finding an (unconstrained) minimum of f (or
more generally, finding a stationary point of f)
is equivalent to solving equation
∇f(x) = 0.
This is a nonlinear system of equations that can
be solved with the Newton method.
Assume ∇2f(x) ∈ Rn×n is nonsingular at any x.
Newton method for optimization repeats the fol-
lowing step
xk+1 = xk − (∇2f(xk))−1∇f(xk).
3
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Another View
Newton Method for Optimization
Let f : Rn 7→ R be a twice continuously differ-
entiable function. Suppose we build a quadratic
model f of f around a given point xk, i.e., we
define ∆x = x − xk and write:
f(x) = f(xk) + ∇f(xk)T∆x +1
2∆xT∇2f(xk)∆x
Now we optimize the model f
instead of optimizing f .
A minimum (or, more generally, a stationary point)
of the quadratic model satisfies:
∇f(x) = ∇f(xk) + ∇2f(xk)∆x = 0,
i.e.
∆x = x − xk = −(∇2f(xk))−1∇f(xk),
which reduces to the usual equation:
xk+1 = xk − (∇2f(xk))−1∇f(xk).
4
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Quadratic Convergence
Let f : Rn 7→ R be a twice continuously differen-
tiable function. Let us apply Newton method to
optimize it:
xk+1 = xk − (∇2f(xk))−1∇f(xk).
Lemma (Quadratic Convergence).
If f is strongly convex with some constant m,
i.e.,
hT∇2f(x)h ≥ m‖h‖22, x, h ∈ Rn,
and ∇2f is Lipschitz continuous with constant
L, i.e.,
‖(∇2f(x)−∇2f(y))h‖2≤L‖x−y‖2‖h‖2, x,y,h∈Rn,
then
‖∇f(xk+1)‖2 ≤L
2m2‖∇f(xk)‖22.
In particular, in the region defined by the inequal-
ityL
2m2‖∇f(xk)‖2 ≤ 1,
Newton method converges quadratically.
5
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Global vs Local Behaviour
Newton method behaves very well near the so-
lution where is displays quadratic convergence.
But it may behave very badly far away from it.
Why?
Newton method uses quadratic approximation.
Such approximation is valid only locally. Thus
one cannot expect that the Newton direction
∆x = −(∇2f(x))−1∇f(x),
is an improvement direction everywhere.
6
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton Method in IPMs
Newton method behaves very well in the case of
Interior Point Methods for Linear Programming.
This is a consequence of a ’weak nonlinearity’
introduced by the logarithmic barrier function.
Newton method applied in IPMs for NLP needs
additional safeguards to ensure global conver-
gence.
There are two possible safeguards:
• use Trust Region,
i.e. believe the quadratic model only in a
neighbourhood of the current point; or
• use Line Search,
i.e. optimize f along Newton direction
∆x = −(∇2f(x))−1∇f(x).
7
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Newton method may diverge
1
2
-e
x
1-e
1
f(x) = e - e1/x
-2
-1
-1
Newton method applied from different starting
points:
iter xk xk xk xk
0 −1.0 0.5 1.5 2.01 −.739 · 101 .65803014 .60987204 −.595
2 −.123 · 103 .83352370 .78563159 −.541 · 101
3 −.263 · 105 .95930672 .93302399 −.718 · 102
4 −.119 · 1010 .99752767 .99332404 −.913 · 104
5 −.244 · 1019 .99999083 .99993320 −.143 · 109
6 −.102 · 1038 1.0000000 .99999999 −.352 · 1017
7 −.180 · 1075 1.0000000 1.0000000 −.213 · 1034
8 −.555 · 10148 1.0000000 1.0000000 −.780 · 1067
The algorithm converges from x0=.5 or x0=1.5
but diverges from x0=−1.0 and x0=2.0.8
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
− logx Barrier Function
Consider the primal barrier linear program
min cTx − µn∑
j=1ln xj
s.t. Ax = b,
where µ ≥ 0 is a barrier parameter.
Write out the Lagrangian
L(x, y, µ) = cTx − yT(Ax − b) − µn
∑
j=1
lnxj,
and the conditions for a stationary point
∇xL(x, y, µ) = c − ATy − µX−1e = 0∇yL(x, y, µ) = Ax − b = 0,
where X−1 = diag{x−11 , x−1
2 , · · · , x−1n }.
Let us denote
s = µX−1e, i.e. XSe = µe.
The First Order Optimality Conditions are:
Ax = b,
ATy + s = c,XSe = µe.
9
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
- logx bf: Newton Method
The first order optimality conditions for the bar-rier problem form a large system of nonlinearequations
F(x, y, s) = 0,
where F : R2n+m 7→ R2n+m is an applicationdefined as follows:
F(x, y, s) =
Ax − b
ATy + s − cXSe − µe
.
Actually, the first two terms of it are linear; onlythe last one, corresponding to the complemen-tarity condition, is nonlinear.
Note that
∇F(x, y, s) =
A 0 0
0 AT IS 0 X
.
Thus, for a given point (x, y, s) we find the New-ton direction (∆x,∆y,∆s) by solving the systemof linear equations:
A 0 0
0 AT IS 0 X
·
∆x∆y∆s
=
b − Ax
c − ATy − sµe − XSe
.
10
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
1/xα, α > 0 Barrier Function
Consider the primal barrier linear program
min cTx + µn∑
j=1
1xα
j
s.t. Ax = b,
where µ ≥ 0 is a barrier parameter and α > 0.
Write out the Lagrangian
L(x, y, µ) = cTx − yT (Ax − b) + µn
∑
j=1
1
xαj
,
and the conditions for a stationary point
∇xL(x, y, µ) = c − ATy − µαX−α−1e = 0∇yL(x, y, µ) = Ax − b = 0,
where X−α−1 = diag{x−α−11 , x−α−1
2 , · · · , x−α−1n }.
Let us denote
s = µαX−α−1e, i.e. Xα+1Se = µαe.
The First Order Optimality Conditions are:
Ax = b,
ATy + s = c,
Xα+1Se = µαe.11
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
1/xα, α>0 bf: Newton Method
The first order optimality conditions for the bar-
rier problem are
F(x, y, s) = 0,
where F : R2n+m 7→ R2n+m is an application
defined as follows:
F(x, y, s) =
Ax − b
ATy + s − c
Xα+1Se − µαe
.
As before, only the last term, corresponding to
the complementarity condition, is nonlinear.
Note that
∇F(x, y, s) =
A 0 0
0 AT I
(α + 1)XαS 0 Xα+1
.
Thus, for a given point (x, y, s) we find the New-
ton direction (∆x,∆y,∆s) by solving the system
of linear equations:
A 0 0
0 AT I
(α+1)XαS 0 Xα+1
·
∆x∆y∆s
=
b − Ax
c − ATy − s
µαe−Xα+1Se
.
12
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
e1/x Barrier Function
Consider the primal barrier linear program
min cTx + µn∑
j=1
e1/xj
s.t. Ax = b,
where µ ≥ 0 is a barrier parameter.
Write out the Lagrangian
L(x, y, µ) = cTx − yT (Ax − b) + µn
∑
j=1
e1/xj ,
and the conditions for a stationary point
∇xL(x, y, µ) = c−ATy−µX−2exp(X−1)e = 0∇yL(x, y, µ) = Ax − b = 0,
where exp(X−1) = diag{e1/x1, e1/x2, · · · , e1/xn}.
Let us denote
s=µX−2exp(X−1)e, i.e. X2exp(−X−1)Se=µe.
The First Order Optimality Conditions are:
Ax = b,
ATy + s = c,
X2exp(−X−1)Se = µe.13
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
e1/x bf: Newton Method
The first order optimality conditions are
F(x, y, s) = 0,
where F :R2n+m 7→R2n+m is defined as follows:
F(x, y, s) =
Ax − b
ATy + s − c
X2exp(−X−1)Se − µe
.
As before, only the last term, corresponding to
the complementarity condition, is nonlinear.
Note that
∇F(x,y,s)=
A 0 0
0 AT I
(2X+I)exp(−X−1) 0 X2exp(−X−1)
.
Newton direction (∆x,∆y,∆s) solves the system
of linear equations:
A 0 0
0 AT I
(2X+I)exp(−X−1)S 0 X2exp(−X−1)
·
∆x∆y∆s
=
b − Ax
c − ATy − s
µe−X2exp(−X−1)Se
.
14
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Why Log Barrier is the Best?
The First Order Optimality Conditions:
− log x : XSe = µe,
1/xα : Xα+1Se = µαe,
e1/x : X2exp(−X−1)Se = µe.
Log Barrier ensures
the symmetry between the primal and the dual.
Newton Equation System:
−logx : ∇F3= [S,0, X],
1/xα : ∇F3= [(α + 1)XαS,0, Xα+1]
e1/x : ∇F3= [(2X+I)exp(−X−1)S,0, X2exp(−X−1)]
Log Barrier produces
’the weakest nonlinearity’.
15
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Self-concordant Functions
There is a nice property of the function that is
responsible for a good behaviour of the Newton
method.
Def
Let C ∈ Rn be an open nonempty convex set.
Let f : C 7→ R be a three times continuously
differentiable convex function.
A function f is called self-concordant if there
exists a constant p > 0 such that
|∇3f(x)[h, h, h]| ≤ 2p−1/2(∇2f(x)[h, h])3/2,
∀x ∈ C, ∀h : x + h ∈ C.
(We then say that f is p-self-concordant).
Note that a self-concordant function is always
well approximated by the quadratic model be-
cause the error of such an approximation can be
bounded by the 3/2 power of ∇2f(x)[h, h].
16
IPMs for LP, QP, NLP, J. Gondzio, Turin 2008
Self-concordant Barriers
Lemma
The barrier function − logx is self-concordant on
R+.
Proof
Consider f(x) = − log x.
We compute
f′
(x) = −x−1, f′′
(x) = x−2 and f′′′
(x) = −2x−3
and check that the self-concordance condition is
satisfied for p = 1.
Lemma
The barrier function 1/xα, with α ∈ (0,∞) is not
self-concordant on R+.
Lemma
The barrier function e1/x is not self-concordant
on R+.
Use self-concordant barriers inoptimization
17