Nonsmooth Optimization
description
Transcript of Nonsmooth Optimization
1
Preliminaries
• Rn, n-dimensional real Euclidean space and x, y ∈ Rn
• Usual inner product (x, y) = xTy = [n∑i=1
xiyi]
• Euclidean norm ‖x‖ =√
(x, x) = (xTx)12
• f : O → R is smooth (continuously differentiable), if thegradient ∇f : O → R is defined and continuous on an open
set O ⊆ Rn: ∇f(x) =
(∂f(x)
∂x1,∂f(x)
∂x2, . . . ,
∂f(x)
∂xn
)T
2
Smooth Functions - Directional Derivative
• Directional derivatives f ′(x;u), f ′(x;−u) of f at x ∈ O,in the direction of u ∈ Rn:
f ′(x;u) := limα→+0
f(x+ αu)− f(x)
α= (∇f(x), u),
• f ′(x; e1), f ′(x; e2), . . . , f ′(x; en), ei(i = 1,2, . . . , n) unit vectors
• (∇f(x), e1) = fx1, (∇f(x), e2) = fx2 and (∇f(x), en) = fxn.
• Note that f ′(x;u) = −f ′(x;−u).
3
Smooth Functions - 1st order approximation
• A first-order approximation of f near x ∈ Oby means of the Taylor series with remainder term:
f(x+ δ) = f(x) + (∇f(x), δ) + ox(δ) (x+ δ ∈ O),
• limα→0
ox(αδ)
α= 0 where δ ∈ Rn is small enough.
• a smooth function can be locally replaced by a “simple” linear
approximation of it
4
Smooth Functions - Optimality Conditions
First-order necessary conditions for an extremum:
• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary
that ∇f(x∗) = 0n,
• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary
that ∇f(x∗) = 0n.
5
Smooth Functions - Descent/Ascent Directions
Directions of steepest descent and ascent if x is not a stationarypoint,
• the unit steepest descent direction ud of the function f at a
point x: ud(x) = −∇f(x)
‖∇f(x)‖,
• the unit steepest ascent direction ua of the function f at a
point x: ua(x) =∇f(x)
‖∇f(x)‖.
• One steepest descent direction, only one steepest ascent di-rection and u0(x) = −u1(x)
6
Smooth Functions - Chain Rule
• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.
• If f ∈ C1(O), g ∈ C1(O) and f(x) = g(h(x)) then, ∇Tf(x) =
∇Tg(h(x))∇h(x)
• ∇h(x) =
[∂hj(x)
∂xi
]i,j=1,2,...,n
is an n× n matrix.
7
Nonsmooth Optimization
• Deals with nondifferentiable functions
• The problem is to find a proper replacement for the concept
of gradient
• Different research groups work on nonsmooth function classes;
hence there are different theories to handle the different non-
smooth problems
• Tools replacing the gradient
8
Keywords of Nonsmooth Optimization
• Convex Functions, Lipschitz Continuous Functions
• Generalized directional derivatives, Generalized Derivatives
• Subgradient method, Bundle method, Discrete Gradient Al-
gorithm
• Asplund Spaces
9
Convex Functions
• O ⊆ Rn a nonempty convex set
if αx+ (1− α)y ∈ O for all x, y ∈ O, α ∈ [0,1]
• f : O → R, R := [−∞,∞] s.t.
f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)
for any x, y ∈ O, λ ∈ [0,1].
10
Convex Functions
• Every local minimum is a global minimum
• ξ a subgradient of f at a nondifferentiable point x ∈ domfif it satisfies the subgradient inequality, i.e.,
f(y) ≥ f(x) + (ξ, y − x).
• Set of subgradients of is called subdifferential, ∂f(x)
∂f(x) := {ξ ∈ Rn | f(y) ≥ f(x) + (ξ, y − x) ∀y ∈ Rn}.
11
Convex Functions
• The subgradients at a point can be characterized by direc-
tional derivative: f ′(x;u) = supξ∈∂f(x)
(ξ, u).
• x in the interior of domf , subdifferential ∂f(x) is compact
then the directional derivative is finite
• Subdifferential in relation with the directional derivative
∂f(x) = {ξ ∈ Rn | f ′(x;u) ≥ (ξ, u) ∀u ∈ Rn}.
12
Lipschitz Continuous Functions
• f : O → R is Lipschitz continuous for some constant K
if for all y, z in an open set O: |f(y)− f(z)| ≤ K‖y − z‖
• Differentiable almost everywhere
• Clarke subdifferential ∂Cf(x) of Lipschitz continuous f at x
∂Cf(x) = co{ξ ∈ Rn | ξ = limk→∞
∇f(xk), xk → x, xk ∈ D}D is the set where the function is differentiable.
13
Lipschitz Continuous Functions
• Mean Value Theorem for Clarke subdifferentials ξ
f(b)− f(a) = (ξ, b− a)
• Nonsmooth chain rule with respect to Clarke subdifferential
∂C(g ◦ F )(x) ⊆ co
{m∑i=1
ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂Cg(F (x))
}µi ∈ ∂Cfi(x) (i = 1,2, . . . ,m)
• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,
g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous
14
Regular Functions
• Locally Lipschitz functions have directional derivative
f ′C(x;u) = f ′(x;u)
• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally
Lipschitz for every u ∈ Rn the following limit exists:
limξ∈∂f(x+αu)
v→uα→+0
(ξ, u)
15
Max- and Min-type Functions
• f(x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1,2, . . . ,m)
• ∂Cf(x) ⊆ co
⋃i∈J(x)
∂Cfi(x)
,where J(x) := {i = 1,2, . . . ,m | f(x) = fi(x)}
• Ex: f(x) = max {f1(x), f2(x)}
16
Quasidifferentiable Functions
• f : Rn → R is quasidifferentiable
if f ′(x;u) exist finitely ∀x in the direction u and
there exists [∂f(x), ∂̄f(x)]
• f ′(x;u) = maxξ∈∂f(x)
(ξ, u) + minφ∈∂̄f(x)
(φ, u)
• [∂f(x), ∂̄f(x)] is the quasidifferential, ∂f(x) subdifferential,
∂f(x) superdifferential
17
Directional Derivatives
f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn
• Dini Directional Derivative
• Hadamard Directional Derivative
• Clarke Directional Derivative
• Michel-Penot Directional Derivative
18
Dini Directional Derivative
• upper Dini directionally differentiable
f ′D(x;u) := lim supα→+0
f(x+αu)−f(x)α
• lower Dini directionally differentiable
f ′D(x;−u) := lim infα→+0
f(x+αu)−f(x)α
• Dini subdifferentiable f ′D(x;u) = f ′D(x;−u)
19
Hadamard Directional Derivative
• upper Hadamard directionally differentiable
f ′H(x;u) := limα→+0
supv→u
f(x+αv)−f(x)α
• lower Hadamard directionally differentiable
f ′H(x;−u) := limα→+0
infv→u
f(x+αv)−f(x)α
• Hadamard Subdifferentiable f ′H(x;u) = f ′H(x;−u)
20
Clarke Directional Derivative
• upper Clarke directionally differentiable
f ′C(x;u) := limy→x sup
α→+0
f(x+αu)−f(y)α
• lower Clarke directionally differentiable
f ′C(x;−u) := limy→x inf
α→+0
f(x+αu)−f(y)α
• Clarke Subdifferentiable f ′C(x;u) = f ′C(x;−u)
21
Michel-Penot Directional Derivative
• upper Michel-Penot directionally differentiable
f ′MP (x;u) := supv∈Rn
{lim supα→0
1α[f(x+ α(u+ v))− f(x+ αv)]}
• lower Michel-Penot directionally differentiable
f ′MP (x;−u) := infv∈Rn
{lim infα→0
1α[f(x+ α(u+ v))− f(x+ αv)]}
• Michel-Penot Subdifferentiable f ′MP (x;u) = f ′MP (x;−u)
22
Subdifferentials and Optimality Conditions
• f ′(x;u) = maxξ∈∂f(x)
(ξ, u) ∀u ∈ Rn
• For a point x∗ to be a minimizer,
it is necessary that 0n ∈ ∂f(x)
• A point x∗ satisfying 0n ∈ ∂f(x) is called stationary point
23
Nonsmooth Optimization Methods
• Subgradient Algorithm (and ε-Subgradient Methods)
• Bundle Methods
• Discrete Gradients
24
Descent Methods
• min f(x) subject to x ∈ Rn
• Objective is to find dk f(xk + dk) < f(xk),
• min f(xk + d)− f(xk) subject to d ∈ Rn.
• f(x) twice continuously differentiable, expanding f(xk + d)
f(xk + d)− f(xk) = f ′(xk, d) + ‖d‖ε(d)
ε(d)→ 0 as ‖d‖ → 0
25
Descent Methods
• We know f ′(xk, d) = ∇f(xk)Td
• mind∈Rn
∇f(xk)Td
subject to d ≤ 1.
• Search direction in descent is obtained
− ∇f(xk)‖∇f(xk)‖
• To find xk+1, a line search performed along dkto obtain t from which next point xk + tdk is computed
26
Subgradient Algorithm
• Developed for minimizing convex functions
• min f(x) subject to x ∈ Rn
• x0 given, generates a sequence {xk}∞k=0 according toxk+1 = xk − αkvk, vk ∈ ∂f(xk)
• Simple generalization of a descent method with line search
• Opposite direction of subgradient is not descentline search cannot be used
27
Subgradient Algorithm
• Does not converge to a stationary point
• Special rules for computation of a step size
• Theorem by Shor N.Z.:
S∗ set of minimum points of f , {xk} using step αk := α‖vk‖
for any ε and any x∗ ∈ S∗, one can find a k = k̄
f(x̄) = f(xk̄) and ‖x̄− x∗‖ < α(1+ε)2
28
Bundle Method
• At current iterate xk, we have trial pointsyj ∈ Rn (j ∈ Jk ⊂ {1,2, . . . , k})
• Idea: underestimate f by using a piecewise-linear functions
• Subdifferential of f at x:∂f(x) = {vj ∈ Rn | (v, z − x) ≤ f(z)− f(x) ∀z ∈ Rn}
• f̂k(x) = maxj∈Jk
{f(yj) + (vj, x− yj)}
• f̂k(x) ≤ f(x) ∀x ∈ Rn and f̂k(yj) = f(yj) j ∈ Jk
29
Bundle Method
• Serious Step: xk+1 := yk+1 := xk + tdk, t > 0
in case a sufficient decrease achieved at xk+1,
• Null Step: xk+1 := xk, in case no sufficient decrease achieved,
gradient information is enriched by new subgradient
vk+1 ∈ ∂f(yk+1) in the bundle.
30
Bundle Method
• Standart concepts: serious step and null step
• The convergence problem is avoided by making sure thatthey are descent methods.
• Descent direction is found by solving a QP involving thecutting plane approximation of the function over a bunddleof subgradients.
• Utilize the information from the previous iterations by storingthe subgradient information into a bundle.
31
Asplund Spaces
• Nonsmooth referred to functions, spaces can also be referred
• Banach spaces: complete normed vector spaces
• Frechet derivative, Gateaux derivative
• f is Frechet differentiable on an open set U ⊂ V ,if its Gateaux derivative linear, bounded at each point of Uand the Gateaux derivative is a continuous map U → L(V,W ).
• Asplund Spaces: a Banach space, every convex continuousfunction is generically Frechet differentiable
32
Referanslar
Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,Wiley-Interscience, New York.
Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: ItsMain Tools, Cybernetics and Systems Analysis, 38(4), 2002.
Jongen, H. Th., Pallaschke, D., 1988. On linearization andcontinuous selections of functions, Optimization 19(3), 343-353.
Rockafellar, R.T., 1972. Convex Analysis, Princeton UniversityPress, New Jersey.
Schittkowski K., 1992. Solving nonlinear programming problemswith very many constraints, Optimization, 25, 179-196.
33
Weber, G.-W., 1993. Minimization of a max-type function:
Characterization of structural stability, in: Parametric Optimiza-
tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and
B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,
Frankfurt a.M., Bern, New York, pp. 519538.