Nonsmooth Optimization

Preliminaries

• Rn, n-dimensional real Euclidean space and x, y ∈ Rn

• Usual inner product (x, y) = xTy = [n∑i=1

xiyi]

• Euclidean norm ‖x‖ =√

(x, x) = (xTx)12

• f : O → R is smooth (continuously differentiable), if thegradient ∇f : O → R is defined and continuous on an open

set O ⊆ Rn: ∇f(x) =

(∂f(x)

∂x1,∂f(x)

∂x2, . . . ,

∂f(x)

∂xn

)T

2

Smooth Functions - Directional Derivative

• Directional derivatives f ′(x;u), f ′(x;−u) of f at x ∈ O,in the direction of u ∈ Rn:

f ′(x;u) := limα→+0

f(x+ αu)− f(x)

α= (∇f(x), u),

• f ′(x; e1), f ′(x; e2), . . . , f ′(x; en), ei(i = 1,2, . . . , n) unit vectors

• (∇f(x), e1) = fx1, (∇f(x), e2) = fx2 and (∇f(x), en) = fxn.

• Note that f ′(x;u) = −f ′(x;−u).

3

Smooth Functions - 1st order approximation

• A first-order approximation of f near x ∈ Oby means of the Taylor series with remainder term:

f(x+ δ) = f(x) + (∇f(x), δ) + ox(δ) (x+ δ ∈ O),

• limα→0

ox(αδ)

α= 0 where δ ∈ Rn is small enough.

• a smooth function can be locally replaced by a “simple” linear

approximation of it

4

Smooth Functions - Optimality Conditions

First-order necessary conditions for an extremum:

• For x∗ ∈ O to be a local minimizer of f on Rn, it is necessary

that ∇f(x∗) = 0n,

• For x∗ ∈ O to be a local maximizer of f on Rn, it is necessary

that ∇f(x∗) = 0n.

5

Smooth Functions - Descent/Ascent Directions

Directions of steepest descent and ascent if x is not a stationarypoint,

• the unit steepest descent direction ud of the function f at a

point x: ud(x) = −∇f(x)

‖∇f(x)‖,

• the unit steepest ascent direction ua of the function f at a

point x: ua(x) =∇f(x)

‖∇f(x)‖.

• One steepest descent direction, only one steepest ascent di-rection and u0(x) = −u1(x)

6

Smooth Functions - Chain Rule

• Chain rule: Let f : Rn → R, g : Rn → R, h : Rn → Rn.

• If f ∈ C1(O), g ∈ C1(O) and f(x) = g(h(x)) then, ∇Tf(x) =

∇Tg(h(x))∇h(x)

• ∇h(x) =

[∂hj(x)

∂xi

]i,j=1,2,...,n

is an n× n matrix.

7

Nonsmooth Optimization

• Deals with nondifferentiable functions

• The problem is to find a proper replacement for the concept

of gradient

• Different research groups work on nonsmooth function classes;

hence there are different theories to handle the different non-

smooth problems

• Tools replacing the gradient

8

Keywords of Nonsmooth Optimization

• Convex Functions, Lipschitz Continuous Functions

• Generalized directional derivatives, Generalized Derivatives

• Subgradient method, Bundle method, Discrete Gradient Al-

gorithm

• Asplund Spaces

9

Convex Functions

• O ⊆ Rn a nonempty convex set

if αx+ (1− α)y ∈ O for all x, y ∈ O, α ∈ [0,1]

• f : O → R, R := [−∞,∞] s.t.

f(λx+ (1− λ)y) ≤ λf(x) + (1− λ)f(y)

for any x, y ∈ O, λ ∈ [0,1].

10

Convex Functions

• Every local minimum is a global minimum

• ξ a subgradient of f at a nondifferentiable point x ∈ domfif it satisfies the subgradient inequality, i.e.,

f(y) ≥ f(x) + (ξ, y − x).

• Set of subgradients of is called subdifferential, ∂f(x)

∂f(x) := {ξ ∈ Rn | f(y) ≥ f(x) + (ξ, y − x) ∀y ∈ Rn}.

11

Convex Functions

• The subgradients at a point can be characterized by direc-

tional derivative: f ′(x;u) = supξ∈∂f(x)

(ξ, u).

• x in the interior of domf , subdifferential ∂f(x) is compact

then the directional derivative is finite

• Subdifferential in relation with the directional derivative

∂f(x) = {ξ ∈ Rn | f ′(x;u) ≥ (ξ, u) ∀u ∈ Rn}.

12

Lipschitz Continuous Functions

• f : O → R is Lipschitz continuous for some constant K

if for all y, z in an open set O: |f(y)− f(z)| ≤ K‖y − z‖

• Differentiable almost everywhere

• Clarke subdifferential ∂Cf(x) of Lipschitz continuous f at x

∂Cf(x) = co{ξ ∈ Rn | ξ = limk→∞

∇f(xk), xk → x, xk ∈ D}D is the set where the function is differentiable.

13

Lipschitz Continuous Functions

• Mean Value Theorem for Clarke subdifferentials ξ

f(b)− f(a) = (ξ, b− a)

• Nonsmooth chain rule with respect to Clarke subdifferential

∂C(g ◦ F )(x) ⊆ co

{m∑i=1

ξiµi | ξ = (ξ1, ξ2, . . . , ξm) ∈ ∂Cg(F (x))

}µi ∈ ∂Cfi(x) (i = 1,2, . . . ,m)

• F (·) = (f1(·), f2(·), . . . , fm(·)) a vector valued function,

g : Rm → R, g ◦ F : Rn → R are Lipschitz continuous

14

Regular Functions

• Locally Lipschitz functions have directional derivative

f ′C(x;u) = f ′(x;u)

• Ex: Semismooth functions: f : Rn → R at x ∈ Rn is locally

Lipschitz for every u ∈ Rn the following limit exists:

limξ∈∂f(x+αu)

v→uα→+0

(ξ, u)

15

Max- and Min-type Functions

• f(x) = max {f1(x), f2(x), . . . , fm(x)}, fi : Rn → R (i = 1,2, . . . ,m)

• ∂Cf(x) ⊆ co

⋃i∈J(x)

∂Cfi(x)

,where J(x) := {i = 1,2, . . . ,m | f(x) = fi(x)}

• Ex: f(x) = max {f1(x), f2(x)}

16

Quasidifferentiable Functions

• f : Rn → R is quasidifferentiable

if f ′(x;u) exist finitely ∀x in the direction u and

there exists [∂f(x), ∂̄f(x)]

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) + minφ∈∂̄f(x)

(φ, u)

• [∂f(x), ∂̄f(x)] is the quasidifferential, ∂f(x) subdifferential,

∂f(x) superdifferential

17

Directional Derivatives

f : O → R, O ⊂ Rn, x ∈ O in the direction u ∈ Rn

• Dini Directional Derivative

• Hadamard Directional Derivative

• Clarke Directional Derivative

• Michel-Penot Directional Derivative

18

Dini Directional Derivative

• upper Dini directionally differentiable

f ′D(x;u) := lim supα→+0

f(x+αu)−f(x)α

• lower Dini directionally differentiable

f ′D(x;−u) := lim infα→+0

f(x+αu)−f(x)α

• Dini subdifferentiable f ′D(x;u) = f ′D(x;−u)

19

Hadamard Directional Derivative

• upper Hadamard directionally differentiable

f ′H(x;u) := limα→+0

supv→u

f(x+αv)−f(x)α

• lower Hadamard directionally differentiable

f ′H(x;−u) := limα→+0

infv→u

f(x+αv)−f(x)α

• Hadamard Subdifferentiable f ′H(x;u) = f ′H(x;−u)

20

Clarke Directional Derivative

• upper Clarke directionally differentiable

f ′C(x;u) := limy→x sup

α→+0

f(x+αu)−f(y)α

• lower Clarke directionally differentiable

f ′C(x;−u) := limy→x inf

α→+0

f(x+αu)−f(y)α

• Clarke Subdifferentiable f ′C(x;u) = f ′C(x;−u)

21

Michel-Penot Directional Derivative

• upper Michel-Penot directionally differentiable

f ′MP (x;u) := supv∈Rn

{lim supα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• lower Michel-Penot directionally differentiable

f ′MP (x;−u) := infv∈Rn

{lim infα→0

1α[f(x+ α(u+ v))− f(x+ αv)]}

• Michel-Penot Subdifferentiable f ′MP (x;u) = f ′MP (x;−u)

22

Subdifferentials and Optimality Conditions

• f ′(x;u) = maxξ∈∂f(x)

(ξ, u) ∀u ∈ Rn

• For a point x∗ to be a minimizer,

it is necessary that 0n ∈ ∂f(x)

• A point x∗ satisfying 0n ∈ ∂f(x) is called stationary point

23

Nonsmooth Optimization Methods

• Subgradient Algorithm (and ε-Subgradient Methods)

• Bundle Methods

• Discrete Gradients

24

Descent Methods

• min f(x) subject to x ∈ Rn

• Objective is to find dk f(xk + dk) < f(xk),

• min f(xk + d)− f(xk) subject to d ∈ Rn.

• f(x) twice continuously differentiable, expanding f(xk + d)

f(xk + d)− f(xk) = f ′(xk, d) + ‖d‖ε(d)

ε(d)→ 0 as ‖d‖ → 0

25

Descent Methods

• We know f ′(xk, d) = ∇f(xk)Td

• mind∈Rn

∇f(xk)Td

subject to d ≤ 1.

• Search direction in descent is obtained

− ∇f(xk)‖∇f(xk)‖

• To find xk+1, a line search performed along dkto obtain t from which next point xk + tdk is computed

26

Subgradient Algorithm

• Developed for minimizing convex functions

• min f(x) subject to x ∈ Rn

• x0 given, generates a sequence {xk}∞k=0 according toxk+1 = xk − αkvk, vk ∈ ∂f(xk)

• Simple generalization of a descent method with line search

• Opposite direction of subgradient is not descentline search cannot be used

27

Subgradient Algorithm

• Does not converge to a stationary point

• Special rules for computation of a step size

• Theorem by Shor N.Z.:

S∗ set of minimum points of f , {xk} using step αk := α‖vk‖

for any ε and any x∗ ∈ S∗, one can find a k = k̄

f(x̄) = f(xk̄) and ‖x̄− x∗‖ < α(1+ε)2

28

Bundle Method

• At current iterate xk, we have trial pointsyj ∈ Rn (j ∈ Jk ⊂ {1,2, . . . , k})

• Idea: underestimate f by using a piecewise-linear functions

• Subdifferential of f at x:∂f(x) = {vj ∈ Rn | (v, z − x) ≤ f(z)− f(x) ∀z ∈ Rn}

• f̂k(x) = maxj∈Jk

{f(yj) + (vj, x− yj)}

• f̂k(x) ≤ f(x) ∀x ∈ Rn and f̂k(yj) = f(yj) j ∈ Jk

29

Bundle Method

• Serious Step: xk+1 := yk+1 := xk + tdk, t > 0

in case a sufficient decrease achieved at xk+1,

• Null Step: xk+1 := xk, in case no sufficient decrease achieved,

gradient information is enriched by new subgradient

vk+1 ∈ ∂f(yk+1) in the bundle.

30

Bundle Method

• Standart concepts: serious step and null step

• The convergence problem is avoided by making sure thatthey are descent methods.

• Descent direction is found by solving a QP involving thecutting plane approximation of the function over a bunddleof subgradients.

• Utilize the information from the previous iterations by storingthe subgradient information into a bundle.

31

Asplund Spaces

• Nonsmooth referred to functions, spaces can also be referred

• Banach spaces: complete normed vector spaces

• Frechet derivative, Gateaux derivative

• f is Frechet differentiable on an open set U ⊂ V ,if its Gateaux derivative linear, bounded at each point of Uand the Gateaux derivative is a continuous map U → L(V,W ).

• Asplund Spaces: a Banach space, every convex continuousfunction is generically Frechet differentiable

32

Referanslar

Clarke, F.H., 1983. Optimization and Nonsmooth Analysis,Wiley-Interscience, New York.

Demyanov, V.F., 2002. The Rise of Nonsmooth Analysis: ItsMain Tools, Cybernetics and Systems Analysis, 38(4), 2002.

Jongen, H. Th., Pallaschke, D., 1988. On linearization andcontinuous selections of functions, Optimization 19(3), 343-353.

Rockafellar, R.T., 1972. Convex Analysis, Princeton UniversityPress, New Jersey.

Schittkowski K., 1992. Solving nonlinear programming problemswith very many constraints, Optimization, 25, 179-196.

33

Weber, G.-W., 1993. Minimization of a max-type function:

Characterization of structural stability, in: Parametric Optimiza-

tion and Related Topics III, J. Guddat, J., H. Th. Jongen, and

B. Kummer, and F. Nozicka, eds., Peter Lang publishing house,

Frankfurt a.M., Bern, New York, pp. 519538.

Nonsmooth Optimization

Education

Transcript of Nonsmooth Optimization