EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4....
Transcript of EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4....
![Page 1: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/1.jpg)
EE 546, Univ of Washington, Spring 2016
4. Subgradients
• definition
• subgradient calculus
• optimality conditions via subgradients
• directional derivative
4–1
![Page 2: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/2.jpg)
Basic inequality
recall basic inequality for convex differentiable f :
f(y) ≥ f(x) +∇f(x)T (y − x)
• first-order approximation of f at x is global lower bound
• ∇f(x) defines non-vertical supporting hyperplane to epi f at (x, f(x))
[
∇f(x)−1
]T ([
yt
]
−[
xf(x)
])
≤ 0 ∀(y, t) ∈ epi f
(epi f denotes the epigraph of f). what if f is not differentiable?
4–2
![Page 3: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/3.jpg)
Subgradient of a function
definition: g is a subgradient of a convex function f at x ∈ dom f if
f(y) ≥ f(x) + gT (y − x) ∀y ∈ dom f
x1 x2
f(x1) + gT1 (x − x1)
f(x2) + gT2 (x − x2)
f(x2) + gT3 (x − x2)
f(x)
g2, g3 are subgradients at x2; g1 is a subgradient at x1
4–3
![Page 4: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/4.jpg)
properties
• f(x) + gT (y − x) is a global lower bound on f
• g defines non-vertical supporting hyperplane to epi f at (x, f(x))
[
g−1
]T ([
yt
]
−[
xf(x)
])
≤ 0 ∀(y, t) ∈ epi f
• if f is convex and differentiable, then ∇f(x) is a subgradient of f at x
applications
• algorithms for nondifferentiable convex optimization
• optimality conditions, duality for nondifferentiable problems
4–4
![Page 5: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/5.jpg)
Example
f(x) = max{f1(x), f2(x)}f1, f2 convex and differentiable; x ∈ R
x0
f1(x)
f2(x)
• subgradients at x0 form line segment [∇f1(x0),∇f2(x0)]
• if f1(x) > f2(x), subgradient of f at x is ∇f1(x)
• if f1(x) < f2(x), subgradient of f at x is ∇f2(x)
4–5
![Page 6: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/6.jpg)
Subdifferential
subdifferential of f at x ∈ dom f is the set of all subgradients of f at x
notation: ∂f(x)
properties
• ∂f(x) is a closed convex set (possibly empty)
proof: ∂f(x) is an intersection of halfspaces
∂f(x) ={
g | f(x) + gT (y − x) ≤ f(y) ∀y ∈ dom f}
• if x ∈ int dom f then ∂f(x) is nonempty and bounded
4–6
![Page 7: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/7.jpg)
Examples
absolute value f(x) = |x|
f(x) = |x| ∂f(x)
x
x
1
−1
Euclidean norm f(x) = ‖x‖2
∂f(x) =1
‖x‖2x if x 6= 0, ∂f(x) = {g | ‖g‖2 ≤ 1} if x = 0
4–7
![Page 8: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/8.jpg)
Monotonicity
subdifferential of a convex function is a monotone operator:
(u− v)T (x− y) ≥ 0 ∀u ∈ ∂f(x), v ∈ ∂f(y)
proof: by definition
f(y) ≥ f(x) + uT (y − x), f(x) ≥ f(y) + vT (x− y)
combining the two inequalities shows monotonicity
4–8
![Page 9: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/9.jpg)
Examples of non-subdifferentiable functions
the following functions are not subdifferentiable at x = 0
• f : R → R, dom f = R+
f(x) = 1 if x = 0, f(x) = 0 if x > 0
• f : R → R, dom f = R+
f(x) = −√x
the only supporting hyperplane to epi f at (0, f(0)) is vertical
4–9
![Page 10: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/10.jpg)
Subgradients and sublevel sets
if g is a subgradient of f at x, then
f(y) ≤ f(x) =⇒ gT (y − x) ≤ 0
f(y) ≤ f(x0)
x0
g
x1
∇f(x1)
nonzero subgradients at x define supporting hyperplanes to sublevel set
{y | f(y) ≤ f(x)}
4–10
![Page 11: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/11.jpg)
Outline
• definition
• subgradient calculus
• optimality conditions via subgradients
• directional derivative
4–11
![Page 12: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/12.jpg)
Subgradient calculus
weak subgradient calculus: rules for finding one subgradient
• sufficient for many algorithms for nondifferentiable convex optimization
• if you can evaluate f(x), you can usually compute a subgradient
strong subgradient calculus: rules for finding ∂f(x) (all subgradients)
• some algorithms, optimality conditions, etc., need whole subdifferential
• can be quite complicated
we will assume that x ∈ int dom f
4–12
![Page 13: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/13.jpg)
Some basic rules
(suppose all fi are convex unless otherwise stated)
differentiable functions: ∂f(x) = {∇f(x)} if f is differentiable at x
nonnegative combination
if h(x) = α1f1(x) + α2f2(x) with α1, α2 ≥ 0, then
∂h(x) = α1∂f1(x) + α2∂f2(x)
(r.h.s. is addition of sets)
affine transformation of variables: if h(x) = f(Ax+ b), then
∂h(x) = AT∂f(Ax+ b)
4–13
![Page 14: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/14.jpg)
Pointwise maximum
f(x) = max{f1(x), . . . , fm(x)}
define I(x) = {i | fi(x) = f(x)}, the ‘active’ functions at x
weak result: to compute a subgradient at x,
choose any k ∈ I(x), and any subgradient of fk(x)
strong result
∂f(x) = conv⋃
i∈I(x)
∂fi(x)
• convex hull of the union of subdifferentials of ‘active’ functions at x
• if fi’s are differentiable, ∂f(x) = conv{∇fi(x) | i ∈ I(x)}
4–14
![Page 15: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/15.jpg)
examplef(x) = max
i=1,...,maTi x+ bi
the subdifferential is a polyhedron ∂f(x) = conv{ai | i ∈ I(x)}
examplef(x) = ‖x‖1 = max
s∈{−1,1}nsTx
1
1
−1
−1
∂f(0, 0)
1
1
−1
∂f(1, 0)
(1,1)
∂f(1, 1)
4–15
![Page 16: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/16.jpg)
Pointwise supremum
f(x) = supα∈A
fα(x)
with fα(x) convex for every α
weak result: to find a subgradient at x,
• find any β for which f(x) = fβ(x) (assuming supremum is achieved)
• choose any g ∈ ∂fβ(x)
(partial) strong result: define I(x) = {α ∈ A | fα(x) = f(x)}
conv⋃
α∈I(x)
∂fα(x) ⊆ ∂f(x)
equality requires some technical conditions
4–16
![Page 17: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/17.jpg)
Example: maximum eignenvalue
f(x) = λmax(A(x)) = sup‖y‖2=1
yTA(x)y
where A(x) = A0 + x1A1 + · · ·+ xnAn, Ai ∈ Sk
how to find one subgradient at x?
• choose any unit eigenvector y associated with λmax(A(x))
• the gradient of yTA(x)y at x is a subgradient of f :
(yTA1y, . . . , yTAny) ∈ ∂f(x)
similarly can find a subgradient of ‖A(x)‖ = σmax(A(x))
4–17
![Page 18: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/18.jpg)
Expectation
f(x) = Eh(x, u)
with h convex in x for each u, expectation is over random variable u
weak result: to find a subgradient at x
• for each u, choose any gu ∈ ∂xh(x, u) (so u 7→ gu is a function)
• then, g = E gu ∈ ∂f(x)
proof: by convexity of h and definition of gu,
h(y, u) ≥ h(x, u) + gTu (y − x) ∀y
therefore
f(y) = Eh(y, u) ≥ Eh(x, u) +E gTu (y − x) = f(x) + gT (y − x)
(will use this in stochastic gradient methods)
4–18
![Page 19: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/19.jpg)
Optimal value function
define h(y) as the optimal value of
minimize f0(x)subject to fi(x) ≤ ui, i = 1, . . . ,m
(fi convex; variable x)
if strong duality holds and λ is an optimal dual variable, then
h(u) ≥ h(u)−m∑
i=1
λi(ui − ui)
i.e., −λ is a subgradient of h at y
4–19
![Page 20: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/20.jpg)
Composition
f(x) = h(f1(x), . . . , fk(x))
with h convex nondecreasing, fi convex
weak result: to find a subgradient at x,
• find z ∈ ∂h(f1(x), . . . , fk(x)) and gi ∈ ∂fi(x)
• then g = z1g1 + · · ·+ zkgk ∈ ∂f(x)
reduces to standard formula for differentiable h, fi
proof:
f(y) = h(f1(y), . . . , fk(y))
≥ h(f1(x) + gT1 (y − x), . . . , fk(x) + gTk (y − x))
≥ h(f1(x), . . . , fk(x)) + zT (gT1 (y − x), . . . , gTk (y − x))
= f(x) + gT (y − x)
4–20
![Page 21: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/21.jpg)
Outline
• definition
• subgradient calculus
• optimality conditions via subgradients
• directional derivative
4–21
![Page 22: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/22.jpg)
Optimality conditions — unconstrained
x⋆ minimizes f(x) if and only
0 ∈ ∂f(x⋆)
x
f(x)
x⋆
0 ∈ ∂f(x⋆)
proof: by definition
f(y) ≥ f(x⋆) + 0T (y − x⋆) for all y ⇐⇒ 0 ∈ ∂f(x⋆)
4–22
![Page 23: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/23.jpg)
Example: piecewise linear minimization
f(x) = maxi=1,...,m
(aTi x+ bi)
optimality condition
0 ∈ conv{ai | i ∈ I(x⋆)} (I(x) = {i | aTi x+ bi = f(x)})
in other words, there is a λ with
λ � 0, 1Tλ = 1,m∑
i=1
λiai = 0, λi = 0 for i 6∈ I(x⋆)
these are the KKT conditions for the equivalent LP
minimize tsubject to Ax+ b � t1
maximize bTλsubject to ATλ = 0
λ � 0, 1Tλ = 1
4–23
![Page 24: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/24.jpg)
Optimality conditions — constrained
minimize f0(x)subject to fi(x) ≤ 0, i = 1, . . . ,m
from Lagrange duality
if strong duality holds, then x⋆, λ⋆ are primal, dual optimal if and only if
1. x⋆ is primal feasible
2. λ⋆ � 0
3. λ⋆i fi(x
⋆) = 0 for i = 1, . . . ,m
4. x⋆ is a minimizer of
L(x, λ⋆) = f0(x) +m∑
i=1
λ⋆i fi(x)
4–24
![Page 25: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/25.jpg)
Karush-Kuhn-Tucker conditions (if dom fi = Rn)
conditions 1, 2, 3 and
0 ∈ ∂Lx(x⋆, λ⋆) = ∂f0(x
⋆) +m∑
i=1
λ⋆i∂fi(x
⋆)
this generalizes the condition
0 = ∇f0(x⋆) +
m∑
i=1
λ⋆i∇fi(x
⋆)
for differentiable fi
4–25
![Page 26: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/26.jpg)
Outline
• definition
• subgradient calculus
• optimality conditions via subgradients
• directional derivative
4–26
![Page 27: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/27.jpg)
Directional derivative
the directional derivative of f at x in the direction y is defined as
f ′(x; y) = limαց0
f(x+ αy)− f(x)
α
(if the limit exists)
properties (for convex f)
• if x ∈ int dom f , then f ′(x; y) exists for all y
• homogeneous in y: f ′(x;λy) = λf ′(x; y) for λ ≥ 0
y is a descent direction for f at x if f ′(x; y) < 0
4–27
![Page 28: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/28.jpg)
Descent directions and subgradients
• if f is differentiable, then −∇f(x) is a descent direction (if ∇f(x) 6= 0)
• if f is nondifferentiable, then −g, with g ∈ ∂f(x), is not always adescent direction
example: f(x1, x2) = |x1|+ 2|x2|
x1
x2
g
g = (1, 2) ∈ ∂f(1, 0), but y = (−1,−2) is not a descent direction at (1, 0)
4–28
![Page 29: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/29.jpg)
Directional derivative for convex f
equivalent definition: can replace lim with inf
f ′(x; y) = infα>0
f(x+ αy)− f(x)
α
= inft>0
(
tf(x+1
ty)− tf(x)
)
proof:
• the function h(y) = f(x+ y)− f(x) is convex in y with h(0) = 0
• its perspective th(y/t) is nonincreasing in t (an exercise from EE578),hence
f ′(x; y) = limtց0
th(y/t) = inft>0
th(y/t)
4–29
![Page 30: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/30.jpg)
consequences of expressions
f ′(x; y) = infα>0
f(x+ αy)− f(x)
α
= inft>0
(
tf(x+1
ty)− tf(x)
)
• f ′(x; y) is convex in y (partial minimization of convex fct in y, t)
• f ′(x; y) defines a lower bound on f in the direction y:
f(x+ αy) ≥ f(x) + αf ′(x; y) for α ≥ 0
4–30
![Page 31: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/31.jpg)
Directional derivative and subdifferential
for convex f and x ∈ int dom f
f ′(x; y) = supg∈∂f(x)
gTy
y
∂f(x)
• generalizes f ′(x; y) = ∇f(x)Ty for differentiable functions
• the directional derivative is the support function of the subdifferential
4–31
![Page 32: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/32.jpg)
proof
1. suppose g ∈ ∂f(x), i.e., f(x+ αy) ≥ f(x) + αgTy for all α, y; then
f ′(x; y) = limαց0
f(x+ αy)− f(x)
α≥ gTy
this shows thatf ′(x; y) ≥ sup
g∈∂f(x)
gTy
2. suppose g ∈ ∂yf′(x; y); then for all v, λ ≥ 0,
λf ′(x; v) = f ′(x;λv) ≥ f ′(x; y) + gT (λv − y)
taking λ → ∞ we get f ′(x; v) ≥ gTv, and therefore
f(x+ v) ≥ f(x) + f ′(x; v) ≥ f(x) + gTv
this means that g ∈ ∂f(x), and from 1, f ′(x; y) ≥ gTy
taking λ = 0 we see that f ′(x; y) = gTy
4–32
![Page 33: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/33.jpg)
Subgradients and distance to sublevel sets
if f is convex, f(y) < f(x), g ∈ ∂f(x), then for small t > 0,
‖x− tg − y‖2 < ‖x− y‖2
• −g is descent direction for ‖x− y‖2, for any y with f(y) < f(x)
• negative subgradient is descent direction for distance to optimal point
proof:
‖x− tg − y‖22 = ‖x− y‖22 − 2tgT (x− y) + t2‖g‖22≤ ‖x− y‖22 − 2t(f(x)− f(y)) + t2‖g‖22< ‖x− y‖22
4–33
![Page 34: EE 546, Univ of Washington, Spring 2016 4. SubgradientsEE 546, Univ of Washington, Spring 2016 4. Subgradients • definition • subgradient calculus • optimality conditions via](https://reader033.fdocuments.net/reader033/viewer/2022050204/5f578dce06f4836af26ab9dd/html5/thumbnails/34.jpg)
References
• L. Vandenberghe, Lecture notes for EE236C - Optimization Methods for
Large-Scale Systems (Spring 2011 and Spring 2014), UCLA.
• J.-B. Hiriart-Urruty, C. Lemarechal, Convex Analysis and Minimization
Algorithms (1993), chapter VI.
• Yu. Nesterov, Introductory Lectures on Convex Optimization. A Basic
Course (2004), section 3.1.
4–34