Stochastic Quasi-Gradient Methods Roger J-B Wets University of California, Davis February 15, 2005.
-
Upload
janel-davis -
Category
Documents
-
view
221 -
download
0
Transcript of Stochastic Quasi-Gradient Methods Roger J-B Wets University of California, Davis February 15, 2005.
Stochastic Quasi-Gradient Methods
Roger J-B Wets
University of California, Davis
February 15, 2005
Stochastic optimization Formulation
Properties:
minE{ f (ξ,x)} so that x∈S⊂ ° n
E{ f (ξ,x)} = f (ξ,x)P(dξ)
Ξ⊂° N∫ =Ef (x)
ξ a f (ξ , x) is (measurable) continuous
x a f (ξ , x) is lsc, convex ⇒ Ef (g) convex
S is a closed convex set
S
Subgradients of convex fcnsv(ξ) ∈∂x f(ξ,x) ⇔ ∀x: f(ξ,x)≥ f(ξ,x) + v(ξ),x−x
v(ξ) =∇f(ξ,x) ⇔ ∂x f(ξ,x) a singleton
Minimization algorithms
xν xν+1 =xν −λν
)v
xν + λ(−)v) : )v∈∂Ef(xν ),λ ≥0.
S
Step type 1
Minimization algorithms
xν
xν+1
xν + λ(−)v) : )v∈∂Ef(xν ),λ ≥0.
S
Step type 2proj
xν −λν
)v
“repeated” projections
xν+1 =argmin dist2 (xν −λνvν (ξ),x) : x∈S⎡⎣ ⎤⎦
Convex program: quadratic objective functionquadratic program if S is a polyhedral set
Many applications: S =[l,u] ∩ x: aj (xj ) ≤βj=1
n∑{ }
projection is a simple/efficient non-negative, convex, bounded away from 0a j
SQG Iterates basic strategy:
xν+1 = projS(xν −λνς
ν ), −ςν estim. descent direction
λν ] 0 as ν → ∞
ς ν stochastic quasi-gradient:
Ef (x*) ≥ Ef (xν ) + E{ς ν | x0 ,K , xν −1},(x * −xν ) + γ ν
"asymptotically": E{ς ν | x0 ,K , xν −1}∈∂Ef (xν )
SQG: Stochastic Optimization . sqg:
justification:
minE{ f (ξ,x)} so that x∈S⊂ ° n
vν (ξs)∈∂f(ξs,xν ) or =∇f(ξs,xν )
ξs sample of random vector ξ, or
vν (ξs)=1κ
vνll=1
κ∑ (ξsl ), vνl (ξsl )∈∂f(ξsl ,xν )
∂Ef (x) = ∂f (ξ , x)P(dξ ) (generally)Ξ∫
SQG: Stochastic Optimization . value estimate:
justification:
minE{ f (ξ,x)} so that x∈S⊂ ° n
ην (ξ s )= f (ξ s , xν ), ξ s sample, or
η ν (ξ s )=1
κf (ξ sl , xν )
l=1
κ
∑
Ef (xν ) =E{ f(ξ,xν )} & Law L.N.
A (simple) location problem Pop. Size of 12 districts: 11 # 26. Probabilistic choice of shopping district:
shortage cost: 4, holding cost: 0.5 (excess) decision: location of facilities (shopping malls)
pij =e−λcij
e−λcik
k=1
12∑ from i to shop in j
λ =0.1 ,cij : distance & other factors
“preferences” tablecij :
0 1
3 4 6 7 8 …
2 0 1 1 3 5 5 …
7 1 0 1 2 6 5 4
4 …
Formulation from objective:
probability of sample
determined by customer behavior
cij to pijminE fi (ξi ,xi )
i=1
12
∑⎧⎨⎩
⎫⎬⎭
fi (ξi ,xi ) =max[qs(ξi −xi ), qe(xi −ξi )]
ξ =(ξ1,K ,ξ12 )
∂fi (ξ i , xi )∋−qs if ξ i ≥ xi; = qe otherwise
pij
Objective Value: iterates
Estimate of the objective per iterate λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν
viν = −qs or = qe
Objective Value (2): iterates
λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν
viν = −qs or = qe
Estimate of the objective per iterate
Facilities: 18.57 15.90 19.13 16.35 27.25 20.75 21.88 17.81 19.11 17.52 18.62 19.60Distr.Pop: 14 11 14 13 26 23 22 11 14 12 18 10
Objective Value (3): iterates
Facilities: 24 22 23 20 26 22 23 22 22 20 22 25 : 271Distr.Pop: 19 16 19 16 27 21 22 18 19 18 19 20 : 234
λν =1 / ν , xν +1 = xν − vν (ξ s ) / ν
viν = −qs or = qe
a.s. Convergence For now presumed optimal sol’n at iteration ν
projection implies:
x*∈S
xν , sample ξν , vν ∈∂f(ξν ,xν ), λν =1 /νxν+1 =prjS(x
ν −λνvν )
F ν =σ − x0 ,K , xν{ }
xν+1 −x*2≤ xν −x*
2−2λν vν ,xν −x* + λν
2 vν 2
a.s Convergence
taking condition expectation w.r.t. Fν
assumption(a.): with
Eν xν+1 −x*2
{ } ≤ xν −x*2+ λν
2Eν vν (ξ)2
{ }
−2λν Eν vν (ξ){ } ,xν −x*
γν + Eν vν (ξ ){ }, x * −xν ≥ Ef (x*) − Ef (xν ) ≥ 0
Eν g{ } =E g: F ν{ }
γν F ν -measurable
⇒ Eν vν (ξ ){ }, x * −xν ≥ −γ ν
a.s Convergence Hence
Assumption(b.): where
with
Eν xν+1 −x*2
{ } ≤
xν −x*2+ λν
2Eν vν (ξ)2
{ } + 2λνγν ≤ xν −x*2+ ρν
Xν = x* −xν 2+ ρkk=0
ν∑ ,
Eν Xν +1{ } ≤Xν , Xν ≥0 ⇒ Xνa.s.⏐ →⏐ X, EX≤EX0
Eν g{ } =E g: F ν{ }
ρνν=0
∞
∑ < ∞
ρν =λν2Eν vν (ξ )
2
{ }+ 2λ ν | γ ν | ⇒ λ ν2 < ∞
ν =0
∞
∑( )
a.s. Convergence recursively
from (a)
E xν+1 −x*2
{ } ≤E x0 −x*2
{ } + λk2E vk 2
{ }k=0
ν
∑
−2 λk E vk{ } ,xk −x*k=0
ν
∑
E xν+1 −x*2
{ } −E x0 −x*2
{ } − E ρk{ }k=0
ν∑≤2 Ef(x* )−Ef(xk)⎡⎣ ⎤⎦k=0
ν∑
a.s. Convergence Thus
assumption (c.) and there exists a subsequence such that
λkk=0
∞
∑ Ef (x*) − Ef (xk )⎡⎣ ⎤⎦< ∞
λν ≥0, λ νν =0
∞
∑ = ∞Ef (x*) ≤Ef(xk) ⇒
Ef (x*)−Ef(xνk )→ 0.
Review of assumptions (a.)
(b.)
(c.)
Ef (x*)−Ef(xν ) ≤γν + Eν vν (ξ){ } ,x* −xν
λν2Eν vν (ξ )
2
{ }+ 2λ ν | γ ν |ν =0
∞
∑ < ∞
λν ≥0, λ νν =0
∞
∑ = ∞
“stumbling” blocks Projection Step size: adaptive, adjust (increase,
decrease) based on the variance of the stochastic quasi-gradient
Stopping criterion: like for step-size, but more generally comparison of the values of the objective:
1
M +1f (ξ l
l=ν−M
ν∑ ,xl ) ≈Ef (xν ) estimate
A short history Stochastic approximation methods
Robbins & Monro, Kiefer & Wolfowitz (‘50) SQG: Theory
Shor, Poljak, Ermoliev, Fabian (‘60),
Kushner(‘70),Pflug, Ruszczynski (‘80), Implementation:
Gaivoronski, Gupal, Norkin (‘80 … 2005)