A Complete Navigation System for Goal Acquisition in Unknown Environments
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments
description
Transcript of Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments
![Page 1: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/1.jpg)
Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments
Michael J. NeelyUniversity of Southern California
http://www-rcf.usc.edu/~mjneelyInformation Theory and Applications Workshop
(ITA), UCSD Feb. 2009*Sponsored in part by the DARPA IT-MANET Program, NSF OCE-0520324, NSF Career CCF-0747525
Pr(success1, …, successn) = ??
![Page 2: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/2.jpg)
•Slotted System, slots t in {0, 1, 2, …}
•Network Queues: Q(t) = (Q1(t), …, QL(t))
•2-Stage Control Decision Every slot t: 1) Stage 1 Decision: k(t) in {1, 2, …, K}.
Reveals random vector w(t) (iid given k(t)) w(t) has unknown distribution Fk(w).
2) Stage 2 Decision: I(t) in I (a possibly infinite set). Affects queue rates: A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)) Incurs a “Penalty Vector” x(t): x(t) = x(k(t), w(t), I(t))
0 1 2 3 4 5 6
![Page 3: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/3.jpg)
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
Goal: Choose stage 1 and stage 2 decisions over time so that the time average penalties x solve:
f(x), hn(x) general convex functions of multi-variables
![Page 4: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/4.jpg)
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs
A1(t)
A2(t)
AL(t)
S1(t)
S2(t)
SL(t)
If channel states are known every slot: Can Schedule without knowing channel statistics or arrival rates! (EECA --- Neely 2005, 2006) (Georgiadis, Neely, Tassiulas F&T 2006)
Minimize Avg. PowerSubject to Stability
![Page 5: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/5.jpg)
Motivating Example 1: Min Power Scheduling with Channel Measurement Costs
A1(t)
A2(t)
AL(t)
S1(t)
S2(t)
SL(t)
If “cost” to measuring, we make a 2-stage decision:Stage 1: Measure or Not? (reveals channels w(t) )Stage 2: Transmit over a known channel? a blind channel?
-Li and Neely (07) -Gopalan, Caramanis, Shakkottai (07)Existing Solutions require a-priori knowledge of the full joint-channel state distribution! (2L , 1024L ? )
Minimize Avg. PowerSubject to Stability
![Page 6: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/6.jpg)
Motivating Example 2: Diversity Backpressure Routing (DIVBAR)
1
23
broadcasting
error
Networking with Lossy channels & Multi-Receiver Diversity:DIVBAR Stage 1: Choose Commodity and TransmitDIVBAR Stage 2: Get Success Feedback, Choose Next hop
If there is a single commodity (no stage 1 decision), we do not need success probabilities! If two or more commodities, we need full joint success probability distribution over all neighbors!
[Neely, Urgaonkar 2006, 2008]
![Page 7: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/7.jpg)
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
Goal:
Equivalent to:
Where g(t) is an auxiliary vector that is a proxy for x(t).
![Page 8: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/8.jpg)
Stage 1: k(t) in {1, …, K}. Reveals random w(t).Stage 2: I(t) in I. Incurs Penalties x(k(t), w(t), I(t)). Also affects queue dynamics A(k(t), w(t), I(t)) , m(k(t), w(t),I(t)).
EquivalentGoal:
Technique: Form virtual queues for each constraint.
U(t) bh(g(t)) Un(t+1) = max[Un(t) + hn(g(t)) – bn,0]
Z(t) g(t)x(t) Zm(t+1) = Zm(t) – gm(t) + xm(t)
Possibly negative
![Page 9: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/9.jpg)
Use Stochastic Lyapunov Optimization Technique: [Neely 2003], [Georgiadis, Neely, Tassiulas F&T 2006]
Define: Q(t) = All Queues States = [Q(t), Z(t), U(t)]Define: L(Q(t)) = (1/2)[sum of squared queue sizes]Define: D(Q(t)) = E{L(Q(t+1)) – L(Q(t))|Q(t)}
Schedule using the modified “Max-Weight” Rule: Every slot t, observe queue states and make a 2-stage decision to minimize the “drift plus penalty”:
Minimize: D(Q(t)) + Vf(g(t))
Where V is a constant control parameter that affectsProximity to optimality (and a delay tradeoff).
![Page 10: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/10.jpg)
How to (try to) minimize:
Minimize: D(Q(t)) + Vf(g(t))The proxy variables g(t) appear separably, and their termscan be minimized without knowing system stochastics!
Minimize:
Subject to:
[Zm(t) and Un(t) are known queue backlogs for slot t]
![Page 11: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/11.jpg)
Minimizing the Remaining Terms:
Minimize: D(Q(t)) + Vf(g(t))
![Page 12: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/12.jpg)
Solution: Define g(mw)(t), I(mw)(t) , k(mw)(t) as the ideal max-weight decisions (minimizing the drift expression).
Define ek(t):
k(mw)(t) = argmin{k in {1,.., K}} ek(t) (Stage 1)
I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q) (Stage 2)
g(mw)(t) = solution to the proxy problem
Then:
?
![Page 13: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/13.jpg)
Approximation Theorem: (related to Neely 2003, G-N-T F&T 2006)
If actual decisions satisfy:
With:
(related to slackness of constraints)
Then: -All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ]
-Average Queue Sizes <
f( x ) < f*optimal + O(max[eQ,eZ]) + (B+C)/V
-Penalty Satisfies:
![Page 14: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/14.jpg)
It all hinges on our approximation of ek(t):
Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here.
{w1(k)(t), …, wW
(k)(t)} = samples over past W type k explor. events
Approach 1:
![Page 15: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/15.jpg)
It all hinges on our approximation of ek(t):
Declare a “type k exploration event” independently with probability q>0 (small). We must use k(t) = k here.
{w1(k)(t), …, wW
(k)(t)} = samples over past W type k explor. Events{Q1
(k)(t), …, QW(k)(t)} = queue backlogs at these sample times.
Approach 2:
![Page 16: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/16.jpg)
Analysis (Approach 2):
Subtleties:1) “Inspection Paradox” issue requires use of samples at exploration events, so {w1
(k)(t), …, wW(k)(t)} iid.
2) Even so, {w1(k)(t), …, wW
(k)(t)} are correlated with queue backlogs at time t, and so we cannot directly apply the Law of Large Numbers!
![Page 17: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/17.jpg)
Analysis (Approach 2):
Use a “Delayed Queue” Analysis:
constant constantCan Apply LLN
ttstart
wW(t)w1(t) w2(t) w3(t)
![Page 18: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/18.jpg)
Max-Weight Learning Algorithm (Approach 2):(No knowledge of probability distributions is required!)
-Have Random Exploration Events (prob. q).
-Choose Stage-1 decision k(t) = argmin{k in {1,.., K}}[ ek(t) ]
-Use I(mw)(t) for Stage-2 decision: I(mw)(t) = argmin{I in I} Yk(t)(w(t), I, Q(t))
-Use g(mw)(t) for proxy variables.
-Update the virtual queues and the moving averages.
![Page 19: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/19.jpg)
Theorem (Fixed W, V): With window size W we have:
-All Constraints Satisfied. [B + C + c0V] min[emax – eQ, s – eZ]
-Average Queue Sizes <
f( x ) < f*q + O(1/sqrt{W}) + (B+C)/V
-Penalty Satisfies:
![Page 20: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/20.jpg)
Concluding Theorem (Variable W, V): Let 0 < b1 < b2 < 1.
Define V(t) = (t + 1) b1 , W(t) = (t+1)b2
Then under the Max-Weight Learning Algorithm: -All Constraints are Satisfied. -All Queues are mean rate stable*:
-Average Penalty gets exact optimality (subject to random exploration events):
f( x ) = f*q
*Mean rate stability does not imply finite average congestion and delay. In fact, Average congestion and delay are necessarily infinite when exact optimality is reached.
![Page 21: Max Weight Learning Algorithms with Application to Scheduling in Unknown Environments](https://reader036.fdocuments.net/reader036/viewer/2022070423/56816754550346895ddc0688/html5/thumbnails/21.jpg)