CPS Scheduling Policy Design with Utility and Stochastic Execution*
description
Transcript of CPS Scheduling Policy Design with Utility and Stochastic Execution*
CPS Scheduling Policy Design with Utility and Stochastic
Execution*Chris Gill
Associate ProfessorDepartment of Computer Science and Engineering
Washington University, St. Louis, MO, [email protected]
Georgia Tech CPS Summer SchoolAtlanta, GA, June 23-25, 2010
*Research supported in part by NSF grants CNS-0716764 (Cybertrust) and CCF-0448562 (CAREER) and driven by numerous
contributions from post-doctoral student Robert Glaubius; doctoral student Terry Tidwell; undergraduate students Braden Sidoti, David Pilla, Justin Meden, Carter Bass, Eli Lasker, Micah
Wylde, and Cameron Cross; and Prof. William D. Smart
2 - Gill et al. – 04/21/23
Washington University in St. Louis
3 - Gill et al. – 04/21/23
Dept. of Computer Science and Engineering
24 faculty members and 70 Ph.D. students working in: real-time and embedded systems, robotics, graphics,
computer vision, HCI, AI, bioinformatics, networking, high-performance architectures, chip multi-processors, mobile computing, sensor networks, optimization
PhD students are fully funded, and we emphasize individual mentorship and interdisciplinary work
Recent graduates are on faculty at U. Mass, UT-Austin, Rochester, RIT, CMU, Michigan St., and UNC-Charlotte
Graduate study application deadline for Fall 2011 is January 15: http://www.cse.wustl.edu
4 - Gill et al. – 04/21/23
Why Pursue CPS Research?
Systems are increasingly being designed to interact with the physical world
This trend offers compelling new research challenges that motivate our work
Consider for example the domain of mobile robotics
my name is
LewisMedia and Machines Laboratory
Washington University in St. Louis
5 - Gill et al. – 04/21/23
Why is This Work CPS Research?
As in many other systems, resources must be shared among competing tasks
Fail-safe modes may reduce consequences of resource-induced timing failures, but precise scheduling matters
The physical properties of some resources motivate new models and techniques
my name is
LewisMedia and Machines Laboratory
Washington University in St. Louis
6 - Gill et al. – 04/21/23
Which Problem Features are Interesting?
Sharing e.g., a camera between navigation and image capture tasks
(1) in general doesn’t allow efficient preemption
(2) involves stochastically distributed durations
Also important in general:(3) scalability (many tasks sharing such a resource);(4) task utility/availability
LewisMedia and Machines Laboratory
Washington University in St. Louis
7 - Gill et al. – 04/21/23
System Model Assumptions We model time as being discrete
» E.g., based on some multiple of the Linux jiffy» States and scheduling decisions align with those
quanta
Separate tasks require a shared resource» Access is mutually exclusive (a task binds the
resource)» Binding durations are independent and non-
preemptive» Tasks’ duration distributions are known (or learned
[1])» Each task is always available to run (relaxed in part
III)
Goal: precise resource allocation among tasks [5]» E.g., 2:1 utilization share targets for tasks A vs. B» Need a deterministic scheduling policy (decides
which task gets the resource when) that best fits that goal
8 - Gill et al. – 04/21/23
Part I
Utilization State Spaces and
Markov Decision Processes
9 - Gill et al. – 04/21/23
Towards Optimal Policies
A Markov decision process (MDP) is a 4-tuple (X,A,C,T) that matches our system model well:X: a finite set of states (e.g., utilizations of 8 vs. 17
quanta)A: the set of actions (giving resource to a particular task)C: cost function for taking an action in a stateT: transition function (probability of moving from one
state to another state based on the action chosen)
Solving the MDP gives a policy that maps each state to an action to minimize long term expected costs
However, to do that we need a finite set of states
10 - Gill et al. – 04/21/23
Share Aware Scheduling
A system state: cumulative resource usage of each task
Dispatching a task moves the system stochastically through the state space according to that task’s duration
(8,17)
11 - Gill et al. – 04/21/23
Share Aware Scheduling
Utilization target induces a ray {u:0} through the state space
Encode each state’s “goodness” (relative to the share) as a cost
Require that costs grow with distance from utilization ray
u
u=(1/3,2/3)
12 - Gill et al. – 04/21/23
Transition Structure
Transitions are state-independent
I.e., relative distribution over successor states is the same in each state
13 - Gill et al. – 04/21/23
Cost Structure
States along same line parallel to the utilization ray have equal cost
14 - Gill et al. – 04/21/23
Equivalence Classes
Transition and cost structure thus induce equivalence classes
Equivalent states have the same optimal long-term cost and policy!
15 - Gill et al. – 04/21/23
Periodicity
Periodic structure allows us to represent each equivalence class with a single exemplar [4]
16 - Gill et al. – 04/21/23
Wrapping the State Model
Remove all but one exemplar from each equivalence class
Actions and costs remain unchanged
Remap any dangling transitions (to removed states) to the corresponding exemplar
(0,0)
17 - Gill et al. – 04/21/23
c(x)=
c(x)=
Truncating the State Model
Inexpensive states are nearer the utilization target
Good policies should keep costs small
Can truncate the state space by bounding sizes of costs considered
18 - Gill et al. – 04/21/23
Bounding the State Model
Map any dangling transitions produced by truncation, to a high-cost absorbing state
This guarantees that we will be able to find bounded-cost policies if they exist
Bounded costs also guarantee bounded deviation from the resource share (precision)
19 - Gill et al. – 04/21/23
A Scheduling Policy Design Approach
Iteratively increase the bounds and re-solve the resulting MDP
As the bounds increase, the bounded model solution converges towards the optimal wrapped model policy
20 - Gill et al. – 04/21/23
Automating Model Discovery
ESPI: Expanding State Policy Iteration [3]
1. Start with a policy that only reaches finitely many states from (0,…,0).
E.g., always run the most underutilized task.
2. Enumerate enough states to evaluate and improve that policy
3. If policy can not be improved, stop4. Otherwise, repeat from (2) with newly improved
policy
21 - Gill et al. – 04/21/23
Policy Evaluation Envelope
Enumerate states reachable from the initial state
Explore state space breadth-first under the current policy, starting from the initial state(0,0)
22 - Gill et al. – 04/21/23
Policy Improvement Envelope
Consider alternative actions
Close under the current policy using breadth-first expansion
Evaluate and improve the policy within this envelope
23 - Gill et al. – 04/21/23
ESPI Termination
As long as the initial policy has finite closure, each ESPI iteration terminates (this is satisfied by starting with the heuristic policy that always runs the most underutilized task)
Policy strictly improves at each iteration
Anecdotally, ESPI terminates on all of the task scheduling MDPs to which we have applied it
24 - Gill et al. – 04/21/23
Comparing Design Methods
Policy performance is shown normalized and centered on the ESPI solution data
Larger bounded state models yield the ESPI solution
25 - Gill et al. – 04/21/23
Part II
Scalabilityand
Approximation Techniques
26 - Gill et al. – 04/21/23
What About Scalability?
MDP representation allows consistent approximation of the optimal scheduling policy
Empirically, bounded model and ESPI solutions appear to be near-optimal
However, approach scales exponentially in number of tasks so while it may be good for (e.g.) sharing an actuator, it won’t apply directly to larger task sets
27 - Gill et al. – 04/21/23
What our Policies Say about Scalability
To overcome limitations of MDP based approach, we focus attention on a restricted class of appropriate scheduling policies
Examining the policies produced by the MDP based approach gives insights into choosing (and into parameterizing) appropriate policies [2]
28 - Gill et al. – 04/21/23
Two-task MDP Policy
Scheduling policies induce a partition on a 2-D state space with boundary parallel to the share target
Establish a decision offset d to identify the partition boundary
Sufficient in 2-D, but what about in higher dimensions?
29 - Gill et al. – 04/21/23
Time Horizons Suggest a Generalization
H0 H1 H2 H3 H4
Ht={x : x1+x2+…+xn=t}
H0
H1
H2
(0,0) (2,0,0)
(0,2,0)
(0,0,2)
u
u
30 - Gill et al. – 04/21/23
Three-task MDP Policy
Action partitions meet along a decision ray that is parallel to the utilization ray
Action partitions are roughly cone-shaped
t =10 t =20 t =30
31 - Gill et al. – 04/21/23
Parameterizing a Partition
Specify a decision offset at the intersection of partitions
Anchor action vectors at the decision offset to approximate partitions
A conic policy selects the action vector best aligned with the displacement between the query state and the decision offset
a1a2
a3
x
32 - Gill et al. – 04/21/23
Conic Policy Parameters
Decision offset dAction vectors a1,a2,…,an
Sufficient to partition each time horizon into n regions
Allows good policy parameters to be found through local search
33 - Gill et al. – 04/21/23
Comparing Policies
Policy found by ESPI (for small numbers of tasks)πESPI(x) – chooses action at state x per solved MDP
Simple heuristics (for all numbers of tasks)πunderused(x) – runs the most underutilized task
πgreedy(x) – minimizes immediate cost from state x
Conic approach (for all numbers of tasks)πconic(x) – selects action with best aligned action vector
34 - Gill et al. – 04/21/23
Policy Comparison on a 4 Task Problem
Task durations: random histograms over [2,32]100 iterations of Monte Carlo conic parameter
searchESPI outperforms, conic eventually approximates
well
35 - Gill et al. – 04/21/23
Policy Comparison on a Ten Task Problem
Repeated the same experiment for 10 tasksESPI is omitted (intractable here)Conic outperforms greedy & underutilized
heuristics
36 - Gill et al. – 04/21/23
Comparison with Varying #s of Tasks
100 independent problems for each # (avg, 95% conf)
ESPI only tractable through all 2 and 3 task casesConic approximates ESPI, then outperforms
others
37 - Gill et al. – 04/21/23
Part III
Expanding our Notions of Utility and Availability
38 - Gill et al. – 04/21/23
Time-Utility Functions
Previously, utility was proximity to utilization target; now we let tasks’ utility and job availability* varytime-utility function (TUF) name
period boundarytermination time
termination timeperiod boundary
* Availability variable qi is defined over {0,1}; {0, tmi/pi }; or {0,1} tmi/pi
Time
39 - Gill et al. – 04/21/23
Utility × Execution Utility Density
A task’s time-utility function and its execution time distribution (e.g., Di(1) = Di(2) = 50%) give a distribution of utility for scheduling the task
40 - Gill et al. – 04/21/23
Actions and State Space StructureState space can be more compact here than in parts I and
II: dimensions are task availability (e.g., over (q1, q2)) vs. time
Can wrap the state space over the hyper-period of all tasks (e.g., D1(1) = D2(1) = 1; tm1 = p1 = 4; tm2 = p2 = 2)
Scheduling actions induce a transition structure over states (e.g., idle action = do nothing; action i = run task i)
action 2action 1idle action
time timetime
41 - Gill et al. – 04/21/23
Reachable States, Successors, RewardsStates with the same task availability and the
same relative position within the hyper-period have the same successor state and reward distributions reachable states
42 - Gill et al. – 04/21/23
Evaluation
(target sensitive)
(linear drop)
(downward step)Different TUF shapes are useful
to characterize tasks’ utilities (e.g., deadline-driven, work-ahead, jitter-sensitive cases)
We chose three representative shapes, and randomized their key parameters: ui, tmi, cpi
(we also randomized 80/20 task
load parameters: li, thi, wi)
utilitybounds
criticalpoints
terminationtimes
43 - Gill et al. – 04/21/23
How Much Better is Optimal Scheduling?
Greedy (Generic Benefit*) vs. Optimal (MDP) Utility Accrual
* P. Li, PhD Dissertation, VA Tech, 2004
2 tasks 3 tasks
5 tasks4 tasks
TUF nuances matter: e.g., work conserving approach degrades target sensitive policy
44 - Gill et al. – 04/21/23
Divergence Increases with # of Tasks
Note we can solve 5 task MDPs for periodic task sets (smaller state spaces; scalability is an ongoing issue)
45 - Gill et al. – 04/21/23
Conclusions
We have developed new techniques for designing non-preemptive scheduling policies for tasks with stochastic resource usage durations
MDP-based methods are effective for 2 or 3 task utilization share problems (e.g., for an actuator)
Conic policy performance is competitive with ESPI for smaller problems, and for larger problems improves on the underutilized and greedy policies
Ongoing work is focused on identifying and evaluating important categories of time-utility functions and tailoring our approach to address their nuances
46 - Gill et al. – 04/21/23
Publications[1] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Real-Time
Scheduling via Reinforcement Learning”, UAI 2010
[2] R. Glaubius, T. Tidwell, B. Sidoti, D. Pilla, J. Meden, C. Gill, and W.D. Smart, “Scalable Scheduling Policy Design for Open Soft Real-Time Systems”, RTAS 2010 (received Best Student Paper award)
[3] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Policy Design for Autonomic Systems”, International Journal on Autonomous and Adaptive Communications Systems, 2(3):276-296, 2009
[4] R. Glaubius, T. Tidwell, C. Gill, and W.D. Smart, “Scheduling Design and Verification for Open Soft Real-Time Systems”, RTSS 2008
[5] T. Tidwell, R. Glaubius, C. Gill, and W.D. Smart, “Scheduling for Reliable Execution in Autonomic Systems”, ATC 2008
Thanks, and hopeto see you at CPSWeek 2011!
Chris Gill Associate Professor of
Computer Science and Engineering