Priority Scheduling: An Application for the Permutahedron
description
Transcript of Priority Scheduling: An Application for the Permutahedron
Priority Scheduling: An Application for the Permutahedron
Ethan BolkerUMass-BostonBMC Software
AMS Toronto meetingSeptember 24, 2000
2
Plan• Brief introduction to queueing theory• Priority scheduling• Conservation laws and the permutahedron• Specifying CPU shares
interesting pictures and open questions
References: www.cs.umb.edu/~eb/goalmodeAcknowledgements: Jeff Buzen, Yiping Ding, Dan Keefe, Oliver Chen, Aaron Ball,
Tom Larard
3
Queueing theory• Workload: stream of jobs visiting a server
(ATM, time shared CPU, printer, …)• Jobs queue when server is busy• Input:
– Arrival rate: job/sec – Service demand: s sec/job
• Performance metrics:– Utilization: u = s (must be 1)– Response time: r = ??? – Degradation: d = r/s – Queue length: q = r (Little’s law)
4
Response time computations• r, d, q measure queueing delay
r s (d 1), unless parallel processing possible• Randomness really matters
r = s (d = 1) if arrivals scheduled (best case, no waiting)r >> s for bulk arrivals (worst case, maximum delays)
• Theorem. d = 1/(1- u) if arrivals are Poisson and service is exponentially distributed (M/M/1). r = s/(1- u) (think virtual server with speed 1-u ) q = u/(1- u) (convention: job in service is on queue)
5
M/M/1• Essential nonlinearity often counterintuitive
– at u = 90% average queue length is 0.9/(1-0.9) = 9,– average response time is s/(1-0.9) = 10s,– but 1 customer in 10 has no wait at all (10% idle time)
• A useful guide even when hypotheses fail– accurate enough ( 20%) for real computer systems– d depends only on u: many small jobs have same impact as few large
jobs– faster system smaller s smaller u r = s/(1-u)
double win: less service, less wait– waiting costly, server cheap (telephones): want u 0– server costly (doctors): want u 1 but scheduled
6
Multiple Job Streams
• Multiple workloads, utilizations u1, u2, …
• U = ui < 1
All degradations equal: di = 1/(1-U) • Suppose priority scheduling possible Study degradation vector V = (d1, d2, …)
Priority Scheduling• Priority state: order workloads by priority (ties OK)
– two workloads, 3 states: 12, 21, [12]– three workloads, 13 states:
• 123 (6 = 3! of these ordered states), • [12]3 (3 of these), • 1[23] (3 of these), • [123] (1 state with no priorities)
– n wkls, f(n) states, n! ordered (simplex lock combos)• p(s) = prob( state = s ) = fraction of time in state s• V(s) = degradation vector when state = s (measure this, or compute it
using queueing theory) • V = s p(s)V(s) (time avg is convex combination)• Achievable region is convex hull of vectors V(s)
8
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
achievable region
d2
d1 = d2
9
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
0.5 V
(12) +
0.5V
(21)
V([1
2])
d2
d1 = d2
10
Two workloads
d1
V(12) (wkl 1 high prio)
V(21)
V([12]) (no priorities)
d2
d1 = d2
note: u1 < u2 wkl 2 effect on wkl 1 large
11
Conservation• No Free Lunch Theorem. Weighted average degradation
is constant, independent of priority scheduling scheme:i (ui /U) di = 1/(1-U)
• Provable from some hypotheses• Observable in some real systems• Sometimes false: shortest job first minimizes average
response time (printer queues, supermarket express checkout lines)
12
Conservation• For any proper set A of workloads
Imagine giving those workloads top priority. Then can pretend other wkls don’t exist. In that case
i A (ui /U(A)) di = 1/(1-U(A))When wkls in A have lower priorities they have higher degradations, so in general
i A (ui /U(A)) di 1/(1-U(A))• These 2n -2 linear inequalities determine the convex
achievable region R • R is a permutahedron: only n! vertices
13
Two workload permutahedrond2
u1d1 + u2d2 = U/(1-U)
d1
14
Two workload permutahedrond2
V(21)
u1d1 + u2d2 = U/(1-U)
d2 1/(1- u2 )
d1
15
Two workload permutahedron
d2
V(12)
V(21)d1 1/(1- u1 )
u1d1 + u2d2 = U/(1-U)
d2 1/(1- u2 )
achievable region
d1
16
Three workload permutahedron
d2
d1
d3
V(213)
u1d1 + u2d2 + u3d3 = U/(1-U)
V(123)
17
Experimental evidence
18
Four workload permutahedron4! = 24 vertices (ordered states)
24 - 2 = 14 facets (proper subsets)(conservation constraints)
74 faces (states)
Simplicial geometry and transportation polytopes,Trans. Amer. Math. Soc. 217 (1976) 138.
19
• Administrator specifies performance goals – desired degradations (IBM OS/390) (not today)– CPU shares (UNIX offerings from HP, IBM, Sun)
• Operating system dispatches jobs in an attempt to meet goals
• Model predicts degradations by constructing map
Scheduling for performance
permutahedronworkload performance goals
20
Specifying CPU shares• Administrator specifies workload CPU shares• Share f (0 < f < 1) means workload guaranteed
fraction f of CPU when at least one of its jobs is queued for service, can get more if some competition is absent
• share utilization• share cap• share should be renamed guarantee
21
Map shares to degradations- two workloads -
• Suppose f1 and f2 > 0 , f1 + f2 = 1• Model: System operates in state
– 12 with probability f1
– 21 with probability f2
(independent of who is on queue) • Average degradation vector:
V = f1 V(12) + f2 V(21)
22
Model validation
23
Model validation
24
Map shares to degradations- three (n) workloads -
f1 f2 f3prob(123) = ------------------------------ (f1 + f2 + f3) (f2 + f3) (f3)
• Theorem: These n! probabilities sum to 1– interesting identity generalizing adding fractions– prove by induction, or by coupon collecting
• V = ordered states s prob(s) V(s) • O(n!), (n!), good enough for n 9 (12)• Searching for fast (approximate) algorithm ...
25
Model validation
26
Model validation
27
Map shares to degradations(geometry)
• Interpret shares as barycentric coordinates in the n-1 simplex
• Study the geometry of the map from the simplex to the n-1 dimensional permutahedron
• Easy when n=2: each is a line segment and map is linear
28
Mapping a triangle to a hexagon
f1 = 1 f 1 =
0
f3 = 0
f3 = 1
132
123
213
312
321
231wkl 1 high priority
wkl 1 low priority
M
29
Mapping a triangle to a hexagon
f1 = 1
{23}
f 1 =
0
30
Mapping a triangle to a hexagon
31
Implementing fair share scheduling
• Actual Sun/solaris implementation is subtle• HP and IBM are black boxes (for me)• Stochastic solution: randomly choose queued job to
dispatch (implement the model rather than model an implementation)
• May require prior computation of priodist(w, p) = prob(wkl w runs at prio p)
• workload priority probabilities, not state probabilities
32
Priority distributions• Given degradations, compute a priodist• A priodist is an nn matrix with row sums 1• {priodists} = cartesian product of n n-simplices
• Map is surjective, not injective• Look for a well behaved inverse imagepermutahedron (dim n-1)priodist space (dim n(n-1))
33
Three workload permutahedron
d1
d2 d1 = d2
132
123
213
[12]3
312
321
231
3[12]
[123]
2[13]
[23]1
1[23]
[13]2
d2 = d3
d1 = d3
34
… dissected into 3! quadrilaterals
d1
d2
d1 = d2
123[12]3
[123]
1[23]
d2 = d3
35
… each mapped to from a skew quadrilateral of priodists
123[12]3
[123]
1[23]
1 0 00 .5 .50 .5 .5
.5 .5 0
.5 .5 0 0 0 1
1 0 00 1 00 0 1
.33 .33 .33
.33 .33 .33
.33 .33 .33
(x,y) xyP123 + x(1-y) P1[23] + (1-x)yP[12]3 + (1-x)(1-y) P[123] degradation vector in this corner of permutahedron
P123
P[123]P1[23]
P[12]3
(x,y)
36
Skew quadrilaterals
• Given 4 points P00, P01, P10, P11 Rm , map unit square: (x,y) xyP00 + x(1-y) P01+ (1-x)yP10 + (1-x)(1-y) P11
• Easy to generalize to 2k points• Analogous to convex hull, which maps
barycentric coordinates on a simplex• Reference for this construction?
37
Inversion
d2
d1
Try to locate * = (d1, d2 ) on coordinate grid
38
Sequential bisection
d2
d1
39
Sequential bisection
d2
d1
40
Sequential bisection
d2
d1
41
Sequential bisection
d2
d1
42
Sequential bisection
d2
d1
43
… may fail to converge
d1
d2
44
Tempered sequential bisection
d1
d2
o
45
Tempered sequential bisection
d1
d2
oo
46
Tempered sequential bisection
prove that this converges... d1
d2
ooo