Multi-layered Round Robin Routing for Parallel Serversdownd/ksrpt3.pdf · Department of Computing...

Multi-layered Round Robin Routing

for Parallel Servers

Douglas G. DownDepartment of Computing and Software

McMaster University

1280 Main Street West, Hamilton, ON L8S 4L7, Canada

[email protected]

905-525-9140

Rong WuDepartment of Computing and Software

McMaster University

1280 Main Street West, Hamilton, ON L8S 4L7, Canada

[email protected]

(Work done while both authors were visitors at EURANDOM, P.O. Box 513,5600 MB Eindhoven, The Netherlands.)

Abstract

We study a system of several identical servers in parallel, where a routing decision must be made immedi-ately on a job’s arrival. Jobs arrive according to a Poisson process, with their processing times following adiscrete distribution with finite support. The processing time of a job is known on arrival and may be usedin the routing decision. We propose a policy consisting of multi-layered round robin routing followed byshortest remaining processing time scheduling at the servers. This policy is shown to have a heavy trafficlimit that is identical to one in which there is a single queue (no routing) and optimal heavy traffic schedul-ing. In light traffic, we show that the performance of this policy is worse than round robin routing followedby shortest remaining processing time scheduling. We also quantify the difference between round robin andmulti-layered round robin routing, which in turn yields insights on the relative importance of routing versus(local) scheduling in such systems.

Multi-layered Round Robin Routing – D.G. Down and R. Wu 1

1 Introduction

With the increasing presence of distributed server architectures in today’s telecommunications and computernetworks, the problem of routing and scheduling jobs in such systems has become of increasing interest. Wepropose to look at a particular architecture, one in which arrivals must be immediately routed to one ofseveral identical servers. The application that we have in mind is a cluster of Web servers, however such amodel should also be of more general interest.

There are two components that we will assume are free to design. The first is the routing policy, whichdetermines how jobs are assigned to servers. Examples of this are random routing (a job is assigned to aparticular server with probability 1/c, where c is the number of servers), round robin (RR) routing (the itharriving job is assigned to the (i mod c)th server) and Join the Shortest Queue (JSQ), where each job isassigned to the server with the least number of waiting jobs. Notice that a routing rule requires varyingdegrees of information about the system state: random and round robin require none, while JSQ requiresthe queue length information at each server. The second component that we are free to design is the localscheduling policy used to schedule the order of processing at a server. Examples of this are First Come FirstServed (FCFS), in which jobs are served in the order that they arrive and Shortest Remaining ProcessingTime (SRPT), where priority is given to the job with the least processing time remaining. To implementSRPT, one needs to know the processing time of a job. We will assume that this is the case in our system,which could model, for example, static requests in a Web server system.

The first papers related to our model were for the case that the actual processing time of a job is onlyknown once it has been completed. Winston [23] showed that JSQ routing followed by FCFS schedulingminimizes the mean waiting time for exponentially distributed interarrival and processing times. Weber[20] extended this result by showing that JSQ routing followed by FCFS scheduling maximizes the numberof jobs that have completed processing by a given time, where the arrival process is arbitrary and the pro-cessing times have a non-decreasing hazard rate. Liu and Towsley [10] show that if the queue lengths arenot available on a job’s arrival and if the processing times have a non-decreasing hazard rate, RR routingfollowed by FCFS scheduling minimizes a separable convex ordering of the waiting times. Ephremides,Varaiya, and Walrand in [3] show that RR routing followed by FCFS scheduling minimizes the sum of theexpected completion times for the first n tasks (for all n) if the task sizes follow an exponential distribution.The results in [3] were extended to arbitrary processing time distributions by Liu and Righter [9]. Whitt[22] then provided several examples in which JSQ routing followed by FCFS scheduling does not minimizethe mean waiting time. These examples had processing times with large variance, in particular they did nothave non-decreasing hazard rates. That such examples could be constructed is not surprising in hindsight,given the results in Righter, Shanthikumar and Yamazaki [14], who show that for a system with a singleserver, FCFS actually maximizes the mean waiting time in a system with non-increasing hazard rate. Thisdoes raise the question: what is the relative value of routing versus scheduling?

We will attempt to provide some insight into this question in the context of the case when the processingtimes are known on arrival. We are interested in minimizing the mean waiting time in the system for anarriving job. For a single server system, Schrage [16] showed that for an arbitrary arrival process andarbitrary processing time distribution, SRPT scheduling minimizes the completion time of the nth departing


job. So, for our model, no matter what the routing policy, it is optimal to use SRPT at each server. Inpractice, the belief that long jobs can be starved (see Tanenbaum [17], amongst others) means that SRPTis not often used. In that spirit, several recent works have suggested policies where the scheduling policyis constrained to be FCFS. Harchol-Balter, Crovella and Murta [6] have proposed a routing policy calledSITA-E (Size Interval Task Assignment with Equal Load) in which jobs within a particular size range areassigned to the same server, with the size ranges chosen to equalize the loads across servers. They showthat under highly variable processing time distributions, SITA-E routing may significantly outperform JSQrouting (with FCFS scheduling). Similar policies are suggested in the work of Ciardo, Riska and Smirni[2] and Riska, Sun, Smirni and Ciardo [15]. An important issue with [6] and [2] is that one must know theprocessing time distribution. The algorithm in [15] is an adaptive version of that in [2], so the processingtime distribution is not assumed to be known a priori.

In this paper, we would like to give an idea of the fundamental limits that can be achieved in such models.Identifying policies which are optimal in some sense is useful even if there are difficulties in implementation.The optimal policy provides a benchmark for use in the study of other proposed schemes. In Section 2, welook at the tightly coupled servers case, i.e. one in which there is a single queue for all of the servers. Theperformance of such a system will be a lower bound on that of the system of interest (which we will call theloosely coupled servers case). For the tightly coupled server system we can identify the optimal policy inheavy traffic (all servers are busy at all times) in the strong sense of minimizing the completion time of thenth departing job from the system. In Section 3, we study the loosely coupled servers case for a particularset of assumptions on the underlying interarrival and processing times. In particular, we assume that thearrivals follow a Poisson process and the service times follow a discrete distribution with finite support. Weshow that, in heavy traffic, a policy consisting of a multi-layered round robin routing policy (an independentround robin policy for each point in the processing time distribution) followed by SRPT scheduling leadsto the same heavy traffic limit as the optimal policy for the tightly coupled servers, a property that we willrefer to as asymptotically optimal in heavy traffic. A consequence of this result is that if the heavy trafficlimits were used to approximate the mean waiting time in the system, the resulting approximations wouldbe identical for both systems. The proof of asymptotic optimality is done by showing that the heavy trafficlimits for the two systems are the same as those for particular priority policies and as such exhibit statespace collapse. Priority policies and state space collapse have been studied in Whitt [21] and Reiman [11]and we will see that our analysis yields heavy traffic limits that are similar to those in [21] and [11]. Infact, we would argue that our contribution is in identifying the asymptotically optimal policy in the looselycoupled servers case, rather than any methodological contribution. Under our assumptions on the processingtime distribution, the existing methodologies are almost immediately applicable. In Section 4, we examinethe relative tradeoff between routing and scheduling, by comparing the performance of naive (round robin)routing with that of the asymptotically optimal policy. In Section 5, we discuss how one may use the insightdeveloped in the first part of the paper to design policies that perform well for systems with general servicetime distributions. In particular, we perform simulations for a system with a Bounded Pareto processingtime distribution. To put the results of the first part of the paper in context, Section 6 involves the discussionof policies in light traffic. Multi-layered round robin routing policy followed by SRPT scheduling is notoptimal in this case. Using Reiman-Simon theory (see Reiman and Simon [12, 13]), we show that round


robin routing followed by SRPT scheduling is asymptotically optimal for general distributions in the sensethat it minimizes the first two terms of the Taylor series expansion of the waiting time. Section 7 providessome final thoughts.

2 Tightly coupled servers

Here we consider a system where there is a common queue for all servers. We make no assumptions on thearrival process or processing time distribution. There are c identical servers. A job can be processed by atmost one server at any point in time. In the case c = 1, Schrage [16] has shown that the Shortest RemainingProcessing Time (SRPT) policy is optimal. The proof involves an interchange argument and also shows thatthe number in system is minimized for each sample path, which of course is much stronger than the meannumber in the system being minimized.

Now consider the case where there are c > 1 servers and a single queue. We define the c-SRPT policyas one in which preemptive (resume) priority is given to the first c jobs with the least remaining processingtimes (if the system has less than c jobs, every job receives service). Let Cc-SRPT(n) be the completion timeof the nth departing job under c-SRPT and Cπ(n) be the corresponding quantity under any other policy π.

Theorem 2.1 For a tightly coupled server system, in heavy traffic (all servers are busy at all times), c-SRPTis optimal in the sense that

Cc-SRPT(n) ≤ Cπ(n)

for all n.

Proof. The proof of this is a straightforward extension of that for the single server in [16], so we will usethe same notation, with a slight change to reflect the fact that there is more than one server. If we let δi(n, t)be the indicator that server i is devoted to job n at time t, A(n) be the arrival time of job n, and P (n) be theprocessing time of job n, we have that the remaining processing time for job n at time t, S(n, t), is given by

S(n, t) = P (n)−c∑

i=1

∫ t

A(n)δi(n, x)dx

and the corresponding completion time for job n is

C(n) = min

{x :

c∑i=1

∫ x

A(n)δi(n, y)dy ≥ P (n)

}.

Now, consider two jobs, labelled j and k, where S(j, t) < S(k, t). Suppose that there exists some timeinterval such that for policy π, processing is devoted to k but not j. We will modify π only on this interval.Note that we make no assumption on which server the processing is done and assume without loss ofgenerality that under π, all servers are busy during the time period. So, we have

Cπ(k) + Cπ(j) = min[Cπ(k), Cπ(j)] + max[Cπ(k), Cπ(j)].


The term max[Cπ(k), Cπ(j)] is unchanged, but min[Cπ(k), Cπ(j)] is reduced if we process j instead of k.Call this new policy π′. So, on the interval (min[Cπ′

(k), Cπ′(j)],min[Cπ(k), Cπ(j)]), the number in the

system is one less under the modified policy and thus c-SRPT is optimal.

So, in a tightly coupled system, c-SRPT is optimal in heavy traffic. Although a straightforward extensionof [16], to our knowledge this result has not appeared in the literature. For example, Ungureanu et al. [18]proposed that non-preemptive c-SRPT be used (they called this Deferred Assignment Scheduling (DAS))and showed its effectiveness via simulation. Note that preemption is crucial in the optimality proof, thepoint here is that it is surprising that the optimality of c-SRPT was not used as motivation for DAS. We willuse the heavy traffic optimality of c-SRPT in what follows, however at this point we note that it may be ofindependent interest to use this policy to provide a benchmark for the performance of other policies.

3 Loosely Coupled Servers

We now consider the case in which a dispatcher must make a routing decision at the time of a job’s arrival(this decision is assumed to take zero time). The model that we consider is one in which arrivals occuraccording to a Poisson process of rate λ. As in the tightly coupled servers case, there are c identical servers.A job has processing times that are chosen from a discrete distribution with finite support, i.e. if X is ageneric processing time,

P[X = xk] = αk, k = 1, . . . ,K, (3.1)

where K > 1,∑K

k=1 αk = 1, and we order the possible processing times in the order x1 < x2 < · · · < xK

(note that if K = 1 the problem is trivial). Also, without loss of generality, we assume that αk > 0, forall k = 1, . . . ,K. We will call jobs with processing times xk type k jobs. A job is indivisible, i.e. onlyone server can work on it at any given time. The processing times of jobs are known upon arrival, i.e. theymay be used in designing the policy. We use insight from the previous section to suggest a policy and provethat in heavy traffic the policy is asymptotically optimal (as a reminder, a policy is asymptotically optimalin heavy traffic if the total number in system has the same heavy traffic limit as the optimal policy, in thiscase c-SRPT). The policy we will study is a multi-layered round robin policy for routing followed by SRPTscheduling at each server. By multi-layered round robin, we mean that the dispatcher uses independentround robin policies for each type of job. We will call this overall policy RRK-SRPT.

The intuition for why this should be a reasonable policy is given by examining how c-SRPT operates.If we would like to mimic how that policy works, we see that we must try to spread around jobs of similarsize to the greatest extent possible. For c-SRPT, if there are only type k or higher jobs in the system, eachserver is processing a type k job (if there are at least c type k jobs in the system, if not, each type k job inthe system is processed by a server). A similar routing policy has been suggested by Ungureanu et al. [19]called Class Dependent Assignment (CDA), in which “short” jobs are assigned in a round robin manner and“long” jobs are deferred until servers become idle. However, processing at a server is FCFS.

In what follows below we wish to compare RRK-SRPT and c-SRPT. We will compare them in heavytraffic and use the now familiar tool of heavy traffic limits. To do this we need to introduce some notationand we also find that it is easier to look at the two policies in turn and save the comparison to the end.


3.1 RRK-SRPT

We focus our attention on one server. For that server, suppose that we separate the K different types intodifferent arrival streams. For each type k, denote by {uk

i , i ≥ 1}, {vki , i ≥ 1} as the interarrival and

processing time sequences, respectively. For k = 1, . . . ,K, the multi-layered round robin policy gives thefollowing values for the rates and variances associated with the interarrival and processing time sequences(at a single server):

λk = (E[uk1])

−1 = αkλ/c,

ak = V ar(uk1) = c

α2kλ2 ,

µk = (E[vk1 ])−1 = 1

xk,

sk = V ar(vk1 ) = 0.

We assume that we have a sequence of systems where the quantities above are indexed by (n) and

λk(n) −→ λk,

ak(n) −→ ak,

µk(n) −→ µk,

sk(n) = 0,

as n→∞. For the scaling argument below, we also require

supn≥1

E[(uk1(n))2+ε] < ∞ for some ε > 0 and (3.2)

supn≥1

E[(vk1 (n))2+ε] < ∞ for some ε > 0 (3.3)

but these are both automatically satisfied for our system, as uk1(n) follows an Erlang distribution and vk

1 (n)is deterministic. We define Qk(t) to be the number of type k jobs at a single server at time t. We areinterested in the following scaled process:

Q(n)k (t) = n−1/2Q

(n)k (nt).

We assume thatQ

(n)k (0) = 0, 1 ≤ k ≤ K.

To state our main result, we also need to define

di(n) =√n

i∑j=1

ρj(n)− 1

where ρi(n) is the load due to jobs of type i, i.e. ρi(n) = λi(n)/µi(n), 1 ≤ i ≤ K. We assume∑K

i=1 ρi(n) < 1.We make the following assumptions

di(n) −→ −∞ as n→∞, 1 ≤ i ≤ K − 1,

dK(n) −→ dK as n→∞, −∞ < dK < 0,

ρi(n) −→ ρi as n→∞, 1 ≤ i ≤ K − 1.


This set of conditions we will call the heavy traffic conditions.We then have the following result, where RBM(a, b) denotes a reflected Brownian motion with drift a

and variance b. Also, =⇒ denotes weak convergence in the metric spaceD consisting of all right continuousfunctions with left limits.

Theorem 3.1 For a single queue in a loosely coupled server system operating under RRK-SRPT, underheavy traffic conditions,

Q(n)k =⇒ 0, 1 ≤ k ≤ K − 1, (3.4)

Q(n)K =⇒ 1

xKRBM

(dK ,

K∑k=1

αkλx2k

c2

). (3.5)

Proof. Whitt [21] and Reiman [11] show that (3.4) and (3.5) hold if type i is given preemptive priorityover type j, i < j. It is intuitively clear that this is also true for SRPT, as it almost has the same prioritystructure. The only difference is that when a type i job arrives when no other type i jobs are in the system,it may have an additional wait due to the effect of at most one job of type j > i, if there is one with aremaining processing time of less than or equal to xi. However, the effect of this additional job disappearsin the limit.

We require three workload processes. The first, U = {U(t), t ≥ 0} is the total workload (U(t) is the sumof the remaining processing times for jobs still in the system at time t). The process UK−1 = {UK−1(t), t ≥0} is the total workload due to jobs of types 1 through K−1 and the process Up,K−1 = {Up,K−1(t), t ≥ 0}is the workload due to types 1 through K − 1 when SRPT is replaced by a policy where types 1 throughK − 1 have preemptive priority over type K (the actual service order within types 1 through K − 1 doesnot matter). For SRPT, there can be at most one type K job in system that has received some service at anypoint in time. The work done on types 1 through K−1 in SRPT will fall behind that done under preemptivepriority for types 1 throughK−1 by at most the remaining processing time of a typeK job that is in servicewhen a job of lower type arrives. To be precise,

UK−1(t) ≤ UK−1,p(t) + xK . (3.6)

Define U (n)K−1,p(t) = n−1/2U

(n)K−1,p(nt) and U (n)

K−1(t) = n−1/2U(n)K−1(nt). By Theorem 4.1 of [11], we then

haveU

(n)K−1,p =⇒ 0, (3.7)

as n→∞. Combining (3.7), (3.6), and the fact that UK−1(t) ≥ 0, we immediately get

U(n)K−1 =⇒ 0,

as n→∞. This yields (3.4).The relation (3.4) shows that there is the state space collapse phenomenon that is observed for priority

systems in [11]. Under our scaling, the total workload is thus contained entirely in type K. To make thisprecise, if we define U (n)(t) = n−1/2U (n)(nt), then we have convergence to a limit

U (n) =⇒ RBM

(dK ,

K∑k=1

αkλx2k

c2

)


that is independent of the scheduling policy. (This is shown in, for example, Theorem 3.1 of [11]. Note alsothat this is where (3.2) and (3.3) are used.) Also,

U (n)(t) =K∑

k=1

xkQ(n)k (t)−

K∑k=1

B(n)k (t), (3.8)

where 0 ≤ B(n)k (t) < xk is the amount of service given to the job at the head of the type k queue at time t.

Combining (3.8) with (3.4) yields (3.5).

3.2 c-SRPT

To differentiate between quantities for the RRK-SRPT system, we will use a * superscript to denote that weare using quantities for the optimal policy in heavy traffic for the tightly coupled server system. In this case,we need to look at the system as a whole (rather than just one server). The definitions of the underlyingquantities are very similar to the previous section, however it is useful to include them all explicitly atthis point, even if there is some repetition. Once again we separate the K different types into differentarrival streams. Denote by {uk,∗

i , i ≥ 1}, {vk,∗i , i ≥ 1} the interarrival and processing time sequences,

respectively. Then we have the following values for the rates and variances associated with the interarrivaland processing time sequences:

λ∗k = (E[uk,∗1 ])−1 = αkλ,

a∗k = V ar(uk,∗1 ) = 1

α2kλ2 ,

µ∗k = (E[vk,∗1 ])−1 = 1

xk,

s∗k = V ar(vk,∗1 ) = 0.

We assume that we have a sequence of systems where the quantities above are indexed by (n) and related tothe sequence of networks considered in the previous section in the following manner:

λ∗k(n) −→ cλk,

a∗k(n) −→ ak/c,

µ∗k(n) −→ µk,

s∗k(n) = 0,

as n→∞. As before, we require

supn≥1

E[(uk,∗1 (n))2+ε] < ∞ for some ε > 0 and

supn≥1

E[(vk,∗1 (n))2+ε] < ∞ for some ε > 0

which are again both automatically satisfied for our system. We define Q∗k(t) to be the number of type kjobs in the system at time t. We form the scaled process

Q∗,(n)k (t) = n−1/2Q

∗,(n)k (nt).


As before, we also need to define

d∗i (n) =√n

i∑j=1

ρ∗j (n)− 1

where ρ∗i (n) is the load due to jobs of type i, i.e. ρ∗i (n) = λ∗i (n)/(cµ∗i (n)), 1 ≤ i ≤ K and ρ∗i (n) < 1,1 ≤ i ≤ K.

We make the same assumptions on d∗i (n), ρ∗i (n), andQ∗,(n)i (0) as for di(n), ρi(n), andQ(n)

i (0), respec-tively, which result in the heavy traffic conditions,

d∗i (n) −→ −∞ as n→∞, 1 ≤ i ≤ K − 1,

d∗K(n) −→ dK as n→∞, −∞ < dK < 0,

ρ∗i (n) −→ ρi as n→∞, 0 ≤ ρi < 1, 1 ≤ i ≤ K − 1.

We then have the following result.

Theorem 3.2 For a tightly coupled server system operating under c-SRPT, under heavy traffic conditions

Q∗,(n)k =⇒ 0, 1 ≤ k ≤ K − 1, (3.9)

Q∗,(n)K =⇒ c

xKRBM

(dK ,

K∑k=1

αkλx2k

c2

). (3.10)

Proof. The relation (3.9) follows in a similar manner to (3.4). We need to introduce one more workloadprocess than in Theorem 3.1. In addition to the total workload, U∗ = {U∗(t), t ≥ 0}, the workload due tojobs of types 1 through K − 1, U∗K−1 = {U∗K−1(t), t ≥ 0}, and U∗K−1,p = {U∗K−1,p(t), t ≥ 0}, the totalworkload of types 1 through K − 1 where types 1 through K − 1 have preemptive priority over type K, wedefine U∗K−1,ssp = {U∗K−1,ssp(t), t ≥ 0} as the unfinished workload for a system where the c servers arereplaced by a “super server” that works at c times the rate of a single server and preemptive priority is givento types 1 through K − 1.

By combiningU∗K−1,p(t) ≤ U∗K−1,ssp(t) + cxK (3.11)

andU∗K−1(t) ≤ U∗K−1,p(t) + cxK , (3.12)

we haveU∗K−1(t) ≤ U∗K−1,ssp(t) + 2cxK . (3.13)

The relation (3.12) is a trivial extension of (3.6) for c servers and (3.11) follows from the fact that if thereare less than c jobs of types 1 through K − 1, the “super server” system still decreases work at rate c, withthe greatest differential being when there is one job of types 1 through K − 1 in the system.

Define U∗,(n)K−1,ssp(t) = n−1/2U

∗,(n)K−1,ssp(nt) and U∗,(n)

K−1 (t) = n−1/2U∗,(n)K−1 (nt). By Theorem 4.1 of [11],

we haveU∗,(n)K−1,ssp =⇒ 0, (3.14)


as n→∞. Combining (3.14) with (3.13), we immediately get

U∗,(n)K−1 =⇒ 0,

as n→∞. This yields (3.9).Now, we can follow the methodology of Iglehart and Whitt [7, 8] to show that the total workload process

has a heavy traffic limit

U∗,(n) =⇒ cRBM

(dK ,

K∑k=1

αkλx2k

c2

). (3.15)

Note that [7, 8] shows this for multiple servers operating under FCFS. That the same convergence appliesfor our policy follows from the fact that if we let Uπ(t) represent the remaining workload at time t for apolicy π, then if π1 and π2 are two non-idling policies,

|Uπ1(t)− Uπ2(t)| < cxK . (3.16)

The relation (3.16) in turn follows from the following four trivial observations.

1. If during any interval [t, t+ δt], there are at least c tasks in both systems, Uπ1(t+ δt)−Uπ2(t+ δt) =Uπ1(t)− Uπ2(t).

2. For i, j = 1, 2 and any interval [t, t+ δt], if under πi there are at least c tasks, under πj there are lessthan c tasks, and Uπi(t) ≥ Uπj (t), then Uπi(t+ δt)− Uπj (t+ δt) ≤ Uπi(t)− Uπj (t).

3. For i, j = 1, 2, if at time t under πi there are at least c tasks, under πj there are less than c tasks, andUπi(t) < Uπj (t), then Uπi(t), Uπj (t) < cxK .

4. If at time t, under both π1 and π2 there are less than c tasks, then Uπ1(t), Uπ2(t) < cxK .

Suppose Uπ1(0) = Uπ2(0). These four observations yield that if (3.16) holds for some t1, then it also holdsfor any t2 > t1, and thus holds for all t. Now, if we let π1 be FCFS and π2 be c-SRPT, we have from (3.16),

UFCFS(t)− cxK ≤ U∗(t) ≤ UFCFS(t) + cxK

which in turn implies (3.15).It may be useful to note that this is the workload process for a G/G/1 system with processing times

divided by the number of servers, c. FCFS scheduling is used in [7, 8] to get related limits for the waitingtime process.

The relation (3.8) holds with B(n)k (t) replaced by B∗,(n)

k (t), where 0 ≤ B∗,(n)k (t) < cxk is the total

amount of processing time that has been done on type k jobs still in the system at time t. Combining thisand (3.9) yields (3.10).

Combining Theorems 3.1 and 3.2, we get the main result of the paper. Upon noting that Theorem 3.1 isfor a single server, its proof is a trivial combination of these two results.


Theorem 3.3 RRK-SRPT is asymptotically optimal in heavy traffic.

A consequence of Theorem 3.3 is that if we were to use the results of Theorems 3.1 and 3.2 to approxi-mate the mean waiting time in system (by taking the means of (3.5) and (3.10) and applying Little’s Law),c-SRPT and RRK-SRPT would have identical (approximate) mean waiting times.

A stronger notion of asymptotic optimality would be to show that the (normalized) mean queue lengthswere asymptotically equal in heavy traffic. As the means of the diffusion limits for the total queue lengthunder RRK-SRPT and c-SRPT are equal, this would involve showing that the diffusion limits (3.5) and(3.10) have the same mean as the normalized queue lengths in each setting. Such a result would be true if onecould show that the stationary distribution of the reflected Brownian motions in (3.5) and (3.10) converged tothe stationary distributions of the corresponding scaled queue lengths and justify an appropriate interchangeof limit and expectation. Recent work in related problems is in Gamarnik and Zeevi [5], which looks at thisissue for generalized Jackson networks. In their work there is also a comprehensive overview of existingsolidarity results for different models. We conjecture that this stronger notion of asymptotic optimality holdsfor RRK-SRPT.

So, multi-layered round robin routing followed by SRPT scheduling is asymptotically optimal underheavy traffic conditions. Thus, one would suspect that this policy would work well under high loads. It isvery attractive in the sense that given the processing time distribution is discrete with finite support, noneof the underlying parameters of the system need to be known to implement the policy. The support of thedistribution could be learned in real-time. However, several natural questions do arise. The first is, howmuch worse would the performance be if the routing were to be simplified? (This is not to say the routing isparticularly complicated, only K counters need to be kept and the size of the job determined upon arrival.)This would also give insight into the relative importance of making good routing decisions versus goodscheduling decisions. This question is the topic of the next section. The second question is, what can one doif the assumptions on the processing time distribution are relaxed, in particular if the processing times havea density? A complete answer is not available at this time, but some initial thoughts are given in Section 5.

4 Round robin routing

Suppose we now use naive round robin routing followed by SRPT scheduling (i.e. the task size is not usedin the routing decision). We will call this policy RR-SRPT. This policy has the attractive feature that noknowledge of the processing time distribution is required. We focus our attention on one server, and examinethe single arrival stream consisting of all arriving jobs, so the required parameters for the interarrival timeand processing time sequences are (λ′ the arrival rate, a the interarrival time variance, µ the processing rate,


s the processing time variance):

λ′ = λ/c,

a = c/λ2,

µ = 1/K∑

k=1

αkxk,

s =K∑

k=1

αkx2k −

(K∑

k=1

αkxk

)2

.

We still keep separate queues for each of the job types (we need the state space collapse results as before),but as seen in (4.3) below, we calculate the limit for the normalized workload based on aggregating all ofthe types into a single arrival stream. We then have the following result for the heavy traffic limit.

Theorem 4.1 For a single queue in a loosely coupled server system operating under RR-SRPT, under heavytraffic conditions,

Q(n)k =⇒ 0, 1 ≤ k ≤ K − 1, (4.1)

Q(n)K =⇒ 1

xKRBM

dK ,λ

c

K∑k=1

αkx2k −

λ

c

(K∑

k=1

αkxk

)2

+1λ

. (4.2)

Proof. Using Theorem 3.1 of [11], we have that the normalized workload process converges to

U (n) =⇒ RBM

dK ,λ

c

K∑k=1

αkx2k −

λ

c

(K∑

k=1

αkxk

)2

+1λ

. (4.3)

The result (4.1) follows as the corresponding result (3.4) in Theorem 3.1 and (4.2) follows from relating theworkload process to the queue length process, also as in Theorem 3.1.

Taking the ratio of the variance terms in (4.2) and (3.5), we get

λ“PK

k=1 αkx2k−(

PKk=1 αkxk)2

”c + 1

λ∑Kk=1 αkλx

2k/c

2. (4.4)

Note that if we keep µ fixed and let xK → ∞, we get this ratio goes to c. This suggests that asthe processing times take on more extreme values and there is a large number of servers, there will be asignificant performance gap between RRK-SRPT and RR-SRPT.

To illustrate the performance, we include here a couple of simulation results. We have performed moreextensive simulation studies, but the following should illustrate the main concepts. We consider a systemwith four servers and a two-point processing time distribution. The mean processing time is chosen to bethree and the arrival rate λ chosen to be such that the overall load is 0.95. Table 1 gives 90 percent confidenceintervals for the expected queue length, where for each case we ran 30 replications. When x1 = 1, x2 = 4,each replication consisted of 1× 107 arrivals, while when x1 = 1, x2 = 1000, each replication consisted of


1× 108 arrivals. We do see a degradation in the performance of RR-SRPT for the case when the processingtimes take on more extreme values. The ratio (4.4) in the two cases is 1.63 and 3.98, respectively, whichshows good agreement with the results in Table 1. (The ratio for the more extreme case is smaller than 3.98,but this is most likely due to the number of small jobs in the system.)

Policy x1 x2 CI for mean queue length

RR-SRPT 1 4 (16.017, 16.082)RR2-SRPT 1 4 (11.815, 11.877)

4-SRPT 1 4 (11.597, 11.653)RR-SRPT 1 1000 (28.404, 29.022)

RR2-SRPT 1 1000 (10.346, 10.512)4-SRPT 1 1000 (9.762, 9.984)

Table 1: Comparison of RR-SRPT, RRK-SRPT and c-SRPT

We now move on to some thoughts of what more can be said in the case of general processing timedistributions.

5 General processing time distribution

Clearly, one cannot directly implement RRK-SRPT in the case when the processing time distributions havea density. So, while c-SRPT would still be optimal in heavy traffic in the tightly coupled servers case, itis not clear if we can construct a policy which is optimal in some asymptotic sense for the loosely coupledservers case. However, one possibility is to map a range of job sizes to a “type” and implement RRK-SRPT.To do this, suppose that the processing time distribution has a density g(x). Then we could partition thesupport of g into K intervals such that a type k job is one whose processing time lies in the kth interval. Wecould then implement RRK-SRPT using this partition. How many intervals to choose and how to choosetheir endpoints to achieve good performance is not a trivial task, and we intend this to be a topic for futureresearch. For now, we present the following simulation results. Once again, we have performed moreextensive simulations, but this set nicely summarizes our findings. We assume the processing times followthe Bounded Pareto distribution used in [6], i.e.

g(x) =

{αmα

1−(m/p)αx−α−1 0 < m ≤ x ≤ p

0 otherwise

where we set α = 1.2, m = 512, p = 1010, which gives a mean processing time of 2965. The system haseight servers and the arrival rate is chosen such that the system load is 0.95. We compare seven differentpolicies.

1. 8-SRPT

2. RR-SRPT


3. Choose two intervals such that the probability a job is of type 1 is 0.85. In other words, type 1 jobshave processing times in [512, 2488) and type 2 jobs have processing times in [2488, 1010]. Then useRR2-SRPT.

4. The same as 3, but change the probability to 0.95, so the intervals for the two types are [512, 6220)and [6220, 1010].

5. Four types, where intervals are chosen according to those for SITA-E, i.e. the intervals for the fourtypes are [512, 2036.6), [2036.6, 13806.2), [13806.2, 318975), [318975, 1010). Then use RR4-SRPT.

6. The same as 5, however the intervals are [512, 1958), [1958, 3489), [3489, 13360), [13360, 1010].

7. Eight types, intervals chosen according to SITA-E. (In the interest of space, we will not explicitly givethe intervals.) Use RR8-SRPT.

Table 2 gives the 90 percent confidence intervals for the mean number in system for the seven policies. Foreach we ran 30 replications of 1 × 108 arrivals each. The results lead to a few observations. First, for any

Policy CI for mean queue length

1 (10.047, 10.340)2 (21.220, 21.793)3 (20.980, 21.635)4 (20.701, 21.433)5 (19.631, 20.279)6 (20.925, 21.605)7 (19.235, 19.832)

Table 2: Approximating RRK-SRPT

of the choices, the results are closer to RR-SRPT than the optimal (8-SRPT), so perhaps RR-SRPT is areasonable choice in general. There is no clear rule as to how to divide the intervals. As long as one makesa reasonable division between “small” and “large” jobs, one sees some improvement. The main conclusion,however, appears to be that if the optimal scheduling policy is used locally, it does not matter too much whatis chosen for the routing policy (as long as it balances the load). Further investigation is warranted.

6 Light traffic

We have shown that RRK-SRPT is asymptotically optimal in heavy traffic. If the system load is not high,Reiman-Simon theory ([12], [13]) suggests round robin routing provides best performance. In this section,we show that RR-SRPT is asymptotically optimal in light traffic for general processing time distributions inthe sense that it minimizes the first two terms of the Taylor series expansion for the waiting time. Note thatthe waiting time is the total time that a job spends in the system, from arrival to departure. Our motivationin studying the light traffic case is twofold. First, we wish to analytically demonstrate that RRK-SRPT will


not be optimal for all values of load. Second, we hope that the insight from this light traffic analysis will aidus in refining RRK-SRPT that results in better performance under moderate loads. This is a topic of currentstudy. The more difficult question of identifying the value of load for which RRK-SRPT has performanceinferior to that of RR-SRPT, while of interest, is beyond the scope of this paper.

First, we introduce Reiman-Simon theory in light traffic (i.e. where the system load approaches zero).Let {vn, n = 0,±1,±2 } and {wn, n = 0,±1,±2} represent processing time and waiting time sequences,respectively. The quantity of interest in the light traffic setting is Pλ[w∞ > x] (x ≥ 0), the complementarycumulative distribution function of the stationary waiting time. Our reason for choosing this is to somedegree an artefact of the methodology. While in the heavy traffic setting we examined queue lengths, here,as the light traffic setting considers exactly two arrivals, examining waiting times is perhaps more natural.We proceed in this manner, but note that the results that follow could be adapted to queue lengths. We letthe system operate from time t = −∞ to guarantee the system to be in steady state at time t = 0. Thus,w0 =st w∞, where =st means equality in distribution. The Reiman-Simon method entails conditioning onthe number of arriving jobs and their corresponding processing times. We consider two events called ∅ and{t}, where ∅ denotes the event where no job arrives in the interval (−∞,∞) except for a tagged job whicharrives at time t = 0 with processing time v0, {t} denotes the event where in addition to the tagged job,there is exactly one more arrival with processing time v at time t. Let Eλ[w∞] represent the expected valueof the stationary waiting time. Define

ψ = 1[w0 > x]

φ(λ) =∫ψdPλ

ψ(∅) = Eλ[ψ|∅]

ψ({t}) = Eλ[ψ|{t}], t ∈ R.

Proposition 6.1 is essentially a version of Theorem 2 in [12].

Proposition 6.1 If there exists θ∗ > 0 such that E[eθv] <∞ for θ < θ∗, then

limλ→0+

Pλ[w∞ > x] = ψ(∅)

andd

dλφ(0+) = lim

λ→0+

d

dλφ(λ) =

∫R

(ψ({t})− ψ(∅)

)dt.

We calculate light traffic derivatives of the distribution of the waiting time based on Proposition 6.1. TheTaylor series expansion of the waiting time distribution around λ = 0 is written for the use of these deriva-tives. We use F (x) = 1 − F (x) (x ≥ 0) to represent the complement of the processing time distributionF . Proposition 6.2 provides results for the performance under RR-SRPT and RRK-SRPT in light traffic.Before we do that, note that if we consider a discrete processing time distribution with finite support, itautomatically has the exponential moment required in Proposition 6.1.

Proposition 6.2 For a loosely coupled server system with Poisson arrivals of rate λ and finite exponentialmoments for the processing time distribution as in Proposition 6.1, letwRR

∞ andwRRK∞ represent the waiting

times in steady-state under RR-SRPT and RRK-SRPT, respectively.


(a) RR-SRPT is asymptotically optimal in the sense that the first two terms of the Taylor expansion ofPλ[w∞ > x] (x ≥ 0) are minimized for a general processing time distribution F .

(b) Under RRK-SRPT for a discrete service time distribution described in (3.1),

Pλ[wRRK∞ > x] = F (x) + λ(1−

K∑k=1

α2k)/c

·

∫ 0

−∞

∑(xk,yk)∈A(t)

(P[v = xk] · P[v0 = yk])− F (x)

dt

+∫ ∞

0

∑(xk,yk)∈B(t)

(P[v = xk] · P[v0 = yk])− F (x)

dt

+ o(λ),

whereA(t) = {(xk, yk)|yk > x, yk ≤ xk}

∪{(xk, yk)|yk > x, yk > xk, yk ≤ xk + t}∪{(xk, yk)|yk > x− xk, yk > xk, yk > xk + t}

B(t) = {(xk, yk)|yk > x, yk ≤ xk, yk < xk + t}∪{(xk, yk)|yk > x− [xk + t]+, yk ≤ xk, yk ≥ xk + t}∪{(xk, yk)|yk > x− [xk + t]+, yk > xk},

and [x]+ is the maximum of x and zero.

(c) RRK-SRPT is not asymptotically optimal for the discrete processing time distribution described in (3.1).

Proof. On the event ∅, we have w0 = v0, so that

ψ(∅) = P[v > x] = F (x).

Proof of (a). Under Round-Robin routing, two consecutive jobs cannot be assigned to the same queue.Hence,

w0 = v0,

ψRR({t}) = P[v0 > x] = F (x).d

dλφRR(0+) =

∫ ∞

−∞

(ψRR({t})− ψRR(∅)

)dt = 0.

Note that the superscriptRR of φ andψ means the policy we use is RR-SRPT. Similarly, we use a superscriptRRK to represent the policy RRK-SRPT. Using a Taylor series expansion and Proposition 6.1,

Pλ[wRR∞ > x] = F (x) + o(λ), x ≥ 0. (6.1)

The waiting time of each job cannot be less than its processing time. If we ignore the terms of higher orderthan λ in (6.1), the value of Pλ[w∞ > x] is minimized under Round-Robin routing.


Proof of (b). Under RRK-SRPT, two consecutive jobs are assigned to the same queue with probabilityβ = (1 −

∑Kk=1 α

2k)/c. If job 0 and the other job are not assigned to the same queue, we have w0 = v0.

Consider the event {t}, if job 0 and the other job are assigned to the same queue, we have

w0 =

v0 + v if t > 0, v0 > v, v0 > v + t

v0 + [v + t]+ if t < 0, v0 > v or t < 0, v0 ≤ v, v0 ≥ v + t

v0 otherwise.

Hence, when t > 0,

ψRRK({t}) = P[w0 > x]

= (1− β) · P[v0 > x] + β · (P[v0 > x|v0 ≤ v] · P[v0 ≤ v]

+ P[v0 > x|v0 > v, v0 ≤ v + t] · P[v0 < v, v0 ≤ v + t]

+ P[v0 > x− v|v0 > v, v0 > v + t] · P[v0 > v, v0 > v + t])

= β · (P[v0 > x, v0 ≤ v] + P[v0 > x, v0 > v, v0 ≤ v + t]

+ P[v0 > x− v, v0 > v, v0 > v + t]) + (1− β) · F (x)

= F (x) + β ·

∑(xk,yk)∈A(t)

P[v = xk] · P[v0 = yk]− F (x)

.

Similarly, when t < 0,

ψRRK({t}) = P[w0 > x]

= (1− β) · P[v0 > x]

+β · (P[v0 > x|v0 ≤ v, v0 < v + t] · P[v0 ≤ v, v0 < v + t]

+ P[v0 > x− [v + t]+|v0 ≤ v, v0 ≥ v + t] · P[v0 ≤ v, v0 ≥ v + t]

+ P[v0 > x− [v + t]+|v0 > v] · P[v0 > v])

= (1− β) · F (x) + β · (P[v0 > x, v0 ≤ v, v0 < v + t]

+ P[v0 > x− [v + t]+, v0 ≤ v, v0 ≥ v + t]

+ P[v0 > x− [v + t]+, v0 > v])

= F (x) + β ·

∑(xk,yk)∈B(t)

P[v = xk] · P[v0 = yk]− F (x)

.

The derivative of φRRK(0+) is

d

dλφRRK(0+) =

∫ ∞

−∞

(ψRRK({t})− ψRRK(∅)

)dt

= β ·

∫ 0

−∞

∑(xk,yk)∈A(t)

P[v = xk] · P[v0 = yk]− F (x)

dt

+∫ ∞

0

∑(xk,yk)∈B(t)

P[v = xk] · P[v0 = yk]− F (x)

dt

.


Using a Taylor series expansion,

Pλ[wRRK∞ > x] = F (x) + λ

(1−

K∑k=1

α2k

)/c

·

∫ 0

−∞

∑(xk,yk)∈A(t)

(P[v = xk] · P[v0 = yk])− F (x)

dt

+∫ ∞

0

∑(xk,yk)∈B(t)

(P[v = xk] · P[v0 = yk])− F (x)

dt

+ o(λ).

Proof of (c). The proof of (a) shows that Pλ[wRR∞ > x] = F (x) + o(λ), x ≥ 0. Combining this with (b),

we immediately get thatPλ[wRRK

∞ > x] > Pλ[wRR∞ > x].

Therefore, RRK-SRPT cannot be asymptotically optimal.

7 Discussion

We have presented results that identify a policy that is asymptotically optimal in heavy traffic for a systemof identical servers in parallel, where a routing decision must be made immediately on a job’s arrival. Wehave assumed that the processing time distribution is discrete with finite support. The policy consists ofindependent round robin policies, one for each possible job size. We have suggested RR-SRPT in light trafficand provided the proof of its asymptotic optimality. These policies are scalable and require no knowledgeof underlying distributional parameters. They also do not use any system state information for routing.

The first thing to note is that if we were to focus our attention on another performance measure, theresults will no longer hold. In particular, if we were to look at a measure that involved some notion offairness, our results may change. However, there is some work that suggests that SRPT scheduling may notbe that unfair, see Bansal and Harchol-Balter [1]. Friedman and Henderson [4] have suggested a schedulingpolicy called Fair Scheduling Protocol (FSP) that tries to combine the strengths of SRPT with the fairnessinherent in the Processor Sharing scheduling policy. A study (along the lines of that undertaken in thispaper) of a protocol such as FSP would be of interest.

We have suggested that in general an intelligent choice of scheduling policy is more important than anintelligent choice in routing. One must be careful in interpreting this statement. First, suppose we relaxthe assumption of the system operating in either heavy or light traffic. This means that each server spendssome period of time idle (idle time does not approach zero). Neither RRK-SRPT nor RR-SRPT providesthe best system performance (mean number in system) in this case. One may want to exploit the serveridleness by using more sophisticated routing schemes. Note that this would probably require more stateinformation than is used in the policy we suggest. Second, suppose that the processing times are not knownupon a job’s arrival. Then the question of the relative importance of routing versus scheduling is muchmore complicated. In particular, we cannot expect that state independent routing can approach optimality.


Reiman [11] examines heavy traffic limits for JSQ routing followed by FCFS scheduling versus RR routingfollowed by FCFS scheduling and shows that there can be a significant gap between the two, even in theexponentially distributed case. For the exponentially distributed case, it is known that JSQ routing followedby FCFS scheduling is optimal [23] under knowledge of full state information and RR routing followed byFCFS scheduling is optimal under the assumption that the routing policy uses no state information ([10],generalized in [9]), so there remains a gap between the two optimal policies in heavy traffic.

More should be done on the case when processing times have a density and the system is operating inheavy traffic. In particular, the problem of quantifying performance for the policy suggested in Section 5would be of interest. This is of both theoretical as well as of applied interest, as it would require identifyingheavy traffic limits for servers under SRPT scheduling for processing times that have a density. For discreteprocessing time distributions, we were able to leverage the fact that SRPT closely resembles a static prioritypolicy and use existing results.

8 Acknowledgements

This work is supported by the Natural Sciences and Engineering Research Council of Canada. We wouldlike to thank Ward Whitt for his useful comments on heavy traffic limits for multiple server systems. Wewould also like to thank the anonymous referee for pointing out problems with an earlier version of thiswork. The bulk of this paper was completed at EURANDOM (Eindhoven, The Netherlands), where the firstauthor spent six months and the second author one month, both during 2004. Both authors wish to thankEURANDOM for its hospitality.

References

[1] N. Bansal and M. Harchol-Balter. Analysis of SRPT scheduling: Investigating unfairness. Proceedingsof ACM Sigmetrics ’01, 2001.

[2] G. Ciardo, A. Riska and E. Smirni. EquiLoad: a load balancing policy for clustered Web servers.Performance Evaluation, 46:101-124, 2001.

[3] A. Ephremides, P. Varaiya, and J. Walrand. A simple dynamic routing problem. IEEE Transactions onAutomatic Control, AC-25:690–693, 1980.

[4] E.J. Friedman and S.G. Henderson. Fairness and efficiency in web server protocols. Proceedings ofACM Sigmetrics ’03, 2003.

[5] D. Gamarnik and A. Zeevi. Validity of heavy traffic steady-state approximations in open queueingnetworks. Preprint.

[6] M. Harchol-Balter, M.E. Crovella and C.D. Murta. On choosing a task assignment policy for a dis-tributed server system. Journal of Parallel and Distributed Computing, 59:204-228, 1999.


[7] D.L. Iglehart and W. Whitt. Multiple channel queues in heavy traffic. I. Advances in Applied Proba-bility, 2:150-177, 1970.

[8] D.L. Iglehart and W. Whitt. Multiple channel queues in heavy traffic. II: Sequences, networks, andbatches. Advances in Applied Probability, 2:355-369, 1970.

[9] Z. Liu and R. Righter. Optimal load balancing on distributed homogeneous unreliable processors.Operations Research, 46:563–573, 1998.

[10] Z. Liu and D. Towsley. Optimality of the round robin routing policy. Journal of Applied Probability,31:466-475, 1994.

[11] M.I. Reiman. Some diffusion approximations with state space collapse. In Lecture Notes in Controland Information Sciences, volume 60, pages 209-240, Springer, Berlin-New York, 1984.

[12] M. I. Reiman and B. Simon. Open queueing systems in light traffic. Mathematics of OperationsResearch, 14:26–59, 1989.

[13] M. I. Reiman and B. Simon. Light traffic limits of sojourn time distributions in Markovian queueingnetworks. Stochastic Models, 4:191–233, 1998.

[14] R. Righter, J.G. Shanthikumar, and G. Yamazaki. On extremal service disciplines in single-stagequeueing systems. Journal of Applied Probability, 27:409-416, 1990.

[15] A. Riska, W. Sun, E. Smirni and G. Ciardo. AdaptLoad: effective balancing in clustered Web serversunder transient load conditions. Proceedings of 22nd International Conference on Distributed Com-puting Systems (ICDCS ’02), 2002.

[16] L. Schrage. A proof of the optimality of the shortest remaining processing time discipline. OperationsResearch, 16:687-690, 1968.

[17] A. Tanenbaum. Modern Operating Systems, Prentice Hall, 1992.

[18] V. Ungureanu, P.G. Bradford, M. Katehakis and B. Melamed. Deferred assignment scheduling inclustered Web servers. Technical Report DIMACS TR: 2002-41, Rutgers University, 2002.

[19] V. Ungureanu, B. Melamed, P.G. Bradford and M. Katehakis. Class-dependent assignment in cluster-based servers. Proceedings of the 19th ACM Symposium on Applied Computing, Nicosia, Cyprus,2004.

[20] R.R. Weber. On the optimal assignment of customers to parallel servers. Journal of Applied Probabil-ity, 15:406-413, 1978.

[21] W. Whitt. Weak convergence theorems for priority queues: preemptive-resume discipline. Journal ofApplied Probability, 8:74-94, 1971.


[22] W. Whitt. Deciding which queue to join: some counter examples. Operations Research, 34:226-244,1986.

[23] W. Winston. Optimality of the shortest line discipline. Journal of Applied Probability, 14:181-189,1977.

Multi-layered Round Robin Routing for Parallel Serversdownd/ksrpt3.pdf · Department of Computing...

Documents

Transcript of Multi-layered Round Robin Routing for Parallel Serversdownd/ksrpt3.pdf · Department of Computing...