A Deterministic Truthful PTAS for Scheduling Related Machines

24

Click here to load reader

Transcript of A Deterministic Truthful PTAS for Scheduling Related Machines

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    SIAM J. COMPUT. c 2013 Society for Industrial and Applied MathematicsVol. 42, No. 4, pp. 15721595

    A DETERMINISTIC TRUTHFUL PTAS FOR SCHEDULINGRELATED MACHINES

    GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    Abstract. Scheduling on related machines (Q||Cmax) is one of the most important problems inthe field of algorithmic mechanism design. Each machine is controlled by a selfish agent and her val-uation function can be expressed via a single parameter, her speed. Archer and Tardos [Proceedingsof the 42nd Annual Symposium on Foundations of Computer Science (FOCS), Las Vegas, NV, 2001,pp. 482491] showed that, in contrast to other similar problems, a (nonpolynomial) allocation thatminimizes the makespan can be truthfully implemented. On the other hand, if we leave out the game-theoretic issues, the complexity of the problem has been completely settledthe problem is stronglyNP-hard, while there exists a polynomial-time approximation scheme (PTAS) [D. S. Hochbaum andD. B. Shmoys, SIAM J. Comput., 17 (1988), pp. 539551, and L. Epstein and J. Sgall, Algorith-mica, 39(1) (2004), pp. 4357]. This problem is the most well studied in single-parameter algorithmicmechanism design. It gives an excellent ground to explore the boundary between truthfulness and ef-ficient computation. Since the work of Archer and Tardos, quite a lot of deterministic and randomizedmechanisms have been suggested. Recently, a breakthrough result [P. Dhangwatnotai, S. Dobzin-ski, S. Dughmi, and T. Roughgarden, Proceedings of the 49th IEEE Symposium of Foundations ofComputer Science, Philadelphia, 2008, pp. 1524] showed that a randomized, truthful-in-expectationPTAS exists. On the other hand, for the deterministic case, the best known approximation factoris 2.8 [A. Kovacs, Algorithms-ESA 2005, 13th Annual European Symposium, 2005, pp. 616627,and A. Kovacs, J. Discrete Algorithms, 7 (2009), pp. 327340]. It has been a major open questionwhether there exists a deterministic truthful PTAS, or whether truthfulness has an essential, negativeimpact on the computational complexity of the problem. In this paper we give a definitive answerto this important question by providing a truthful deterministic PTAS.

    Key words. algorithmic game theory, algorithmic mechanism design, truthful scheduling mech-anisms, related machine scheduling

    AMS subject classifications. 68W, 90D

    DOI. 10.1137/120866038

    1. Introduction. Algorithmic mechanism design (AMD) is an area originatedin the seminal paper by Nisan and Ronen [16, 17], and it has flourished during thelast decade. It studies combinatorial optimization problems, where part of the inputis controlled by selfish agents that are either unmotivated to report them correctlyor strongly motivated to report them erroneously if a false report is profitable. Inclassical mechanism design more emphasis has been placed on incentives issues andless to computational aspects of the optimization problem at hand. On the otherhand, traditional algorithm design disregards the fact that in some settings the agentsmight have incentive to lie. Therefore, we end up with algorithms that are fragileagainst selfish behavior. AMD carries challenges from both disciplines, aiming at thedesign of algorithms that optimize some global objective that, at the same time, makeselfish users interested in reporting truthfully, and so are also immune to strategicbehavior.

    Received by the editors February 15, 2012; accepted for publication (in revised form) May 17,2013; published electronically July 23, 2013.

    http://www.siam.org/journals/sicomp/42-4/86603.htmlUniversity of Liverpool, Liverpool, U.K. ([email protected]). This author has been partially

    supported by DFG grant Kr 2332/1-3 within Emmy Noether program and by EPSRC grantEP/F069502/1.

    Department of Informatics, Goethe University, Frankfurt M., Germany ([email protected]). Research supported by the German Research Foundation (DFG) grant KO 4083/1-1.

    1572

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1573

    A fundamental optimization problem that has been suggested in [17] as a groundto explore the design of truthful mechanisms is the scheduling problem, where a set of ntasks need to be processed by a set of m machines. There are two important variantswith respect to the processing capabilities of the machines that have been studiedwithin the AMD framework. The machines can be unrelated, i.e., each machine ineeds tij units of time to process task j, or related, where machine i comes with aspeed si, while task j has processing requirement pj, that is, tij = pj/si. (We willuse the settled notation Q||Cmax to refer to the latter problem.) The objective is toallocate the jobs to the machines so that the maximum finish time of the machines,i.e., the makespan, is minimized.

    In the game-theoretic setting, each machine i is owned by a rational agent whoalone knows the private values of row ti. In the related machines case agent i knowsthe speed si, a single parameter, that implicates the running times tij accordingly(whereas pj is commonly known). In order to motivate the machines to cooperate,we pay them to execute the tasks. It is assumed that each agent wants to maximizehis utility, and without a proper incentive he will lie (report false ti) if this can trickthe algorithm to assign less work or higher payment to him. A mechanism consistsof two parts: an allocation algorithm that assigns the tasks to the machines and apayment scheme that compensates the machines in monetary terms. We are interestedin devising truthful mechanisms in dominant strategies, where each agent maximizeshis utility by telling the truth, regardless of the reports of the other agents.

    The scheduling problem provides an excellent framework to study the compu-tational aspects of truthfulness. It is a well-studied problem from the algorithmicperspective with a lot of algorithmic techniques that have been developed. Moreover,it is conceptually close to (additive) combinatorial auctions, so that solutions and in-sights can be transferred from the one problem to the other.1 Indeed, the schedulingproblem comes with a variety of objectives to be optimized that are different than theobjectives used in classical mechanism design.

    The mechanism design version of scheduling on related machines was first stud-ied by Archer and Tardos [3]. It is the most central and well studied among single-parameter problems, where each player controls a single real value and his objectiveis proportional to this value (see also Chap. 9 and 12 of [15]). In particular, in thescheduling setting the cost of player i is ti Wi, where ti = 1/si is his private value andWi is the total processing requirement (sum of the pj) allocated to machine i. The util-ity (profit) that the agent tries to maximize is the payment he receives minus his cost.

    Myerson [14] gave a characterization of truthful algorithms for one-parameterproblems in terms of a monotonicity condition. Archer and Tardos [3] found a similarmonotonicity characterization, and they showed that a certain type of optimal allo-cation is monotone and consequently truthful (albeit exponential time). A monotoneallocation rule for related scheduling is defined as follows.

    Definition 1.1. An allocation rule for related machine scheduling is monotoneif increasing the input speed si of any single machine i, while leaving the other speedsunchanged, monotonically increases the workload Wi allocated to this machine, i.e.,si < s

    i Wi W i .

    1More precisely, additive combinatorial auctions are equivalent with mechanisms for unrelatedscheduling assuming negative running times. (Notice that the agents have valuations, i.e., negativecosts, and they pay for the items, instead of getting paid.) This is a technically very similar butnot completely analogous problem to scheduling, e.g., the sets of truthful mechanisms are provablydifferent in the two cases [6].

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1574 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    The fact that truthfulness does not exclude optimality, in contrast to the multi-parameter variant of scheduling (the unrelated case)2, makes the problem anappropriate example to explore the interplay between truthfulness and computationalcomplexity. It has been a major open problem whether a deterministic monotonePTAS exists for Q||Cmax3. In this work, we give a definitive positive answer to thatcentral question and conclude the problem.

    1.1. Related Work. Auletta et al. [5] gave the first deterministic polynomial-time monotone algorithm for any fixed number of machines with approximation ratio4. This result was improved to an fully polynomial-time approximation scheme (FP-TAS) by Andelman, Azar, and Sorani [1]. For an arbitrary number of machines, theauthors of [1] gave a 5-approximation deterministic truthful mechanism, and Kovacsimproved the approximation ratio to 3 [12] and to 2.8 [13], which was the previousrecord for the problem4.

    Randomization has been successfully applied. There are two major concepts ofrandomization for truthful mechanisms, universal truthfulness and truthfulness-in-expectation. The first notion is strongest and consists of randomized mechanismsthat are probability distributions over deterministic truthful mechanisms. In the lat-ter notion, by telling the truth a player maximizes his expected utility. Only thesecond notion of randomized truthfulness has been applied to the problem. Archerand Tardos [3] gave a truthful-in-expectation mechanism with approximationratio 3, which was later improved to 2 [2]. Recently, Dhangwatnotai et al. [8] set-tled the status for the randomized version of the problem by giving a random-ized PTAS that is truthful-in-expectation. Both mechanisms apply (among othermethods) a randomized rounding procedure. Interestingly, randomization is use-ful only to guarantee truthfulness and has no implication on the approximationratio. Indeed, both algorithms can be easily derandomized to provide determinis-tic mechanisms that preserve the approximation ratio but violate the monotonicitycondition.

    1.2. Our results and techniques. We provide a deterministic monotone PTASfor Q||Cmax. The corresponding payment scheme [3] is polynomially computable, andwith these payments our algorithm induces a (1+3)-approximate deterministic truth-ful mechanism, settling the status of the problem. As opposed to [12], where a variantof the Longest Processing Time (LPT) heuristic was shown to be monotone, our al-gorithm is the first deterministic monotone adaptation of the known PTASs [11, 9].Next we describe the main ideas that led to this result. We assume that input speedsare indexed so that s1 s2 sm holds. For any set of jobs P = {p1, p2, . . . , pj},the weight or workload of the set is |P | = jr=1 pr. We will view an allocation of thejobs to the machines as an (ordered) partition (P1, P2, . . . , Pm) of the jobs into m sets.We search for an output where the workloads |Pi| are in increasing order.5

    2With the scheduling on unrelated machines, we are more in the dark. (See [7] for a recentoverview of results.) There are impossibility results that show that there does not exist any truthfulmechanism with approximation ratio better than a constant even in exponential time. Therefore,more primitive questions need to be answered before we settle the complexity of the problem. Theonly known algorithm for the problem is the Vickrey-Clarke-Groves (VCG) that has approximationratio equal to the number of machines. For the special but natural case of anonymous mechanisms,Ashlagi, Dobzinski, and Lavi [4] showed that the best possible mechanism is VCG.

    3We say that a mechanism runs in polynomial time when both the allocation algorithm and thepayment algorithm run in polynomial time.

    4Very recently, Epstein, Levin, and van Stee [10] have provided monotone PTASs for a widerclass of objective functions.

    5We use the terms increasing (decreasing) in the sense of nondecreasing (nonincreasing).

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1575

    The PTAS in [9]which is a simplified and polished version of the very first PTAS[11]defines a directed network on m+1 layers depending on the input job set, whereeach arc leading between the layers i 1 and i represents a possible realization of theset Pi, and directed paths leading over the m + 1 layers correspond to the possiblejob partitions. An optimal solution is then found using a shortest path computationin this network for minimizing the maximum edge-weight, i.e., finish time, over thepath.

    The difficulty in applying any known PTAS to construct a deterministic monotonealgorithm for Q||Cmax is twofold. First, in all of the known PTASs, sets of input jobsof approximately the same size form groups, such that in the optimization process, acommon (rounded or smoothed) size is assumed for all members of the same group.Second, jobs that are tiny compared to the total workload of a machine do not turnup individually in the calculations, but just as part of an arbitrarily divisible (e.g., inform of small blocks) total volume.

    Note that it must be relatively easy to find an allocation procedure that is in a wayapproximately monotone. However, (exact) monotonicity intuitively requires exactdetermination and knowledge of the allocated workloads. To justify this, we justpoint out that in every monotone (in expectation) algorithm for Q||Cmax providedso far, the (expected) workloads either occur in increasing order w.r.t. increasingmachine speeds, or constitute a lexicographically minimal optimal solution w.r.t. afixed solution set over labeled machines.

    Thus, both of the mentioned simplifications of the input setwhich, to someextent, seem necessary to admit polynomial time optimizationappear to be con-demned to destroy any attempt to make a deterministic adaptation monotone. (Theauthors of [8] used randomization at both points to obtain the monotone in expecta-tion PTAS.) Our ideas to eliminate the above two sources of inaccuracy of the outputare the following, respectively:

    1. As for rounding the job sizes, note that grouping is necessary only to narrowthe set of schedules considered. We can achieve basically the same restric-tion if for any group of jobs of similar size we fix the order of jobs in whichthey appear in the allocation (e.g., in increasing order) and calculate withthe exact job sizes along the optimization process. Now, if reducing a ma-chine speed increases the makespan of the (previously optimal) solution, thatmeans that this machine became a bottleneck, so a new upper bound on theoptimum makespan over the considered set of outputs is induced exactly bythe (previous) workload of the changed machine (the same argument as usedin [3, 1, 8]).

    2. Concerning tiny jobs, we observe that with these we can fill up some of thefastest machines nearly to the makespan level. Further, it is easy to show[2] that prerounding machine speeds to powers of some predefined (1 + )does not spoil monotonicity and increases the approximation bound only bya factor of (1 + ). Assuming now that the coarsity of tiny blocks is muchfiner than the coarsity of machine speeds, we can be sure that (full) machinesof higher speed receive more work than slower machines. Moreover, havingreduced the speed of such a machine, tiny jobs in its workload flow to othermachines to enable a new optimum makespan much smaller than implied bythe previous workload of this machine.

    It is quite a technical challenge to combine these two ideas so smoothly that inthe end it yields a correct monotonicity proof. We optimize over a special form of

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1576 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    Fig. 1.1. The type of schedule we look for.

    schedules (see Figure 1.1): Each machine i has a set Li of large (nontiny) jobs, sothat the L1, L2, . . . , Lm have increasing and exactly known weights with a fixed orderof jobs within each size category (see point 1 above). Also, each machine has a set Siof small jobs. (Some of these are uniform tiny blocks, while some are known exactly.)We note that the same job size may be large for a slower machine, while it is smallon a faster machine. The total set of small jobs is flexible in the sense that we canalways move a small job to a higher index machine and obtain a valid schedule. Weset the objectives so that in an optimum solution the small jobs are moved towardthe higher index machines as much as possible and fill them up to the makespan.

    Fulfilling monotonicity becomes subtle in case of the first (and so, not necessarilyfull) machine containing small jobs. It is especially so when only one machine (say,m) has small jobs, not leaving space for manipulating the small jobs in the output asneeded.6 We solved these technical difficulties by treating the last three machines asa single entity.

    1.3. Roadmap. Following the preliminaries, in section 2 we introduce the spe-cial form of allocations (canonical allocations) that we consider, as delineated in theprevious paragraphs. We show (Theorem 2.6) that for arbitrary input, a canonicalallocation with near-optimal makespan exists. We proceed in section 3 with describ-ing a succinct (imprecise) representation of allocations of tasks to the machines calledconfiguration. In section 4 we build a digraph of configurations with m layers, whereeach path leading through all layers corresponds to a feasible allocation. We show(Theorem 4.4) that an optimal path of this graph represents a near-optimal sched-ule. In section 5 we describe the algorithm PTAS, which essentially consists of twoprocedures: one finding an optimal path in the digraph and another distributing thejobs consistently with the path presentation. Finally, in section 6 we show that ouralgorithm is monotone (Theorem 6.1), and therefore it can become truthful by usingthe appropriate payment scheme that we discuss in section 6.1.

    1.4. Preliminaries. The input is given by a set PI of n input jobs and a vectors (or ) of input speeds s1 sm. For a job p PI we use p both to denote thejob itself and the size of this job in a given formula.

    6In order to circumvent this problem we restrict the search to allocations where at least twomachines do have some tiny blocks (unless too few tiny jobs exist). Moreover, it seems crucialthat every machine has the possibility to get rid of all its tiny blocks (i.e., those inducing impreciseworkload) if this is provoked by a reduction of its speed. Combining these two requirements we jointhe last three machines into a single object. A carefully optimized assignment of an obligatory setof tiny blocks, and later of the actual tiny jobs to these machines, then implies monotonicity.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1577

    For a desired approximation bound 1 + 3, we choose a that will be therounding precision of the job sizes. For ease of exposition, we will assume that (1 +)t = 2 for some t N.7 Furthermore, we define as the unique integer power of 2 in[/6, /3].

    Definition 1.2 (job classes). If p denotes (the size of) a job, then p denotesthis job rounded up to the nearest integral power of (1+ ). A job p is in the job classCl iff p = (1 + )

    l.Let Cl = {pl1, pl2, . . . , plnmax

    l} be the jobs of Cl in some fixed increasing order of

    size. We use the notation Cl[a] = {pl1, . . . , pla} for 0 a nmaxl , and Cl(a, b] =Cl[b]\Cl[a] for 0 a b nmaxl .

    If P = {p1, p2, . . . , pj} is a job set, then P = {p1, p2, . . . , pj} denotes the corre-sponding set of rounded jobs. The weight or workload of P is |P | = jr=1 pr; therounded weight is |P | = jr=1 pr. Assuming that the jobs are in decreasing order ofsize, we denote the subset of the r largest jobs in P by P r = {p1, p2, p3, . . . , pr}.

    We use the interval notation (e.g., [1,m]) for a set of consecutive machine indices.All logarithms are base 2 unless otherwise specified.

    2. Canonical allocations. This section characterizes the type of allocationswe call them canonicalthat we will consider. Definitions 2.1 and 2.2 describe thenecessary restrictions on the output job partition.

    Definition 2.1 (-division). We say that a set of jobs P is -divided into thepair of sets (L, S) if

    (D0) P = L S and L S = ,(D1) p > |L|(1+)2 for every p L, and(D2) q |L| for every q S.The subsets L and S will be, respectively, the large and the small jobs on a

    single machine. Note that we cannot set a sharp border between their sizes. Forinstance, having uniform jobs, 1/ of them could form L, while the rest (arbitrarilymany) could belong to S. (In fact, for any fixed jobset a -division always existsseeDefinition 2.3but it is not necessarily unique, e.g., in the case of uniform jobs.)

    Definition 2.2 (canonical allocation). For a given input, an allocationP1, P2, . . . , Pm is called canonical if for every i [1,m], the set Pi can be -dividedinto (Li, Si), so that the following hold:

    (A1) If i < i, then |Li| |Li |.(A2) For jobs p and q of the same job class p < q holds if and only if

    (a) p Li and q Si for some i, i [1,m], or(b) p Li and q Li and i i, or(c) p Si and q Si and i i.

    That is, taking the jobs of any job class in increasing order, they will get filledfirst into the large sets Li by increasing machine speeds, and then into the small setsSi, again starting by the first machine and by increasing speeds (wherever jobs of thisclass are intended in the solution).

    As the main result of the section, Theorem 2.6 states that for any input andany > 0, a canonical allocation exists that provides (1 + 3)-approximation to theoptimum makespan. Before presenting the details, we sketch the proof here. Wewill modify an optimal partition of the rounded input jobs P I to get the canonical

    7This simplifying assumption is unrealistic for computations, but it is not necessary for the resultto hold. We could equally well use the rounding function of [9] or [8]. Also, since our result is ofpurely theoretical interest, we do not try to optimize the ratio /; it will be clear that, e.g., 30 < suffices in the proofs.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1578 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    allocation. First we take the core set of each set in the partition as defined byDefinition 2.3. In the sense of Proposition 2.4, this yields a preliminary -divisionof the sets. Next, we order the partition sets by increasing order of core size. It is easyto show that small jobs (those outside the cores) can be shifted towards fast machines,in order to fit below the original makespan again. Finally, we apply Lemma 2.5 tomake the cores (now with original job sizes) fulfill (A2)(b). After all, the cores andthe small jobs induce a -division on each machine.

    Definition 2.3 (core). Given a set P of jobs, we define the core cr(P ) of Pas follows. Consider the jobs P = {p1, p2, . . .} in a fixed decreasing order of size.Let j be minimum with the property that pj 1+ |P j1|; then cr(P )

    def= P j1 =

    {p1, . . . , pj1}. If no such j exists, then cr(P ) def= P.Proposition 2.4. Let P be a set of jobs; then

    (d1) for all p cr(P ) p > (1+)2 |cr(P )|;(d2) for all q P \ cr(P ) q 1+ |cr(P )|.Proof. Since job sizes are in decreasing order, (d2) holds by definition. Moreover,

    also by definition, pj1 >

    (1+) |P j2|. Therefore

    (1 + )2pj1 > (1 + )pj1 + pj1 > (|P j2|+ pj1) = |P j1| = |cr(P )|.

    The same holds for all jobs of size at least pj1, which proves (d1).Lemma 2.5. Let (Q1, . . . , Qm) be a partition of a subset Q of the input jobs such

    that |Q1| |Qm|. There exists a partition (L1, . . . , Lm) of Q that satisfies(i) |L1| |Lm|;(ii) for any job class Cl, if job plj belongs to Li and job plk to Li , where i < i

    ,then j < k;

    (iii) {Q1, Q2, . . . , Qm} = {L1, L2, . . . , Lm} (some permutation of the Oi yields theLi);

    (iv) for all i

    1

    1 + |Qi| < |Li| |Qi|.

    Proof. Let Qi R. We say that we maximize Qi w.r.t. R if for every class l wereplace the jobs in Qi Cl by the largest possible jobs in R Cl (i.e., if there are rsuch jobs, then with the r largest jobs of R Cl). We will denote the maximized setby QRi . Clearly, if S R, then |QSi | |QRi |.

    Now we construct the new partition recursively. We define Lm as a set of maxi-mum workload among {QQ1 , QQ2 , . . . , QQm}. (Notice that the latter contains subsets ofQ but is not a partition of Q.) Assuming that Lm = Q

    Qi , now Lm1 is defined to be

    a set of maximum workload among {QQ\Lm1 , . . . , QQ\Lmi1 , QQ\Lmi+1 . . . , QQ\Lmm }, etc. Inevery recursive step we selected a set that has larger weight than any other remainingset, even if those sets get the largest remaining jobs of the respective classes. Thisproves (i), whereas (ii) and (iii) hold by construction.

    Next we argue that (iv) holds as well. By (iii), there exist at least i jobsets amongthe Ql, so that |Ql| < (1+ )|Li| (namely, the sets of rounded jobs L1, L2, . . . , Li). Inparticular, the inequality holds for the i-th set Qi in the increasing order:

    1

    1 + |Qi| < |Li|.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1579

    Similarly, there exist at least m i+1 sets among the Ql, so that |Li| |Ql| (namely,the sets Li, Li+1, . . . , Lm), which proves

    |Li| |Qi|.Theorem 2.6. For arbitrary increasing input speeds and input jobs, a canonical

    allocation inducing a schedule with makespan at most (1+3)OPT exists, where OPTis the optimum makespan.

    Proof. Let PI be the set of all jobs and s1 sm be the input speeds. Weprocess this set of jobs in three steps to finally obtain the desired canonical allocation.In the first two steps we consider only the set of rounded jobs P I.

    1. (core division) We start from an optimal schedule of P I. Let this be(P1, P2, . . . , Pm) and its makespan be M (1 + )OPT. The inequality triviallyholds, since any schedule of PI induces a schedule of P I of makespan increased by afactor of at most (1 + ).

    Moreover, for every Pi let Si = Pi \ cr(Pi). In the rest of the proof we call jobs

    inm

    i=1 cr(Pi) large, and jobs inm

    i=1 Si small.

    2. (core sorting) In this step we start from the schedule (P1, P2, . . . , Pm) withPi = cr(Pi)Si (disjoint union), and we result in a schedule P 1, . . . , P m with P i =LiSi, where L1, L2, . . . , Lm is simply the set of cores cr(Pi) sorted by weight.

    We sort the complete Pi sets by core weight |cr(Pi)|. After that, we dont movelarge jobs but take the small jobs in the same order as they are allocated now, startingfrom (small) jobs on machine m,m 1,m 2, . . . , and fill them on the machines upto the time M, in decreasing order of machines. When a job reached or exceeded thefinish time M, we continue on the next machine.

    For the obtained (Li, Si) divisions, property (d1) of Proposition 2.4 trivially holds,

    since we did not change the large sets. Next we show that (d2) holds as well. Weclaim that every small job that was previously in Si has been moved to a some coreset Lh with |Lh| |cr(Pi)|. This, in turn, implies (d2). Assume the contrary, thatduring redistribution, some small job is moved from a machine k to a lower indexmachine h < k. This, however, would mean that the total set of jobs on machines[k,m] (after the sorting) does not fit on the fastest m k + 1 machines below finishtime M, contradicting the fact that originally these were job sets on some m k + 1machines, fitting below M. We claim that the resulting schedule has makespan atmost (1 + )M (1 + )2OPT. A folklore argument [9] shows that sorting the corescannot increase the makespan induced by the cores, which is at most M. Duringredistribution, the last small job on each machine increased the makespan by a factorof at most (1 + ), due to (d2).

    3. (permutation) Finally, we turn to the original job sizes. Working from thefastest machine to the slowest, we replace each small rounded job by the largestunrounded job in its class that has not already been placed on a faster machine. Thuswe obtain the sets Si so that (A2)(c) holds. Then we replace the large rounded jobs ineach Li by the remaining original jobs of the respective classes, and apply Lemma 2.5on the resulting partition of large jobs, to obtain L1, . . . , Lm. Now (A1) and (A2)hold for the partition. Moreover, by Lemma 2.5(iv), 11+ |Li| < |Li| |Li| holds forevery i. This implies, on the one hand, that the makespan did not increase. On theother hand, we obtained -divisions (Li, Si): the sets Li are the permuted core setsLi by Lemma 2.5(iii), so for any job p in Li we have by (d1) that

    p >

    (1 + )2|Li|

    (1 + )2|Li|,

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1580 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    and this proves (D1). Finally, (D2) holds because if q Si, then q Si, andby (d2)

    q (1 + )

    |Li| < |Li|.

    3. Configurations. Like in [9, 8], we introduce so-called configurations(w, , no, n1) in order to represent any possible job set Pi of the partition up to accuracy. The and w determine which jobs are considered small and which areconsidered large (resp., considered at all) in the delta-division of Pi. The n

    o describesa set of jobs in this scope, while n1 describes a superset of those jobs, Pi being roughlythe difference of the two sets.

    We use the configurations to define the vertices of a directed graph HI over mlayers. An optimal path over all layers will then determine our output schedule: if apath goes through a configuration at layer i, it means that machine i gets the job setof this configuration.8

    The first component of any configuration is a magnitude w which is an integerpower of 2. As we proceed from slow machines to fast machines in a schedule, themonotonically increasing magnitude keeps track of the largest job size allocated sofar, which must be some size in the interval (w/2, w]. Thus, the current magnitudeindicates which (larger) job sizes are not yet relevant and which (tiny) jobs need notbe taken into account individually anymore in the configuration. (Note that here weexploit that the |Li| must be increasing; cf. [9].) This motivates the next definition.

    Definition 3.1 (valid magnitude). The value w = 2z (z Z) is a valid magni-tude if an input job p PI exists so that w/2 < p w. Let wmin and wmax denotethe smallest and the largest valid magnitudes, respectively. We call a job tiny for wif it has size at most w.

    Recall that is the integer power of 2 between /6 and /3. Having a magnitudew fixed, let = log(1+) w = t log(w) and = log(1+) w = t logw, where(1+ )t = 2. Notice that both and are integers, and by Definition 1.2, the jobs ofsize in (w,w] belong to the classes C+1, . . . , C. These will constitute the relevantjob classes if the largest job size on the current or slower machines is between w/2and w. The integer parameter (,] is the index of the job class on the borderbetween large and small jobs, as we discuss later.

    If the configuration represents the set Pi in a job partition, then the so-called size

    vector no = (no, no+1, . . . , n

    o) describes the jobs in the cumulative job set Ai1

    def=i1

    h=1 Ph as follows. For < l , (l = , +1), exactly the first (smallest) nol jobsof the class Cl are in the set Ai1. The meaning of no is that in Ai1 the total weightof jobs from

    l Cl is in the interval ((n

    o 1) w, (no + 1) w). However, the

    particular subset of these small jobs inside Ai1 is not determined by . The vectorn1 represents the set Ai

    def=

    ih=1 Ph, analogously.

    A major difference to the configurations of [9] is that our configurations shouldnot only represent a job set Pi but also its -division (Li, Si). In particular, we willdistinguish four types of job sizes in a configuration. Tiny jobs have size at mostw, and, as already seen, are represented by the first coordinates n of the two sizevectors with their total size rounded to an integer multiple of w. Correspondingly,

    8Roughly speaking, our graph can be thought of as the line graph of the graph G defined in [9](with simple modifications). That is, the vertices of H correspond to edges of G.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1581

    we will talk about blocks of size w which are simply re-tailored9 tiny jobs so as tomake our procedure efficient.

    Definition 3.2 (block). Blocks are imaginary tiny jobs, each having size w forsome valid magnitude w. We use T (n, w) to denote a set of n blocks of size w.

    Small jobs are those that (together with the tiny jobs) can only appear in theset Si of the -division, whereas large jobs can only be in the set Li. However, thereexist job classeswe will call them middle size jobswhich might occur in both Liand Si, since by Definition 2.1 there is a flexible boundary between the job sizes inLi and in Si. Therefore, exactly two job classes, and + 1, will be represented bya triple of (increasing) nonnegative integers instead of scalar values in both of thevectors no, n1. In the case of no = (n

    o, n

    om, n

    os), the meaning of the three numbers

    will be that in the set Ai1, from the class C exactly the jobs in C(nom, nos] areallocated as small jobs, that is, to one of the sets S1, S2, . . . , Si1, and exactly thejobs in C[n

    o] as large jobs, i.e., in one of L1, . . . , Li1, and similarly in case of n

    1

    for the set Ai. The meaning of the numbers for + 1 is analogous. We summarizethe above description in the next, somewhat technical definitions.

    Definition 3.3 (size vector). A size vector n = (n, . . . , n) with middle size [+ 1,] is a vector of integers, except for the entries n = (n, nm, ns) andn+1 = (n(+1), n(+1)m, n(+1)s), both of which consist of three integers, so thatn nm ns, and n(+1) n(+1)m n(+1)s holds. All integer entries belongto [0, n].

    Definition 3.4 (configuration). Let = log(1+) w and = log(1+) w (i.e.,the jobs of size in (w,w] belong to the classes C+1, . . . , C).

    A configuration (w, , no, n1) consists of four components: a valid magnitudew, and two size vectors no = (no, . . . , n

    o), and n

    1 = (n1, . . . , n1) with middle size

    [+ 1,], such that(C1) nol n1l nmaxl for < l , l {, + 1}.(C2) If w = wmin, then n1l > 0 for at least one l ( t,].(C3) no n1 max{no,

    l |Cl|w

    1}.

    (C4) no n1 nom = n1m nos n1s nmax , and analogously for + 1;

    Tdef= T (n1 no, w).

    Sdef= T C(nos, n1s] C(+1)(no(+1)s, n1(+1)s]

    1l=+1

    Cl(nol , n

    1l ], and

    Ldef= C(n

    o, n

    1] C(+1)(no(+1), n1(+1)]

    l=+2

    Cl(nol , n

    1l ].

    T, S, and L are the tiny, small, and large jobs (respectively) in configuration .(C5) Either no = n1, and (1 + )(+1) |L| < (1 + )(+2),

    or is the empty configuration (wmin, min + 1,0,0), where min = t log(wmin).

    Clearly, (C1), (C3), and (C4) reflect how no and n1 represent the cumulative job-sets; (C2) implies that w is always the smallest possible magnitude for representingthese job sets.

    9A block can be thought of as a box of size w filled exactly with actual tiny jobs, so that jobscan be cut into two parts when needed. Blocks are used during the optimization (OptPath) and arereplaced by the corresponding actual jobs afterwards (Partition).

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1582 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    (C5) is different in flavor from the previous properties: it establishes the relationbetween the weight of L and of the job sizes at the boundary of L and S. In par-ticular, observe that the inequalities of (C5) impose (D1) and (D2) of Definition 2.1.Moreover, since 1, for the tiny blocks (of size w = (1 + )), (C5) impliesthe strict inequality

    (3.1) w < |L|.

    Notation 1. We refer to the whole represented job set L S simply by (abusing notation), and || stands for the total work of the set . Note that is avirtual jobset, since S also includes tiny blocks. By property (C5), for every nonemptyconfiguration , the sets (L, S) form a -division of the set . We denote the setwithout tiny blocks by = \ T.

    4. The directed graph HI. In this section, for arbitrary input instance I, wedefine a directed, layered graph HI. All vertices of this graph are configurations,labeled and chained to form the graph in a suitable way.

    First, for an arbitrary configuration , we define a set of configurations calledScale(). These are the possible configurations of an end-vertex of any arc startingat a vertex labelled by . In particular, if = (w, , no, n1), = (w, , no, n1), and Scale(), then no must represent the same job set Ai, as n1 from the point ofview of a (possibly) increased magnitude w and a (possibly) increased middle size. Whenever w > w, this involves collecting in Ai the jobs of classes that becometiny and the tiny blocks and forming the proper number of new bigger tiny blocksout of this job set. Since all magnitudes are powers of 2, the size w of tiny blockseither remains unchanged or at least doubles as we proceed to faster machines. Thisprevents that rounding errors in the total (tiny) job size cumulate to an unpredictableerror (see also [9]). Furthermore, the middle size is allowed to increase if Ai alreadycontains all large jobs in the class , that is, n1 = n

    1m. The exact definition of

    Scale() can be found in Appendix A.The vertices of HI are arranged in m layers and in levels I and II, which are

    orthogonal to the layers. The configurations on level I must have an empty set ofsmall jobs, i.e., S = , and here the layers {m 2,m 1,m} are empty. Level II hasm 1 real layers, and we add a single dummy vertex vm adjacent to every vertexon layer m 1 that alone forms the last layer m.10 In general, the ith layer standsfor the ith set Pi. Any directed path of m nodes leads to vm over the m layers, andfrom level I (or II) to level II. Such a path we call an m-path. The m-paths representpartitions of the input PI. For a given m-path, the very first vertex on level II is insome layer k m 2; we will call it the switch vertex, and k the switch machine.(Note that k is thus the first machine possibly receiving small jobs.) We denote thevertex sets of the two levels by VI, and VII, respectively.

    Notation 2. For any given directed path (v1, v2, . . . vr), the corresponding con-figurations of the nodes will be denoted by (1, 2, . . . , r).

    In what follows, we consider the last three sets Pm2, Pm1, Pm of the partition.Note first that for any m-path the last configuration m1 alone represents

    m1h=1 Ph.

    Thus, we can use m1 to uniquely determine the hidden configuration m (notappearing explicitly in the path). We define m to have wm = wm1, m = m1,and all jobs not allocated in

    m1h=1 Ph. In particular, note that m includes all jobs of

    10More precisely, we will unite layers m 2 and m 1 and use double vertices in the united layer,but it simplifies the discussion to think of these as pairs of individual vertices.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1583

    Fig. 4.1. The graph HI. Level I consists of m 3 layers, level II has m layers. Arcs leadbetween consecutive layers within level I, within level II, and from level I to level II. Layers m 2and m 1 are combined and contain the double vertices.

    class higher than m1 and that this job set can be handled as a single huge chunkwithout violating the running time bounds. For technical reasons we overestimate the

    amount of tiny blocks on m and set n1 =

    l |Cl|w

    +3. (We relax on (C3) for m1

    and m and allow this bound on n1.) The definition of other size vector entries in mis obvious.

    Furthermore, our monotonicity argument requires that even the last three work-loads m2, m1, and m be dealt with as a single entity. Among other restrictions,we will demand that either all of them have the same magnitude, and therefore usewm1 = wm2 instead

    11 of wm1, or that wm2 is much smaller than wm1, so thatall jobs on m 2 (if they exist) are tiny for machines m 1 and m. Correspondingly,m2 and m1 of any path must adhere to either type (X) or (Y) as specified next.Observe that in case (X) on the last three machines, respectively, in case (Y) on thelast two machines, the size of tiny blocks is the same (i.e., well defined):

    (X) wm2 > 2 wm1; then we modify the last magnitude to be wm1 := wm2,and require(i) either all tiny jobs (at most 18 blocks) are on machines m 1 and m, or(ii) machines {m 2,m 1,m} have at least 18 tiny blocks, and at least

    two of them have each at least 6 tiny blocks.(Y) wm2 2 wm1; then

    (i) all machines but {m 1,m} are empty, or(ii) m 1 and m together have at least 6 tiny blocks.

    The requirements (X) and (Y) can be included in the graph definition, e.g., byusing only special double vertices vDm2 with double configurations (m2, m1) inthe layers (m 2,m 1) on level II. Applying wm1 := wm2 > 2 wm1 canbe done by using size vectors of triple length for the double vertices of type (X).Clearly, all restrictions can be represented by the double configurations, and we needpolynomially many of them.

    The following definition of graph HI is independent of the speed vector s anddepends only on the job set PI. We assume without loss of generality that m 3,otherwise we add a machine of speed (close to) 0.

    Definition 4.1 (graph HI). HI(V,E) is a directed graph, where every vertexv = vm is a triple v = (d, i, ), so that d {I, II} is a level, i is a layer, and

    11The exposition may look sloppy here; a rigorous definition of double configurations would havebeen direct, e.g., using common magnitude wm2 and a size vector of triple length in case (X). Thewords instead and modify clearly reflect the development of our construction. We hope this willprove reader-friendly rather than confusing.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1584 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    = (w, , no, n1) is a configuration. Triples that fulfill i [1,m 3] and (V1)(V2)are vertices, and those fulfilling (V3) are double vertices:

    (V1) If i = 1, then for l = , + 1 nol = 0, while for l {, + 1} nol = 0 andnolm = n

    ols.

    (V2) If d = I, then S = .There is an arc from v = (d, i, ) to v = (d, i+ 1, ) if an only if

    (E1) Scale(), and(E2) |L| |L|, and(E3) d d;(V3) for d = II, and combined layers i = (m 2,m 1), include double vertices

    vD with double configurations (m2, m1) so that for (m2, m1, m)the requirements (E1), (E2) and either (X) or (Y) hold. Moreover, |m2| |m1| |m|, and |m2| |m1| |m|.

    Next, to each vertex we assign a weight called finish time and define the makespanof a path accordingly. These values do depend on the machine speeds s. Observe thatmachines having tiny (that is, nonexact) jobs use an upper-bound as finish time,except for the switch machine.

    Definition 4.2 (finish time of a vertex). Let v = (d, i, ) be a vertex of HI,where = (w, , no, n1). The finish time of v is then f(v) = ||+wsi if does have

    tiny blocks (no < n1), and f(v) =

    ||si

    otherwise.

    Definition 4.3 (makespan of a path). Let Q = (v1, v2, . . . , vr) be a directed pathin HI. If Q VI or Q VII, then the makespan of Q is M(Q) = maxrh=1 f(vh). If Qis an m-path with switch vertex vk=(II, k, k), then M(Q)=max{ |k|sk ,maxh =k f(vh)}.

    The following theorem, saying that an m-path having makespan close to theoptimum makespan of the scheduling problem always exists, is a consequence of The-orem 2.6. The proof is straightforward and requires a technical translation of realschedules to m-paths in HI. The reader can find the complete proof in Appendix B.

    Theorem 4.4. For every input I = (PI, s) of the scheduling problem, the optimalmakespan over all m-paths in HI is at most OPT (1 + O()), where OPT denotesthe optimum makespan of the scheduling problem.

    5. The deterministic algorithm. This section describes the deterministicmonotone algorithm in the form of two procedures and the main algorithm Ptas.We will make use of an arbitrary fixed total order over the set of all configurations,such that configurations of smaller total workload are smaller according to .

    Among the m-paths of HI, the algorithm selects an m-path having minimummakespan as the primary objective. Among paths of minimum makespan, we maxi-mize the index of the switch machine k. A further order of preference is to be of type(X), then (Y). This selection of the optimal path is done by procedure OptPath (seeFigure 5.1). This procedure is a common dynamic programming algorithm that findsan m-path of minimum makespan in HI. However, we do not simply proceed from leftto right over the m graph layers, but select an optimal prefix path from the first layerto every node in VI, and similarly, an optimal suffix path from layer m to each node inVII. For all v VI and v VII, the pointers pred() and succ() determine an incoming,respectively, an outgoing, path of minimum makespan. When the makespan of twoprefix (or suffix) paths is the same, we break ties according to . For every v VIand v VII, the respective optimum values are recorded by opt(v).

    Finally, we test each vertex in VII to be a potential switch vertex (i.e., we findoptimal paths leading to the switch vertex from both end layers). The optimal path

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1585

    Procedure 1. OptPath.Input: The directed graph HI and speeds s1 s2 sm.Output: The optimal m-path Q = (v1, . . . , vm) of HI.

    1. For every double vertex vDm2 = (m2, m1) VII doopt(vDm2) := max{f(vm2), f(vm1), f(vm)} (where m1 determines m;see section 4)

    M(vDm2) := max{ |m2|sm2 ,|m1|sm1

    , |m|sm };for i = m 3 down to 1 do

    for every v = (d, i, ) VII do(i) succ(v) := w if opt(w) = min{opt(y) | (v, y) E} and among such

    vertices of minimum opt() the configuration of vertex w is -minimal;(ii) opt(v) := max{f(v), opt(succ(v))};

    M(v) := max{ ||si , opt(succ(v))}.2. For every v = (d, 1, ) VI do opt(v) := f(v);

    for i = 2 to m 2 dofor every v = (d, i, ) VI VII do

    (i) pred(v) := w VI if opt(w) = min{opt(y) | y VI, (y, v) E} andamong such vertices of minimum opt() the configuration of vertex wis -minimal;

    (ii) if v VI, then opt(v) := max{f(v), opt(pred(v))};if v VII, then M(v) := max{M(v), opt(pred(v))}.

    3. Select an optimal switch vertex vk = (II, k, k) VII, by the following objec-tives:(i) M(vk) = min{M(v) | v VII};(ii) the layer k is maximum over all v of minimum M(v);(iii) if k < m 2, then the configuration k is minimal w.r.t. over all v of

    minimum M(v) in layer k;(iv) if k = m 2, then among all double vertices vDm2 = (m2, m1) of

    minimum M(vDm2) select vk = vDm2 by the following objectives:

    (a) keep the order (X), (Y) and then (i), (ii) (cf. section 4);(b) in the case of (X), select an (m2, m1, m, (Tm2Tm1Tm))(i.e., with a common pool of tiny blocks) by some predefined ordering,and thenby redistributing the tiny blocksminimize also the second

    highest finish time of |m2|sm2 ,|m1|sm1

    , and |m|sm (as M(vDm2) had been

    minimized before);(c) in the case of (Y), select an (m2, m1, m) by some predefinedordering for which |m1|+ |m| is maximized;

    4. for i = k 1 down to 1 do vi := pred(vi+1);for i = k + 1 to m do vi := succ(vi1);Q := (v1, . . . , vk, . . . , vm).

    Fig. 5.1. Procedure OptPath finds an m-path Q of minimum M(Q) in the directed graph.

    makespan, assuming some v VII as switch vertex, is recorded in M(v). We choose aswitch vertex vk providing optimum makespan and of maximum possible k.

    The case k = m 2 needs careful optimization. Among solutions of minimummakespan, we choose deterministically by some fixed order of the double configura-tions (m2, m1), except for the tiny blocks that are always distributed so as tominimize the (local) makespan of the last three machines. The flexibility provided

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1586 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    by three machines with many tiny blocks facilitates a monotone allocation in thisdegenerate case as well.

    Proposition 5.1. If Q is the m-path output by OptPath with configurations(1, . . . , m) in the path vertices and switch machine k = m 2, then |i| (1 4) M(Q) si for all i > k.

    Proof. Assume the contrary, that |i| < (1 4) M si, where M = M(Q). Byoptimality of the switch machine k, Sk = . We claim that we can put a smalljob p Sk from k to i by modifying the path Q. Suppose that we put one job fromclass l from machine k to k + 1, where (w, , no, n), and (w, , n, n1) withsmallest job classes and are the configurations of vk and vk+1, respectively. In we reduce nl by 1. As for , if l >

    , then we reduce nl by 1 as well, and we aredone. If l , then we scale the reduced n size vector according to (S5) in order toobtain the new n . In this way, n

    either reduces by 1 or keeps its original value. In

    the latter case we can say that T swallowed the job. In order to put a job from k toi > k, we can repeatedly apply the above changes to the configurations until the jobgets swallowed or we arrive at machine i. These changes decrease |k| and increase|i| by (at most) one small job, that is, by at most |Li | |i|. (If i = m2 and so|m2| increased, then we may have to increase |m1| and |m| so as to keep themnondecreasing; that is why we need 4 instead of 2 in the claim.)

    The workload of machines between k and i did not increase, because jobs in classCl get shifted to the right. Altogether, we either found a smaller (by ) k or a validpath with switch vertex k+1 if no small jobs remained on k, without having increasedthe makespan, so Q was not optimal.

    Once an optimal m-path is found, we have to allocate the jobs of PI to the ma-chines. This is obvious for jobs that appear individually in some configuration of thepath, but we need to define how the tiny jobs are distributed, given the block repre-sentation. Procedure Partition is depicted in Figure 5.2. Importantly, dependingon whether the switch machine k is filled high (close to the makespan) or low, it getsfilled with tiny jobs below, respectively, above |k|.

    Distributing the tiny jobs for k = m 2 (see 3b in Partition) is a slightlymore subtle procedure, operating with the same principle. (A low machine is filledover |i|.) Note also that here |Qm2| |Qm1| |Qm| can be achieved, becauseboth the i and the i are nondecreasing, and the total number of tiny blocks isoverestimated by 3 blocks in m. Further, if the set Qi is increased because of 3b(i)or (ii), then i still gets much less work than any higher index machines, so the |Qi|remain increasing.

    The output of Partition is, indeed, a partition of PI, and the induced allocation(Q1, . . . , Qm) is canonical with the choice Li := Li . This is clear, since the config-urations represent -divisions and (A1) and (A2) are imposed by the construction.Finally, the upper bound in (C3) on the number of tiny blocks in any configurationand the rules of rounding in (S5) imply12 that there are enough jobs in T from theclasses l to fulfill 3a(ii). In other words, the tiny jobs allocated to any machine ihave size at most wi, as intended.

    The next technical proposition describes essential properties of the output job-partition.

    12Note that denotes the total size of tiny blocks ini

    h=1 h in 3a(i). Let denote the total size

    of jobs tiny for wi ini1

    h=1 h. By (S5) + < n1 wi + wi; by (C3) n1 wi

    l |Cl| wi.

    So, + (1 /2) M(Q).Let = 0 and r = 0;for i = k to m 1 do

    given i = (w, , no, n1) and = log w, let i := (n

    1 no) w;

    (i) := + i;(ii) if high k, then let u be the maximum index in T so that

    uj=1 tj ;

    if low k, then let u be the minimum index in T so thatu

    j=1 tj ;(iii) Qi := Qi {tr+1, tr+2, . . . , tu};(iv) r := u.Qm := Qm {tr+1, tr+2, . . . , t}.

    3b. If k = m 2, thenstart with an allocation of tiny jobs to {m 2,m 1,m} (in the given order)such that each machine i gets at most |Ti | amount of tiny jobs and the|Qm2|, |Qm1|, |Qm| are nondecreasing.Let M = max{ |m2|sm2 ,

    |m1|sm1 ,

    |m|sm

    };call i {m 2,m 1,m} low if |i|/si (1 2/3) M, and high if|i|/si (1 /2) M ; correct the partition of tiny jobs (with keeping thejob order) so that(i) if there is one low machine i and two high machines, then i receives atleast |i| work;(ii) if there are two nonhigh machines, then both receive at least 6w workof tiny jobs.

    Fig. 5.2. Procedure Partition allocates the jobs based on path Q output by OptPath.

    Proposition 5.2. If Q = (v1, . . . , vm) is the input path, then for the outputQ1, . . . , Qm of Partition,

    |i|8wisi

    |Qi|si M(Q) for every machine i.Proof. For i < k, i contains no tiny jobs, and Qi = i. So, in this case

    |Qi|si

    =|i|si

    = f(vi) M(Q).In Partition, the variable i stands for the total size of tiny blocks in i.If k < m 2, then the machine i = k receives an amount of tiny jobs of total size

    between k wk and k + wk, and not more than k in the case of high k. Thelast machine having nonzero tiny blocks (m1 or m) receives at most i, and at leasti6wi, work (due to the estimate n1 =

    l |Cl|w

    +3 in the hidden configuration

    m, which is a bound on the total rounding error). The other machines i > k get tinyjobs of total size in [i wi,i + wi]. The +wi overhead was calculated in thesemachines finish times f(v), and therefore in the makespan M(Q).

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1588 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    Algorithm 3. Ptas.

    Input: machine speeds 1 2 m, and job set PI = {p1, p2, . . . , pn}, desiredprecision .Output: A partition P1, P2, . . . , Pm of PI.

    1. for each i [1,m], round the speed i up to the nearest power of (1 + );2. based on the jobs PI, and an appropriate , construct the graph HI;3. withHI and the rounded speeds s1 s2 sm, run ProcedureOptPath

    in order to obtain the optimal m-path Q = (v1, . . . , vm);4. from Q compute the partition Q1, Q2, . . . , Qm by procedure Partition;5. let P1, P2, . . . , Pm be the sets of {Q1, Q2, . . . , Qm}, sorted by increasing order

    of weight |Qi|; output P1, P2, . . . , Pm.

    Fig. 5.3. The deterministic monotone Ptas.

    Finally, in the case k = m 2, only a low machine i {m 2,m 1,m} mayreceive more work than |i|; and each machine receives at least i 8wi work oftiny jobs, even after the corrections done in 3b(i) or (ii).

    The monotone Ptas is presented in Figure 5.3. A substantial property of theoutput is that on machines i < k, sets Qi = Pi are determined exactly and haveincreasing workloads. On the other hand, the machines i > k have finish time close tothe makespan M(Q). The characterization of the final output allocation P1, . . . , Pmis summarized by the next lemma.

    Lemma 5.3. Let PI denote the set of input jobs and s be the vector of roundedspeeds. Let Q = (v1, . . . , vm) be the path output by OptPath with configurations(1, . . . , m) in the vertices, and vk = (II, k, k) be the switch vertex. In Ptas thesets Qi get permuted only among machines of equal rounded speed. In particular, ifP1, . . . , Pm is the partition output by Ptas, then

    (a) for i < k, Pi = Qi = i;

    moreover, if k < m 2, then(b) for i > k, (1 12) M(Q) |Pi|si M(Q);(c) for k either Pk = Qk or (b) holds.

    Proof. Point (a) is obvious, since Si = for every vertex vi VI, and by step 1of Partition, Qi = i = Li . Moreover, by (E2) Li has the ith smallest weight, soQi = Pi.

    For i > k we have by Proposition 5.2 that M(Q) si |Qi| |i| 8wi; byinequality (3.1) that wi < |i|; and by Proposition 5.1 that |i| (14)M(Q)si.Therefore, M(Q) si |Qi| |i| 8wi > |i|(1 8) > (1 12) M(Q) si. Thisproves (b) for the |Qi|, that is, before ordering the workloads. Furthermore, since , this implies that the Qi get permuted only among machines of equal si which,in turn, proves (b) also for the permuted workloads |Pi|.

    If |Qi| (1 12) M(Q) si holds for i = k, then (b) holds for i = k as well bythe previous argument; otherwise Qk has the kth smallest weight, so Pk = Qk.

    Since the Qi get permuted only if k < m 2 (and only among machines of thesame speed), Proposition 5.2 and Lemma 5.3 imply |Pi|/si M(Q) in every case, i.e.,the makespan M(Q) of the output path is always an upper bound on the makespanof the output schedule.

    We conclude the section with proving the approximation and the running-timebound.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1589

    Theorem 5.4. For arbitrary input I and any given 0 < 1, the deterministicalgorithm Ptas outputs a (1+3)-approximate optimal allocation in time Poly(n,m).

    Proof. By Theorem 4.4, for the optimal path Q in HI has makespanM(Q) < (1 + )OPT, and by Lemma 5.3 this remains an upper bound for themakespan of the output. Since Ptas rounds the input speeds to integral powersof (1 + ), we obtain an overall approximation factor of at most (1 + )2 (1 + 3).

    In order to prove the running time bound, we show that for constant , step 2of Ptas can be computed efficiently. The number of graph vertices |V | is O(m C), where C denotes the number of different configurations. Every configuration(including the double configurations of triple length) is determined by O(log(1+) 1/)coordinates (where = ()). Since close to zero log(1 + ) = () holds, this meansO((1/) log 1/) coordinates, each of which (including ) can be assumed to takeat most n + 1 different values, so |C| = nO((1/)log 1/). Finally, for any v, v Vdeciding whether (v, v) E takes time linear in the number of these coordinates.Thus for fixed , step 2 is poly(n,m), and all other steps obviously take polynomialtime.

    6. Monotonicity and payments. A high-level formulation of the monotonic-ity argument is the following. Since the workloads |Pi| of the output partition areincreasing, it is easy to see that it suffices to prove monotonicity in the special casewhen a single rounded speed si = (1 + ) is reduced to s

    i = 1. Let the output path

    of OptPath be Q in the first and Q in the second case; moreover, let f and M de-note finish times and makespan for input (si, si) and f and M for input (si, si),respectively.

    The makespan of any (sub)path in HI cannot decrease by reducing a machinespeed. The objectives concerning the optimal path are defined so that Q = Q mayoccur only if machine i becomes a bottleneck machine, either concerning the wholepath (meaning M(Q) < M (Q) = f (vi)) or some relevant subpath (e.g., prefix pathor the subpath (vm2, vm1, vm)). In any of these cases, Q is a possible solution of(local) makespan f (vi), and Q is preferred only in case the respective (sub)pathof Q has no higher (local) makespan than f (vi), implying that i gets not moreworkload than f (vi) = (1 + )f(vi). This proves monotonicity if (1 + )f(vi) is theexact workload (e.g., if i < k), or at least a lower bound on what i received with theoriginal speed.

    On the other hand, if (1 + )f(vi) was just a -estimate, then the machine wasclose to full with input si, and, by reallocating many or all tiny blocks, the newmakespan (no matter what the output path looks like!) becomes much smaller thanf(vi)(1 + ).

    Next we present the theorem with complete proof.Theorem 6.1. Algorithm Ptas is monotone.Proof. Assume that machine i alone decreased its speed i to

    i in the input. If

    the vector of rounded speeds (s1, s2, . . . , sm) remains the same, then the deterministicPtas outputs the same allocation, and i receives the same or smaller workload, sincethe output workloads are in increasing order. Assuming that the rounded speed sidecreased as well, it is enough to consider the special case when i is the first (smallestindex) machine of rounded speed si = (1 + ) in input I(P, s), and after reducingits speed, it becomes the last (highest index) machine of rounded speed 1 in inputI(P, s). Since the workloads in the final allocation P1, . . . , Pm are ordered, this impliesmonotonicity for every one-step speed change (like (1 + ) 1). Monotonicity ingeneral can then be obtained by applying such steps repeatedly.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1590 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    Note that for both inputs the algorithm constructs the same graph, independentlyof the speed vector. We assume that with inputs I and I, OptPath outputs Q =(v1, . . . , vm) and Q = (v1, . . . , vm), where the switch vertices have index k and k,respectively. Finish time, makespan, etc., w.r.t. the new speed vector s are denotedby f (),M (), etc. We prove that |Pi| |P i |.

    We start with a simple observation. Since we decreased a machine speed, it followsfrom the definition of makespan that for any path R = (v1, v2, . . . , vr) within levelVI or within level VII, and for any m-path, M

    (R) M(R). Similarly, for any vertexv, opt(v) opt(v), and for any v VII M (v) M(v) (cf. procedure OptPath).Obviously, also the optimum makespan over all m-paths could not decrease.

    Case 1. i < k. In this case, i = Pi by Lemma 5.3(a), so the machine receives exactly|i| work with speed si.

    If M (Q) > M(Q), that means that machine i with the new speed si = 1 becomesa bottleneck in path Q, that is, M (Q) = f (vi) = |i|1 . Thus, Q is a path withmakespan |i|, and by the optimality of Q we obtain |P i | = |P i |/si M (Q) M (Q) = |i| = |Pi|.

    If M (Q) = M(Q) and Q = Q, then P i = i = i = Pi. Suppose M (Q) =M(Q), but Q = Q. Since the optimum makespan could not decrease by decreasinga speed, Q remains an optimal path for the new speeds as well, that is, M (Q) =M (Q). We claim that also k = k. Assume for contradiction that k > k; then Qwould have been preferred to Q for input s as well, because M(Q) M (Q) =M (Q) = M(Q). In turn, also vk = vk (meaning double vertices if k = m 2);otherwise would hold, and Q would have been preferred for input s as well.

    Now, since Q = Q, a maximum h < k exists so that vh = vh. This meansthat pred(vh+1) = pred(vh+1). Recall that opt(vh) is the optimum makespan over allpaths leading to vh from layer 1. If v

    h was preferred to vh in Q because h h,

    then opt(vh) < opt(vh) opt(vh) opt(vh). The first inequality holds; otherwise

    vh = pred(vh+1) would have been the choice for input s. The second holds for everyvertex. The third holds; otherwise vh = pred

    (vh+1) would have been the choiceof OptPath for input s. Similarly, if vh was preferred in Q because opt(vh) k. According to Lemma 5.3(b), with speed si = 1 + ,machine i has a workload of at least |Pi| (1 12) M(Q) (1 + ). Note that|Pi| |Li | also holds, since |Qh| |Li | for all h i, and this remains true for thesorted sets Ph for h i.

    We modify the path Q and construct a new path Q, where, with input speedsi = 1, machine i is a bottleneck (just like in path Q), and that has a makespanM (Q) of at most |Pi|. Therefore, the optimal path makespan M (Q) is also atmost |Pi|. Since the optimal path makespan is an upper bound on the makespan ofthe output schedule, this will prove |P i | = |P i |/si M (Q) |Pi|.

    We construct Q as follows. Assuming i < m, we put small jobs from Si (tinyor nontiny) onto machine i + 1 until either the moved jobs have a total weight of atleast 13 M(Q) (1+ ) or Si becomes empty. In the latter case the new finish timeof i is at most |Li |/si = |Li | |Pi|; in the former case (using also (3.1) in section 3)it is at most M(Q) (1 + ) (1 13) + wi M(Q) (1 + ) (1 12) |Pi|. Ifi = m, then we put only tiny blocks of the common magnitude wm1 onto m 1, and

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1591

    use |m| instead of |Li | in the calculation. The job moving procedure is the same asdescribed in Proposition 5.1.

    It remains to show that the new finish time is, indeed, the makespan of Q, i.e.,machine i is a bottleneck in Q. By Propositions 5.1 and 5.2, |Qi| = (1 O()) M(Q) (1 + ), and from this workload we removed total job size of O() M(Q).Since , the (work and) finish time of i is still at least M(Q)(1 + 13), which isan upper bound on all other machines finish time.

    Case 3. k < m 2 and i = k. By Lemma 5.3(c), for k either Pk = Qk or (1 12) M(Q) |Pi|si holds. In the latter case, we can include the case i = k in Case 2 above.

    Assume that i = k and Pk = Qk. If high k holds for Q, meaning |k| > (1 /2) M(Q) (1 + ) > (1 + /3) M(Q), then the proof is analogous to Case 2: fromPartition it is clear that |Pk| = |Qk| max{|k|, |k| wk}. By putting one tinyblock onto the next machine (if there are any blocks), we still obtain a path Q withmakespan M (Q) = |k | = max{|k|, |k| wk} > M(Q), and we are done.

    If low k is the case, then with input sk the machine receives at least |k| work inPartition. If machine k still becomes a bottleneck, that is, M (Q) > M(Q), thenthe new makespan |k|/sk = |k| is an upper bound on the new optimum makespanand therewith on the new workload |P k|.

    Finally, if the makespan of Q does not increase, then Q remains the outputpath: assuming the contrary that some Q = Q would be preferred by any of theoptimization criteria, this would have been the case also with the original input speeds.It follows now from the Partition procedure that k cannot get more workload (onlyless if low k changes to high k).

    Case 4. k = m 2 and i k. It follows from the definition (V3) of double verticesand of Partition that in this case the |Qi| are nondecreasing, so no permutation instep 5 of Ptas takes place. We show |P i | |Qi|.

    In cases (X)(i) and (Y)(i) (concerning the path Q), all tiny jobsi.e., of sizeat most wm2 or wm1, respectivelyare on the last two machines. Importantly,these cases are trivially recognizable from the double configuration vDm2. Moreover,the tiny jobs can be allocated to machines {m1,m} by any fixed rule (say, m gets alltiny jobs), and the makespan can be computed exactly during the path optimization.It can also be assumed that |m2| |m1| |m|. The proof is therefore analogousto that of Case 1.

    Next, suppose that (X)(ii) holds for path Q. Let M = max{ |m2|sm2 ,|m1|sm1

    , |m|sm }.Assume first that machine i m 2 with speed si was a nonlow machine in 3bof Partition, meaning |i| > (1 2/3) M (1 + ). Then, with speed si = 1machine i becomes a bottleneck in the subpath (vm2, vm1, vm) (and possibly inthe path Q). Moreover, due to , machine i is still a bottleneck in a modified(sub)path Q, where we put 8 tiny blocks, or all tiny blocks from i to the othertwo of the last three machines (obeying the sorted order and the restrictions (X)(ii)).That is, the makespan of Q (resp., the local makespan on (m 2,m 1,m)) ismax{|i|, |i| 8wi}, which is an upper bound on |P i | and a lower bound on |Qi|,by Proposition 5.2.

    Assume that i was a low machine in 3b of Partition.

    If there was another nonhigh machine i, then only i and i have tiny blocks inQ (cf. 3 (iv)(b) of OptPath). Having changed the speed to si, we construct a pathQ by putting tiny blocks (when necessary) from i to i so that their maximum finishtime is minimized. If, with the optimized tiny blocks, the local makespan remainsM , then Q is the new output solution. (Any path preferred to Q would have been

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1592 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    preferred with speed si as well.) Obviously, in Partition, i receives a subset of thetiny jobs that it received with speed si. If in Q the local makespan M increases, theni becomes a (local) bottleneck so it has at most 6 blocks in Q. (Any further blockwould be put on i.) Thus, |i|+ 6wi is an upper bound on the new local makespanand also on |P i |, whereas, by Partition 3b(ii) it is a lower bound on |Pi|.

    Suppose that i was a low machine with speed si, and the other two were bothhigh machines. Then i received all but 6 tiny blocks (see 3(iv) of OptPath), whereas|Qi| |i| (see 3b(i) of Partition). Consider now the same path Q with speed si.If i becomes a (local) bottleneck, then |i| is an upper bound on the optimal (local)makespan of Q, so that |P i | |i|, and we are done. If i has the second highest finishtime among {m 2,m 1,m}, then the output path might change only by (possibly)optimizing the second finish time by removing blocks from i; in this case i is not alow machine anymore, and |P i | |i|. If i still has the lowest finish time, then theoutput path is the same, and in Partition, i gets the same set, or a subset of hisprevious jobs in case he becomes a nonlow machine.

    Finally, we turn to the case when for path Q (Y)(ii) holds. We claim that by theoptimality of Q, machines m 1 and m have finish time of at least (1 6)M(Q).Moreover, the allocation of Partition is essentially the same as in the case k < m2(withm2 being a low or a nonlow machine), and the monotonicity proof is analogous.

    In the rest of the proof, we show that the claim holds. Assume that m 1 orm not filled to at least (1 6) M(Q). In the case (Y)(ii), OptPath maximizes|m1| + |m|. If machine m 2 had at least one small job, then we could shift asmall job from m 2 to the last two machines, increasing |m1| + |m| withoutincreasing M(Q), a contradiction. Assume that m 2 has no small jobs, that is,m2 = Lm2. We show that the whole job set Lm2 has size of at most wm1.By definition of case (Y), 2wm1 > wm2. Since wm2 is the maximum job sizein Lm2, and by property (D2) of -divisions, we have wm2 >

    |Lm2|(1+)2 > |Lm2|.

    The latter inequality follows from < /3 for small . Putting it together, we obtain2wm1 > |Lm2|. We define a path Q as follows. We put the whole Lm2 tom 1 or to m in the form of at most one tiny block of size wm1 as imposed by(S5) (see Appendix A). Also, we shift every workload i = Li to the next machinefor machines i < m 2. The new partition remains canonical, and we found a pathbetter than the optimum Q, a contradiction.

    6.1. Computing the payments. In order to get a truthful mechanism, weextend the monotone algorithm with the payment scheme, as defined in [14, 3]. Forevery machine the formula for the payment Pi, as a function of the reported speedsand the allocation, is essentially uniquely determined:13 it is composed of the cost(reported finish time) of i and of the integral of the work curve on the interval [ 1si ,).The work curve is the workload assigned to machine i by Ptas, as a function of hisreported inverse speed, for fixed reported speeds of the other machines. This is astep-function, having breaks at integer powers of (1 + ) and at the reported inversespeeds of the other m 1 machines. Moreover, the curve vanishes when 1si

    npmaxpminsm ,

    where sm stands for the maximum speed of the other machinesthat is, i gets nowork if he reports a very low speed; cf. also [3, 1, 8]. The calculation requires runningthe algorithm Ptas O(log(1+) npmaxsipminsm ) number of times which is polynomial in theinput size.

    13That is, it is the case if we demand that all machines achieve nonnegative utility, and machinesdoing no work get zero payment.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1593

    Appendix A. The definition of Scale().

    We use the notation = log(1+)(w), = log(1+) w, = log(1+)(w) and

    = log(1+) w.Definition A.1 (Scale()). Let = (w, , no, n1) and = (w, , no, n1) be

    two configurations, where n1 = n = (n, . . . , n), respectively, no=n=(n , . . . , n

    );

    then Scale() iff the following hold:(S1) w w and .(S2) If = , then n = n and n+1 = n+1; if > , then n = nm and

    n(+1)m = n(+1)s ;

    if = + 1, then n = n(+1); if > + 1, then n(+1) = n(+1)m and

    nm = ns.

    For the sake of a concise presentation, in the next three requirements we assume that

    ndef= ns and n

    (+1)

    def= n(+1) whenever <

    holds; furthermore n(+1)def=

    n(+1)s and n

    def= n if additionally + 1 <

    holds.(S3) If < l , then nl = 0.(S4) If < l , then nl = nl.(S5) Let = nw +

    l=+1 |Cl[nl]|. If = 0, then let n = 0. Otherwise let

    n be the nearest positive integer to /w (breaking ties to smaller).

    Disregarding (S2), the conditions above are the natural scaling requirements, asthey also appear in [9]. By (S3) and (S4), in n a 0 stands for job classes not appearingin n, whereas those appearing explicitly in both n and n must be represented by thesame number in both size vectors. Less obvious is (S5). By the first condition wewant to achieve that n = 0 if and only if no jobs of size at most w

    have beenallocated in Ai. In general, it holds that

    R+ ( w, + w) ((n 1) w, (n + 1) w).

    That is, up to a rounding error of w, we keep track of the total size of tiny jobsallocated in the cumulative job set. Here we exploit that the magnitudes and areexact powers of 2, and in particular, if the magnitude increases during scaling, thenit at least doubles.

    Finally, we turn to the meaning of (S2). If is smaller than the new middle size, that means that the jobs of C[nm], which have to be allocated as large jobs, arealready in Ai, that is, n = nm and so C[nm] = C[n] Ai. Moreover, C[ns]are now all the jobs allocated from this class, so we can define (the scalar) n

    def= ns.

    Similarly, if is not a middle size in the old vector n, then no jobs of class C havebeen allocated as small jobs in the set Ai, and this is expressed by n

    m = n

    s and

    by the notation ndef= n.

    Appendix B. The proof of Theorem 4.4.

    Theorem 4.4. For every input I = (PI, s) of the scheduling problem, the optimalmakespan over all m-paths in HI is at most OPT (1 + O()), where OPT denotesthe optimum makespan of the scheduling problem.

    Proof. We define an appropriate m-path in HI by the following steps:1. We collect tiny jobs from PI into a set T starting from the smallest job class,

    and proceeding in increasing order of job classes, but in decreasing order within eachjob class. We stop collecting if either |T | 18wmax or the next job has size largerthan wmax. Let P := PI \ T.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    1594 GEORGE CHRISTODOULOU AND ANNAMARIA KOVACS

    2. According to Theorem 2.6, a canonical allocation P1, . . . , Pm, Pi = (Li, Si) ofP exists with makespan of at most OPT (1 + 3). We start from this allocation.

    3. We define wi for each i < m to be the magnitude of the largest job ini

    h=1 Pi.Since the |Li| are increasing, now we have wi/2 < |Li|; together with < /3 thisimplies

    (B.1)3

    2wi < |Li|.

    In turn, by (D2), jobs of size at most wi can only be in Si (and not in Li). Let these

    be the set Ti of tiny jobs in Pi. (We note that the largest job ini

    h=1 Pi might appearin some Si, but it cannot belong to Ti.) For m let wm := wm1.

    4. We put the set T of tiny jobs to machine m, increasing its workload by a factorof at most O(). Assume wm2 > 2wm1. Now either T contains all jobs tiny forwm2, so that (X)(i) holds, or T has enough tiny jobs so that (X)(ii) holds. In thelatter case we redistribute T over machines m 1 and m. We also set wm = wm1 =wm2.

    If wm2 2wm1, then either all machines i m 2 are empty, and (Y)(i)holds, or m2 is nonempty, meaning that T has at least 18wmax 6wm1 amountof jobs of size at most wm2, which are jobs tiny for wm1, so after creating tinyblocks (Y)(ii) will hold.

    We update the sets Tm1, Tm, Sm, etc., as implied by the changes above.5. In order to facilitate a transparent definition of our m-path, we modify the

    current partition of PI so that the sets contain tiny blocks instead of the tiny jobs.The jobs in Ti make |Ti|/(wi) (fractional) blocks. We build integral blocks out ofthese for every i < m by a simple procedure (cf. [8]) which packs (fractional) tiny jobsfrom a fractional block on some machine i into a fractional block on machine h > i,until one of them gets rounded to an integer number of blocks. Note that the (full sizeof) any repacked job remains tiny on its new machine. If there is just one machinei < m left with a fractional block, we put this fractional block on machine m. Finally,we put one block of size w (if exists) for each valid magnitude w < wm onto machinem from an arbitrary machine (with this we enforce (C3) on the configurations). Theworkload on every machine i increased by at most 2wi.

    6. We shift small jobs in Sm2Sm1Sm to the right so that |m2| |m1| |m| holds, increasing the makespan by at most 3OPT. Finally we redistribute tinyblocks when needed, so that |m2| |m1| |m| also holds.

    The resulting job partition is denoted by Q1, Q2, . . . , Qm Qi = (Li, Si). It is easy

    to verify that during the whole process, we did not confuse the order of jobs withinthe classes. (The ordering within tiny blocks is irrelevant.)

    7. Let i = log(1+) wi and i = log(1+) wi. Given |Li|, the property (C5) ofconfigurations uniquely determines i for i < m; moreover, the i are increasing sincethe same holds for the |Li|. By (D1) and (C5), the jobs in Li have rounded size of atleast (1 + )i (we exploit the strict inequality in (D1)), and by definition, at mostwi = (1 + )

    i . Thus, i i. By (D2) and (C5), the jobs in Si have rounded size ofat most (1+ )(i+1) (we exploit the strict inequality in (C5)). For small , inequality(B.1) above and (C5) imply i < i. For m we define m := m1.14

    14As a technical consequence of m1 = m, we can only require (1+ )(+1) |Lm | insteadof property (C5) of configurations. That is, on the last machine the division (Lm, Sm) does not fulfill(D1). This relaxation for m does not affect the rest of the argument.

    Dow

    nloa

    ded

    12/2

    0/14

    to 1

    34.1

    29.1

    64.1

    86. R

    edis

    trib

    utio

    n su

    bjec

    t to

    SIA

    M li

    cens

    e or

    cop

    yrig

    ht; s

    ee h

    ttp://

    ww

    w.s

    iam

    .org

    /jour

    nals

    /ojs

    a.ph

    p

  • Copyright by SIAM. Unauthorized reproduction of this article is prohibited.

    A DETERMINISTIC TRUTHFUL PTAS FOR QCMAX 1595

    8. It is now straightforward to define a configuration i for each Qi in a recursivemanner. If Qi = , then let i be the empty configuration. Otherwise let no of i bethe null vector for i = 1, and be the second size vector in i1 scaled to wi for i > 1.Based on the calculations of step 7, we can construct the n1 of i so that Li = Li, andSi = S

    i. In the end, we can define a double configuration (m2, m1) consistent

    with (V3), since the partition fulfills (X) or (Y).Clearly, the vertices vi = (II, i, i) exist in VII and form an m-path in level II, as

    easily follows from the graph definition. We increased every workload by O() OPT,so the theorem follows.

    REFERENCES

    [1] N. Andelman, Y. Azar, and M. Sorani, Truthful approximation mechanisms for schedulingselfish related machines, Theory Comput. Syst., 40 (2007), pp. 423436.

    [2] A. Archer, Mechanisms for Discrete Optimization with Rational Agents, Ph.D. thesis, CornellUniversity, Ithaca, NY, 2004.

    [3] A. Archer and E. Tardos, Truthful mechanisms for one-parameter agents, in Proceedingsof the 42nd Annual Symposium on Foundations of Computer Science (FOCS), Las Vegas,NV, 2001, pp. 482491.

    [4] I. Ashlagi, S. Dobzinski, and R. Lavi, An optimal lower bound for anonymous schedulingmechanisms, in Proceedings of the 10th ACM Conference on Electronic Commerce (EC-2009), 2009, Stanford, CA, pp. 169176.

    [5] V. Auletta, R. D. P