arXiv:1510.03362v1 [cs.DS] 12 Oct 20152.2 The Paging Problem Paging models a two-level memory system...

31
On the Smoothness of Paging Algorithms Jan Reineke 1 and Alejandro Salinger 2? 1 Department of Computer Science, Saarland University, Saarbr¨ ucken, Germany [email protected] 2 SAP SE, Walldorf, Germany [email protected] Abstract. We study the smoothness of paging algorithms. How much can the number of page faults increase due to a perturbation of the request sequence? We call a paging algorithm smooth if the maximal increase in page faults is proportional to the number of changes in the request sequence. We also introduce quantitative smoothness notions that measure the smoothness of an algorithm. We derive lower and upper bounds on the smoothness of deterministic and randomized demand-paging and competitive algorithms. Among strongly-competitive deterministic algorithms LRU matches the lower bound, while FIFO matches the upper bound. Well-known randomized algorithms like Partition, Equitable, or Mark are shown not to be smooth. We introduce two new randomized algorithms, called Smoothed-LRU and LRU-Random. Smoothed- LRU allows to sacrifice competitiveness for smoothness, where the trade-off is controlled by a parameter. LRU-Random is at least as competitive as any deterministic algorithm while smoother. 1 Introduction Due to their strong influence on system performance, paging algorithms have been studied extensively since the 1960s. Early studies were based on probabilistic request models [1,2,3]. In their seminal work, Sleator and Tarjan [4] introduced the notion of competitiveness, which relates the performance of an online algorithm to that of the optimal offline algorithm. By now, the competitiveness of well-known deterministic and randomized paging algorithms is well understood, and various optimal online algorithms [5,6] have been identified. In this paper, we study the smoothness of paging algorithms. We seek to answer the following question: How strongly may the performance of a paging algorithm change when the sequence of memory requests is slightly perturbed? This question is relevant in various domains: Can the cache performance of an algorithm suffer significantly due to the occasional execution of interrupt handling code? Can the execution time of a safety-critical real-time application be safely and tightly bounded in the presence of interference on the cache? Can secret-dependent memory requests have a significant influence on the number of cache misses of a cryptographic protocol and thus give rise to a timing side-channel attack? We formalize the notion of smoothness by identifying the performance of a paging algorithm with the number of page faults and the magnitude of a perturbation with the edit distance between two request sequences. We show that for any deterministic, demand-paging or competitive algorithm, a single additional memory request may cause k + 1 additional faults, where k is the size of the cache. Least-recently-used (LRU) matches this lower bound, indicating that there is no trade-off between competitiveness and smoothness for deterministic algorithms. In contrast, First-in first-out (FIFO) is shown to be least smooth among all strongly-competitive deterministic algorithms. Randomized algorithms have been shown to be more competitive than deterministic ones. We derive lower bounds for the smoothness of randomized, demand-paging and randomized strongly-competitive algorithms that indicate that randomization might also help with smoothness. However, we show that none of the well-known randomized algorithms Mark, Equitable, and Partition is smooth. The simple randomized algorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, but not more. ? Most of the reported work was carried out while this author was a postdoctoral researcher at Saarland University. arXiv:1510.03362v1 [cs.DS] 12 Oct 2015

Transcript of arXiv:1510.03362v1 [cs.DS] 12 Oct 20152.2 The Paging Problem Paging models a two-level memory system...

  • On the Smoothness of Paging Algorithms

    Jan Reineke1 and Alejandro Salinger2?

    1 Department of Computer Science, Saarland University, Saarbrücken, [email protected] SAP SE, Walldorf, [email protected]

    Abstract. We study the smoothness of paging algorithms. How much can the number of page faultsincrease due to a perturbation of the request sequence? We call a paging algorithm smooth if themaximal increase in page faults is proportional to the number of changes in the request sequence. Wealso introduce quantitative smoothness notions that measure the smoothness of an algorithm.We derive lower and upper bounds on the smoothness of deterministic and randomized demand-pagingand competitive algorithms. Among strongly-competitive deterministic algorithms LRU matches thelower bound, while FIFO matches the upper bound.Well-known randomized algorithms like Partition, Equitable, or Mark are shown not to be smooth.We introduce two new randomized algorithms, called Smoothed-LRU and LRU-Random. Smoothed-LRU allows to sacrifice competitiveness for smoothness, where the trade-off is controlled by a parameter.LRU-Random is at least as competitive as any deterministic algorithm while smoother.

    1 Introduction

    Due to their strong influence on system performance, paging algorithms have been studied extensively sincethe 1960s. Early studies were based on probabilistic request models [1,2,3]. In their seminal work, Sleator andTarjan [4] introduced the notion of competitiveness, which relates the performance of an online algorithm tothat of the optimal offline algorithm. By now, the competitiveness of well-known deterministic and randomizedpaging algorithms is well understood, and various optimal online algorithms [5,6] have been identified.

    In this paper, we study the smoothness of paging algorithms. We seek to answer the following question:How strongly may the performance of a paging algorithm change when the sequence of memory requests isslightly perturbed? This question is relevant in various domains: Can the cache performance of an algorithmsuffer significantly due to the occasional execution of interrupt handling code? Can the execution time ofa safety-critical real-time application be safely and tightly bounded in the presence of interference on thecache? Can secret-dependent memory requests have a significant influence on the number of cache missesof a cryptographic protocol and thus give rise to a timing side-channel attack?

    We formalize the notion of smoothness by identifying the performance of a paging algorithm with thenumber of page faults and the magnitude of a perturbation with the edit distance between two requestsequences.

    We show that for any deterministic, demand-paging or competitive algorithm, a single additional memoryrequest may cause k + 1 additional faults, where k is the size of the cache. Least-recently-used (LRU)matches this lower bound, indicating that there is no trade-off between competitiveness and smoothnessfor deterministic algorithms. In contrast, First-in first-out (FIFO) is shown to be least smooth among allstrongly-competitive deterministic algorithms.

    Randomized algorithms have been shown to be more competitive than deterministic ones. We derive lowerbounds for the smoothness of randomized, demand-paging and randomized strongly-competitive algorithmsthat indicate that randomization might also help with smoothness. However, we show that none of thewell-known randomized algorithms Mark, Equitable, and Partition is smooth. The simple randomizedalgorithm that evicts one of the cached pages uniformly at random is shown to be as smooth as LRU, butnot more.

    ? Most of the reported work was carried out while this author was a postdoctoral researcher at Saarland University.

    arX

    iv:1

    510.

    0336

    2v1

    [cs

    .DS]

    12

    Oct

    201

    5

  • We then introduce a new parameterized randomized algorithm, Smoothed-LRU, that allows to sacrificecompetitiveness for smoothness. For some parameter values Smoothed-LRU is smoother than any randomizedstrongly-competitive algorithm can possibly be, indicating a trade-off between smoothness and competitivenessfor randomized algorithms. This leaves the question whether there is a randomized algorithms that is smootherthan any deterministic algorithm without sacrificing competitiveness. We answer this question in the affirmativeby introducing LRU-Random, a randomized version of LRU that evicts older pages with a higher probabilitythan younger ones. We show that LRU-Random is smoother than any deterministic algorithm for k = 2.While we conjecture that this is the case as well for general k, this remains an open problem.

    The notion of smoothness we present is not meant to be an alternative to competitive analysis for theevaluation of the performance of a paging algorithm; rather, it is a complementary quantitative measurethat provides guarantees about the performance of an algorithm under uncertainty of the input. In general,smoothness is useful in both testing and verification:– In testing: if a system is smooth, then a successful test run is indicative of the system’s correct behavior

    not only on the particular test input, but also in its neighborhood.– In verification, systems are shown to behave correctly under some assumption on their environment. Due

    to incomplete environment specifications, operator errors, faulty implementations, or other causes, theenvironment assumption may not always hold completely. In such a case, if the system is smooth, “small”violations of the environment assumptions will, in the worst case, result in “small” deviations from correctbehavior.

    An example of the latter case that motivates our present work appears in safety-critical real-time systems,where static analyses are employed to derive guarantees on the worst-case execution time (WCET) of a programon a particular microarchitecture [7]. While state-of-the-art WCET analyses are able to derive fairly precisebounds on execution times, they usually only hold for the uninterrupted execution of a single program withno interference from the environment whatsoever. These assumptions are increasingly difficult to satisfy withthe adoption of preemptive scheduling or even multi-core architectures, which may introduce interference onshared resources such as caches and buses. Given a smooth cache hierarchy, it is possible to separately analyzethe effects of interference on the cache, e.g. due to interrupts, preemptions, or even co-running programs onother cores. Our results may thus inform the design and analysis of microarchitectures for real-time systems [8].

    Interestingly, our model shows a significant difference between LRU and FIFO, two algorithms whosetheoretical performance has proven difficult to separate.

    Our results are summarized in Table 1. An algorithm A is (α, β, δ)-smooth, if the number of page faultsA(σ′) of A on request sequence σ′ is bounded by α ·A(σ) + β whenever σ can be transformed into σ′ by atmost δ insertions, deletions, or substitutions of individual requests. Often, our results apply to a genericvalue of δ. In such cases, we express the smoothness of a paging algorithm by a pair (α, β), where α andβ are functions of δ, and A is (α(δ), β(δ), δ)-smooth for every δ. Usually, the smoothness of an algorithmdepends on the size of the cache, which we denote by k. As an example, under LRU the number of faultsmay increase by at most δ(k + 1), where δ is the number of changes in the sequence. A precise definition ofthese notions is given in Section 3.

    For readability, we place some of the proofs of our results in the appendix.

    2 Related Work

    2.1 Notions of Smoothness

    Robust control is a branch of control theory that explicitly deals with uncertainty in its approach to controllerdesign. Informally, a controller designed for a particular set of parameters is said to be robust if it would alsowork well under a slightly different set of assumptions. In computer science, the focus has long been on thebinary property of correctness, as well as on average- and worst-case performance. Lately, however, variousnotions of smoothness have received increasing attention: Chaudhuri et al. [9] develop analysis techniquesto determine whether a given program computes a Lipschitz-continuous function. Lipschitz continuity isa special case of our notion of smoothness. Continuity is also strongly related to differential privacy [10],

    2

  • Table 1. Upper and lower bounds on the smoothness of paging algorithms. In the table, k is the size of the cache, δis the distance between input sequences, Hk denotes the k

    th harmonic number, and γ is an arbitrary constant.

    Algorithm Lower bound Upper bound

    Deterministic, demand-paging (1, δ(k + 1)) ∞Det. c-competitive with additive constant β (1, δ(k + 1)) (c, 2δc+ β)Deterministic, strongly-competitive (1, δ(k + 1)) (k, 2δk)

    Optimal offline (1, 2δ) (1, 2δ)

    LRU (1, δ(k + 1)) (1, δ(k + 1))FWF (1, 2δk) (1, 2δk)FIFO (k, γ, 1) (k, 2δk)

    Randomized, demand-paging (1, Hk +1k, 1) ∞

    Randomized, strongly-competitive (1, δHk) (Hk, 2δHk)

    Equitable, Partition (1 + �, γ, 1) (Hk, 2δHk)Mark (Ω(Hk), γ, 1) (2Hk − 1, δ(4Hk − 2))Random (1, δ(k + 1)) (1, δ(k + 1))Evict-On-Access (1, δ(1 + k

    2k−1 )) (1, δ(1 +k

    2k−1 ))

    Smoothed-LRUk,i (1, δ(k+i2i+1

    + 1)) (1, δ( k+i2i+1

    + 1))

    where the result of a query may not depend strongly on the information about any particular individual.Differential privacy proofs with respect to cache side channels [11] may be achievable in a compositionalmanner for caches employing smooth paging algorithms.

    Doyen et al. [12] consider the robustness of sequential circuits. They determine how long into the futurea single disturbance in the inputs of a sequential circuit may affect the circuit’s outputs. Much earlier, butin a similar vein, Kleene [13], Perles, Rabin, Shamir [14], and Liu [15] developed the theory of definite eventsand definite automata. The outputs of a definite automaton are determined by a fixed-length suffix of itsinputs. Definiteness is a sufficient condition for smoothness.

    The work of Reineke and Grund [16] is closest to ours: they study the maximal difference in the numberof page faults on the same request sequence starting from two different initial states for various deterministicpaging algorithms. In contrast, here, we study the effect of differences in the request sequences on thenumber of faults. Also, in addition to only studying particular deterministic algorithms as in [16], in thispaper we determine smoothness properties that apply to classes of algorithms, such as all demand-pagingor strongly-competitive ones, as well as to randomized algorithms. One motivation to consider randomizedalgorithms in this work are recent efforts to employ randomized caches in the context of hard real-timesystems [17].

    2.2 The Paging Problem

    Paging models a two-level memory system with a small fast memory known as cache, and a large but slowmemory, usually referred to simply as memory. During a program’s execution, data is transferred betweenthe cache and memory in units of data known as pages. The size of the cache in pages is usually referred toas k. The size of the memory can be assumed to be infinite. The input to the paging problem is a sequence ofpage requests which must be made available in the cache as they arrive. When a request for a page arrivesand this page is already in the cache, then no action is required. This is known as a hit. Otherwise, the pagemust be brought from memory to the cache, possibly requiring the eviction of another page from the cache.This is known as a page fault or miss. A paging algorithm must decide which pages to keep in the cache inorder to minimize the number of faults.

    A paging algorithm is said to be demand paging if it only evicts a page from the cache upon a faultwith a full cache. Any non-demand paging algorithm can be made to be demand paging without sacrificingperformance [18].

    3

  • In general, paging algorithms must make decisions as requests arrive, with no knowledge of future requests.That is, paging is an online problem. The most prevalent way to analyze online algorithms is competitive anal-ysis [4]. In this framework, the performance of an online algorithm is measured against an algorithm with fullknowledge of the input sequence, known as optimal offline or OPT. We denote by A(σ) the number of misses ofan algorithm when processing the request sequence σ. A paging algorithm A is said to be c-competitive if for allsequences σ, A(σ) ≤ c ·OPT(σ) +β, where β is a constant independent of σ. The competitive ratio of an algo-rithm is the infimum over all possible values of c satisfying the inequality above. An algorithm is called competi-tive if it has a constant competitive ratio and strongly competitive if its competitive ratio is the best possible [5].

    Traditional paging algorithm are Least-recently-used (LRU)—evict the page in the cache that has beenrequested least recently— and First-in first-out (FIFO)—evict the page in the cache that was broughtinto cache the earliest. Another simple algorithm often considered is Flush-when-full (FWF)—empty thecache if the cache is full and a fault occurs. These algorithms are k-competitive, which is the best ratiothat can be achieved for deterministic online algorithms [4]. An optimal offline algorithm for paging isFurthest-in-the-future, also known as Longest-forward-distance and Belady’s algorithm [1]. This algorithmevicts the page in the cache that will be requested at the latest time in the future.

    A competitive ratio less than k can be achieved by the use of randomization. Important randomizedpaging algorithms are Random—evict a page chosen uniformly at random— and Mark [19]—mark a pagewhen it is unmarked and requested, and upon a fault evict a page chosen uniformly at random amongunmarked pages (unmarking all pages first if no unmarked pages remain). Random achieves a competitive

    ratio of k, while Mark’s competitive ratio is 2Hk − 1, where Hk =∑ki=1

    1i is the k

    th harmonic number. Thestrongly-competitive algorithms Partition [5] and Equitable [6] achieve the optimal ratio of Hk.

    3 Smoothness of Paging Algorithms

    We now formalize the notion of smoothness of paging algorithms. We are interested in answering the followingquestion: How does the number of misses of a paging algorithm vary as its inputs vary? We quantify thesimilarity of two request sequences by their edit distance:

    Definition 1 (Distance). Let σ = x1, . . . , xn and σ′ = x′1, . . . , x

    ′m be two request sequences. Then we denote

    by ∆(σ, σ′) their edit distance, defined as the minimum number of substitutions, insertions, or deletions totransform σ into σ′.

    This is also referred to as the Levenshtein distance. Based on this notion of distance we define (α, β, δ)-smoothness:

    Definition 2 ((α, β, δ)-smoothness). Given a paging algorithm A, we say that A is (α, β, δ)-smooth, if forall pairs of sequences σ, σ′ with ∆(σ, σ′) ≤ δ,

    A(σ′) ≤ α ·A(σ) + β

    For randomized algorithms, A(σ) denotes the algorithm’s expected number of faults when serving σ.An algorithm that is (α, β, δ)-smooth may also be (α′, β′, δ)-smooth for α′ > α and β′ < β. As the

    multiplicative factor α dominates the additive constant β in the long run, when analyzing the smoothness ofan algorithm, we first look for the minimal α such that the algorithm is (α, β, δ)-smooth for any β.

    We say that an algorithm is smooth if it is (1, β, 1)-smooth for some β. In this case, the maximal increasein the number of page faults is proportional to the number of changes in the request sequence. This is calledLipschitz continuity in mathematical analysis. For smooth algorithms, we also analyze the Lipschitz constant,i.e, the additive part β in detail, otherwise we concentrate the analysis on the multiplicative factor α.

    We use the above notation when referring to a specific distance δ. For a generic value of δ we omit thisparameter and express the smoothness of a paging algorithm with a pair (α, β), where both α and β arefunctions of δ.

    4

  • Definition 3 ((α, β)-smoothness). Given a paging algorithm A, we say that A is (α, β)-smooth, if for allpairs of sequences σ, σ′,

    A(σ′) ≤ α(δ) ·A(σ) + β(δ),where α and β are functions, and δ = ∆(σ, σ′).

    Often, it is enough to determine the effects of one change in the inputs to characterize the smoothness ofan algorithm A.

    Lemma 1. If A is (α, β, 1)-smooth, then A is (αδ, β∑δ−1i=0 α

    i)-smooth.

    Proof. By induction on δ. The case δ = 1 is trivial. Assume the hypothesis is true for 1 < δ ≤ h. Let σh+1 and σbe any pair of sequences such that ∆(σ, σh+1) = h+1. Then there exists a sequence σh such that ∆(σ, σh) = hand ∆(σh, σh+1) = 1. Since A is (α, β, 1)-smooth, then A(σh+1) ≤ αA(σh) + β. By the inductive hypothesis,A(σh) ≤ αhA(σ) + β

    ∑h−1i=0 α

    i. Therefore, A(σh+1) ≤ α(αhA(σ) + β∑h−1i=0 α

    i) + β = αh+1A(σ) + β∑hi=0 α

    i,

    and thus A is (αδ, β∑δ−1i=0 α

    i)-smooth. ut

    Corollary 1. If A is (1, β, 1)-smooth, then A is (1, δβ)-smooth.

    4 Smoothness of Deterministic Paging Algorithms

    4.1 Bounds on the Smoothness of Deterministic Paging Algorithms

    Before considering particular deterministic online algorithms, we determine upper and lower bounds forseveral important classes of algorithms. Many natural algorithms are demand paging.

    Theorem 1 (Lower bound for deterministic, demand-paging algorithms). No deterministic,demand-paging algorithm is (1,δ(k+1−�))-smooth for any �>0.

    Proof. Let A be any deterministic, demand-paging algorithm. Using k + 1 distinct pages, we can construct asequence σA(δ) of length k + δ(k + 1) such that A faults on every request: first request the k + 1 distinctpages in any order; then arbitrarily extend the sequence by requesting the page that A has just evicted. Let pbe the page that occurs least frequently in σA(δ). By removing all requests to p from σA(δ), we obtain asequence σ′A(δ) that consists of k distinct pages only. By assumption A is demand paging. Thus, A incursonly k page faults on the entire sequence. Assume for a contradiction that A is (1, δ(k + 1− �))-smooth forsome � > 0. Then, we have by definition:

    A(σA(δ)) ≤ 1 ·A(σ′A(δ)) +∆(σ′A(δ), σA(δ)) · (k + 1− �).

    Clearly, p occurs at most⌊k+δ(k+1)k+1

    ⌋= δ times in σA(δ). So ∆(σ

    ′A(δ), σA(δ)) ≤ δ, and we get:

    k + δ(k + 1) ≤ k + δ · (k + 1− �)⇔ � ≤ 0,

    which contradicts the assumption that � > 0. ut

    While most algorithms are demand paging, it is not a necessary condition for an algorithm to be competitive,as demonstrated by FWF. However, we obtain the same lower bound for competitive algorithms as fordemand-paging ones.

    Theorem 2 (Lower bound for deterministic, competitive paging algorithms). No deterministic,competitive paging algorithm is (1, δ(k + 1− �))-smooth for any � > 0.

    5

  • Proof. Let A be any c-competitive deterministic online paging algorithm. The proof is essentially the sameas the one for Theorem 1 with the number of faults of A on σ′A(δ) being at most ck + β, for some constant β.This follows from the competitiveness of A and the fact that OPT makes at most k faults on σ′A(δ). Assumingfor a contradiction that A is (1, δ(k + 1− �))-smooth for some �, we get

    A(σA(δ)) ≤ 1 ·A(σ′A(δ)) +∆(σ′A(δ), σA(δ)) · (k + 1− �)⇒ k + δ(k + 1) ≤ ck + β + δ · (k + 1− �)

    ⇒ δ� ≤ (c− 1)k + β

    For any � > 0, there is a δm such that any δ > δm contradicts the above inequality. By a slight generalizationof Corollary 1 this implies that the algorithm is not (1, δ(k + 1− �), δ)-smooth for any δ and � > 0. ut

    By contraposition of Corollary 1, the two previous theorems show that no deterministic, demand-paging orcompetitive algorithm is (1, k + 1− �, 1)-smooth for any � > 0.

    Intuitively, the optimal offline algorithm should be very smooth, and this is indeed the case as we shownext:

    Theorem 3 (Smoothness of OPT). OPT is (1, 2δ)-smooth. This is tight.

    Proof. For the lower bound consider the following two sequences σδ and σ′δ with ∆(σδ, σ

    ′δ) = δ: σδ =

    (1, . . . , k, 1, . . . , k)δ+1 and σ′δ = (1, . . . , k, 1, . . . , k, x)δ(1, . . . , k, 1, . . . , k), where σl denotes the concatenation

    of l copies of σ and x 6∈ {1, . . . , k}. Clearly, OPT(σδ) = k as σδ contains only k distinct pages. Further, underoptimal replacement every request to x faults in σ′δ and it replaces one of the pages 1, . . . , k, which results inan additional fault later on. So, OPT(σ′δ) ≥ k + 2δ.

    We show that OPT is (1, 2, 1)-smooth, which implies the theorem by Corollary 1. Let σ and σ′ be twosequences such that ∆(σ, σ′) = 1. We will show that there exists an algorithm A such that A(σ′) ≤ OPT (σ)+2,from which the theorem follows since OPT(σ′) ≤ A(σ′). Let A be an offline paging algorithm serving σ′.On the equal prefix of σ′ and σ, A will act exactly as OPT does on σ. This implies that right before thedifference the caches COPT of OPT and CA of A have the same pages. No matter what the difference betweenthe sequences is, the different request can only make the caches of A and OPT differ by at most one page:if it is an insertion of p in σ′ that results in a hit for A then COPT = CA, otherwise p evicts a page q andfetches p. If the difference is a deletion of p from σ′ then if p is a hit COPT = CA, and otherwise OPT evictssome page q and fetches p. If the difference is a substitution of p in σ by r in σ′, then if either of the requestsis a hit this is equivalent to the cases above, and if they are both misses, A evicts what OPT evicts. In allcases, after the difference either CA = COPT, or CA = (COPT \ {q}) ∪ {p} for some pages p and q, with p 6= q.At this point the number of faults of A exceeds those of OPT by at most one. We now show that A canmanage to incur at most one more fault than OPT in the rest of the sequence.

    Let ρ be the suffix of σ and σ′ after the difference and let Ai(ρ) and OPTi(ρ) denote the number of faultsof the algorithms on the suffix up to request i. We now claim that A can be such that after every request ρieither (1) Ai(ρ) ≤ OPTi(ρ) and either CA = COPT or CA = (COPT \ {q})∪ {p}, for some pages p and q withp 6= q, or (2) Ai(ρ) = OPTi(ρ) + 1 and CA = COPT. If at any point CA = COPT, then A acts like OPT forthe rest of the suffix and the claim is true.

    We show that this invariant holds after every request ρi, which implies that A(ρ) ≤ OPT(ρ) + 1 andhence that A(σ′) ≤ OPT(σ) + 2.

    Initially Ai(ρ) = OPTi(ρ) and assume that CA = (COPT \ {q}) ∪ {p}. A acts as follows on ρi:

    – If ρi ∈ CA and ρi ∈ COPT, A does nothing and the invariant holds.– If ρi ∈ CA and ρi /∈ COPT, then ρi = p and A does nothing. It holds that Ai(ρ) ≤ OPTi(ρ). If OPT

    evicts a page q′ 6= q, then CA = (COPT \ {q}) ∪ {q′} and the invariant holds. If OPT evicts q then bothcaches are equal and the claim is true.

    – If ρi /∈ CA and ρi /∈ COPT, Ai(ρ) ≤ OPTi(ρ) still holds. If OPT evicts q, A evicts p, and the caches areequal. Otherwise A evicts the same page as OPT and CA = (COPT \ {q}) ∪ {p}.

    – If ρi /∈ CA and ρi ∈ COPT, then ρi = q. Ai(ρ) = OPTi(ρ) + 1, A evicts p and CA = COPT. ut

    6

  • With Theorem 3 it is easy to show the following upper bound on the smoothness of any competitive algorithm:

    Theorem 4 (Smoothness of competitive algorithms). Let A be any paging algorithm such that forall sequences σ, A(σ) ≤ c ·OPT(σ) + β. Then A is (c, 2δc+ β)-smooth.

    Proof. Let σ′ be a sequence such that ∆(σ, σ′) = δ. By Theorem 3, OPT(σ′) ≤ OPT(σ) + 2δ. Therefore,A(σ′) ≤ c · (OPT(σ) + 2δ) + β ≤ c ·A(σ) + 2δc+ β. ut

    Note that the above theorem applies to both deterministic and randomized algorithms. Given thatevery competitive algorithm is (α, β)-smooth for some α and β, the natural question to ask is whether theconverse also holds. Below, we answer this question in the affirmative for deterministic bounded-memory,demand-paging algorithms. By bounded memory we mean algorithms that, in addition to the contents oftheir fast memory, only have a finite amount of additional state. For a more formal definition consult [18,page 93]. Paging algorithms implemented in hardware caches are bounded memory. Our proof requires thenotion of a k-phase partition:

    Definition 4 (k-phase partition). The k-phase partition of a sequence σ is a partition of σ into contiguoussubsequences called k-phases, or simply phases, such that the first phase starts with the first request of σ anda new phase starts when (k + 1) distinct pages have been requested since the beginning of the previous phase.

    Theorem 5 (Competitiveness of smooth algorithms). If algorithm A is deterministic bounded-memory, demand-paging, and (α, β)-smooth for some α and β, then A is also competitive.

    Proof. Assume algorithm A is non-competitive. Then, there is no bound on the number of misses in a singlek-phase for A: otherwise, if r is a bound on the number of misses of A in every phase, then A is competitivewith competitive ratio c ≤ r.

    Let n be the number of states of A. Within a k-phase, a demand-paging algorithm can reach at most (2k)k

    different configurations: each of the k slots can either contain one of the k “old” pages cached at the start ofthe k-phase, or one of the up to k “new” pages requested within the phase. Let σ be a sequence of minimallength that ends on a phase in which A misses more than n · (2k)k times. By the pigeon-hole principle, Amust assume the same state and configuration pair twice within that phase. Due to the minimality of thesequence, A must fault at least once between those two occurrences. By repeating the sequence of requestsbetween the two occurrences, we can thus pump up the sequence and the number of faults arbitrarily withoutincreasing the number of phases. By removing the finite prefix of the sequence that comprises all but the finalphase, we can construct a sequence σ′ containing at most k distinct pages. Any demand-paging algorithm,in particular A, will fault at most k times on this sequence. The edit distance between σ′ and σ is finite, butthe difference in faults is unbounded. This shows that A is not (α, β)-smooth for any α and β. ut

    4.2 Smoothness of Particular Deterministic Algorithms

    Now let us turn to the analysis of three well-known deterministic algorithms: LRU, FWF, and FIFO. Weshow that both LRU and FWF are smooth. On the other hand, FIFO is not smooth, as a single change inthe request sequence may increase the number of misses by a factor of k.

    Theorem 6 (Smoothness of Least-recently-used).LRU is (1, δ(k + 1))-smooth. This is tight.

    Proof. We show that LRU is (1, k + 1, 1)-smooth. Corollary 1 then immediately implies that LRU is(1, δ(k + 1))-smooth. Tightness follows from Theorem 1 as LRU is demand paging. To analyze LRU, it isconvenient to introduce the notion of age. The age of page p is the number of distinct pages that have beenrequested since the previous request to p. Before their first request, all pages have age ∞. A request to pagep results in a fault if and only if p’s age is greater than or equal to k, the size of the cache. Finite ages are

    7

  • unique, i.e., no two pages have the same age less than ∞. At any time at most k pages are cached, and atmost k pages have an age less than k.

    Let us now consider how the insertion of one request may affect ages and the expected number of faults.By definition, the age of any page is only affected from the point of insertion up to its next request. Onlythe next request to a page may thus turn from a hit into a miss. At any time at most k pages have an ageless than k. So at most k requests may turn from hits into misses. As the inserted request itself may alsointroduce a fault, the overall number of faults may thus increase by at most k + 1.

    Substitutions are similar to insertions: they turn at most k succeeding hits into misses, and the substitutedrequest itself may introduce one additional fault. The deletion of a request to page p does not increase theages of other pages. Only the next request to p may turn from a hit into a miss. ut

    So LRU matches the lower bound for both demand-paging and competitive paging algorithms. We nowshow that FWF is also smooth, with a factor that is almost twice that of LRU. The smoothness of FWFfollows from the fact that it always misses k times per phase, and the number of phases can only changemarginally when perturbing a sequence, as we show in Lemma 2.

    For a sequence σ, let Φ(σ) denote the number of phases in its k-phase partition.

    Lemma 2. Let σ and σ′ be two sequences such that ∆(σ, σ′) = 1. Then Φ(σ′) ≤ Φ(σ) + 2. Furthermore, let `and `′ be the number of distinct pages in the last phase of σ and σ′, respectively. If Φ(σ′) = Φ(σ) + 2, then`′ ≤ `.

    Theorem 7 (Smoothness of Flush-when-full).FWF is (1, 2δk)-smooth. This is tight.

    Proof. Let σ and σ′ be two sequences such that ∆(σ, σ′) = 1. Let Φ(σ) (resp. Φ(σ′)) be the number ofphases in the k-phase partition of σ (resp. σ′), and let ` (resp. `′) be the number distinct pages in thelast phase of the partition of σ (resp. σ′). FWF misses exactly k times in any phase of a sequence, exceptpossibly for the last one, in which it misses a number of times equal to the number of distinct pagesin the phase. Then, FWF(σ) = k · (Φ(σ) − 1) + `, and FWF(σ′) = k · (Φ(σ′) − 1) + `′. By Lemma 2,if Φ(σ′) = Φ(σ) + 2, `′ ≤ `, and thus FWF(σ′) ≤ k(Φ(σ) + 2 − 1) + ` = FWF(σ) + 2k. Otherwise, ifΦ(σ′) ≤ Φ(σ) + 1, then FWF(σ′) ≤ k(Φ(σ) + 1 − 1) + `′ = FWF(σ) − ` + k + `′. Since ` ≥ 0 and `′ ≤ k,FWF(σ′) ≤ FWF(σ)+2k. The upper bound in the lemma follows by Corollary 1. To see that this upper boundis tight, let σ = (x1, . . . , xk)

    2δ+1, where xi 6= xj for all i 6= j. Let σ′ = x1, . . . , xk(xk+1, x1, . . . , xk, x1, . . . , xk)δ,where xk+1 6= xi for all i ≤ k. Thus, ∆(σ, σ′) = δ. Clearly FWF(σ) = k, while FWF(σ′) = k + 2δk, andhence FWF(σ′) = FWF(σ) + 2δk. ut

    We now show that FIFO is not smooth. In fact, we show that with only a single difference in the sequences,the number of misses of FIFO can be k times higher than the number of misses in the original sequence. Onthe other hand, since FIFO is strongly competitive, the multiplicative factor k is also an upper bound forFIFO’s smoothness.

    Theorem 8 (Smoothness of First-in first-out).FIFO is (k, 2δk)-smooth. FIFO is not (k − �, γ, 1)-smooth for any � > 0 and γ.Proof. The upper bound follows from the competitiveness of FIFO and Theorem 4.

    For the lower bound, we show how to construct two sequences σk and σ′k for each cache size k, such that

    ∆(σ′k, σk) = 1, that yield configurations c = [1, . . . , k] and c′ = [k, . . . , 1], where pages are sorted from last-in

    to first-in from left to right. Then, the sequence 0, 1, 2, . . . , k − 1 yields k misses starting from configurationc′ and only one miss starting from c. The resulting configurations are [0, . . . , k − 1] and [k − 1, . . . , 0], whichare equal to c and c′ up to renaming. So we can construct an arbitrarily long sequence that yields k times asmany misses starting from configuration c′ as it does from configuration c.

    For k = 2, σ2 = 2, 1 and σ′2 = 1, 2, 1 = 1 ◦ σ2 have edit distance ∆(σ′2, σ2) = 1 and yield configurations

    [1, 2] and [2, 1], respectively. For k = 3, σ3 = 2, 3, 1, 4, 2, 1, 5, 1, 4 and σ′3 = 1 ◦ σ3 yield configurations [4, 1, 5]

    and [5, 1, 4], respectively, which are equal up to renaming to [1, 2, 3] and [3, 2, 1].

    8

  • For k > 3, we present a recursive construction of σk and σ′k based on σk−1 and σ

    ′k−1. Notice, that

    σ′2 = 1 ◦ σ2 and σ′3 = 1 ◦ σ3. We will maintain that σ′k = 1 ◦ σk in the recursive construction.As σk−1 and σ′k−1 are constructed for a cache of size k−1, they will behave differently on a larger cache of

    size k. However, we can pad σk−1 and σ′k−1 with requests to one additional page x that fills up the additionalspace in the cache. This can be achieved as follows: Add a request to x at the start of the two sequences(following the request to 1 in σ′k). Also, whenever x is evicted in either of the two sequences, in a cache ofsize k, add a request to x in both sequences. By construction, the additional requests do not increase theedit distance between the two sequences. Further, the additional requests ensure that every request thatbelongs to the original sequences faults in the new sequence on a cache of size k if and only if it faults inthe original sequence on a cache of size k − 1. In this way, we obtain σk,pre and σ′k,pre. The two sequencesyield configurations c = [1, . . . , i′, x, i′+ 1, . . . , k− 1] and c′ = [k− 1, . . . , j′+ 1, x, j′, . . . 1], respectively, which,unless i′ = j′, are almost solutions to the original problem.

    Observe that c = [1, . . . , i′, x, i′ + 1, . . . , k − 1] and c′ = [k − 1, . . . , j′ + 1, x, j′, . . . 1] are equal up torenaming to d = [1, . . . , k] and d′ = [k, . . . , j + 1, i, j . . . , i + 1, i − 1, . . . 1] for some i, j with 1 ≤ i < j ≤ k.We distinguish five cases depending on the values of i and j:

    Case 1: 1 < i < j < k. Below we build a suffix σk,post that finishes the construction:

    Requests State for prefix σ′k,pre State for prefix σk,pre[1, . . . , k] [k, . . . , j + 1, i, j, . . . , i+ 1, i− 1, . . . , 1]

    v−→ [v, 1, . . . , k − 1] [v, k, . . . , j + 1, i, j, . . . , i+ 1, i− 1, . . . , 2]k,...,i+1−−−−−→ [i+ 1, . . . , k, v, 1, . . . , i− 1] [v, k, . . . , j + 1, i, j, . . . , i+ 1, i− 1, . . . , 2]1,...,i−1−−−−−→ [i+ 1, . . . , k, v, 1, . . . , i− 1] [i− 1, . . . , 1, v, k, . . . , j + 1, i, j, . . . , i+ 2]

    w−→ [w, i+ 1, . . . , k, v, 1, . . . , i− 2] [w, i− 1, . . . , 1, v, k, . . . , j + 1, i, j, . . . , i+ 3]i+1,...,k,v,1,...,i−2−−−−−−−−−−−−→ [w, i+ 1, . . . , k, v, 1, . . . , i− 2] [i− 2, . . . , 1, v, k, . . . , i+ 1, w]

    Case 2: 1 = i < j < k − 1. Consider the following suffix:Requests State for prefix σ′k,pre State for prefix σk,pre

    [1, . . . , k] [k, . . . , j + 1, 1, j, . . . , 2]y−→ [y, 1, . . . , k − 1] [y, k, . . . , j + 1, 1, j, . . . , 3]

    2,...,j−−−→ [y, 1, . . . , k − 1] [j, . . . , 2, y, k, . . . , j + 1]1−→ [y, 1, . . . , k − 1] [1, j, . . . , 2, y, k, . . . , j + 2]

    j+1,...,k−1−−−−−−−→ [y, 1, . . . , k − 1] [k − 1, . . . , j + 1, 1, j, . . . , 2, y]

    The final pair of states is equal up to renaming to the pair d = [1, . . . , k] and d′ = [k, . . . , j+2, 2, j+1, . . . , 3, 1],and so it fulfills the conditions under which the suffix σk,post constructed in Case 1 finishes the construction.

    Case 3: 1 = i < j = k − 1. Consider the following suffix:Requests State for prefix σ′k,pre State for prefix σk,pre

    [1, . . . , k] [k, 1, k − 1, . . . , 2]x−→ [x, 1, . . . , k − 1] [x, k, 1, k − 1, . . . , 3]

    2,...,k−1−−−−−→ [x, 1, . . . , k − 1] [k − 1, . . . , 2, x, k]y,z−−→ [z, y, x, 1, . . . , k − 3] [z, y, k − 1, . . . , 2]

    x,1,...,k−3−−−−−−−→ [z, y, x, 1, . . . , k − 3] [k − 3, . . . , 1, x, z, y]

    The final pair of states is equal up to renaming to the pair d = [1, . . . , k] and d′ = [k, . . . , 3, 1, 2], and so itfulfills the conditions under which Case 2 continues the construction.

    Case 4: 1 < i < j = k. Exchanging σk,pre and σ′k,pre yields states that fulfill the conditions of either Case

    2 or Case 3.Case 5: 1 = i < j = k. Consider the following suffix:

    Requests State for prefix σ′k,pre State for prefix σk,pre[1, . . . , k] [1, k, . . . , 2]

    x−→ [x, 1, . . . , k − 1] [x, 1, k, . . . , 3]2,...,k−1−−−−−→ [x, 1, . . . , k − 1] [k − 1, . . . , 2, x, 1]

    9

  • The resulting pair of states is equal up to renaming to [1, . . . , k] and [k, . . . , 3, 1, 2], which corresponds toCase 2. ut

    FIFO matches the upper bound for strongly-competitive deterministic paging algorithms. With the resultfor LRU, this demonstrates that the upper and lower bounds for the smoothness of strongly-competitivealgorithms are tight.

    5 Smoothness of Randomized Paging Algorithms

    5.1 Bounds on the Smoothness of Randomized Paging Algorithms

    Similarly to deterministic algorithms, we can show a lower bound on the smoothness of any randomizeddemand-paging algorithm. Notice that the lower bound only applies to δ = 1 and so additional disturbancesmight have a smaller effect than the first one.

    Theorem 9 (Lower bound for randomized, demand-paging algorithms). No randomized, demand-paging algorithm is (1, Hk+

    1k−�, 1)-smooth for any �>0.

    Proof. For a given randomized, demand-paging algorithm A, we show how an oblivious adversary canconstruct two sequences, a “bad” sequence σ′A and a “good” sequence σA, with edit distance 1, such thatA(σ′) is at least k +Hk + 1k and A(σ) is exactly k. The existence of such sequences immediately implies thetheorem. The construction is inspired by the nemesis sequence devised by Fiat et al. [19] in their proof of alower bound for the competitiveness of randomized algorithms.

    The sequence σ′A consists of requests to n = k+ 1 distinct pages. During the construction of the sequence,the adversary maintains for each of the n pages its probability pi of not being in the cache. This is possible,because the adversary knows the probability distribution used by A. We have

    ∑i pi ≥ 1, as only k of the

    n = k + 1 pages can be in the fast memory.The “bad” sequence σ′A begins by n requests, a single request to each of the n pages in an arbitrary order.

    Initially, the fast memory is empty, and so these requests will result in k + 1 faults. After those requests, aspn = 0, there will be at least one page i with pi ≥ 1k . The next request in σ′A is to such a page. We will laterrefer to this page as m. The remainder of σ′A is composed of k − 1 subphases, the ith subphase of which willcontribute an expected 1k−i+1 page faults. By linearity of expectation, we can sum up the expected faults on

    the entire sequence, and obtain A(σ′) ≥ k + 1 + 1k +∑k−1i=1

    1k−i+1 = k + 1 +

    1k + Hk − 1 = k + Hk + 1k . It

    remains to show how to construct the remaining k − 1 subphases and the “good” sequence σA.Each of the k − 1 subphases consists of zero or more requests to marked pages followed by exactly one

    request to an unmarked page. A page is marked at the start of subphase i if it is page m or if it has beenrequested in at least one of the preceding subphases 1 ≤ j < i. Let M be the set of marked pages at thestart of the jth subphase. Then the number of marked pages is |M | = j and the number of unmarked pagesis u = k + 1− j. Let pM =

    ∑i∈M pi. If pM = 0, then there must be an unmarked page n with pn ≥ 1u and

    the adversary can pick this page to end the subphase. Otherwise, if pM > 0 there must be a marked page lwith pl > 0. The first request of subphase j is to page l. Let � = pl. The adversary can now generate requeststo marked pages using the following loop:

    While the expected number of faults in subphase i is less than 1u , and while pM > �, request page l suchthat l = arg maxi∈M pi.

    Note that the loop must terminate, as each iteration will contribute pl ≥ pM|M | > �|M | expected faults. Ifthe loop terminates due to the first condition, the adversary can request an arbitrary unmarked page to endthe subphase. Otherwise, the adversary requests the unmarked page i with the highest probability values.Clearly, pi ≥ 1−pMu > 1−�u . The total expected number of faults of the subphase is then � + pi > 1u . Thisconcludes the construction of σ′A.

    Notice that there is one unmarked page that has only been requested in the initial n requests of σ′A. Weobtain the “good” sequence σA be deleting the request to this unmarked page from σ

    ′A. By construction,

    σA contains requests to only k distinct pages. As A is by assumption demand paging, σA will thus incur kpage faults only. ut

    10

  • For strongly-competitive randomized algorithms we can show a similar statement using a similar yet morecomplex construction:

    Theorem 10 (Lower bound for strongly-competitive randomized paging algorithms). No strongly-competitive, randomized paging algorithm is (1, δ(Hk − �))-smooth for any � > 0.

    In contrast to the deterministic case, this lower bound only applies to strongly-competitive algorithms, asopposed to simply competitive. So with randomization there might be a trade-off between competitivenessand smoothness. There might be competitive algorithms that are smoother than all strongly-competitiveones.

    5.2 Smoothness of Particular Randomized Algorithms

    Two known strongly-competitive randomized paging algorithms are Partition, introduced by McGeoch andSleator [5] and Equitable, introduced by Achlioptas, Chrobak, and Noga [6]. We show that neither of thetwo algorithms is smooth.

    Theorem 11 (Smoothness of Partition and Equitable). For any cache size k ≥ 2, there is an � > 0,such that neither Partition nor Equitable is (1 + �, γ, 1)-smooth for any γ. Also, Partition andEquitable are (Hk, 2δHk)-smooth.

    The lower bound in the theorem above is not tight, but it shows that neither of the two algorithmsmatches the lower bound from Theorem 10. This leaves open the question whether the lower bound fromTheorem 10 is tight.

    Note that the lower bound for Equitable applies equally to OnlineMin [20], as OnlineMin has thesame expected number of faults as Equitable on all request sequences.

    Mark [19] is a simpler randomized algorithm that is (2Hk − 1)-competitive. We show that it is notsmooth either.

    Theorem 12 (Smoothness of Mark). Let α = max1 0 and any γ. Also, Mark is (2Hk− 1, δ(4Hk− 2))-smooth.We conjecture that the lower bound for Mark is tight, i.e., that Mark is (α, β)-smooth for α as defined

    in Theorem 12 and some β.We now prove that Random achieves the same bounds for smoothness as LRU and the best possible

    for any deterministic, demand-paging or competitive algorithm. Intuitively, the additive term k + 1 in thesmoothness of Random is explained by the fact that a single difference between two sequences can makethe caches of both executions differ by one page p. Since Random evicts a page with probability 1/k, theexpected number of faults until p is evicted is k.

    Theorem 13 (Smoothness of Random).Random is (1, δ(k + 1))-smooth. This is tight.

    Proof. For this theorem we use a non-demand paging definition of Random that, upon a fault, it evicts a pagewith probability 1/k, even if the cache is not yet full. This modification with respect to the demand-pagingversion does not change the competitiveness of the algorithm and it allows us to avoid the analysis of specialcases when proving properties about smoothness. In fact, for the non-demand paging version of the algorithm,given any pair of sequences σ, σ′, it is possible to construct two sequences ρ and ρ′ with ∆(ρ, ρ′) = ∆(σ, σ′)such that the number of faults of σ and σ′ starting from an empty cache equals the number of faults of ρ andρ′ starting with a full cache containing an arbitrary set of pages. This can be achieved by renaming in σand σ′ any occurrences of the pages in the initial cache so that these pages do not appear in the rest of thesequences. This implies that any property derived on the smoothness of the algorithm starting with an emptycache can also be achieved when the cache is assumed to be initially full. Note that the same property holdsfor LRU-Random, which is introduced in Section 5.4.

    11

  • For the lower bound, we use a similar construction as the one used for the lower bound of Random’scompetitiveness in [21]. Consider the sequences σ = σ1...k · σ1...k and σ′ = σ1...k · xk+1 · σ1...k with σ1...k =(x1, x2, . . . , xk)

    n. The sequences are identical but for the insertion of xk+1 in σ′ and thus ∆(σ, σ′) = 1. For

    any � > 0, the number of faults in the second half of σ is less than � for a sufficiently large n. On σ′, Randomfaults on xk+1 and evicts one of x1, . . . , xk. Then, on each of the n subsequences of k requests in the secondpart of σ′, and while xk+1 is still in its cache, Random will incur a fault. If in one of these faults Randomevicts xk+1, then it does not incur any faults for the rest of the sequence. Since on every fault Randomevicts xk+1 with probability 1/k, the expected number of faults until this happens exceeds k− � for any � > 0for sufficiently large n. This, plus the initial request to xk+1 yield Random(σ

    ′) ≥ Random(σ) + k + 1− 2�for any � and sufficiently large n. Now, for general δ we follow the same idea: instead of one, we have δsubsequences (x1 . . . xk)

    n in σ and δ subsequences yi(x1 . . . xk)n, where yi (1 ≤ i ≤ δ) is a new page not

    requested so far, and it is distinct in every repetition. The number of expected faults in each repetition inσ′ is at least k + 1− � for any � and sufficiently large n, while Random does not incur extra faults. Thus,Random(σ′) ≥ Random(σ) + δ(k + 1)− �(δ + 1).

    In order to prove the upper bound we look at the state distributions of Random when serving twosequences σ and σ′ with ∆(σ, σ′) = 1. We use a potential function defined as the distance between two statedistributions. For this distance, we define a version of the earth mover’s distance. Let D and D′ be twoprobability distribution of cache states. We define the distance between D and D′ as the minimum cost oftransforming D into D′ by means of transferring probability mass from the states of D to the states of D′.

    Let s and s′ be two cache states in D and D′ with probabilities ps and ps′ , respectively. Let α be afunction that denotes the amount of probability mass to be transferred from states in D to states in D′. Theearth mover’s distance between D and D′ is defined as

    ∆(D,D′) := minα

    ∑s,s′

    α(s, s′) · d(s, s′),

    where for all s,∑s′ α(s, s

    ′) = ps, for all s′,∑s α(s, s

    ′) = ps′ , and d(s, s′) is the distance between states sand s′. We define d(s, s′) = k · Hc(s,s′), where c(s, s′) = max{|s \ s′|, |s′ \ s|}, and H` is the `th harmonicnumber. Note that |s\s′| might not equal |s′ \s| if either state does not represent a full cache. For conveniencewe let H0 = 0. We now prove the following claim:

    Claim. Let D and D′ be two probability distributions over cache states. Let σ be any request sequence andlet MD(σ) and MD′(σ) be two random variables equal to the number of misses on σ by Random whenstarting from distributions D and D′, respectively. Then, E[M ′D(σ)]− E[MD(σ)] ≤ ∆(D,D′).

    Let us assume that the claim is true. Then, we prove the theorem by considering two sequences ρ and ρ′

    such that δ = ∆(ρ, ρ′) = 1 and arguing that ∆(D,D′) ≤ k for any pair of distributions D and D′ that can bereached, respectively, by serving prefixes of ρ and ρ′ starting from an empty cache. If this prefix includesthe single difference between both sequences, then the theorem follows by applying the claim above to themaximal suffix σ shared by both sequences.

    Let j be the minimum j such that ρ[(j + 1)..|ρ|] = ρ′[(j + 1)..|ρ′|] = σ. Then, ρj 6= ρ′j (one of the twomight be empty) and ρ[1..(j − 1)] = ρ′[1..(j − 1)]. Since Random(ρ) and Random(ρ′) start both with anempty cache, their distributions and expected misses before serving ρj and ρ

    ′j coincide. We now argue that

    after serving ρj and ρ′j the distance between the resulting distributions D and D

    ′ is at most k.Let F be the state distribution of both executions before serving ρj and ρ

    ′j . Suppose first that ρ

    ′j is empty

    and thus D′ = F (the case when ρj is empty is symmetric). We look at the minimum cost to transfer theprobability mass from each state from F to D. Let si be a state in F with probability pi. If ρj ∈ si thensi has probability at least pi in D and hence we can transfer pi mass between these states in F and D atcost zero. Otherwise, if ρj /∈ si, D contains k states s′i1 , . . . , s′ik resulting from the eviction of each of the kpages of si, with c(si, s

    ′ir

    ) = 1 and hence d(si, s′ir

    ) = kH1 = k for all 1 ≤ r ≤ k. Moreover, the probabilityof these states is at least pi/k and hence we can transfer all the mass of si to these states at a total cost

    of∑kr=1 k(pi/k) = kpi. Adding up over all states si ∈ F , we can transfer all probability mass of F to D at

    a cost of at most k∑pi = k, since

    ∑pi = 1. Since the distance between D and F is the minimum cost of

    12

  • transferring the probability mass from F to D, this cost is at most k. For the case when ρj 6= ρ′j and neitheris empty, we apply a similar argument. Let si be a state in F with probability pi. If both ρj and ρ

    ′j are in si,

    then this state is also in D and D′, an we can transfer pi from D to D′ at cost zero. Assume that ρj ∈ si butρ′j /∈ si. Then, as we argued above, in D′ there are k states with probability at least pi/k with distance 1 tosi. Since si ∈ D, we can transfer a mass of pi to these states at a cost of kpi. Now, if ρj /∈ si but ρ′j ∈ si,si is in D

    ′ and there are k states in D with distance 1 to si. We can transfer pi/k mass from each of thesestates in D to si in D

    ′ at a cost of kpi. Finally, if ρj /∈ si and ρ′j /∈ si, then there are k pairs of states (s, s′)with s ∈ D and s′ ∈ D′ resulting from the replacement of the same page in si by ρj and ρ′j , respectively, andthus c(s, s′) = 1. In the distance ∆(D,D′) we can transfer pi/k from s to s′ at a cost of k. Since there arek such such pairs for each si, the total cost contributed by these pairs is pik. Since for all cases the costcontributed by a state si ∈ F when transferring mass from D to D′ is at most kpi, the distance ∆(D,D′) isat most k

    ∑pi = k.

    Since serving ρj and ρ′j can add at most 1 to the difference in expected misses, and by the claim above

    the difference in expected misses in the suffix σ is at most ∆(D,D′) = k, it follows that E[Random(ρ′)−Random(ρ)] ≤ k + 1. The theorem follows by Corollary 1.

    We now prove the claim. Let MD(σi) denote the number of misses of Random when σi is requested andwhen the state distribution of Random is D. Let D0 = D and D

    ′0 = D

    ′. Then, it is sufficient to prove thatfor every request σi ∈ σ, for 1 ≤ i ≤ |σ|,

    E[MDi−1(σi)]− E[MD′i−1(σi)] ≤ ∆(Di−1, D′i−1)−∆(Di, D′i). (1)

    This implies that E[MD(σ)] − E[MD′(σ)] ≤ ∆(D0, D′0) − ∆(Df , D′f ) ≤ ∆(D0, D′0) = ∆(D,D′), since∆(·, ·) ≥ 0 for any pair of distributions.

    Let Di−1 and D′i−1 be the distributions before the request to σi.

    ∆(Di−1, D′i−1) =

    ∑su,sv

    α(su, sv)d(su, sv),

    where α(su, sv) is the amount of mass transferred from su to sv (which could be zero). We look at two statessu ∈ D with probability pu and sv ∈ D′ with probability pv and construct a valid assignment α′ after therequest to σi for the distance ∆(Di, D

    ′i).

    We separate the analysis in the following cases:

    1. [σi ∈ su, sv] In this case su ∈ Di with probability at least pu and sv ∈ D′i with probability at least pv.Hence, since α(su, sv) ≤ pu, pv we can make α′(su, sv) = α(su, sv). The contribution of this pair of statesto ∆(Di, D

    ′i) is α(su, sv)d(su, sv) = α(su, sv)kHc(su,sv).

    2. [σi /∈ su, sv] There are k states r = {r1, . . . , rk} in Di and t = {t1, . . . , tk} in D′i resulting from theeviction of each page of su and sv, respectively. The probability of each state of r and t is at least pu/kand pv/k, respectively. Let c = c(su, sv) and α = α(su, sv). If c = 0, then we pair states in r and t suchthat rj1 = tj2 and we make α

    ′(rj1 , tj2) = α/k. Otherwise, there are c pages that su and sv do not have incommon. We sort the states in r and s such that the first c states are those that result from evicting apage from su that is not in sv and vice versa, while the rest of the states are the ones resulting fromevicting a common page. We pair the states in order and set α′(rj , tj) = α/k. Note that c(rj , tj) = c− 1for all j ≤ c and c(rj , tj) = c for all j > c. The contribution of this pair of states to ∆(Di, D′i) is at most

    (α/k)(ckHc−1 + (k − c)kHc) = α(kHc + c(Hc−1 −Hc)) = α(kHc − 1)3. [σi ∈ su, σi /∈ sv] We transfer α(su, sv)/k to the k states in D′i resulting from evictions from sv. As in

    case 2. there are c = c(su, sv) states that result from evicting a non-common page with su and the restevict a common page. Each of the first c states has c− 1 non-common pages with su, while the rest havec non-common pages. Hence, the contribution of these states to ∆(Di, D

    ′i) is α(su, sv)(kHc(su,sv) − 1).

    4. [σi /∈ su, σi ∈ sv] This case is analogous to case 3. We transfer α(su, sv)/k mass to sv ∈ D′ from each of thek states in D that result from evictions from su. The contribution of these states is α(su, sv)(kHc(su,sv)−1).

    13

  • Since in the cases above we account for all the probability mass of all possible states in Di and D′i, the

    described mass transfer is a valid distance between the distributions, and its cost is:

    ∆(Di, D′i) ≤

    ∑su,sv|σi∈su,sv

    α(su, sv)kHc(su,sv) +∑

    su,sv|σi /∈su,σi∈svα(su, sv)(kHc(su,sv) − 1)

    +∑

    su,sv|σi∈su,σi /∈svα(su, sv)(kHc(su,sv) − 1) +

    ∑su,sv|σi /∈su,σi∈sv

    α(su, sv)(kHc(su,sv) − 1)

    = ∆(Di−1, D′i−1)−

    ∑su,sv|σi /∈su∨σi /∈sv

    α(su, sv)

    ≤ ∆(Di−1, D′i−1)−∑

    su,sv|σi /∈suα(su, sv)

    Therefore, ∆(Di−1, D′i−1) − ∆(Di, D′i) ≥∑su,sv|σi /∈su α(su, sv). On the other hand, E[MDi−1(σi)] −

    E[MD′i−1(σi)] ≤ E[MDi−1(σi)] =∑su,sv|σi /∈su α(su, sv), and hence E[MDi−1(σi)]−E[MD′i−1(σi)] ≤ ∆(Di−1, D′i−1)−

    ∆(Di, D′i). ut

    5.3 Trading Competitiveness for Smoothness

    We have seen that none of the well-known randomized algorithms are particularly smooth. Random is theonly known randomized algorithm that is (1, δc)-smooth for some c. However, it is neither smoother normore competitive than LRU, the smoothest deterministic algorithm. In this section we show that greatersmoothness can be achieved at the expense of competitiveness. First, as an extreme example of this, weshow that Evict-on-access (EOA) [17]—the policy that evicts each page with a probability of 1k upon everyrequest, i.e., not only on faults but also on hits—beats the lower bounds of Theorems 9 and 10 and isstrictly smoother than OPT. This policy is non-demand paging and it is obviously not competitive. Wethen introduce Smoothed-LRU, a parameterized randomized algorithm that trades competitiveness forsmoothness.

    Theorem 14 (Smoothness of EOA).EOA is (1, δ(1 + k2k−1 ))-smooth. This is tight.

    Smoothed-LRU We now describe Smoothed-LRU. The main idea of this algorithm is to smooth out thetransition from the hit to the miss case.

    Recall the notion of age that is convenient in the analysis of LRU: The age of page p is the numberof distinct pages that have been requested since the previous request to p. LRU faults if and only if therequested page’s age is greater than or equal to k, the size of the cache. An additional request may increasethe ages of k cached pages by one. At the next request to each of these pages, the page’s age may thusincrease from k − 1 to k, and turn the request from a hit into a miss, resulting in k additional misses.

    By construction, under Smoothed-LRU, the hit probability of a request decreases only gradually withincreasing age. The speed of the transition from definite hit to definite miss is controlled by a parameter i,with 0 ≤ i < k. Under Smoothed-LRU, the hit probability P (hitSmoothed-LRUk,i(a)) of a request to a pagewith age a is:

    P (hitSmoothed-LRUk,i(a)) =

    1 : a < k − ik+i−a2i+1 : k − i ≤ a < k + i

    0 : a ≥ k + i(2)

    where k is the size of the cache. Figure 1 illustrates this graphically in relation to LRU for cache size k = 8and i = 4.

    14

  • 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

    0

    0.5

    1

    Age of requested page

    Hit

    pro

    babilit

    y

    LRU

    Smoothed-LRU

    Fig. 1. Hit probabilities of LRU and Smoothed-LRU in terms of the age of the requested page

    Theorem 15 (Smoothness of Smoothed-LRU).Smoothed-LRUk,i is (1, δ(

    k+i2i+1 + 1))-smooth. This is tight.

    Proof. The proof of the upper bound is similar to that for LRU. The key difference is that, in contrast to LRU,an age increase may only increase the miss probability of a page by 12i+1 . We show that Smoothed-LRUk,i

    is (1, k+i2i+1 + 1, 1)-smooth. Corollary 1 then implies that Smoothed-LRUk,i is (1, δ(k+i2i+1 + 1))-smooth.

    Let us first consider how the insertion of one request may affect ages and the expected number of faults.By definition, the age of any page is only affected from the point of insertion up to its next request. Only thehit probability of the next request to a page may thus change due to an additional request.

    Under Smoothed-LRUk,i, at most k + i pages have a non-zero hit probability at any time. Onlysubsequent requests to these pages may increase the expected number of misses. By construction, increasingthe age of a request by one may only decrease the hit probability by 12i+1 . As the inserted request itself may

    also introduce a fault, the overall number of faults may thus increase by at most k+i2i+1 + 1.Substitutions are similar to insertions: they may increase the ages of at most k + i pages, and the

    substituted request itself may introduce one additional fault. The deletion of a request to page p does notincrease the ages of other pages. Only the next request to p may turn from a hit into a miss.

    For tightness, consider the two sequences σ = 1, 2, . . . , k + i, 1, 2, . . . , k + i, y and σ′ = 1, 2, . . . , k +i, x, 1, 2, . . . , k + i, y, with ∆(σ′, σ) = 1 if y > x > k + i. The difference between the expected number offaults on the two sequences is exactly k+i2i+1 + 1. For δ > 1, consider the sequences σδ and σ

    ′δ obtained by

    concatenating δ copies of σ and σ′, respectively. ut

    For i = 0, Smoothed-LRU is identical to LRU and (1, δ(k + 1))-smooth. At the other extreme, fori = k − 1, Smoothed-LRU is (1, 2δ)-smooth, like the optimal offline algorithm. However, for larger i,Smoothed-LRU is less competitive than LRU:

    Lemma 3 (Competitiveness of Smoothed-LRU). For any sequence σ and l ≤ k − i,

    Smoothed-LRUk,i(σ) ≤k − i

    k − i− l + 1 ·OPTl(σ) + l,

    where OPTl(σ) denotes the number of faults of the optimal offline algorithm processing σ on a fast memory ofsize l. For l > k−i and any α and β there is a sequence σ, such that Smoothed-LRUk,i(σ) > α·OPTl(σ)+β.

    Proof. Clearly, Smoothed-LRUk,i(σ) ≤ LRUk−i(σ) for any sequence σ, as Smoothed-LRUk,i caches allpages younger than k with probability one. From Sleator and Tarjan [4], we know that LRUk−i(σ) ≤

    k−ik−i−l+1 ·OPTl(σ) + l.

    For the second part of the theorem consider the sequence σn = (1, . . . , l)n, which contains l distinct pages.

    The optimal offline algorithm misses exactly l times on this sequence independently of n. For k − i < l, on

    15

  • the other hand, Smoothed-LRUk,i has a non-zero miss probability of at least1

    2i+1 on every request. Forevery α and β there is an n such that Smoothed-LRUk,i(σn) > α ·OPTl(σn) + β. ut

    So far we have analyzed Smoothed-LRU based on the hit probabilities given in (2). We have yet toshow that a randomized algorithm satisfying (2) can be realized. In the following, we construct a probabilitydistribution on the set of all deterministic algorithms using a fast memory of size k that satisfies (2). This iscommonly referred to as a mixed strategy.

    First, we decompose an instance of Smoothed-LRU into i+ 1 instances of a simpler algorithm calledStep-LRU. Then we show how Step-LRU can be realized as a mixed strategy. Like Smoothed-LRU,Step-LRU is parameterized by i, and it exhibits the following hit probabilities in terms of the age of arequested page:

    P (hitStep-LRUk,i(a)) =

    1 : a < k − i12 : k − i ≤ a < k + i0 : a ≥ k + i

    (3)

    Lemma 4 (Decomposition of Smoothed-LRU in terms of Step-LRU). For all ages a,

    P (hitSmoothed-LRUk,i(a)) =1

    2i+ 1

    1 · P (hitStep-LRUk,0(a)) + i∑j=1

    2 · P (hitStep-LRUk,j (a))

    .As a consequence, we can realize Smoothed-LRU as a mixed strategy if we can realize Step-LRU as a

    mixed strategy.While the hit probabilities P (hitStep-LRUk,i(a)) do not fully define Step-LRU, by linearity of expectation

    they are sufficient to determine the expected number of faults on any sequence σ, which we denote byStep-LRUk,i(σ).

    Proposition 1 (Step-LRU as a mixed strategy). There is a probability distribution d : A → R over afinite set of deterministic paging algorithms A using a fast memory of size k, such that for all sequences σ,

    Step-LRUk,i(σ) =∑A∈A

    d(A) ·A(σ).

    Corollary 2 (Smoothed-LRU as a mixed strategy). There is a probability distribution d : A → R overa finite set of deterministic paging algorithms A using a fast memory of size k, such that for all sequences σ,

    Smoothed-LRUk,i(σ) =∑A∈A

    d(A) ·A(σ).

    Proof. This follows immediately from Lemma 4 and Proposition 1. ut

    5.4 A Competitive and Smooth Randomized Paging Algorithm: LRU-Random

    In this section we introduce and analyze LRU-Random, a competitive randomized algorithm that is smootherthan any competitive deterministic algorithm. LRU-Random orders the pages in the fast memory by theirrecency of use; like LRU. Upon a miss, LRU-Random evicts older pages with a higher probability thanyounger pages. More precisely, the ith oldest page in the cache is evicted with probability 1i·Hk . By construction

    the eviction probabilities sum up to 1:∑ki=1

    1i·Hk =

    1Hk·∑ki=1 1i = 1. LRU-Random is not demand paging:

    if the cache is not yet entirely filled, it may still evict cached pages according to the probabilities mentionedabove.

    LRU-Random is at least as competitive as strongly-competitive deterministic algorithms:

    16

  • Theorem 16 (Competitiveness of LRU-Random). For any sequence σ,

    LRU-Random(σ) ≤ k ·OPT(σ).

    Proof. We actually prove a stronger statement, namely that LRU-Random is k-competitive against anyadaptive online adversary [22]. Our proof is based on a potential argument.

    Let SADV and SLRUR be the set of pages contained in the adversary’s and LRU-Random’s fast memory,respectively. Further, let age(p) be the age of page p ∈ SLRUR, i.e., age(p) is 0 for the most-recently-usedpage and k − 1 for the least-recently-used one among those pages that are in SLRUR. Based on age(p),we define s(p) = k − age(p). In other words, s(p) is 1 for the oldest cached page, and k for the youngest,most-recently-used. Using these notions we define the following potential function:

    Φ = Hk ·∑

    p∈SLRUR\SADV

    s(p)

    Hs(p).

    We will show that for any page x and any decision of the adversary to evict a page from its memory, we have

    LRU-Random(x) +∆Φ(x) ≤ k ·ADV(x), (4)

    where LRU-Random(x) and ADV(x) denote the cost of the request, and ∆Φ(x) is the expected change inthe potential function. Note that the potential function is initially zero, given that both caches are initiallyempty. Further it is never negative. From this and (4) the k-competitiveness of LRU-Random against anadaptive online adversary follows. To prove (4), we distinguish four cases upon a request to page x:

    1. LRU-Random hits and ADV hits. Then, LRU-Random(x) = ADV(x) = 0. The request may notdecrease the ages of pages in SLRUR \SADV and so the potential may not increase, as sHs is monotone in s.

    2. LRU-Random hits and ADV misses. As LRU-Random(x) = 0 and ADV(x) = 1, we have to show that∆Φ(x) ≤ k. The contribution of each page p ∈ SLRUR \ SADV to the potential drops or stays the same, asthe ages of these pages may not decrease. The potential may only increase if ADV chooses to evict apage in SLRUR ∩ SADV. The maximal increase is achieved by evicting the youngest such page p. After therequest, p’s age is at least 1, as it was not the requested page. Therefore it contributes at most Hk · k−1Hk−1to the potential, which is

    Hk ·k − 1Hk−1

    =

    (Hk−1 +

    1

    k

    )· k − 1Hk−1

    = k − 1 + k − 1k ·Hk−1

    < k.

    3. LRU-Random misses and ADV hits. Then, we have to show that the potential reduces by at least 1 inexpectation. Again, the contribution of no page p ∈ SLRUR \ SADV may increase. Further, as ADV maynot evict a page, no new page may contribute to the potential. We show that the contribution of eachpage p ∈ SLRUR \ SADV drops by at least 1 in expectation. There are three possible cases for a page pwith s(p) = s:(a) A younger page is replaced, and p’s contribution to the potential does not change. This happens with

    probability∑ki=s+1

    1i·Hk = 1−

    HsHk

    .

    (b) Page p gets replaced. This happens with probability 1s·Hk and it reduces the potential byHk·sHs

    .

    (c) An older page is replaced, and p’s age increases by one. This happens with probability∑s−1i=1

    1i·Hk =

    Hs−1Hk

    and it reduces the potential by Hk ·(sHs− s−1Hs−1

    )= Hk · Hs−1HsHs−1 .

    So the expected change in potential due to page p is

    1

    s ·Hk· −Hk · s

    Hs+Hs−1Hk

    ·Hk ·1−HsHsHs−1

    = − 1Hs

    +1−HsHs

    = −1.

    17

  • Smoothness

    Competitiveness1 Hk 2Hk−1 k γ ∞

    (1, 2δ)

    (1, (1 + k2k−1 )δ))

    (1, δHk)

    (1, δ(k + 1))

    (1, δβ)

    (O(Hk), γ2)

    (k, γ1)

    Bounded mem. (α, β)-smooth det. demand paging

    Randomized strongly comp.

    Deterministic strongly comp.

    Deterministicdemand pagingor competitive

    Smoothalgorithms

    FWFRandom, LRU

    LRU-Randomk=2

    OPT Smoothed-LRUk+i,i

    FIFO

    Partition, Equitable

    Mark

    EOA

    Fig. 2. Schematic view of the smoothness and competitiveness landscape. Crosses indicate tight results, whereasellipses indicate upper bounds. Braces denote upper and lower bounds on the smoothness or competitiveness of classesof algorithms. For simplicity of exposition, γ1 and γ2 are left unspecified; γ can be chosen arbitrarily. More precisestatements are provided in the respective theorems.

    4. LRU-Random misses and ADV misses. If, before the request, SLRUR \ SADV 6= ∅, then we can combinethe arguments from cases 2 and 3 to show that the potential increases by at most k−1. This does not coverthe case where SLRUR = SADV. In this case, the potential is increased maximally if the adversary choosesto evict the most-recently-used page. If LRU-Random replaces a different page, the potential increasesby Hk · k−1Hk−1 . However, with probability

    1k·Hk , LRU-Random also replaces the most-recently-used page

    (in which case the potential remains the same). The expected change in potential is thus bounded by(1− 1

    kHk

    )Hk

    k − 1Hk−1

    =kHk − 1kHk

    Hkk − 1Hk−1

    =(kHk − 1)(k − 1)

    Hk−1k≤ kHk−1(k − 1)

    Hk−1k= k − 1.

    ut

    The proof of Theorem 16 applies to an adaptive online adversary. An analysis for an oblivious adversarymight yield a lower competitive ratio.

    For k = 2, we also show that LRU-Random is (1, δc)-smooth, where c is less than k+ 1, which is the bestpossible among deterministic, demand-paging or competitive algorithms. Specifically, c is 1 + 11/6 = 2.83̄.Although our proof technique does not scale beyond k = 2, we conjecture that this algorithm is in factsmoother than (1, δ(k + 1)) for all k.

    Theorem 17 (Smoothness of LRU-Random).Let k = 2. LRU-Random is (1, 176 δ)-smooth.

    Conjecture 1 (Smoothness of LRU-Random)LRU-Random is (1, Θ(H2k)δ)-smooth.

    6 Discussion

    We have determined fundamental limits on the smoothness of deterministic and randomized paging algorithms.No deterministic competitive algorithm can be smoother than (1, δ(k + 1))-smooth. Under the restrictionto bounded-memory algorithms, which is natural for hardware implementations of caches, smoothness

    18

  • implies competitiveness. LRU is strongly competitive, and it matches the lower bound for deterministiccompetitive algorithms, while FIFO matches the upper bound. There is no trade-off between smoothnessand competitiveness for deterministic algorithms.

    In contrast, among randomized algorithms, we have identified Smoothed-LRU, an algorithm that is verysmooth, but not competitive. In particular, it is smoother than any strongly-competitive randomized algorithmmay be. The well-known randomized algorithms Mark, Partition, and Equitable are not smooth. Itis an open question, whether there is a randomized “LRU sibling” that is both strongly-competitive and(1, δHk)-smooth. With LRU-Random we introduce a randomized algorithm that is at least as competitiveas any deterministic algorithm, yet provably smoother, at least for k = 2. Its exact smoothness remains open.Figure 2 schematically illustrates many of our results.

    Acknowledgments. This work was partially supported by the German Research Council (DFG) as part ofthe Transregional Collaborative Research Center “Automatic Verification and Analysis of Complex Systems”(SFB/TR 14 AVACS).

    References

    1. Belady, L.A.: A study of replacement algorithms for virtual-storage computer. IBM Systems Journal 5(2) (1966)78–101

    2. Mattson, R.L., Gecsei, J., Slutz, D.R., Traiger, I.L.: Evaluation techniques for storage hierarchies. IBM SystemsJournal 9(2) (1970) 78–117

    3. Aho, A., Denning, P., Ullman, J.: Principles of optimal page replacement. Journal of the ACM 18(1) (1971) 80–934. Sleator, D.D., Tarjan, R.E.: Amortized efficiency of list update and paging rules. Commun. ACM 28(2) (1985)

    202–2085. McGeoch, L., Sleator, D.: A strongly competitive randomized paging algorithm. Algorithmica 6 (1991) 816–825

    10.1007/BF01759073.6. Achlioptas, D., Chrobak, M., Noga, J.: Competitive analysis of randomized paging algorithms. Theoretical

    Computer Science 234(1-2) (2000) 203–2187. Wilhelm, R., et al.: The worst-case execution-time problem—overview of methods and survey of tools. ACM

    Trans. Embed. Comput. Syst. 7(3) (2008) 36:1–36:538. Axer, P., et al.: Building timing predictable embedded systems. ACM Trans. Embed. Comput. Syst. 13(4) (March

    2014) 82:1–82:379. Chaudhuri, S., Gulwani, S., Lublinerman, R.: Continuity and robustness of programs. Commun. ACM 55(8)

    (August 2012) 107–11510. Dwork, C.: Differential privacy. In Bugliesi, M., Preneel, B., Sassone, V., Wegener, I., eds.: ICALP 2006, Part II.

    Volume 4052 of LNCS., Springer (2006) 1–1211. Doychev, G., et al.: CacheAudit: A tool for the static analysis of cache side channels. ACM Trans. Inf. Syst. Secur.

    18(1) (June 2015) 4:1–4:3212. Doyen, L., Henzinger, T., Legay, A., Nickovic, D.: Robustness of sequential circuits. In: ACSD ’10. (June 2010)

    77–8413. Kleene, S.: Representation of Events in Nerve Nets and Finite Automata. In: Automata Studies. Princeton

    University Press, Princeton, NJ, USA (1956)14. Perles, M., Rabin, M., Shamir, E.: The theory of definite automata. IEEE Transactions on Electronic Computers

    12(3) (June 1963) 233–24315. Liu, C.L.: Some memory aspects of finite automata. Technical Report 411, Massachusetts Institute of Technology

    (May 1963)16. Reineke, J., Grund, D.: Sensitivity of cache replacement policies. ACM Trans. Embed. Comput. Syst. 12(1s)

    (March 2013) 42:1–42:1817. Cazorla, F.J., et al.: PROARTIS: Probabilistically analyzable real-time systems. ACM Trans. Embed. Comput.

    Syst. 12(2s) (May 2013) 94:1–94:2618. Borodin, A., El-Yaniv, R.: Online computation and competitive analysis. Cambridge University Press, New York,

    NY, USA (1998)19. Fiat, A., Karp, R.M., Luby, M., McGeoch, L.A., Sleator, D.D., Young, N.E.: Competitive paging algorithms. J.

    Algorithms 12(4) (1991) 685–699

    19

  • 20. Brodal, G.S., Moruz, G., Negoescu, A.: OnlineMin: A fast strongly competitive randomized paging algorithm.Theor. Comp. Sys. 56(1) (January 2015) 22–40

    21. Raghavan, P., Snir, M.: Memory versus randomization in on-line algorithms (extended abstract). In Ausiello,G., Dezani-Ciancaglini, M., Rocca, S.R.D., eds.: Automata, Languages and Programming, 16th InternationalColloquium, ICALP89, Stresa, Italy, July 11-15, 1989, Proceedings. Volume 372 of Lecture Notes in ComputerScience., Springer (1989) 687–703

    22. Motwani, R., Raghavan, P.: Randomized Algorithms. Cambridge University Press, New York, NY, USA (1995)23. Koutsoupias, E., Papadimitriou, C.: Beyond competitive analysis. SIAM Journal on Computing 30(1) (2000)

    300–317

    Appendix

    Proposition 2. For a sequence σ, let Φ(σ) denote the number of phases in its k-phase partition. Let σ be asequence, let ρ be a suffix of σ, and let ` and `′ denote the number of distinct pages in the last phase of σand ρ, respectively. Then Φ(ρ) ≤ Φ(σ). Furthermore, if Φ(ρ) = Φ(σ) then `′ ≤ `.

    Proof. Let ij and i′j denote the indices in σ of the first request of the j

    th phase in σ and ρ, respectively, withij = |σ|+ 1 for j > Φ(σ) and i′j = |ρ′|+ 1 for j > Φ(ρ′). Then, for all j it holds that ij ≤ i′j . We prove this byinduction on j. The case j = 1 is trivially true as i1 = 1 and i

    ′1 ≥ 1 since ρ is a suffix of σ. Suppose that

    the hypothesis holds for 1 < j ≤ n. It is easy to see that it holds for j = n + 1: since there are at most kdistinct pages between ij and ij+1 − 1, inclusive, and by the inductive hypothesis i′j ≥ ij , then the requestthat ends the jth phase in ρ cannot be earlier than ij+1, and hence i

    ′j+1 ≥ ij+1. Since this is true for all

    phases including the last one, then Φ(ρ) ≤ Φ(σ). Now, assume that Φ(ρ) = Φ(σ) = m. Then i′m < |ρ′|+ 1 andby the proof above im ≤ i′m, which implies that `′ ≤ `. ut

    Lemma 2. Let σ and σ′ be two sequences such that ∆(σ, σ′) = 1. Then Φ(σ′) ≤ Φ(σ) + 2. Furthermore, let `and `′ be the number of distinct pages in the last phase of σ and σ′, respectively. If Φ(σ′) = Φ(σ) + 2, then`′ ≤ `.

    Proof. Let ij and i′j be the indices of the requests that mark the first page of the j

    th phase in σ and σ′,respectively, with ij = |σ|+ 1 for j > Φ(σ) and i′j = |σ′|+ 1 for j > Φ(σ′). Let Φ(σ, j) denote the number ofphases of σ starting from the jth phase (with Φ(σ, j) = 0 if j > Φ(σ)). Let h− 1 be the phase in σ′ wherethe difference between both sequences occurs. For simplicity, assume that if the difference is an insertion(deletion) on σ′i, then i refers to an empty page in σ (σ

    ′), i.e., unaffected requests have equal indices in bothsequences. If the difference is a deletion, then ih ≤ i′h and by Proposition 2, Φ(σ′, h) ≤ Φ(σ, h), which impliesthe lemma. If it is a substitution, suppose that q in σ is changed to p in σ′. Then consider σ′′ resulting fromthe deletion of q from σ. By the argument above, Φ(σ′′) ≤ Φ(σ). Hence, showing that Φ(σ′) ≤ Φ(σ′′) + 2implies as well that Φ(σ′) ≤ Φ(σ) + 2. Since σ′ is the result of inserting p into σ′′, it suffices to consider theinsertion case (we argue later that if Φ(σ′) = Φ(σ′′) + 2, then `′′ < `′ also holds).

    Let p be the page that is added to σ to make σ′. We analyze Φ(σ) in terms of Φ(σ′). We have the followingcases:

    – [p is not the first page of phase h − 1]. If p occurs again in the phase then Φ(σ) = Φ(σ′). This is alsothe case if h− 1 is the last phase of σ′. Otherwise, i′h < ih ≤ i′h+1 (i′h cannot be larger than ih+1 as inthis case phase h− 1 in σ′ would include the k + 1 distinct pages in σ[ih..ih+1]). Then by Proposition 2,Φ(σ′, h+ 1) ≤ Φ(σ, h), and therefore Φ(σ′) ≤ Φ(σ) + 1.

    – [p is the first page of phase h− 1]. Then ih−1 > i′h−1. We have two cases:• If ih−1 ≤ i′h then we have the same case as above but with ih−1 and i′h. Thus, Φ(σ′) ≤ Φ(σ) + 1.• If i′h < ih−1 ≤ i′h+1 (again, ih−1 cannot be greater than i′h+1 as in this case the (h− 2)th phase of σ

    would include all k+ 1 distinct pages in σ′[i′h..i′h+1]), then by Proposition 2, Φ(σ

    ′, h+ 1) ≤ Φ(σ, h− 1).If Φ(σ′, h+ 1) = Φ(σ, h− 1), then `′ ≤ ` and Φ(σ′) = Φ(σ) + 2. Otherwise, Φ(σ′, h+ 1) < Φ(σ, h− 1)and Φ(σ′) ≤ Φ(σ) + 1.

    20

  • In all cases above either Φ(σ′) ≤ Φ(σ)+1 or Φ(σ′) = Φ(σ)+2 with `′ ≤ `. If the difference is a substitution,let q be the page in σ′ that replaces p in σ′. Note that the case Φ(σ′) = Φ(σ) + 2 can only happen if q isrequested earlier in the same phase in σ. Then, removing the request to q would not change the k-phasepartition of σ, and hence the same analysis above for an insertion applies and thus `′ ≤ ` as well. ut

    Theorem 10 (Lower bound for strongly-competitive randomized paging algorithms). No strongly-competitive, randomized paging algorithm is (1, δ(Hk − �))-smooth for any � > 0.

    Proof. For any Hk-competitive algorithm A and any � > 0, we can construct two sequences σ′A and σA, such

    that A(σ′A)−A(σA) > ∆(σ′A, σA) · (Hk − �), which proves the theorem.Fiat et al. [19] show how to construct a “bad” sequence consisting of an arbitrary number of k-phases, in

    each of which any randomized algorithm incurs at least Hk misses, while the optimal offline algorithm incursonly one miss. To follow this proof, it is helpful to be familiar with the proof of Theorem 4 in [19]. We adapttheir construction in the following way: whenever it is possible to incur a cost of Hk − �2 in a k-phase byrequesting only unmarked pages, we do so. We call such phases type I. Whenever, this is not possible, we canconstruct a type II phase that results in more than Hk + δ misses, where δ > 0 depends on �. Type II phasesmay only happen a finite number of times for each occurrence of a type I phase: otherwise, the sequencewould be a counterexample to the Hk-competitiveness of A. Thus, for any d we can construct a sequenceσ′A(d) that includes exactly d type I phases.

    From the resulting “bad” sequence σ′A(d) we obtain the “good” sequence σA(d) by deleting one requestfrom each of the type I phases. By construction, each page is only requested once in a type I phase. Bydeleting the one request to the page that is not requested in the following phase, we reduce the number ofk-phases in the sequence by one. As a consequence, the resulting “good” sequence contains n − d phases,where n is the number of phases of σ′A(d). Including compulsory misses, the optimal offline algorithm incurs(k + n− 1)− d misses. As A is strongly competitive, it incurs at most Hk · (k + n− 1− d) misses on σA. Onthe other hand, by construction, the number of expected misses on σ′A is at least k + (n− 1) ·Hk − d · �2 , andwe get:

    A(σ′A(d))−A(σA(d)) ≥ k · (1−Hk) + d ·(Hk −

    2

    ),

    which is greater than d · (Hk − �) for a large enough value of d. As, by construction d = ∆(σ′A, σA) this provesthe theorem.

    It remains to show how to construct type I and type II phases with the properties discussed above. Let usfirst discuss how to adapt the construction of the ith subphase. We introduce an additional parameter �i thatcontrols the reduction in expected faults we are willing to pay for a type I subphase.

    Let M be the set of marked pages at the start of the subphase and pM =∑i∈M pi the probability that a

    marked page is not cached. Let u = k + 1− i denote the number of unmarked pages. If there is an unmarkedpage j with pj ≥ 1u − �i the adversary requests this page to end the subphase. We call such a subphase type I,analogously to the convention for phases. Other subphases are called type II.

    Note, that in the first subphase there is always such an unmarked page, as pm = 0 for the only markedm page, as it has just been requested. Otherwise, pM > 0, and the adversary requests the marked pagel with l = arg maxi∈M pi. Let µ = pl. As there are i marked pages and pM > �i · u, µ must be at least�i·ui =

    �i·(k+1−i)i ≥ �ik for 1 < i ≤ k. The adversary can now generate requests to marked pages using the

    following loop:While the expected number of faults in subphase i is less than 1u +

    �i2k , and while pM >

    �iu2k , request a

    marked page l such that l = arg maxi∈M pi.This loop is guaranteed to terminate, as each iteration adds at least �iu2ki to the expected number faults

    in the subphase. If the total expected number of faults ends up exceeding 1u +�i2k an arbitrary request is

    made to an unmarked page. Otherwise, the page with the highest probability value is requested. Its faultprobability is at least

    (1− �i·u2k

    )· 1u = 1u − �i2k . Taking into account the initial request to a marked page that

    contributed at least �ik faults, the subphase has an expected number of at least1u +

    �i2k faults.

    For a given �, we can choose the parameters �i to the subphases in a way that if all subphases end up astype I, and thus the phase itself ends up as type I, the expected number of faults is at least Hk − �: the sum

    21

  • of the �i needs to be less than or equal to �. Further, by picking �i such that�i2k > δ+

    ∑1 0,such that neither Partition nor Equitable is (1 + �, γ, 1)-smooth for any γ. Also, Partition andEquitable are (Hk, 2δHk)-smooth.

    Proof. The fact that the two policies are (Hk, 2δHk)-smooth follows immediately from Theorem 4 and thefact that the two policies are Hk-competitive with additive constant 0.

    Koutsoupias and Papadimitriou [23] introduced the layer representation, a sequence (L1, . . . , Lk) of ksets of pages that compactly represents the current work function. For both Partition and Equitable, theprobability of being in a particular configuration is determined by the current work function and thus itslayer representation. See Achlioptas et al. [6] for more details.

    Let ω = (L1, . . . , Lk) and let p be the next page to be requested. Then the layer representation of thework function is updated as follows:

    ωp =

    (p, L1, . . . , Lj−1, Lj ∪ Lj+1 − {p}, Lj+2, . . . , Lk) if p ∈ Lj ∧ j < k(p, L1, . . . , Lk−1) if p ∈ Lk(p, L1 ∪ L2, L3, . . . , Lk) if p 6∈

    ⋃ki=1 Li

    In the following, to save space, we will omit braces in the representation of the sets Li, i.e. (abc, d, e) =({a, b, c}, {d}, {e}).

    Starting from an empty cache, and the corresponding empty layer representation, the sequence σ =y, x, k − 1, k − 2, . . . , 1, 0, k − 2, k − 3, . . . , 1, 0 yields the following layers:

    (0, 1, . . . , k − 2, (k − 1)xy).

    The sequence σ′ = y, x, k− 1, k− 2, . . . , 1, 0, k− 1, k− 2, . . . , 1, 0, on the other hand, with ∆(σ′, σ) = 1 yieldsthe following layers, in which all pages are revealed:

    (0, 1, . . . , k − 2, k − 1).

    In the following, we show how to extend these two sequences to yield an unbounded difference in the

    expected number of faults. An arrowp:〈h1,h2〉−−−−−−→ indicates a request to page p with a hit probability of h1 under

    prefix σ′ and a hit probability of h2 under prefix σ.

    22

  • Request and hitprobabilities

    Layers for prefix σ′ Layers for prefix σ

    (0, 1, 2, . . . , k − 1) (0, 1, 2, . . . , (k − 1)xy)x:〈0, 13 〉−−−−−→ (x, 01, 2, 3, . . . , k − 1) (x, 0, 1, . . . , k − 2)

    0:〈 k−1k ,1〉−−−−−−→ (0, x, 12, 3, . . . , k − 1) (0, x, 1, 2, . . . , k − 2)1:〈 k−2k−1 ,1〉−−−−−−→ (1, 0, x, 23, 4, . . . , k − 1) (1, 0, x, 2, . . . , k − 2)

    ...i:〈 k−i−1k−i ,1〉−−−−−−−−→ (i, i− 1, . . . , 1, 0, x, (i+ 1)(i+ 2), . . . , k − 1) (i, i− 1, . . . , 1, 0, x, i+ 1, . . . , k − 2)

    ...k−3:〈 23 ,1〉−−−−−−−→ (k − 3, k − 4, . . . , 1, 0, x, (k − 2)(k − 1)) (k − 3, k − 4, . . . , 1, 0, x, k − 2)

    y:〈0,0〉−−−−→ (y, (k − 3)(k − 4), . . . , 1, 0, x, (k − 2)(k − 1)) (y, (k − 3)(k − 4), . . . , 1, 0, x, k − 2)k−1:〈 12 · k−1k ,0〉−−−−−−−−−−→ (k − 1, y, (k − 3)(k − 4), . . . , 1, 0, x) (k − 1, y(k − 3)(k − 4), . . . , 1, 0, x, k − 2)k−3:〈 k−2k−1 , k−1k+1 〉−−−−−−−−−−−→ (k − 3, k − 1, y, (k − 4)(k − 5), . . . , 1, 0, x) (k − 3, k − 1, y(k − 4)(k − 5), . . . , 1, 0, x, k − 2)

    ...i:〈 i+1i+2 , i+2i+4 〉−−−−−−−−→ (i, . . . , k − 3, k − 1, y, (i− 1)(i− 2), . . . , 1, 0, x) (i, . . . , k − 3, k − 1, y(i− 1)(i− 2), . . . , 1, 0, x, k − 2)

    ...0:〈 12 , 24 〉−−−−−→ (0, 1, . . . , k − 3, k − 1, y) (0, 1, . . . , k − 3, k − 1, yx(k − 2))

    Observe that the layers at the end of the above sequence are equal to the layers at the beginning of thesequence up to renaming. So we can extend the sequence arbitrarily achieving the same hit probabilities onboth sides.

    It remains to compute the expected number of hits and misses on the sequence above. Summing up thehit probabilities for prefix σ′, we get:

    0 +

    k−3∑i=0

    k − i− 1k − i + 0 +

    1

    2· k − 1

    k+

    k−3∑i=0

    i+ 1

    i+ 2

    =

    (k −Hk −

    1

    2

    )+

    1

    2· k − 1

    k+ (k −Hk−1 − 1)

    = 2k − 2Hk +1

    k+

    1

    2· k − 1

    k− 3

    2

    = 2k − 2Hk − 1 +1

    2k.

    As there are 2k − 1 requests on the sequence, the expected number of misses for prefix σ′ is