Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud...

6
Dynacache: Dynamic Cloud Caching Asaf Cidon 1 , Assaf Eisenman 1 , Mohammad Alizadeh 2 , and Sachin Katti 1 1 Stanford University 2 MIT CSAIL ABSTRACT Web-scale applications are heavily reliant on memory cache systems such as Memcached to improve through- put and reduce user latency. Small performance improve- ments in these systems can result in large end-to-end gains, for example a marginal increase in hit rate of 1% can reduce the application layer latency by over 25%. Yet, surprisingly many of these systems use generic first- come-first-serve designs with simple fixed size alloca- tions that are oblivious to the application’s requirements. In this paper, we use detailed empirical measurements from a widely used caching service, Memcachier [3] to show that these simple default policies can lead to sig- nificant performance penalties, in some cases increasing the number of cache misses by as much as 3×. Motivated by these empirical analyses, we propose Dynacache, a cache controller that significantly im- proves the hit rate of web applications, by profiling ap- plications and dynamically tailoring memory resources and eviction policies. We show that for certain appli- cations in our real-world traces from Memcachier, Dy- nacache reduces the number of misses by more than 65% with a minimal overhead on the average request perfor- mance. We also show that Memcachier would need to more than double the number of Memcached servers in order to achieve the same reduction of misses that is achieved by Dynacache. In addition, Dynacache allows Memcached operators to better plan their resource allo- cation and manage server costs, by estimating the cost of cache hits as a function of memory. 1. INTRODUCTION Memory caches like Memcached [11] and Redis [4] have become a vital component of cloud infrastructure. Major web service providers such as Facebook, Twit- ter, Pinterest, Box and Airbnb have large deployments of Memcached, while smaller providers utilize caching- as-a-service solutions like Amazon ElastiCache [1] and Memcachier [3]. These applications rely heavily on caching to improve the latency of web requests, reduce load on backend databases and lower operating costs. Even modest improvements to the cache hit rate have a significant impact on user perceived performance in modern web services, because reading data from a disk- based database (like MySQL) is orders of magnitude slower than in-memory cache. For instance, the hit rate of one of Facebook’s main Memcached pools has been reported to be 98.2% [6]. Assuming the latency for a typical cache hit and MySQL read is 200μs and 10ms, respectively, increasing the hit rate by just 1% would reduce the average read latency by over 25% (from 376.4μs at a 98.2% hit rate to 278.4μs at a 99.2% hit rate). The end-to-end benefits can be substantially larger for user queries, which often must wait on hundreds of reads [17]. Memory caching systems are very simple and are roughly modeled on CPU caches with tables of nearly equal-sized items and Least Recently Used (LRU) evic- tions. Importantly, current memory caches are oblivious to application request patterns and requirements. Mem- ory allocation across slab classes 1 and across different applications sharing a cache server is based on fixed poli- cies like first-come first-serve or static reservations. The eviction policy, LRU, is also fixed. In this paper, we argue that this one-size-fits-all cache behavior is poorly suited to cloud environments. Our po- sition is that cloud memory caches must be dynamically tuned based on application access patterns, hit rate re- quirements, and operator policy. Like any other critical resource (e.g., compute, storage, network bandwidth), memory caches require intelligent application-aware re- source management and policy. We observe that cloud memory caches are very dif- ferent from CPU caches in terms of workloads and ca- pabilities. On the one hand, cloud application request patterns are significantly more variable than CPU work- loads, which access fixed-sized cache lines, and often se- quentially due to inherent program structures like loops. On the other hand, cloud caches are much more capa- 1 To avoid memory fragmentation, Memcached divides its memory into several slabs. Each slab stores items with size in a specific range (e.g., < 128B, 128-256B, etc.) [2] 1

Transcript of Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud...

Page 1: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

Dynacache: Dynamic Cloud Caching

Asaf Cidon1, Assaf Eisenman1, Mohammad Alizadeh2, and Sachin Katti1

1Stanford University2MIT CSAIL

ABSTRACTWeb-scale applications are heavily reliant on memorycache systems such as Memcached to improve through-put and reduce user latency. Small performance improve-ments in these systems can result in large end-to-endgains, for example a marginal increase in hit rate of 1%can reduce the application layer latency by over 25%.Yet, surprisingly many of these systems use generic first-come-first-serve designs with simple fixed size alloca-tions that are oblivious to the application’s requirements.In this paper, we use detailed empirical measurementsfrom a widely used caching service, Memcachier [3] toshow that these simple default policies can lead to sig-nificant performance penalties, in some cases increasingthe number of cache misses by as much as 3×.

Motivated by these empirical analyses, we proposeDynacache, a cache controller that significantly im-proves the hit rate of web applications, by profiling ap-plications and dynamically tailoring memory resourcesand eviction policies. We show that for certain appli-cations in our real-world traces from Memcachier, Dy-nacache reduces the number of misses by more than 65%with a minimal overhead on the average request perfor-mance. We also show that Memcachier would need tomore than double the number of Memcached servers inorder to achieve the same reduction of misses that isachieved by Dynacache. In addition, Dynacache allowsMemcached operators to better plan their resource allo-cation and manage server costs, by estimating the cost ofcache hits as a function of memory.

1. INTRODUCTIONMemory caches like Memcached [11] and Redis [4]

have become a vital component of cloud infrastructure.Major web service providers such as Facebook, Twit-ter, Pinterest, Box and Airbnb have large deploymentsof Memcached, while smaller providers utilize caching-as-a-service solutions like Amazon ElastiCache [1] andMemcachier [3]. These applications rely heavily oncaching to improve the latency of web requests, reduceload on backend databases and lower operating costs.

Even modest improvements to the cache hit rate havea significant impact on user perceived performance inmodern web services, because reading data from a disk-based database (like MySQL) is orders of magnitudeslower than in-memory cache. For instance, the hit rateof one of Facebook’s main Memcached pools has beenreported to be 98.2% [6]. Assuming the latency for atypical cache hit and MySQL read is 200µs and 10ms,respectively, increasing the hit rate by just 1% wouldreduce the average read latency by over 25% (from376.4µs at a 98.2% hit rate to 278.4µs at a 99.2% hitrate). The end-to-end benefits can be substantially largerfor user queries, which often must wait on hundreds ofreads [17].

Memory caching systems are very simple and areroughly modeled on CPU caches with tables of nearlyequal-sized items and Least Recently Used (LRU) evic-tions. Importantly, current memory caches are obliviousto application request patterns and requirements. Mem-ory allocation across slab classes1 and across differentapplications sharing a cache server is based on fixed poli-cies like first-come first-serve or static reservations. Theeviction policy, LRU, is also fixed.

In this paper, we argue that this one-size-fits-all cachebehavior is poorly suited to cloud environments. Our po-sition is that cloud memory caches must be dynamicallytuned based on application access patterns, hit rate re-quirements, and operator policy. Like any other criticalresource (e.g., compute, storage, network bandwidth),memory caches require intelligent application-aware re-source management and policy.

We observe that cloud memory caches are very dif-ferent from CPU caches in terms of workloads and ca-pabilities. On the one hand, cloud application requestpatterns are significantly more variable than CPU work-loads, which access fixed-sized cache lines, and often se-quentially due to inherent program structures like loops.On the other hand, cloud caches are much more capa-

1To avoid memory fragmentation, Memcached divides itsmemory into several slabs. Each slab stores items with sizein a specific range (e.g., < 128B, 128-256B, etc.) [2]

1

Page 2: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

ble than CPU caches. Unlike CPU caches, free fromthe restrictions of hardware implementation at GHz fre-quencies, a memory caching system can use sophisti-cated application profiling and optimization to adapt tothe unique characteristics of each application.

While a dynamic application-aware cache is concep-tually appealing, is it worth the additional complexity inpractice? We answer this in the affirmative by analyz-ing of a week-long trace of over 490 web applicationsat Memcachier [3], a popular Memcached-as-a-serviceplatform. We find significant room for improvement overthe application-oblivious baseline cache system. Specif-ically, Memcached’s default first-come first-serve slabmemory allocation performs very poorly for some ap-plications. In some cases, the default slab class alloca-tions result in over 3× more misses than could have beenachieved with an optimal allocation of the same amountof memory. For the same hit rate, default Memcachedsometimes requires over 2.5× more memory.

Our trace analysis demonstrates the need for a cachethat can dynamically adjust its memory allocation to sat-isfy the varying demands of different applications. Tothis end, we develop Dynacache, a lightweight, min-imally invasive, application-aware cache controller forcloud environments. Dynacache detects the applicationsthat are most likely to benefit from optimization using asimple, easy-to-compute entropy metric. It then profilesthe request pattern of the selected applications and opti-mizes their slab allocation to maximize hit rate.

We have built a prototype implementation of Dy-nacache in C for Memcached. Our preliminary evalua-tion with realistic workloads based on measurements atFacebook [6] shows that Dynacache’s application profil-ing is feasible and has a low overhead on the performanceof other applications.

Our current design focuses on slab memory alloca-tion to demonstrate how rigid cache policies hurt perfor-mance. However, we believe that other parameters suchas eviction policy and memory allocation across appli-cations also hold great promise for improvement. Fur-ther, we expect that our application profiling may informdevelopers and operators to make more intelligent deci-sions about their cache needs and resource sharing pol-icy, which today are mostly determined arbitrarily. Weintend to explore these directions in future work.

2. MEMCACHIER TRACE ANALYSISIn this section, we provide empirical evidence that

the one-size-fits-all resource allocation policies of Mem-cached are not ideal for real web applications that ex-hibit variable behavior. We collected a trace of 490 appli-cations on Memcachier, a multi-tenant Memcached ser-vice. Each application reserves a certain amount of mem-ory in advance, which is uniformly allocated across mul-

App % ofTrace

OldHitrate

NewHitrate

%MissRe-duc-tion

AdditionalMemoryRequiredwithDefaultAlloc.

MissEn-tropy

1 36.8% 69.6% 75.3% 18.6% 20% 0.52 8.6% 99.9% 99.9% 0% 0% 03 5.7% 97.5% 98.7% 51.1% 133% 2.74 4.2% 97.5% 97.7% 8.4% 14% 05 3.6% 98.3% 99.4% 65.2% 145% 2.46 3.4% 99.8% 99.8% 0% 0% 0.27 2.9% 29% 32.6% 5% 70% 0.68 2.7% 99.9% 99.9% 0% 0% 0.29 2.4% 93.9% 93.9% 0% 0% 1.610 2% 99.6% 99.6% 11.9% 14% 0.1

Table 1: Analysis of the affect of optimizing the slab class allocationfor the top 10 applications in the Memcachier trace. The sixth columnin the table depicts the amount of memory required to achieve the im-proved hit rate using the default slab class allocation of Memcachier,and Miss Entropy is a metric for classifying which application will ben-efit most from optimizing its slab class allocation.

tiple Memcached servers comprising the Memcachiercache. We obtained packet captures on one of the Mem-cachier servers for a week. In all, the raw capturesamount to 220 GB of compressed data. We analyzed thepacket captures to reconstruct all Memcached requestsand responses, including the type of request (e.g., GET,SET), key and value sizes, whether the request hit ormissed, and the slab class for each request.

In particular, we focus on one dimension of cache re-source allocation, namely, the allocation of memory re-sources across slab classes for each application. In Mem-cached, memory pages are divided into 1MB pages as-signed to one of several slab classes. Each slab classstores items whose sizes are within a specific range (e.g.,between 1KB and 2KB). For example, a 1MB page thatbelongs to the 1-2KB slab class, would be split into 512chunks of 2KB in size. Each slab class has its own LRUeviction queue per application.

In Memcachier, memory is allocated to slab classesgreedily based on the sizes of the initial items at the be-ginning of the workload. Since each slab class has itsown eviction queue, the length of the queues can varygreatly among slab classes, and some slab classes maybe under or over provisioned.

Table 1 presents the hit rate achieved by Memcachierfor the top 10 applications (sorted by number of requests)in our trace, and the hit rate that could be achieved withan optimal allocation of memory to slab classes (we willshow how to derive the optimal slab class allocation inSection 3).

The results show that some applications can benefitgreatly from better slab class allocation, and some do not.

2

Page 3: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

App SlabClass

%GETs

OriginalHitrate

OriginalAlloca-tion

OptimalHitrate

OptimalAlloca-tion

4 0 9% 99.98% 1.38MB 98.11% 0.72MB4 1 91% 97.32% 4.39MB 97.66% 5.05MB

3 0 9% 98.18% 0.002MB 99.99% 0.25MB3 1 22% 96.71% 0.002MB 99.98% 0.56MB3 3 3% 97.27% 0.017MB 99.99% 0.67MB3 4 45% 98.45% 0.25MB 99.31% 2.27MB3 5 6% 99.70% 0.02MB 100% 0.55MB3 6 1% 97.08% 0.07MB 99.95% 1.06MB3 7 5% 98.89% 0.18MB 99.97% 2.22MB3 8 6% 98.90% 0.42MB 99.97% 4.24MB3 9 1% 69.35% 3.75MB 17.16% 0MB3 10 1% 82.62% 11.13MB 94.51% 15.72MB3 11 2% 93.12% 32.75MB 77.09% 21.05MB

Table 2: Hit rate and slab class allocations of two applications in theMemcachier trace in a single server. Each application’s workload isspread uniformly across 12 servers. In Memcachier, each slab classsize is: 64bytes · 2SlabId. So for example, slab class 0 is for itemswith value between 0-64 bytes, while slab class 1 is for values between64-128 bytes. The slab class optimization improved the overall hit rateof application 4 from 97.5% to 97.7%, and of application 3 from 97.5%to 98.7%.

For example, the number of misses in applications 3 and5 is reduced by 51% and 65% respectively. About halfof the applications in the trace do not benefit at all fromoptimized slab allocation, because applications are typi-cally well over provisioned. In some applications (likeapplication 7), while the optimal slab class allocationdoes improve the hit rate, the improvement is small. Thetable also shows the amount of memory that would be re-quired to achieve the improved hit rate, using the defaultslab allocation. For applications 3 and 5, Memcachiermust more than double the amount of allocated memoryto achieve the same hit rate with the default policy.

The problem with Memcachier’s first-come-first-servememory allocation to slab classes is that if some applica-tions change their request distribution, some slab classesmay not have enough memory and their eviction queueswill be too short. Some Memcached implementationshave tried to solve this problem by periodically evictingan entire slab of a slab class, and reassigning it to the cor-responding slab class of new incoming requests [17, 18].However, even these improved slab class allocation maysuffer from sub-optimal slab class allocation. For exam-ple, they may tend to favor large slab classes over smallones, because if a request for a large item size is receivedat the server, it will be treated equally as a request with asmall item size, even though the large item size occupiesmuch more memory in the Memcached server.

This is demonstrated in Table 2, which details the de-fault hit rate and the slab class allocation of two Mem-cachier applications. In both applications, Memcachier

Figure 1: High level architecture of Dynacache.

allocates relatively much more memory to the larger slabclasses, compared to the number of requests from eachslab class. It is clear that in both applications, the Mem-cachier slab class allocation favors large slab classes rel-ative to the number of times they are requested.

Intuitively, we expect that in application 3 this prob-lem will be more severe, since the requests are moreevenly distributed among the slab classes. In application4, since the vast majority of the requests belong to slabclass 1, the default slab allocation will perform reason-ably well. This is the reason some applications’ perfor-mance can be improved more than others with optimalslab class allocation.

3. DESIGNIn this section we describe the design of Dynacache.

In order to tailor the resource allocation of Memcachedfor different applications, Dynacache first classifies andprofiles the applications’ access patterns, and then com-putes and sets their optimal slab class allocation. Thearchitecture of Dynacache is depicted in Figure 1. Dy-nacache runs as a module that integrates with the Mem-cached server. We describe its three components below.

3.1 Slab Class Allocation OptimizationWe first describe how to compute the optimal memory

allocation to each slab class for a single application. Theproblem can be expressed as an optimization:

maximizem

s∑i=1

fihi(mi, e)

subject tos∑

i=1

mi ≤M

(1)

Where s is the number of slab classes, fi is the fre-quency of GETs for each slab class, hi(mi, e) is the hit

3

Page 4: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

rate of each slab class as a function of the its availablememory (mi) and cache eviction policy (e), and M isthe amount of memory reserved by the application onthe Memcached server.

In order to solve this optimization problem, we needto compute hi(mi, e), or the hit rate curve for each slabclass. Stack distances [16] provide a convenient means ofcomputing the hit rate curve beyond the allocated mem-ory size, for a given eviction policy. The stack distance ofa requested item is its rank in the cache, counted from thetop of the eviction queue. For example, if the requesteditem is at the top of the eviction queue, its stack distanceis equal to 1. If an item has never been requested before,its stack distance would be infinite.

The stack distances can be computed offline for eachincoming request in the Memcachier trace (e.g., once ev-ery 6-12 hours). This allows us to compute the hit ratecurve as a function of the memory allocated to each slabclass. For example, Figure 2 depicts the hit rate curve ofslab class 9 of application 3.

To solve the optimization in Equation (1) efficiently,the hit rate curves need to be approximated as concavefunctions. Fortunately, hit rate curves in the Memcachiertraces are concave or nearly concave. We use convexpiecewise-linear fitting [15] to approximate the hit ratecurves. The piecewise-linear hit rate curves allow us tosolve the optimization (1) using a simple LP solver.

3.2 Practical ProfilingComputing the exact stack distance for each incoming

request requires maintaining a “shadow” eviction queueand tracking the position of every incoming request in thequeue to calculate its stack distance. This can be com-putationally prohibitive, especially when the applicationaccesses a large number of unique keys (the computationis O(n), where n is the size of the shadow queue).

Instead, Dynacache uses a bucketing scheme simi-lar to Mimir [19] (similar schemes were described inCRAMM [21] and Path [7]). In this bucketing scheme,instead of keeping track of a shadow eviction queue,there is a linked list of buckets, each containing a fixednumber of items. Each incoming request enters the topbucket, and when the top bucket is filled, we removethe bucket at the end of the queue. We maintain a hashfunction that maps each item to the bucket in which itis stored. We can estimate the stack distance of an in-coming request, by summing the size of all buckets thatappear in the queue before it, and adding it to the size ofits own bucket divided by 2. This stack distance com-putation algorithm is much faster than the naïve method,since its complexity is O(B), where B is the number ofbuckets. For Memcachier, we utilized 100 buckets with100 keys each. The difference in the hit rate improve-ment for the optimized slab class allocation based on the

0 200 400 600 800 10000

0.2

0.4

0.6

0.8

1

Number of Items in LRU Queue

Hitra

te

Application 3, Slab Class 9

Figure 2: Hit rate curve over an entire week for Application 3, SlabClass 9 (item sizes of 16-32 KB).

bucket algorithm versus exact stack distances is less than10% for all the applications we analyzed.

3.3 Which Applications to Optimize?The naïve approach would be to simply calculate the

hit rate curves of all the applications and run the op-timization function. However, estimating the hit ratecurves for hundreds of applications on each server iscostly. As our results in Table 1 have shown, not all ap-plications benefit from optimized slab class allocation.

Dynacache should ideally only optimize the slab classallocation of applications that are likely to benefit. Tothis end, we derive a metric that is simple to computeand with a high degree of certainty predicts the hit rateimprovement due to optimized slab class allocation.

Intuitively, the default slab class allocation will per-form poorly when there is a relatively uniform distribu-tion of unique requests among slab classes. When all therequests are concentrated on a small number of adjacentslab classes (or in a single slab class), there won’t be abig difference between a first-come-first-serve slab allo-cation policy and the optimal slab allocation policy.

Entropy is a metric that provides a measure of the uni-formity of a distribution function. When a probabilityfunction behaves completely uniformly, its entropy willbe the highest, and when it behaves deterministically, itsentropy will be 0. The last column in Table 1 shows theMiss Entropy across slab classes for the different appli-cations in the trace. The Miss Entropy is calculated bytreating the misses per slab class as a probability densityfunction and calculating its entropy [9].

We have found that Miss Entropy is a good indi-cator for applications that would benefit from optimalslab class allocation. The reason is that high Miss En-tropy implies that accesses to unique keys are evenly dis-tributed across slab classes, and the default slab class al-

4

Page 5: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

0.00%  

2.00%  

4.00%  

6.00%  

8.00%  

4   8   12   16   20   24   28   32   36   40  

Throughp

ut  

Overhead  

Requests  per  Second  (1000s)  

Figure 3: Throughput overhead of Dynacache profiler. The overheadremains 6% after 40,000 requests per second, because the Memcachedserver becomes saturated.

location tends to disproportionately prioritize large slabclasses in such workloads. The only exception to thisrule is application 9. This is because Memcachier allo-cated a very small amount of memory (only 1.6MB) toapplication 9. The optimization function does not workwith a very small number of data points, and thereforecannot improve the slab allocation, given the memoryconstraints. If application 9 had been allocated morememory, it would have benefited significantly from animproved slab class allocation, similar to applications 3and 5.

4. EVALUATIONWe implemented a prototype of the Dynacache profiler

in C, and integrated it with Memcached 1.4.22. Our im-plementation consists of about 170 code lines. In orderto measure the overhead added by Dynacache, we lever-aged Mutilate [14], a load generator that emulates Mem-cached workloads from the 2012 Facebook study [6]. Weused the same key, value and read/write distributions asdescribed in the Facebook paper. We used the Facebookworkload because it is much more CPU intensive thanthe applications measured in the Memcachier trace. Weran our simulation on an Intel Xeon E5-2670 system with32 GB of RAM and an SSD drive, using 5 minutes ex-periments. We measured the achieved throughput andlatency overhead while running the Dynacache profiler,under different request loads.

We examined the throughput and latency achieved us-ing different request loads, ranging from 4000 requestsper second (Memcachier’s average load) to 100,000 re-quests per second. Our evaluation shows an averageof 5.8% latency slowdown for read queries (GETs) and9.6% latency slowdown for write queries (SETs), withnegligible deviations between the experiments.

Figure 3 presents the throughput overhead. In order tomeasure throughput, we generated a series of requests atthe client, and measured the number of requests returnedduring a 5 minute period. The figure shows that through-

put was not affected at lower loads, but had an overheadof 6% compared to the default Memcached implementa-tion, once Memcached became CPU bounded. The over-head remained at 6% after 40,000 requests per second,because the Memcached server becomes saturated. Notethat in the case of Memcachier, Memcached is memorybound and not CPU bound, and therefore the Dynacachewould not impose any overhead on throughput. Initialprofiling shows that further optimizations can be madein our implementation by reducing the number of hashcomputations in cases of updates and bucket deletions,and by running the profiler asynchronously (i.e., not inthe critical path of incoming requests).

5. RELATED WORKThis work is related to previous work on improv-

ing and profiling the performance of Memcached.Mimir [19] and Blaze [8] also profile cache hit ratecurves to enforce QoS guarantees in multi-tenant webcaches. Similarly, Wires et. al. profile hit rate curvesusing Counter Stacks [20] in order to better provisionFlash based storage resources. In addition, Hwang et.al. have proposed a dynamic hashing system [13] thatcan evenly distribute requests across servers, taking intoaccount varying item sizes. A recent study on the Face-book photo cache demonstrates that modifying LRU cansignificantly improve web cache performance [12]. Twit-ter [18] and Facebook [17] have tried to improve Mem-cached slab class allocation to better adjust for varyingitem sizes, by periodically shifting pages from slabs witha high hitrate to those with a low hitrate. Both of theseschemes are far from optimal, since they do not take intoaccount the hit rate curves across all the slab classes, andwe plan to conduct a quantitative comparison with themin future work. There is a body of prior work on algo-rithms for calculating stack distances in the context ofCPU caches [5, 10, 16, 22]. These techniques may beapplicable for Dynacache to enhance the performance ofthe profiler.

6. CONCLUSIONBy analyzing a multi-tenant Memcached cluster, we

demonstrated that a web-based cache can be improvedsignificantly by tuning its behavior to dynamically adjustto the requirements of different applications. We showedthat the performance of certain applications can be sig-nificantly improved by simply better allocating the mem-ory slab allocation within the Memcached servers, with-out interfering with the data path of the cache. Our nextstep is to generalize dynamic tuning to the other param-eters in cache systems such as relative allocations acrossapplications and eviction policies.

5

Page 6: Dynacache: Dynamic Cloud Caching - People | MIT CSAIL · 2015-08-28 · Dynacache: Dynamic Cloud Caching Asaf Cidon 1, Assaf Eisenman , Mohammad Alizadeh2, and Sachin Katti1 1Stanford

References[1] Amazon Elasticache. aws.amazon.com/elasticache/.

[2] Memcached. code.google.com/p/memcached/wiki/NewUserInternals.

[3] Memcachier. www.memcachier.com.

[4] Redis. redis.io.

[5] G. Almási, C. Cascaval, and D. A. Padua. Calculating stack dis-tances efficiently. In ACM SIGPLAN Notices, volume 38, pages37–43. ACM, 2002.

[6] B. Atikoglu, Y. Xu, E. Frachtenberg, S. Jiang, and M. Paleczny.Workload analysis of a large-scale key-value store. In ACM SIG-METRICS Performance Evaluation Review, volume 40, pages53–64. ACM, 2012.

[7] R. Azimi, L. Soares, M. Stumm, T. Walsh, and A. D. Brown.Path: page access tracking to improve memory management. InProceedings of the 6th international symposium on Memory man-agement, pages 31–42. ACM, 2007.

[8] H. Bjornsson, G. Chockler, T. Saemundsson, and Y. Vigfusson.Dynamic performance profiling of cloud caches. In Proceed-ings of the 4th annual Symposium on Cloud Computing, page 59.ACM, 2013.

[9] T. M. Cover and J. A. Thomas. Elements of information theory.John Wiley & Sons, 2012.

[10] C. Ding and Y. Zhong. Predicting whole-program localitythrough reuse distance analysis. In ACM SIGPLAN Notices, vol-ume 38, pages 245–257. ACM, 2003.

[11] B. Fitzpatrick. Distributed caching with Memcached. Linux jour-nal, 2004(124):5, 2004.

[12] Q. Huang, K. Birman, R. van Renesse, W. Lloyd, S. Kumar, andH. C. Li. An analysis of Facebook photo caching. In Proceed-ings of the Twenty-Fourth ACM Symposium on Operating SystemsPrinciples, pages 167–181. ACM, 2013.

[13] J. Hwang and T. Wood. Adaptive performance-aware distributedmemory caching. In ICAC, pages 33–43, 2013.

[14] J. Leverich. Mutilate. github.com/leverich/mutilate/.

[15] A. Magnani and S. P. Boyd. Convex piecewise-linear fitting. Op-timization and Engineering, 10(1):1–17, 2009.

[16] R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluationtechniques for storage hierarchies. IBM Systems journal, 9(2):78–117, 1970.

[17] R. Nishtala, H. Fugal, S. Grimm, M. Kwiatkowski, H. Lee, H. C.Li, R. McElroy, M. Paleczny, D. Peek, P. Saab, D. Stafford,T. Tung, and V. Venkataramani. Scaling Memcache at Face-book. In Presented as part of the 10th USENIX Symposium onNetworked Systems Design and Implementation (NSDI 13), pages385–398, Lombard, IL, 2013. USENIX.

[18] M. Rajashekhar and Y. Yue. Twemcache. blog.twitter.com/2012/caching-with-twemcache.

[19] T. Saemundsson, H. Bjornsson, G. Chockler, and Y. Vigfusson.Dynamic performance profiling of cloud caches. In Proceedingsof the ACM Symposium on Cloud Computing, pages 1–14. ACM,2014.

[20] J. Wires, S. Ingram, Z. Drudi, N. J. Harvey, A. Warfield, andC. Data. Characterizing storage workloads with counter stacks.In Proceedings of the 11th USENIX conference on Operating Sys-tems Design and Implementation, pages 335–349. USENIX As-sociation, 2014.

[21] T. Yang, E. D. Berger, S. F. Kaplan, and J. E. B. Moss. CRAMM:Virtual memory support for garbage-collected applications. InProceedings of the 7th symposium on Operating systems de-sign and implementation, pages 103–116. USENIX Association,2006.

[22] Y. Zhong, S. G. Dropsho, and C. Ding. Miss rate prediction acrossall program inputs. In Parallel Architectures and CompilationTechniques, 2003. PACT 2003. Proceedings. 12th InternationalConference on, pages 79–90. IEEE, 2003.

6