Scaling Your Cache And Caching At Scale

59
Scaling Your Cache & Caching at Scale Alex Miller @puredanger

description

Caching has been an essential strategy for greater performance in computing since the beginning of the field. Nearly all applications have data access patterns that make caching an attractive technique, but caching also has hidden trade-offs related to concurrency, memory usage, and latency.As we build larger distributed systems, caching continues to be a critical technique for building scalable, high-throughput, low-latency applications. Large systems tend to magnify the caching trade-offs and have created new approaches to distributed caching. There are unique challenges in testing systems like these as well.Ehcache and Terracotta provide a unique way to start with simple caching for a small system and grow that system over time with a consistent API while maintaining low-latency, high-throughput caching.

Transcript of Scaling Your Cache And Caching At Scale

Page 1: Scaling Your Cache And Caching At Scale

Scaling Your Cache & Caching at Scale

Alex Miller@puredanger

Page 2: Scaling Your Cache And Caching At Scale

Mission• Why does caching work?

• What’s hard about caching? • How do we make choices as we

design a caching architecture?• How do we test a cache for

performance?

Page 3: Scaling Your Cache And Caching At Scale

What is caching?

Page 4: Scaling Your Cache And Caching At Scale

Lots of data

Page 5: Scaling Your Cache And Caching At Scale

Memory Hierarchy

Register

L1 cache

L2 cache

RAM

Disk

Remote disk

1E+00 1E+01 1E+02 1E+03 1E+04 1E+05 1E+06 1E+07 1E+08 1E+09

1000000000

10000000

200

15

3

1

Clock cycles to access

Page 6: Scaling Your Cache And Caching At Scale

Register

L1 Cache

L2 Cache

Main Memory

Local Disk

Remote Disk

Fast

Slow

Small Expensive

Big Cheap

Facts of Life

Page 7: Scaling Your Cache And Caching At Scale

Caching to the rescue!

Page 8: Scaling Your Cache And Caching At Scale

Temporal Locality

Stream:

Cache:

Hits: 0%

Page 9: Scaling Your Cache And Caching At Scale

Temporal Locality

Stream:

Cache:

Hits: 0%

Stream:

Cache:

Hits: 65%

Page 10: Scaling Your Cache And Caching At Scale

Non-uniform distribution

0

800

1600

2400

3200

0%

25%

50%

75%

100%Web page hits, ordered by rank

Page views, ordered by rank

Pageviews per rank% of total hits per rank

Page 11: Scaling Your Cache And Caching At Scale

Temporal locality+

Non-uniform distribution

Page 12: Scaling Your Cache And Caching At Scale

17000 pageviewsassume avg load = 250 ms

cache 17 pages / 80% of viewscached page load = 10 ms

new avg load = 58 ms

trade memory for latency reduction

Page 13: Scaling Your Cache And Caching At Scale

The hidden benefit:reduces database load

DatabaseMemory

line of over provisioning

Page 14: Scaling Your Cache And Caching At Scale

A brief aside...

• What is Ehcache?

• What is Terracotta?

Page 15: Scaling Your Cache And Caching At Scale

Ehcache Example

CacheManager manager = new CacheManager();Ehcache cache = manager.getEhcache("employees");cache.put(new Element(employee.getId(), employee));Element element = cache.get(employee.getId());

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>

Page 16: Scaling Your Cache And Caching At Scale

Terracotta

App Node

TerracottaServer

TerracottaServer

App NodeApp NodeApp Node

App NodeApp NodeApp NodeApp Node

Page 17: Scaling Your Cache And Caching At Scale

But things are not always so simple...

Page 18: Scaling Your Cache And Caching At Scale

Pain of LargeData Sets

• How do I choose which elements stay in memory and which go to disk?

• How do I choose which elements to evict when I have too many?

• How do I balance cache size against other memory uses?

Page 19: Scaling Your Cache And Caching At Scale

Eviction

When cache memory is full, what do I do?

• Delete - Evict elements

• Overflow to disk - Move to slower, bigger storage

• Delete local - But keep remote data

Page 20: Scaling Your Cache And Caching At Scale

Eviction in Ehcache

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>

Evict with “Least Recently Used” policy:

Page 21: Scaling Your Cache And Caching At Scale

Spill to Disk in Ehcache

<diskStore path="java.io.tmpdir"/>

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600"

overflowToDisk="true" maxElementsOnDisk="1000000" diskExpiryThreadIntervalSeconds="120" diskSpoolBufferSizeMB="30" />

Spill to disk:

Page 22: Scaling Your Cache And Caching At Scale

Terracotta Clustering

<terracottaConfig url="server1:9510,server2:9510"/>

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false">

<terracotta/> </cache>

Terracotta configuration:

Page 23: Scaling Your Cache And Caching At Scale

Pain of Stale Data

• How tolerant am I of seeing values changed on the underlying data source?

• How tolerant am I of seeing values changed by another node?

Page 24: Scaling Your Cache And Caching At Scale

Expiration

1 2 3 4 5 6 7 80 9

TTI=4

TTL=4

Page 25: Scaling Your Cache And Caching At Scale

TTI and TTL in Ehcache

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="false"/>

Page 26: Scaling Your Cache And Caching At Scale

Replication in Ehcache<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution. RMICacheManagerPeerProviderFactory" properties="hostName=fully_qualified_hostname_or_ip, peerDiscovery=automatic, multicastGroupAddress=230.0.0.1, multicastGroupPort=4446, timeToLive=32"/>

<cache name="employees" ...> <cacheEventListenerFactory class="net.sf.ehcache.distribution.RMICacheReplicatorFactory” properties="replicateAsynchronously=true, replicatePuts=true, replicatePutsViaCopy=false, replicateUpdates=true, replicateUpdatesViaCopy=true, replicateRemovals=true asynchronousReplicationIntervalMillis=1000"/></cache>

Page 27: Scaling Your Cache And Caching At Scale

Terracotta Clustering

Still use TTI and TTL to manage stale data between cache and data source

Coherent by default but can relax with coherentReads=”false”

Page 28: Scaling Your Cache And Caching At Scale

Pain of Loading

• How do I pre-load the cache on startup?

• How do I avoid re-loading the data on every node?

Page 29: Scaling Your Cache And Caching At Scale

Persistent Disk Store

<diskStore path="java.io.tmpdir"/>

<cache name="employees" maxElementsInMemory="1000" memoryStoreEvictionPolicy="LRU" eternal="false" timeToIdleSeconds="600" timeToLiveSeconds="3600" overflowToDisk="true" maxElementsOnDisk="1000000" diskExpiryThreadIntervalSeconds="120" diskSpoolBufferSizeMB="30"

diskPersistent="true" />

Page 30: Scaling Your Cache And Caching At Scale

Bootstrap Cache Loader

<bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution. RMIBootstrapCacheLoaderFactory" properties="bootstrapAsynchronously=true, maximumChunkSizeBytes=5000000" propertySeparator=",” />

Bootstrap a new cache node from a peer:

On startup, create background thread to pull the existing cache data from another peer.

Page 31: Scaling Your Cache And Caching At Scale

Terracotta Persistence

Nothing needed beyond setting up Terracotta clustering.

Terracotta will automatically bootstrap:- the cache key set on startup- cache values on demand

Page 32: Scaling Your Cache And Caching At Scale

Pain of Duplication

• How do I get failover capability while avoiding excessive duplication of data?

Page 33: Scaling Your Cache And Caching At Scale

Partitioning + Terracotta Virtual Memory

• Each node (mostly) holds data it has seen• Use load balancer to get app-level partitioning• Use fine-grained locking to get concurrency• Use memory flush/fault to handle memory

overflow and availability• Use causal ordering to guarantee coherency

Page 34: Scaling Your Cache And Caching At Scale

Scaling Your Cache

Page 35: Scaling Your Cache And Caching At Scale

21

1 JVM

Ehcache

2 or more JVMS2 or more

JVMs

EhcacheRMI

2 or more JVMS2 or more big JVMs

Ehcachedisk store

Terracotta OSS

2 or more JVMS2 or more

JVMs

2 or more JVMs2 or more

JVMs2 or more JVMslots of

JVMs

more scale

Terracotta FXEhcache FX

2 or more JVMS2 or more

JVMs2 or more JVMslots of

JVMs

Terracotta FXEhcache FX

NO NO YES YESYES

Scalability Continuumcausal

ordering

# JVMs

runtime

Ehcache DXmanagementand control

Ehcache EX and FXmanagementand control

Page 36: Scaling Your Cache And Caching At Scale

Caching at Scale

Page 37: Scaling Your Cache And Caching At Scale

Know Your Use Case• Is your data partitioned (sessions) or

not (reference data)?• Do you have a hot set or uniform

access distribution?• Do you have a very large data set?

• Do you have a high write rate (50%)?• How much data consistency do you

need?

Page 38: Scaling Your Cache And Caching At Scale

Types of caches

Name Communication Advantage

Broadcast invalidation

multicast low latency

Replicated multicast offloads db

Datagrid point-to-point scalable

Distributed 2-tier point-to-point

all of the above

Page 39: Scaling Your Cache And Caching At Scale

Common Data PatternsI/O pattern Locality Hot set Rate of

changeCatalog/customer

low low low

Inventory high high high

Conversations high high low

Catalogs/customers• warm all the

data into cache

• High TTL

Inventory• fine-grained

locking• write-behind to DB

Conversations• sticky load

balancer• disconnect

conversations from DB

Page 40: Scaling Your Cache And Caching At Scale

Build a Test

• As realistic as possible• Use real data (or good fake data)

• Verify test does what you think• Ideal test run is 15-20 minutes

Page 41: Scaling Your Cache And Caching At Scale

Cache Warming

Page 42: Scaling Your Cache And Caching At Scale

Cache Warming

• Explicitly record cache warming or loading as a testing phase

• Possibly multiple warming phases

Page 43: Scaling Your Cache And Caching At Scale

Lots o’ Knobs

Page 44: Scaling Your Cache And Caching At Scale

Things to Change• Cache size• Read / write / other mix• Key distribution• Hot set• Key / value size and structure• # of nodes

Page 45: Scaling Your Cache And Caching At Scale

Lots o’ Gauges

Page 46: Scaling Your Cache And Caching At Scale

Things to Measure

• Application throughput (TPS)• Application latency

• OS: CPU, Memory, Disk, Network• JVM: Heap, Threads, GC

Page 47: Scaling Your Cache And Caching At Scale

Benchmark and Tune

• Create a baseline• Run and modify parameters

• Test, observe, hypothesize, verify• Keep a run log

Page 48: Scaling Your Cache And Caching At Scale

Bottleneck Analysis

Page 49: Scaling Your Cache And Caching At Scale

Pushing It• If CPUs are not all busy...

• Can you push more load?• Waiting for I/O or resources

• If CPUs are all busy...• Latency analysis

Page 50: Scaling Your Cache And Caching At Scale

I/O Waiting• Database

• Connection pooling • Database tuning

• Lazy connections• Remote services

Page 51: Scaling Your Cache And Caching At Scale

Locking and Concurrency

Key Value1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

get 2get 2

put 8

put 1

2

Threads Locks

Page 52: Scaling Your Cache And Caching At Scale

Locking and Concurrency

Key Value1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

get 2get 2

put 8

put 12

Threads Locks

Page 53: Scaling Your Cache And Caching At Scale

Objects and GC

• Unnecessary object churn• Tune GC

• Concurrent vs parallel collectors• Max heap

• ...and so much more• Watch your GC pauses!!!

Page 54: Scaling Your Cache And Caching At Scale

Cache Efficiency

• Watch hit rates and latencies• Cache hit - should be fast

• Unless concurrency issue• Cache miss

• Miss local vs• Miss disk / cluster

Page 55: Scaling Your Cache And Caching At Scale

Cache Sizing• Expiration and eviction tuning

• TTI - manage moving hot set

• TTL - manage max staleness

• Max in memory - keep hot set resident

• Max on disk / cluster - manage total disk / clustered cache

Page 56: Scaling Your Cache And Caching At Scale

Cache Coherency

• No replication (fastest)

• RMI replication (loose coupling)• Terracotta replication (causal

ordering) - way faster than strict ordering

Page 57: Scaling Your Cache And Caching At Scale

Latency Analysis

• Profilers• Custom timers

• Tier timings• Tracer bullets

Page 58: Scaling Your Cache And Caching At Scale

mumble-mumble*

It’s time to add it to Terracotta.

* lawyers won’t let me say more

Page 59: Scaling Your Cache And Caching At Scale

Thanks!

• Twitter - @puredanger• Blog - http://tech.puredanger.com

• Terracotta - http://terracotta.org• Ehcache - http://ehcache.org