MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
description
Transcript of MemC3: Compact and Concurrent MemCache with Dumber Caching and Smarter Hashing
MemC3: Compact and Concurrent MemCache with Dumber
Caching and Smarter Hashing
Bin Fan, David G. Andersen, Michael Kaminsky
Presenter: Son Nguyen
Memcached internal• LRU caching using chaining Hashtable and
doubly linked list
Goals
• Reduce space overhead (bytes/key)• Improve throughput (queries/sec)• Target read-intensive workload with small
objects• Result: 3X throughput, 30% more objects
Doubly-linked-list’s problems
• At least two pointers per item -> expensive• Both read and write change the list’s structure
-> need locking between threads (no concurrency)
Solution: CLOCK-based LRU
• Approximate LRU• Multiple readers/single writer• Circular queue instead of linked list -> less
space overhead
CLOCK exampleentry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)
recency 1 0 1 1 0
entry (ka, va) (kb, vb) (kc, vc) (kd, vd) (ke, ve)
recency 1 0 1 0 0Read(kd):
entry (ka, va) (kb, vb) (kf, vf) (kd, vd) (ke, ve)
recency 1 1 0 0 0Write(kf, vf):
entry (kg, vg) (kb, vb) (kf, vf) (kd, vd) (ke, ve)
recency 0 1 0 1 1Write(kg, vg):
Originally:
Chaining Hashtable’s problems
• Use linked list -> costly space overhead for pointers
• Pointer dereference is slow (no advantage from CPU cache)
• Read is not constant time (due to possibly long list)
Solution: Cuckoo Hashing
• Use 2 hashtables• Each bucket has exactly 4 slots (fits in CPU
cache)• Each (key, value) object therefore can reside at
one of the 8 possible slots
Cuckoo Hashing
(ka,va)
HASH1(ka)
HASH2(ka)
Cuckoo Hashing
• Read: always 8 lookups (constant, fast)• Write: write(ka, va) – Find an empty slot in 8 possible slots of ka– If all are full then randomly kick some (kb, vb) out– Now find an empty slot for (kb, vb)– Repeat 500 times or until an empty slot is found– If still not found then do table expansion
Cuckoo HashingX
X X X
X
X X
X X
X X X
X c X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
ba
Insert a:
Cuckoo HashingX
X X a X
X
X X
X X
X X X
X X X
X X
X X X X
X
X
(kb,vb)
HASH1(kb)
HASH2(kb) cb
Insert b:
Cuckoo HashingX
X X a X
X
X X
X X
X X X
X b X X
X X
X X X X
X
X
(kc,vc)
HASH1(kc)
HASH2(kc)
c
Insert c:
Done !!!
Cuckoo Hashing
• Problem: after (kb, vb) is kicked out, a reader might attempt to read (kb, vb) and get a false cache miss
• Solution: Compute the kick out path (Cuckoo path) first, then move items backward
• Before: (b,c,Null)->(a,c,Null)->(a,b,Null)->(a,b,c)• Fixed: (b,c,Null)->(b,c,c)->(b,b,c)->(a,b,c)
Cuckoo pathX
X X b X
X
X X
X X
X X X
X c X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
Cuckoo path backward insertX
X X X
X
X X
X X
X X X
X X X
X X
X X X X
X
X
(ka,va)
HASH1(ka)
HASH2(ka)
Insert a:
c
ba
Cuckoo’s advantages
• Concurrency: multiple readers/single writer• Read optimized (entries fit in CPU cache)• Still O(1) amortized time for write• 30% less space overhead• 95% table occupancy
Evaluation68% throughput improvement in all hit case. 235% for all miss
Evaluation3x throughput on “real” workload
Discussion
• Write is slower than chaining Hashtable– Chaining Hashtable: 14.38 million keys/sec– Cuckoo: 7 million keys/sec
• Idea: finding cuckoo path in parallel– Benchmark doesn’t show much improvement
• Can we make it write-concurrent?