Concurrent Tries with Efficient Non-blocking Snapshots
description
Transcript of Concurrent Tries with Efficient Non-blocking Snapshots
Concurrent Tries with Efficient Non-blocking Snapshots
Aleksandar ProkopecPhil Bagwell
Martin OderskyÉcole Polytechnique Fédérale de Lausanne
Nathan BronsonStanford
Motivation
val numbers = getNumbers()
// compute square rootsnumbers foreach { entry => x = entry.root n = entry.number entry.root = 0.5 * (x + n / x) if (abs(entry.root - x) < eps) numbers.remove(entry)}
Hash Array Mapped Tries (HAMT)
Hash Array Mapped Tries (HAMT)
0 = 0000002
Hash Array Mapped Tries (HAMT)
0
Hash Array Mapped Tries (HAMT)
016 = 0100002
Hash Array Mapped Tries (HAMT)
0 16
Hash Array Mapped Tries (HAMT)
0 164 = 0001002
Hash Array Mapped Tries (HAMT)
16
0
4 = 0001002
Hash Array Mapped Tries (HAMT)
16
0 4
Hash Array Mapped Tries (HAMT)
16
0 4
12 = 0011002
Hash Array Mapped Tries (HAMT)
16
0 4
12 = 0011002
Hash Array Mapped Tries (HAMT)
16
0 4 12
Hash Array Mapped Tries (HAMT)
16 33
0 4 12
Hash Array Mapped Tries (HAMT)
16 33
0 4 12
48
Hash Array Mapped Tries (HAMT)
16
0 4 12
48
33 37
Hash Array Mapped Tries (HAMT)
16
4 12
48
33 37
0 3
Hash Array Mapped Tries (HAMT)
4 12 16 20 25 33 37
0 1 8 93
48 57
Immutable HAMT
• used as immutable maps in functional languages
4 12 16 20 25 33 37
0 1 8 93
Immutable HAMT
• updates rewrite path from root to leaf
4 12 16 20 25 33 37
0 1 8 93
4 12
8 9 11
insert(11)
Immutable HAMT
• updates rewrite path from root to leaf
4 12 16 20 25 33 37
0 1 8 93
4 12
8 9 11
insert(11)
efficient updates - logk(n)
Node compression
48 57
48 571 0 1 0
48 571 0 1 0
48 5710
BITPOP(((1 << ((hc >> lev) & 1F)) – 1) & BMP)
Node compression
48 57
48 571 0 1 0
48 571 0 1 0
48 5710 48 57
Ctrie
Can mutable HAMT be modified to be
thread-safe?
Ctrie insert
4 9 12 16 20 25 33 37
0 1 3
48 57
17 = 0100012
Ctrie insert
4 9 12 16 20 25 33 37
0 1 3
48 57
17 = 010001216 17
1) allocate
Ctrie insert
4 9 12 20 25 33 37
0 1 3
48 57
17 = 010001216 17
2) CAS
Ctrie insert
4 9 12 20 25 33 37
0 1 3
48 57
17 = 010001216 17
Ctrie insert
4 9 12 33 37
0 1 3
48 57
18 = 0100102
16 17
20 25
Ctrie insert
4 9 12 33 37
0 1 3
48 57
18 = 0100102
16 17
20 25
1) allocate16 17 18
Ctrie insert
4 9 12 33 37
0 1 3
48 57
18 = 0100102
20 25
2) CAS 16 17 18
Ctrie insert
4 9 12 33 37
0 1 3
48 57
18 = 0100102
20 25
2) CAS 16 17 18
Unless…
Ctrie insert
4 9 12 33 37
0 1 3
48 57
18 = 0100102
16 17
20 25
T1-1) allocate16 17 18
Unless…28 = 0111002
T1
T2
Ctrie insert
4 9 12
0 1 3
18 = 0100102
16 17
20 25
T1-1) allocate16 17 18
Unless…28 = 0111002
T1
T2
20 25 28 T2-1) allocate
Ctrie insert
4 9 12
0 1 3
18 = 0100102
16 17
20 25
T1-1) allocate16 17 18
28 = 0111002
T1
T2
20 25 28
T2-2) CAS
Ctrie insert
4 9 12
0 1 3
18 = 0100102
16 17
20 25
T1-2) CAS
16 17 18
28 = 0111002
T1
T2
20 25 28
T2-2) CAS
Ctrie insert
4 9 12
0 1 3
18 = 0100102
16 17
20 25
16 17 18
28 = 0111002
T1
T2
20 25 28
Lost insert!
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17
20 25
Solution: I-nodes
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17
20 25
18 = 0100102
28 = 0111002
T1
T2
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17
T1
T2
20 25
18 = 0100102
28 = 0111002
16 17 18
20 25 28 T2-1) allocate
T1-1) allocate
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17
T1
T2
20 25
16 17 18
20 25 28
T2-2) CAS
T1-2) CAS
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17 18
20 25 28
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17 18
20 25 28
Idea: once added to the Ctrie, I-nodes remain present.
Ctrie insert – 2nd attempt
4 9 12
0 1 3 16 17 18
20 25 28
Remove operation supported as well - details in the paper.
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 0
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 0
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 0
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 0
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 1
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 2
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 3
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 5
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 5
actual size = 12
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 5
0 1
actual size = 12
Ctrie size
4 9 12
0 1 3 16 17 18
20 25 28
size = 5
0 1
CAS
actual size = 11
Ctrie size
4 9 12
16 17 18
20 25 28
size = 5
0 1
actual size = 11
Ctrie size
4 9 12
16 17 18
20 25 28
size = 6
0 1
actual size = 11
Ctrie size
4 9 12
16 17 18
20 25 28
size = 6
0 1
actual size = 11
19
Ctrie size
4 9 12
16 17 18
20 25 28
size = 6
0 1
actual size = 11
16 17 18 19
Ctrie size
4 9 12
16 17 18
20 25 28
size = 6
0 1
actual size = 12
16 17 18 19
CAS
Ctrie size
4 9 12 20 25 28
size = 6
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 6
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 7
0 1
actual size = 9
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 8
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 9
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 10
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 11
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 12
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 13
0 1
actual size = 12
16 17 18 19
Ctrie size
4 9 12 20 25 28
size = 13
0 1
actual size = 12
16 17 18 19
But the sizewas never 13!
Global state information
4 9 12 20 25 28
0 1 16 17 18 19
• size• find• filter• iterator
Global state information
4 9 12 20 25 28
0 1 16 17 18 19
• size• find• filter• iterator
snapshot
Snapshot using locks
4 9 12 20 25 28
0 1 16 17 18 19
Snapshot using locks
4 9 12 20 25 28
0 1 16 17 18 19
• copy expensive
Snapshot using locks
4 9 12 20 25 28
0 1 16 17 18 19
• copy expensive• not lock-free
Snapshot using locks
4 9 12 20 25 28
0 1 16 17 18 19
• copy expensive• not lock-free• can insert or
remove remain lock-free?
0 1 2
CAS
Snapshot using locks
4 9 12 20 25 28
0 1 16 17 18 19
• copy expensive• not lock-free• can insert or
remove remain lock-free?
0 1 2
CAS
Snapshot using logs
4 9 12 20 25 28
0 1 16 17 18 19
• keep a linked list of previous values in each I-node
Snapshot using logs
4 9 12 20 25 28
0 1 16 17 18 190 1 2
• keep a linked list of previous values in each I-node
Snapshot using logs
4 9 12 20 25 28
0 1 16 17 18 19
• keep a linked list of previous values in each I-node
• when is it safe to delete old entries?0 1 2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
root
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
root
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
snapshot!
root
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
snapshot!
#2
root
1) create new I-node at #2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
snapshot!
#2
root
2) set snapshot
snapshot #1
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
snapshot!
#2
root 3) CAS root to new I-nodesnapshot #1
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
generation #2 - ok!
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
generation #1not ok, too old!
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
root
1) create updated node at #2
snapshot #1
2
#2 #2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
root2) CAS to the updated node
snapshot #1
2
#2 #2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
#2 #2
#1 too old!
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
#2 #2
4 9 12
#2 1) create updated node at #2
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
2
#2 #2
4 9 12
#2
2) CAS
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
subsequent insert
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2
finally, create a new leafand CAS
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
another insert
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2
3
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
another insert
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
But... this won't really work... why?
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
CAS
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
CAS
How to fail this last CAS?
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
DCAS
How to fail this last CAS?DCAS
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
How to fail this last CAS?DCAS - software based
DCAS
Snapshot using immutability
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 0 1 2 3
T2: remove 19
16 17 18
How to fail this last CAS?DCAS - software based...creates intermediate objects
DCAS
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
T2: remove 19
16 17 18 prev
1) set prev field
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
T2: remove 19
16 17 18 prev
2) CAS
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
T2: remove 19
16 17 18 prev
3) read root generation
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
16 17 18 prev 4) if root generation changed CAS prev to FailedNode(prev)
FN
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
16 17 18 prev 4) if root generation changed CAS prev to FailedNode(prev)
FN
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
16 17 18 prev 5) CAS to previous value
FN
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
16 17 18 prev 4) if root generation unchanged CAS prev to null
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
16 17 18 4) if root generation unchanged CAS prev to null
GCAS - generation-compare-and-swap
4 9 12 20 25 28
0 1 16 17 18 19
#1
#1 #1
#1 #1
#2
rootsnapshot #1
#2 #2
4 9 12
#2
0 1 2 3
1) Replace all CAS with GCAS2) Replace all READ with GCAS_READ (which checks if prev field is null)
Snapshot-based iterator
def iterator = if (isSnapshot) new Iterator(root) else snapshot().iterator()
Snapshot-based size
def size = { val sz = 0 val it = iterator while (it.hasNext) sz += 1 sz}
Snapshot-based size
def size = { val sz = 0 val it = iterator while (it.hasNext) sz += 1 sz}
Above is O(n).But, by caching size in nodes - amortized O(logkn)!(see source code)
Snapshot-based atomic clear
def clear() = { val or = READ(root) val nr = new INode(new Gen) if (!CAS(root, or, nr)) clear()}
(roughly)
Evaluation - quad core i7
Evaluation – UltraSPARC T2
Evaluation – 4x 8-core i7
Evaluation – snapshot
Conclusion
• snapshots are linearizable and lock-free• snapshots take constant time• snapshots are horizontally scalable• snapshots add a non-significant overhead to the
algorithm if they aren't used• the approach may be applicable to tree-based
lock-free data-structures in general (intuition)
Thank you!