Post on 27-Mar-2015
Spin Locks and Contention
Companion slides forThe Art of Multiprocessor Programming
by Maurice Herlihy & Nir Shavit
Art of Multiprocessor Programming 2
Focus so far: Correctness and Progress
• Models– Accurate (we never lied to you)
– But idealized (so we forgot to mention a few things)
• Protocols– Elegant– Important– But naïve
Art of Multiprocessor Programming 3
New Focus: Performance
• Models– More complicated (not the same as complex!)
– Still focus on principles (not soon obsolete)
• Protocols– Elegant (in their fashion)
– Important (why else would we pay attention)
– And realistic (your mileage may vary)
Art of Multiprocessor Programming 4
Kinds of Architectures
• SISD (Uniprocessor)– Single instruction stream– Single data stream
• SIMD (Vector)– Single instruction– Multiple data
• MIMD (Multiprocessors)– Multiple instruction– Multiple data.
Art of Multiprocessor Programming 5
Kinds of Architectures
• SISD (Uniprocessor)– Single instruction stream– Single data stream
• SIMD (Vector)– Single instruction– Multiple data
• MIMD (Multiprocessors)– Multiple instruction– Multiple data.
Our space
(1)
Art of Multiprocessor Programming 6
MIMD Architectures
• Memory Contention• Communication Contention • Communication Latency
Shared Bus
memory
Distributed
Art of Multiprocessor Programming 7
Today: Revisit Mutual Exclusion
• Performance, not just correctness• Proper use of multiprocessor
architectures• A collection of locking algorithms…
(1)
Art of Multiprocessor Programming 8
What Should you do if you can’t get a lock?
• Keep trying– “spin” or “busy-wait”– Good if delays are short
• Give up the processor– Good if delays are long– Always good on uniprocessor
(1)
Art of Multiprocessor Programming 9
What Should you do if you can’t get a lock?
• Keep trying– “spin” or “busy-wait”– Good if delays are short
• Give up the processor– Good if delays are long– Always good on uniprocessor
our focus
Art of Multiprocessor Programming 10
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...
Art of Multiprocessor Programming 11
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...
…lock introduces sequential bottleneck
Art of Multiprocessor Programming 12
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...
…lock suffers from contention
Art of Multiprocessor Programming 13
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...Notice: these are distinct phenomena
…lock suffers from contention
Art of Multiprocessor Programming 14
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...Seq Bottleneck no parallelism
…lock suffers from contention
Art of Multiprocessor Programming 15
Basic Spin-Lock
CS
Resets lock upon exit
spin lock
critical section
...Contention ???
…lock suffers from contention
Art of Multiprocessor Programming 16
Review: Test-and-Set
• Boolean value
• Test-and-set (TAS)– Swap true with current value– Return value tells if prior value was true or
false
• Can reset just by writing false
• TAS aka “getAndSet”
Art of Multiprocessor Programming 17
Review: Test-and-Set
public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; }}
(5)
Art of Multiprocessor Programming 18
Review: Test-and-Set
public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; }}
Packagejava.util.concurrent.atomic
Art of Multiprocessor Programming 19
Review: Test-and-Set
public class AtomicBoolean { boolean value; public synchronized boolean getAndSet(boolean newValue) {
boolean prior = value; value = newValue; return prior; }}
Swap old and new values
Art of Multiprocessor Programming 20
Review: Test-and-Set
AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)
Art of Multiprocessor Programming 21
Review: Test-and-Set
AtomicBoolean lock = new AtomicBoolean(false)…boolean prior = lock.getAndSet(true)
(5)
Swapping in true is called “test-and-set” or TAS
Art of Multiprocessor Programming 22
Test-and-Set Locks
• Locking– Lock is free: value is false– Lock is taken: value is true
• Acquire lock by calling TAS– If result is false, you win– If result is true, you lose
• Release lock by writing false
Art of Multiprocessor Programming 23
Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Art of Multiprocessor Programming 24
Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Lock state is AtomicBoolean
Art of Multiprocessor Programming 25
Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Keep trying until lock acquired
Art of Multiprocessor Programming 26
Test-and-set Lock
class TASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (state.getAndSet(true)) {} } void unlock() { state.set(false); }}
Release lock by resetting state to false
Art of Multiprocessor Programming 27
Space Complexity
• TAS spin-lock has small “footprint”
• N thread spin-lock uses O(1) space
• As opposed to O(n) Peterson/Bakery
• How did we overcome the (n) lower bound?
• We used a RMW operation…
Art of Multiprocessor Programming 28
Performance
• Experiment– n threads– Increment shared counter 1 million times
• How long should it take?
• How long does it take?
Art of Multiprocessor Programming 29
Graph
ideal
time
threads
no speedup because of sequential bottleneck
Art of Multiprocessor Programming 30
Mystery #1tim
e
threads
TAS lock
Ideal
What is going on?
Art of Multiprocessor Programming 31
Test-and-Test-and-Set Locks
• Lurking stage– Wait until lock “looks” free– Spin while read returns true (lock taken)
• Pouncing state– As soon as lock “looks” available– Read returns false (lock free)– Call TAS to acquire lock– If TAS loses, back to lurking
Art of Multiprocessor Programming 32
Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}
Art of Multiprocessor Programming 33
Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }} Wait until lock looks free
Art of Multiprocessor Programming 34
Test-and-test-and-set Lock
class TTASlock { AtomicBoolean state = new AtomicBoolean(false);
void lock() { while (true) { while (state.get()) {} if (!state.getAndSet(true)) return; }}
Then try to acquire it
Art of Multiprocessor Programming 35
Mystery #2
TAS lock
TTAS lock
Ideal
time
threads
Art of Multiprocessor Programming 36
Mystery
• Both– TAS and TTAS– Do the same thing (in our model)
• Except that– TTAS performs much better than TAS– Neither approaches ideal
Art of Multiprocessor Programming 37
Opinion
• Our memory abstraction is broken
• TAS & TTAS methods– Are provably the same (in our model)
– Except they aren’t (in field tests)
• Need a more detailed model …
Art of Multiprocessor Programming 38
Bus-Based Architectures
Bus
cache
memory
cachecache
Art of Multiprocessor Programming 39
Bus-Based Architectures
Bus
cache
memory
cachecache
Random access memory (10s of cycles)
Art of Multiprocessor Programming 40
Bus-Based Architectures
cache
memory
cachecache
Shared Bus•Broadcast medium•One broadcaster at a time•Processors and memory all “snoop”
Bus
Art of Multiprocessor Programming 41
Bus-Based Architectures
Bus
cache
memory
cachecache
Per-Processor Caches•Small•Fast: 1 or 2 cycles•Address & state information
Art of Multiprocessor Programming 42
Jargon Watch
• Cache hit– “I found what I wanted in my cache”– Good Thing™
Art of Multiprocessor Programming 43
Jargon Watch
• Cache hit– “I found what I wanted in my cache”– Good Thing™
• Cache miss– “I had to shlep all the way to memory for
that data”– Bad Thing™
Art of Multiprocessor Programming 44
Cave Canem
• This model is still a simplification– But not in any essential way– Illustrates basic principles
• Will discuss complexities later
Art of Multiprocessor Programming 45
Bus
Processor Issues Load Request
cache
memory
cachecache
data
Art of Multiprocessor Programming 46
Bus
Processor Issues Load Request
Bus
cache
memory
cachecache
data
Gimmedata
Art of Multiprocessor Programming 47
cache
Bus
Memory Responds
Bus
memory
cachecache
data
Got your data right
here data
Art of Multiprocessor Programming 48
Bus
Processor Issues Load Request
memory
cachecachedata
data
Gimmedata
Art of Multiprocessor Programming 49
Bus
Processor Issues Load Request
Bus
memory
cachecachedata
data
Gimmedata
Art of Multiprocessor Programming 50
Bus
Processor Issues Load Request
Bus
memory
cachecachedata
data
I got data
Art of Multiprocessor Programming 51
Bus
Other Processor Responds
memory
cachecache
data
I got data
datadata
Bus
Art of Multiprocessor Programming 52
Bus
Other Processor Responds
memory
cachecache
data
datadata
Bus
Art of Multiprocessor Programming 53
Modify Cached Data
Bus
data
memory
cachedata
data
(1)
Art of Multiprocessor Programming 54
Modify Cached Data
Bus
data
memory
cachedata
data
data
(1)
Art of Multiprocessor Programming 55
memory
Bus
data
Modify Cached Data
cachedata
data
Art of Multiprocessor Programming 56
memory
Bus
data
Modify Cached Data
cache
What’s up with the other copies?
data
data
Art of Multiprocessor Programming 57
Cache Coherence
• We have lots of copies of data– Original copy in memory – Cached copies at processors
• Some processor modifies its own copy– What do we do with the others?– How to avoid confusion?
Art of Multiprocessor Programming 58
Write-Back Caches
• Accumulate changes in cache
• Write back when needed– Need the cache for something else– Another processor wants it
• On first modification– Invalidate other entries– Requires non-trivial protocol …
Art of Multiprocessor Programming 59
Write-Back Caches
• Cache entry has three states– Invalid: contains raw seething bits– Valid: I can read but I can’t write– Dirty: Data has been modified
• Intercept other load requests• Write back to memory before using cache
Art of Multiprocessor Programming 60
Bus
Invalidate
memory
cachedatadata
data
Art of Multiprocessor Programming 61
Bus
Invalidate
Bus
memory
cachedatadata
data
Mine, all mine!
Art of Multiprocessor Programming 62
Bus
Invalidate
Bus
memory
cachedatadata
data
cache
Uh,oh
Art of Multiprocessor Programming 63
cache
Bus
Invalidate
memory
cachedata
data
Other caches lose read permission
Art of Multiprocessor Programming 64
cache
Bus
Invalidate
memory
cachedata
data
Other caches lose read permission
This cache acquires write permission
Art of Multiprocessor Programming 65
cache
Bus
Invalidate
memory
cachedata
data
Memory provides data only if not present in any cache, so no need to
change it now (expensive)
Art of Multiprocessor Programming 66
cache
Bus
Another Processor Asks for Data
memory
cachedata
data
Bus
Art of Multiprocessor Programming 67
cache data
Bus
Owner Responds
memory
cachedata
data
Bus
Here it is!
Art of Multiprocessor Programming 68
Bus
End of the Day …
memory
cachedata
dataReading OK, no writing
data data
Art of Multiprocessor Programming 69
Mutual Exclusion
• What do we want to optimize?– Bus bandwidth used by spinning threads– Release/Acquire latency– Acquire latency for idle lock
Art of Multiprocessor Programming 70
Simple TASLock
• TAS invalidates cache lines
• Spinners– Miss in cache– Go to bus
• Thread wants to release lock– delayed behind spinners
Art of Multiprocessor Programming 71
Test-and-test-and-set
• Wait until lock “looks” free– Spin on local cache– No bus use while lock busy
• Problem: when lock is released– Invalidation storm …
Art of Multiprocessor Programming 72
Local Spinning while Lock is Busy
Bus
memory
busybusybusy
busy
Art of Multiprocessor Programming 73
Bus
On Release
memory
freeinvalidinvalid
free
Art of Multiprocessor Programming 74
On Release
Bus
memory
freeinvalidinvalid
free
miss miss
Everyone misses, rereads
(1)
Art of Multiprocessor Programming 75
On Release
Bus
memory
freeinvalidinvalid
free
TAS(…) TAS(…)
Everyone tries TAS
(1)
Art of Multiprocessor Programming 76
Problems
• Everyone misses– Reads satisfied sequentially
• Everyone does TAS– Invalidates others’ caches
• Eventually quiesces after lock acquired– How long does this take?
Measuring Quiescence Time
• Acquire lock• Pause without using bus• Use bus heavily
Art of Multiprocessor Programming 77
P1
P2
Pn
If pause > quiescence time,critical section duration independent of number of threads
If pause < quiescence time,critical section duration slower with more threads
Art of Multiprocessor Programming 78
Quiescence Time
Increses linearly with the number of processors for bus architecturetim
e
threads
Art of Multiprocessor Programming 79
Mystery Explained
TAS lock
TTAS lock
Ideal
time
threads
Better than TAS but still not as good as ideal
Art of Multiprocessor Programming 80
Solution: Introduce Delay
spin locktimedr1dr2d
• If the lock looks free• But I fail to get it
• There must be contention• Better to back off than to collide again
Art of Multiprocessor Programming 81
Dynamic Example: Exponential Backoff
timed2d4d spin lock
If I fail to get lock– wait random duration before retry– Each subsequent failure doubles expected wait
Art of Multiprocessor Programming 82
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Art of Multiprocessor Programming 83
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Fix minimum delay
Art of Multiprocessor Programming 84
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} Wait until lock looks free
Art of Multiprocessor Programming 85
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}} If we win, return
Art of Multiprocessor Programming 86
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Back off for random duration
Art of Multiprocessor Programming 87
Exponential Backoff Lock
public class Backoff implements lock { public void lock() { int delay = MIN_DELAY; while (true) { while (state.get()) {} if (!lock.getAndSet(true)) return; sleep(random() % delay); if (delay < MAX_DELAY) delay = 2 * delay; }}}
Double max delay, within reason
Art of Multiprocessor Programming 88
Spin-Waiting Overhead
TTAS Lock
Backoff lock
time
threads
Art of Multiprocessor Programming 89
Backoff: Other Issues
• Good– Easy to implement– Beats TTAS lock
• Bad– Must choose parameters carefully– Not portable across platforms
Art of Multiprocessor Programming 90
Idea
• Avoid useless invalidations– By keeping a queue of threads
• Each thread– Notifies next in line– Without bothering the others
Art of Multiprocessor Programming 91
Anderson Queue Lock
flags
next
T F F F F F F F
idle
Art of Multiprocessor Programming 92
Anderson Queue Lock
flags
next
T F F F F F F F
acquiring
getAndIncrement
Art of Multiprocessor Programming 93
Anderson Queue Lock
flags
next
T F F F F F F F
acquiring
getAndIncrement
Art of Multiprocessor Programming 94
Anderson Queue Lock
flags
next
T F F F F F F F
acquired
Mine!
Art of Multiprocessor Programming 95
Anderson Queue Lock
flags
next
T F F F F F F F
acquired acquiring
Art of Multiprocessor Programming 96
Anderson Queue Lock
flags
next
T F F F F F F F
acquired acquiring
getAndIncrement
Art of Multiprocessor Programming 97
Anderson Queue Lock
flags
next
T F F F F F F F
acquired acquiring
getAndIncrement
Art of Multiprocessor Programming 98
acquired
Anderson Queue Lock
flags
next
T F F F F F F F
acquiring
Art of Multiprocessor Programming 99
released
Anderson Queue Lock
flags
next
T T F F F F F F
acquired
Art of Multiprocessor Programming 100
released
Anderson Queue Lock
flags
next
T T F F F F F F
acquired
Yow!
Art of Multiprocessor Programming 101
Anderson Queue Lock
class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;
Art of Multiprocessor Programming 102
Anderson Queue Lock
class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;
One flag per thread
Art of Multiprocessor Programming 103
Anderson Queue Lock
class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;
Next flag to use
Art of Multiprocessor Programming 104
Anderson Queue Lock
class ALock implements Lock { boolean[] flags={true,false,…,false}; AtomicInteger next = new AtomicInteger(0); ThreadLocal<Integer> mySlot;
Thread-local variable
Art of Multiprocessor Programming 105
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;}
public unlock() { flags[(mySlot+1) % n] = true;}
Art of Multiprocessor Programming 106
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;}
public unlock() { flags[(mySlot+1) % n] = true;} Take next slot
Art of Multiprocessor Programming 107
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;}
public unlock() { flags[(mySlot+1) % n] = true;} Spin until told to go
Art of Multiprocessor Programming 108
Anderson Queue Lock
public lock() {
myslot = next.getAndIncrement();
while (!flags[myslot % n]) {};
flags[myslot % n] = false;}
public unlock() { flags[(myslot+1) % n] = true;} Prepare slot for re-use
Art of Multiprocessor Programming 109
Anderson Queue Lock
public lock() {
mySlot = next.getAndIncrement();
while (!flags[mySlot % n]) {};
flags[mySlot % n] = false;}
public unlock() { flags[(mySlot+1) % n] = true;}
Tell next thread to go
Art of Multiprocessor Programming 110
released
Local Spinning
flags
next
T F F F F F F F
acquiredSpin on my bit
Unfortunately many bits share cache line
111
released
False Sharing
flags
next
T F F F F F F F
acquiredSpin on my bit
Line 1 Line 2
Spinning thread gets cache
invalidation on account of store by threads it is not waiting for
Result: contention
Art of Multiprocessor Programming
112
released
The Solution: Padding
flags
next
T / / / F / / /
acquired
Line 1 Line 2Art of Multiprocessor Programming
Spin on my line
Art of Multiprocessor Programming 113
Performance
• Shorter handover than backoff
• Curve is practically flat• Scalable performance
queue
TTAS
Art of Multiprocessor Programming 114
Anderson Queue Lock
Good– First truly scalable lock
– Simple, easy to implement
– Back to FIFO order (like Bakery)
Art of Multiprocessor Programming 115
Anderson Queue LockBad
– Space hog…
– One bit per thread one cache line per thread• What if unknown number of threads?• What if small number of actual
contenders?
Art of Multiprocessor Programming 116
CLH Lock
• FIFO order
• Small, constant-size overhead per thread
Art of Multiprocessor Programming 117
Initially
false
tail
idle
Art of Multiprocessor Programming 118
Initially
false
tail
idle
Queue tail
Art of Multiprocessor Programming 119
Initially
false
tail
idle
Lock is free
Art of Multiprocessor Programming 120
Initially
false
tail
idle
Art of Multiprocessor Programming 121
Purple Wants the Lock
false
tail
acquiring
Art of Multiprocessor Programming 122
Purple Wants the Lock
false
tail
acquiring
true
Art of Multiprocessor Programming 123
Purple Wants the Lock
false
tail
acquiring
true
Swap
Art of Multiprocessor Programming 124
Purple Has the Lock
false
tail
acquired
true
Art of Multiprocessor Programming 125
Red Wants the Lock
false
tail
acquired acquiring
true true
Art of Multiprocessor Programming 126
Red Wants the Lock
false
tail
acquired acquiring
true
Swap
true
Art of Multiprocessor Programming 127
Red Wants the Lock
false
tail
acquired acquiring
true true
Art of Multiprocessor Programming 128
Red Wants the Lock
false
tail
acquired acquiring
true true
Art of Multiprocessor Programming 129
Red Wants the Lock
false
tail
acquired acquiring
true true
ImplicitLinked list
Art of Multiprocessor Programming 130
Red Wants the Lock
false
tail
acquired acquiring
true true
Art of Multiprocessor Programming 131
Red Wants the Lock
false
tail
acquired acquiring
true true
trueActually, it spins on cached copy
Art of Multiprocessor Programming 132
Purple Releases
false
tail
release acquiring
false true
false Bingo!
Art of Multiprocessor Programming 133
Purple Releases
tail
released acquired
true
Art of Multiprocessor Programming 134
Space Usage
• Let– L = number of locks– N = number of threads
• ALock– O(LN)
• CLH lock– O(L+N)
Art of Multiprocessor Programming 135
CLH Queue Lock
class Qnode {
AtomicBoolean locked =
new AtomicBoolean(true);
}
Art of Multiprocessor Programming 136
CLH Queue Lock
class Qnode {
AtomicBoolean locked =
new AtomicBoolean(true);
}
Not released yet
Art of Multiprocessor Programming 137
CLH Queue Lockclass CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Art of Multiprocessor Programming 138
CLH Queue Lockclass CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}Queue tail
Art of Multiprocessor Programming 139
CLH Queue Lockclass CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}Thread-local Qnode
Art of Multiprocessor Programming 140
CLH Queue Lockclass CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Swap in my node
Art of Multiprocessor Programming 141
CLH Queue Lockclass CLHLock implements Lock {
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode
= new Qnode();
public void lock() {
Qnode pred
= tail.getAndSet(myNode);
while (pred.locked) {}
}}
Spin until predecessorreleases lock
Art of Multiprocessor Programming 142
CLH Queue LockClass CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Art of Multiprocessor Programming 143
CLH Queue LockClass CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Notify successor
Art of Multiprocessor Programming 144
CLH Queue LockClass CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
Recycle predecessor’s node
Art of Multiprocessor Programming 145
CLH Queue LockClass CLHLock implements Lock {
…
public void unlock() {
myNode.locked.set(false);
myNode = pred;
}
}
(we don’t actually reuse myNode. Code in book shows how it’s done.)
Art of Multiprocessor Programming 146
CLH Lock
• Good– Lock release affects predecessor only– Small, constant-sized space
• Bad– Doesn’t work for uncached NUMA
architectures
Art of Multiprocessor Programming 147
NUMA Architecturs
• Acronym:– Non-Uniform Memory Architecture
• Illusion:– Flat shared memory
• Truth:– No caches (sometimes)– Some memory regions faster than others
Art of Multiprocessor Programming 148
NUMA Machines
Spinning on local memory is fast
Art of Multiprocessor Programming 149
NUMA Machines
Spinning on remote memory is slow
Art of Multiprocessor Programming 150
CLH Lock
• Each thread spins on predecessor’s memory
• Could be far away …
Art of Multiprocessor Programming 151
MCS Lock
• FIFO order
• Spin on local memory only
• Small, Constant-size overhead
Art of Multiprocessor Programming 152
Initially
falsefalse
idle
tail
Art of Multiprocessor Programming 153
Acquiring
falsefalse
true
acquiring
(allocate Qnode)
tail
Art of Multiprocessor Programming 154
Acquiring
false
tail false
true
acquired
swap
Art of Multiprocessor Programming 155
Acquiring
false
tail false
true
acquired
Art of Multiprocessor Programming 156
Acquired
false
tail false
true
acquired
Art of Multiprocessor Programming 157
Acquiring
tail
false
acquiredacquiring
trueswap
Art of Multiprocessor Programming 158
Acquiring
tail
acquiredacquiring
true
false
Art of Multiprocessor Programming 159
Acquiring
tail
acquiredacquiring
true
false
Art of Multiprocessor Programming 160
Acquiring
tail
acquiredacquiring
true
false
Art of Multiprocessor Programming 161
Acquiring
tail
acquiredacquiring
true
true
false
Art of Multiprocessor Programming 162
Acquiring
tail
acquiredacquiring
true
trueYes!
false
Art of Multiprocessor Programming 163
MCS Queue Lock
class Qnode {
boolean locked = false;
qnode next = null;
}
Art of Multiprocessor Programming 164
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Art of Multiprocessor Programming 165
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Make a QNode
Art of Multiprocessor Programming 166
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
add my Node to the tail of queue
Art of Multiprocessor Programming 167
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Fix if queue was non-empty
Art of Multiprocessor Programming 168
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void lock() {
Qnode qnode = new Qnode();
Qnode pred = tail.getAndSet(qnode);
if (pred != null) {
qnode.locked = true;
pred.next = qnode;
while (qnode.locked) {}
}}}
Wait until unlocked
Art of Multiprocessor Programming 169
MCS Queue Unlockclass MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Art of Multiprocessor Programming 170
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Missingsuccessor
?
Art of Multiprocessor Programming 171
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
If really no successor, return
Art of Multiprocessor Programming 172
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference tail;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Otherwise wait for successor to catch up
Art of Multiprocessor Programming 173
MCS Queue Lockclass MCSLock implements Lock {
AtomicReference queue;
public void unlock() {
if (qnode.next == null) {
if (tail.CAS(qnode, null)
return;
while (qnode.next == null) {}
}
qnode.next.locked = false;
}}
Pass lock to successor
Art of Multiprocessor Programming 174
Purple Release
false
releasing swap
false
Art of Multiprocessor Programming 175
Purple Release
false
releasing swap
false
By looking at the queue, I see another thread is active
Art of Multiprocessor Programming 176
Purple Release
false
releasing swap
false
I have to wait for that thread to finish
By looking at the queue, I see another thread is active
Art of Multiprocessor Programming 177
Purple Release
false
releasing prepare to spin
true
Art of Multiprocessor Programming 178
Purple Release
false
releasing spinning
true
Art of Multiprocessor Programming 179
Purple Release
false
releasing spinning
truefalse
Art of Multiprocessor Programming 180
Purple Release
false
releasing
true
Acquired lock
false
Art of Multiprocessor Programming 181
Abortable Locks
• What if you want to give up waiting for a lock?
• For example– Timeout– Database transaction aborted by user
Art of Multiprocessor Programming 182
Back-off Lock
• Aborting is trivial– Just return from lock() call
• Extra benefit:– No cleaning up– Wait-free– Immediate return
Art of Multiprocessor Programming 183
Queue Locks
• Can’t just quit– Thread in line behind will starve
• Need a graceful way out
Art of Multiprocessor Programming 184
Queue Locks
spinning
true
spinning
truetrue
spinning
Art of Multiprocessor Programming 185
Queue Locks
spinning
true
spinning
truefalse
locked
Art of Multiprocessor Programming 186
Queue Locks
spinning
true
locked
false
Art of Multiprocessor Programming 187
Queue Locks
locked
false
Art of Multiprocessor Programming 188
Queue Locks
spinning
true
spinning
truetrue
spinning
Art of Multiprocessor Programming 189
Queue Locks
spinning
truetruetrue
spinning
Art of Multiprocessor Programming 190
Queue Locks
spinning
truetruefalse
locked
Art of Multiprocessor Programming 191
Queue Locks
spinning
truefalse
Art of Multiprocessor Programming 192
Queue Locks
pwned
truefalse
Art of Multiprocessor Programming 193
Abortable CLH Lock
• When a thread gives up– Removing node in a wait-free way is hard
• Idea:– let successor deal with it.
Art of Multiprocessor Programming 194
Initially
tail
idlePointer to
predecessor (or null)
A
Art of Multiprocessor Programming 195
Initially
tail
idleDistinguished available node means lock is
free
A
Art of Multiprocessor Programming 196
Acquiring
tail
acquiring
A
Art of Multiprocessor Programming 197
Acquiringacquiring
A
Null predecessor means lock not
released or aborted
Art of Multiprocessor Programming 198
Acquiringacquiring
A
Swap
Art of Multiprocessor Programming 199
Acquiringacquiring
A
Art of Multiprocessor Programming 200
Acquiredlocked
A
Reference to AVAILABLE means
lock is free.
spinningspinninglocked
Art of Multiprocessor Programming 201
Normal Case
Null means lock is not free & request
not aborted
Art of Multiprocessor Programming 202
One Thread Aborts
spinningTimed outlocked
Art of Multiprocessor Programming 203
Successor Notices
spinningTimed outlocked
Non-Null means predecessor
aborted
Art of Multiprocessor Programming 204
Recycle Predecessor’s Node
spinninglocked
Art of Multiprocessor Programming 205
Spin on Earlier Node
spinninglocked
Art of Multiprocessor Programming 206
Spin on Earlier Node
spinningreleased
A
The lock is now mine
Art of Multiprocessor Programming 207
Time-out Lockpublic class TOLock implements Lock {
static Qnode AVAILABLE
= new Qnode();
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode;
Art of Multiprocessor Programming 208
Time-out Lockpublic class TOLock implements Lock {
static Qnode AVAILABLE
= new Qnode();
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode;
AVAILABLE node signifies free lock
Art of Multiprocessor Programming 209
Time-out Lockpublic class TOLock implements Lock {
static Qnode AVAILABLE
= new Qnode();
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode;
Tail of the queue
Art of Multiprocessor Programming 210
Time-out Lockpublic class TOLock implements Lock {
static Qnode AVAILABLE
= new Qnode();
AtomicReference<Qnode> tail;
ThreadLocal<Qnode> myNode;
Remember my node …
Art of Multiprocessor Programming 211
Time-out Lockpublic boolean lock(long timeout) {
Qnode qnode = new Qnode();
myNode.set(qnode);
qnode.prev = null;
Qnode myPred = tail.getAndSet(qnode);
if (myPred== null
|| myPred.prev == AVAILABLE) {
return true;
}
…
Art of Multiprocessor Programming 212
Time-out Lockpublic boolean lock(long timeout) {
Qnode qnode = new Qnode();
myNode.set(qnode);
qnode.prev = null;
Qnode myPred = tail.getAndSet(qnode);
if (myPred == null
|| myPred.prev == AVAILABLE) {
return true;
}
Create & initialize node
Art of Multiprocessor Programming 213
Time-out Lockpublic boolean lock(long timeout) {
Qnode qnode = new Qnode();
myNode.set(qnode);
qnode.prev = null;
Qnode myPred = tail.getAndSet(qnode);
if (myPred == null
|| myPred.prev == AVAILABLE) {
return true;
}
Swap with tail
Art of Multiprocessor Programming 214
Time-out Lockpublic boolean lock(long timeout) {
Qnode qnode = new Qnode();
myNode.set(qnode);
qnode.prev = null;
Qnode myPred = tail.getAndSet(qnode);
if (myPred == null
|| myPred.prev == AVAILABLE) {
return true;
}
...If predecessor absent or
released, we are done
Art of Multiprocessor Programming 215
Time-out Lock…
long start = now();
while (now()- start < timeout) {
Qnode predPred = myPred.prev;
if (predPred == AVAILABLE) {
return true;
} else if (predPred != null) {
myPred = predPred;
}
}
…
spinningspinninglocked
Art of Multiprocessor Programming 216
Time-out Lock…
long start = now();
while (now()- start < timeout) {
Qnode predPred = myPred.prev;
if (predPred == AVAILABLE) {
return true;
} else if (predPred != null) {
myPred = predPred;
}
}
…
Keep trying for a while …
Art of Multiprocessor Programming 217
Time-out Lock…
long start = now();
while (now()- start < timeout) {
Qnode predPred = myPred.prev;
if (predPred == AVAILABLE) {
return true;
} else if (predPred != null) {
myPred = predPred;
}
}
…
Spin on predecessor’s prev field
Art of Multiprocessor Programming 218
Time-out Lock…
long start = now();
while (now()- start < timeout) {
Qnode predPred = myPred.prev;
if (predPred == AVAILABLE) {
return true;
} else if (predPred != null) {
myPred = predPred;
}
}
…Predecessor released lock
Art of Multiprocessor Programming 219
Time-out Lock…
long start = now();
while (now()- start < timeout) {
Qnode predPred = myPred.prev;
if (predPred == AVAILABLE) {
return true;
} else if (predPred != null) {
myPred = predPred;
}
}
…
Predecessor aborted, advance one
Art of Multiprocessor Programming 220
Time-out Lock…
if (!tail.compareAndSet(qnode, myPred))
qnode.prev = myPred;
return false;
}
}
What do I do when I time out?
Art of Multiprocessor Programming 221
Time-out Lock…
if (!tail.compareAndSet(qnode, myPred))
qnode.prev = myPred;
return false;
}
}
Do I have a successor?If CAS fails, I do.
Tell it about myPred
Art of Multiprocessor Programming 222
Time-out Lock…
if (!tail.compareAndSet(qnode, myPred))
qnode.prev = myPred;
return false;
}
}
If CAS succeeds: no successor, simply return false
Art of Multiprocessor Programming 223
Time-Out Unlockpublic void unlock() {
Qnode qnode = myNode.get();
if (!tail.compareAndSet(qnode, null))
qnode.prev = AVAILABLE;
}
Art of Multiprocessor Programming 224
public void unlock() {
Qnode qnode = myNode.get();
if (!tail.compareAndSet(qnode, null))
qnode.prev = AVAILABLE;
}
Time-out Unlock
If CAS failed:successor exists,notify it can enter
Art of Multiprocessor Programming 225
public void unlock() {
Qnode qnode = myNode.get();
if (!tail.compareAndSet(qnode, null))
qnode.prev = AVAILABLE;
}
Timing-out Lock
CAS successful: set tail to null, no clean up since no
successor waiting
Art of Multiprocessor Programming 226
One Lock To Rule Them All?
• TTAS+Backoff, CLH, MCS, ToLock…
• Each better than others in some way
• There is no one solution
• Lock we pick really depends on:– the application– the hardware– which properties are important
Art of Multiprocessor Programming 227
This work is licensed under a Creative Commons Attribution-ShareAlike 2.5 License.
• You are free:– to Share — to copy, distribute and transmit the work – to Remix — to adapt the work
• Under the following conditions:– Attribution. You must attribute the work to “The Art of
Multiprocessor Programming” (but not in any way that suggests that the authors endorse you or your use of the work).
– Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under the same, similar or a compatible license.
• For any reuse or distribution, you must make clear to others the license terms of this work. The best way to do this is with a link to– http://creativecommons.org/licenses/by-sa/3.0/.
• Any of the above conditions can be waived if you get permission from the copyright holder.
• Nothing in this license impairs or restricts the author's moral rights.