Consistency Oblivious Programming
description
Transcript of Consistency Oblivious Programming
Consistency Oblivious Programming
Hillel AvniTel Aviv University
Agenda Transactional Memory and Locking
Consistency Oblivious Programming (COP)
COP with STM
COP With HTM
Future Work
2
Global Lock
Easy to use
Composable - Concatenate critical sections
Not scalable
3
Fine Grain Locking
Hard to use
Not Composable
Scalable
Lazy linked list is a good example…
4
Lazy Traversal
b d ea
add(c) Aha!
5
Lock and Validate
b d ea
add(c) Yes, b still points to d
6
Perform Updates and Release Locks
b d ea
add(c)
c
7
Transactional Memory
Easy to use
Composable
Scalable
How is it done?
8
9
Java (Duece)bool CAS(int location, int expected, int new val){ atomic { if (location != expected) return false; location = new val; } return true;}
10
bool CAS(int location, int expected, int new val){ __transaction_atomic { if (location != expected) return false; location = new val; } return true;}
C/C++ (GCC-4.7)
1111
Software Transactional Memory
Different algorithms are used. Different algorithms are used.
consistency checkingconsistency checking
rollbackrollback
Compiler recognizes shared accesses.
Compiler recognizes shared accesses.
STM Problem - Overheadtemplate <typename V> static V load(const V* addr, ls_modifier mod)
{
if (unlikely(mod == RfW))
{
pre_write(addr, sizeof(V));
return *addr;
}
if (unlikely(mod == RaW))
return *addr;
gtm_thread *tx = gtm_thr();
gtm_rwlog_entry* log = pre_load(tx, addr, sizeof(V));
V v = *addr;
atomic_thread_fence(memory_order_acquire);
post_load(tx, log);
return v;
}
load function from GCC 4.8.1load function from GCC 4.8.1
12
STM Problem - Overhead static gtm_rwlog_entry* pre_load(gtm_thread *tx, const void* addr, size_t len)
{
size_t log_start = tx->readlog.size();
gtm_word snapshot = tx->shared_state.load(memory_order_relaxed);
gtm_word locked_by_tx = ml_mg::set_locked(tx);
size_t orec = ml_mg::get_orec(addr);
size_t orec_end = ml_mg::get_orec_end(addr, len);
do
{
gtm_word o = o_ml_mg.orecs[orec].load(memory_order_acquire);
if (likely (!ml_mg::is_more_recent_or_locked(o, snapshot))) {
success:
gtm_rwlog_entry *e = tx->readlog.push();
e->orec = o_ml_mg.orecs + orec; e->value = o;
}
else if (!ml_mg::is_locked(o)) {snapshot = extend(tx); goto success; } else {
if (o != locked_by_tx)
tx->restart(RESTART_LOCKED_READ);}
orec = o_ml_mg.get_next_orec(orec); }
while (orec != orec_end);
return &tx->readlog[log_start];
}
load always call pre_loadload always call pre_load
13
STM Problem - Overhead
static void post_load(gtm_thread *tx, gtm_rwlog_entry* log)
{
for (gtm_rwlog_entry *end = tx->readlog.end(); log != end; log++)
{
gtm_word o = log->orec->load(memory_order_relaxed);
if (log->value != o)
tx->restart(RESTART_VALIDATE_READ);
}
} and post_loadand post_load
Compare to mov eax, [ebx]on x86
Compare to mov eax, [ebx]on x86
14
1515
Hardware Transactional Memory
Exploit native cache coherenceExploit native cache coherence
consistency checkingconsistency checking
rollbackrollback
1616
HTM Problem – Resources
limitslimits
cache size limits data footprintcache size limits data footprint
A transaction cannot commit if it isA transaction cannot commit if it is
too bigtoo big
too slowtoo slow
quantum size limits durationquantum size limits duration
1717
All TM Problem – False Conflicts
Any address that was encountered during the transaction is monitored until the endof that transaction.
An address may abort a transaction long After it is not relevant…
Any address that was encountered during the transaction is monitored until the endof that transaction.
An address may abort a transaction long After it is not relevant…
Agenda Transactional Memory and Locking
Consistency Oblivious Programming (COP)
COP with STM
COP With HTM
Future Work
18
COP Operation
• In non transactional mode:– Execute the read-only prefix of the
operation and record its output.
• In transactional mode:– Verify output is correct.– Perform updates.
19
COP Example – RB Tree
20
3010
27 40
2528
20
Add 26 – Tree Unbalanced
20
3010
40
TM Search 26TM Search 26
27
2528
2621
Tree Balanced
27
3020
2510
2840
26
TM Search continues from 27TM Search continues from 27
Conflict and AbortConflict and Abort
22
Add 26 – Tree Unbalanced
20
3010
40
COP Search 26COP Search 26
27
2528
2623
Tree Balanced
27
3020
2510
2840
26
TM Search continues from 27TM Search continues from 27
FoundFound
24
COP RB-Tree VerifyTo facilitate verification:
• all nodes in the RB-Tree are connected in a successor-predecessor doubly linked list, and each node has a live mark.
• Search returns a node n with k or a leaf with k’s successor or predecessor.
25
COP RB-Tree Suffix• Resume a transaction
• Verify:– k found and n is live – done.– K not found, check:
• (n.k>k>n.pred.k && !n.right) or (n.k<k<n.succ.k && !n.left)
• If verification failed – abort the transaction.
• Complete updates, add / remove / rebalance, using n.
26
COP Template for opstart-transaction
any-code
suspend-transaction
output = op-rop();
resume-transaction
If(not(op-verify(output)))
abort-transaction
op-complete(output)
any-code
end-transaction
27
COP CorrectnessThe underlying TM:• Transactional Regular Registers
The COP algorithm:• Obliviousness• Verifiability• Separation
We prove that if the TM yields transactional regular registers, and the COP algorithm demonstrates obliviousness, verifiability, and separation, than the COP operation is linearizeable.
28
Agenda Transactional Memory and Locking
Consistency Oblivious Programming (COP)
COP with STM
COP With HTM
Future Work
29
STM Algorithm• GCC default STM algorithm is the one that proved to
be the most efficient and scalable in most scenarios:– Write Through (WT)– Encounter Time Locking (ETL)– Multi Lock (ML)
30
STM: WT – ETL - ML
1. RV Shared Version Clock2. On Read: check unlocked and
v# <= RV then add to read-Set3. On write: check v# <= RV, lock,
and add to undo-Set4. WV = F&I(VClock)5. Validate that in the read-set
each v# <= RV6. Release locks with v# WV
100 Shared Version Clock
87 0 87 0
34 0
88 0
44 0
V# 0
34 0
99 0 99 0
50 0 50 0
Mem Locks
87 0
34 0
99 0
50 0
34 1
99 1
87 0
X
Y
Commit
121 0
121 0
50 0
87 0
121 0
88 0
V# 0
44 0
V# 0
121 0
50 0
100 RV
100120121
X
Y
31
GCC Constructs__transaction_atomic{}: Mark the transaction.
__transaction_cancel: Explicit abort.
__attribute__((transaction_safe)): Instrument the code.
__attribute__((transaction_pure)):
Do not instrument the code. We will show this attribute can be used efficiently as __transaction_suspend with WT – ETL – ML default STM algorithm in GCC.
32
pure = suspend • Transactional Regular Registers – All values upto
one architecture-word size are written and read atomically. The rollback may use memcpy, but the memcpy is optimized to write maximal alignment.
• Now we will compare the future Power architecture HTM suspended mode, to transaction_pure with WT-ETL-ML STM algorithm.
33
Power tsuspend - tresume1. Until failure occurs, load instructions that access
memory locations that were transactionally written by the same thread will return the transactionally written data.
2. In the event of transaction failure, failure recording is performed, but failure handling is deferred until transactional execution is resumed.
3. The initiation of a new transaction is prevented.
4. Store instructions that access memory locations that have been accessed transactionally (due to load or store) by the same thread will cause the transaction to fail.
34
RB – 1M sz – 20%U - 10 op/tx
35
RB – 1K sz – 8 Threads – 20% U
36
Agenda Transactional Memory and Locking
Consistency Oblivious Programming (COP)
COP with STM
COP With HTM
Future Work
37
Haswell HTM with COPThere is no suspend mode, so to compose COP
operations, we execute all ROP before the transaction. This limits the composition to one writing COP operation in a transaction at most.
38
Capacity and Cache AssociativityPacked Memory Array (PMA) search is done by divide
and conquer. Assume a PMA size is 0x800000, and it starts at address 0. A searches for an item that is found in address 0x0…0x7FFF, must go through the addresses:
0x400000 0x20000 0x100000 0x80000
0x40000 0x20000 0x10000 0x8000
As cache size in Haswell is 0x8000, all these addresses have the same cache index (0), and will always abort.
39
PMA
40
RB-Tree Capacity Aborts
41
RB-Tree Conflict Aborts
42
Agenda Transactional Memory and Locking
Consistency Oblivious Programming (COP)
COP with STM
COP With HTM
Future Work
43
Data StructuresWe already have COP versions of:• RB-Tree• Linked list• PMA• Cache Oblivious B-Tree• Leaplist (k-ary skip list, tailored for range queries)
Can we design more COP data structures?
44
ApplicationsUse COP in applications.
Many applications use shared data structures, so it is interesting to see the impact of COP on their performance.
45
InfrastructureAdd statistics (transactional accesses, conflicts) to GCC.
Add real suspend-mode to GCC, hardware.
46
TheoryHow to make transformation to COP automatic?
Is COP applicable outside the data-structures area?
Bounds on the amount of transactional accesses?
Bounds on the amount of false conflicts?
47
Thank You