Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.
-
Upload
marion-wells -
Category
Documents
-
view
215 -
download
0
Transcript of Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.
![Page 1: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/1.jpg)
Software Transactional Memory
TiC 2010
Adam Welc
Programming Systems LabIntel Labs
![Page 2: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/2.jpg)
2
Agenda
Part 1: STM Overview– Introduction– Language Constructs and Semantics– Design space
Part 2: STM Implementation– Runtime– Compiler– Performance
![Page 3: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/3.jpg)
3
Concurrent Programming Today
•Mutual exclusion locks (Java monitors, pthread locks etc.) used for concurrency control– Coarse-grained locking limits concurrency– Fine-grained locking is hard: composability,
possibility of deadlocks, etc.
•Transactional Memory (TM) offers an alternative
![Page 4: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/4.jpg)
4
Designing Map Structure
•Operations
T1
m.get(k);
T2
m.put(k,v);
T3
m.remove(k);
get (Key k)put (Key k, Value v)remove (Key k)
{ seqGet(k); }{ seqPut(k, v); }{ seqRemove(k); }
• How to make it thread-safe?
![Page 5: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/5.jpg)
5
ConcurrentMap Classsynchronized
Value get(Key k) {
return seqGet(k);
}
synchronized
void put(Key k, Value v) {
seqVal(k, v);
}
synchronized
void remove(Key k) {
seqRemove(k);
}
What if workload
mostly read-only?
![Page 6: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/6.jpg)
6
Refined ConcurrentMap Class
Value get(Key k) {
// try unsynchronized
Value tmp = seqGet(k);
if (tmp != null) return tmp;
else synchronized(this) {
// possible interference
return seqGet(k);
} }
void put(Key k, Value v) {
synchronized(this) {
seqPut(k, v);
} }
void remove(Key k) {
synchronized(this) {
seqRemove(k);
} }
![Page 7: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/7.jpg)
7
Actual Code
public Object get(Object key) { int hash = hash(key); // Try first without locking... Entry[] tab = table; int index = hash & (tab.length - 1); Entry first = tab[index]; Entry e;
for (e = first; e != null; e = e.next) { if (e.hash == hash && eq(key, e.key)) { Object value = e.value; if (value != null) return value; else break; } }…
… // Recheck under synch if key not there or interference Segment seg = segments[hash & SEGMENT_MASK]; synchronized(seg) { tab = table; index = hash & (tab.length - 1); Entry newFirst = tab[index]; if (e != null || first != newFirst) { for (e = newFirst; e != null; e = e.next) { if (e.hash == hash && eq(key, e.key)) return e.value; } } return null; } }
DO YOU REALLY
WANT TO WRITE
THIS KIND OF CODE?
![Page 8: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/8.jpg)
8
Composition
•Simple concurrent accesses work
•Consider concurrent value deposit
int v1 = map.get(k);
v1 += 10;
map.put(k, v1);
synchronized(map) {
}
Back to coarse-grained locking
T1 T2
map.get(k) == 100
int v2 = map.get(k);
v2 += 20;
map.put(k, v2);
synchronized(map) {
}
== 100== 100
== 120
== 120
== 110
== 110
IS LOST
![Page 9: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/9.jpg)
9
TM Approach
Let TM system take care of the rest
get (Key k)put (Key k, Value v)remove (Key k)
{ __tm_atomic { seqGet(k); }}{ __tm_atomic { seqPut(k, v); }}{ __tm_atomic { seqRemove(k); }}
int v = map.get(k);v += amount;map.put(k, v);
__tm_atomic {
}
![Page 10: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/10.jpg)
10
Agenda
Part 1: STM Overview– Introduction– Language Constructs and Semantics– Design space
Part 2: STM Implementation– Runtime– Compiler– Performance
![Page 11: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/11.jpg)
11
Managed vs. Unmanaged STM
• Same core semantics and language constructs (and algorithms)
• Managed (e.g. Java, .NET)– Controlled execution of native code– Dynamic compilation
• Unmanaged (e.g. C, C++)– Problem with legacy binaries– Have to know upfront if code executed
transactionally
![Page 12: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/12.jpg)
12
Atomic Blocks == Transactions
•Originally a database concept
•Transactional executions– Atomic– Consistent– Isolated– Durable
serial
serializable
Serializable – appearance of serial
![Page 13: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/13.jpg)
13
Serial Execution
T1 T2
__tm_atomic { int tmp1 = x;
int tmp2 = y;}
__tm_atomic { x = 42;
y = 42;}
int x = 0; int y = 0;
== 42
== 42
== 0
== 0
BOTH RESULTS CORRECT
![Page 14: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/14.jpg)
14
Serializable Execution
T1 T2
__tm_atomic { int tmp1 = x;
int tmp2 = y;}
__tm_atomic { x = 42;
y = 42;}
int x = 0; int y = 0;
== 42
== 42
== 42
== 42
BOTH RESULTS THE SAME DESPITE
INTERLEAVING
![Page 15: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/15.jpg)
15
Non-Serializable Execution
T1 T2
__tm_atomic { int tmp1 = x;
}
__tm_atomic { x = 42;
int x = 0; int y = 0;
== 42
== 42
== 0== 42
int tmp2 = y;
y = 42;}
DIFFERENT FROM ANY
SERIAL
TM’s role is to “fix” conflicting executions
ROLL BACK
! CONFLICT !
![Page 16: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/16.jpg)
16
Transaction Nesting
•Required for composability
•Open nesting– Results exposed upon inner transaction commit– Compensating actions used upon outer
transaction abort– May lead to serializability violations
•Closed nesting– Computation results exposed only upon
outermost transaction commit– Transactions can be flattened - inner
transaction is semantically a no-op
![Page 17: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/17.jpg)
17
Open Nesting
• Conditional can be entered after inner commit
__tm_atomic {
__tm_atomic { inc(); }
}
__tm_atomic { if (x == 1) { … }}
void inc() { x++; }void dec() { x--; }
int x = 0;
// register dec()
dec();
T1 T2
• Effect is undone but T2 has seen the result!
![Page 18: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/18.jpg)
18
Closed Nesting
• Conditional can be entered only after outermost commit
__tm_atomic {
__tm_atomic { inc(); }
}
__tm_atomic { if (x == 1) { … }}
void inc() { x++; }void dec() { x--; }
int x = 0;T1 T2
![Page 19: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/19.jpg)
19
Flatten Or Not To Flatten?
__tm_atomic {
…
…
}
__tm_atomic {
}
potential conflict
ROLL BACK
ROLL BACK
![Page 20: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/20.jpg)
More on Execution Semantics
• Transactions are serializable, but
• The notion comes from database world where all actions are transactional
• What about non-transactional code?
20
![Page 21: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/21.jpg)
Problematic Behavior
T1 T2
__tm_atomic { if (p != NULL)
tmp = *p;}
Should this behavior be allowed? Yes: This program is buggy, p = null should be inside a
transaction No: Transactions should be atomic no matter what
p = null;true
int * p = &x;
NULL POINTER
== null
21
![Page 22: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/22.jpg)
Two Points of View on Atomicity
•Weak atomicity – Transactions serializable with respect to other
transactions
•Strong atomicity– Transactions serializable with respect to all
memory accesses
WEAK ATOMICITY
STRENGTH
STRONG ATOMICITY
22
![Page 23: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/23.jpg)
Weak Atomicity
• Non-transactional accesses bypass STM access protocol– Non-transactional code remains un-instrumented– Most STMs behave this way
• Requires segregation of transactional and non-transactional data– Hard to enforce
• Otherwise – behavior depends on implementation – Unexpected results can be observed
23
![Page 24: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/24.jpg)
Non-Repeatable Read
T1 T2
__tm_atomic { tmp1 = x;
tmp2 = x;}
•Non-txn code can affect transactional computation
x = 42;
int x = 0;
== 42
== 42
== 0
== 0
tmp1 == tmp2tmp1 != tmp2
24
![Page 25: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/25.jpg)
Dirty Read
T1 T2
__tm_atomic { x++;
x++;}
•Txn code can leak intermediate results to non-transactional computation
tmp = x;
int x = 0;
tmp is eventmp is odd
== 0
== 1
== 2
== 1
25
![Page 26: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/26.jpg)
Strong Atomicity
•Non-transactional accesses turned into micro-transactions– Reads and writes block until write gets
committed– Interleaved writes can invalidate a transaction
•Avoids all undesirable behaviors of weak atomicity, but
•All code needs to be instrumented
26
![Page 27: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/27.jpg)
Non-Repeatable Read
T1 T2
__tm_ atomic { tmp1 = x;
tmp2 = x;}
•Write by T2 invalidates T1’s transaction
__tm_atomic { x = 42;}
int x = 0;
== 0
ROLL BACK
27
![Page 28: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/28.jpg)
Dirty Read
T1 T2
atomic { x++;
x++;}
•Blocking effectively reschedules and serializes non-transactional operations
__tm_atomic { tmp = x;}
int x = 0;
== 2
BLOCK== 1
== 2
28
![Page 29: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/29.jpg)
Are We Done?
•Overhead of strong atomicity can be huge (up to 10x slowdown)
•Non-txn code instrumentation may be problematic (precompiled libraries, system calls, etc.)
•Can we find an in-between solution?
WEAK ATOMICITY
STRENGTH
STRONG ATOMICITY
SGLA
29
![Page 30: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/30.jpg)
Single Global Lock Atomicity
• Transactions execute as if protected by a single global lock
__tm_atomic { synchronized(m) {
S; S;
} }
•Matches intuition of weakly atomic STM– Transactions are serialized w.r.t. each other– And, no surprises compared to locks
• STM must provide additional guarantees– Consistency– Privatization safety
30
![Page 31: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/31.jpg)
31
Consistency
__tm_atomic {
__tm_atomic {
int t1 = x;
…
int t2 = x;
if (t1 != t2)
*ptr = x;
}
lock(mutex);
int t1 = x;
…
int t2 = x;
if (t1 != t2)
*ptr = x;
unlock(mutex);
x=y;
}
lock(mutex);
x=y;
unlock(mutex);
int *ptr = NULL;
int x = 0; int y = 1
NULL POINTER
T1 T2
== 1
== 1
== 0
// cannot happen
![Page 32: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/32.jpg)
32
Privatization Safety
__tm_atomic { t1 = head; if (t1)
__tm_atomic { t2 = head; head = t2->next; t2->next = NULL;}priv = t2->x;…assert (priv == t2->y);
lock(mutex); t2 = head; head = t2->next; t2->next = NULL;unlock(mutex);priv = t2->x;…assert (priv == t2->y);
t1->x = t1->y = 1;}
lock(mutex); t1 = head; if (t1)
t1->x = t1->y = 1;unlock(mutex);
T1 T2
0
0
x
y
next
head
t1
t2 1
1
= NULL;
== 1
== 1== 1
== 0
![Page 33: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/33.jpg)
33
Agenda
Part 1: STM Overview– Introduction– Language Constructs and Semantics– Design space
Part 2: STM Implementation– Runtime– Compiler– Performance
![Page 34: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/34.jpg)
34
Transactional Execution Modes
Optimistic Pessimistic
Lock data on write (exclusive write locks)
Record reads
Release write locks and validate reads on commit
Lock data on write (exclusive write locks)
Lock data on read (shared read locks)
Release read and write locks on commit
Pros Cache effects
No read locking cost
Privatization-safety and consistency for free
Filtering
Cons Providing privatization and consistency incurs extra cost
No filtering
Cache effects
Additional read locking cost
•Obstinate – pessimistic transaction that wins all conflicts
![Page 35: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/35.jpg)
35
Write Buffering vs. In-Place Update
Write Buffering
(a.k.a. Lazy Versioning)
In-Place Update
(a.k.a. Eager Versioning)
Write to private buffer
Copy to memory on commit
Lazy Locking (acquire locks on commit) or Eager Locking (acquire locks on access)
Directly write shared memory
Record old values in a undo log
Eager Locking: acquire write-locks on write
Pros Fast abort Fast commit
Direct reads
Cons Slow commit
Reads have to search buffer
Slow abort
![Page 36: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/36.jpg)
36
Conflict Detection Granularity
class Foo { int x; int y;}
object-based(Java/C#)
word-based(cacheline-based)
(C/C++)
struct Foo { int x; int y;}
y
x
metadata
vtbl
metadata
metadata
metadata
metadata
metadata
y
x
Owner Table
…… …
… …
![Page 37: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/37.jpg)
37
Agenda
Part 1: STM Overview– Introduction– Language Constructs and Semantics– Design space
Part 2: STM Implementation– Runtime– Compiler– Performance
![Page 38: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/38.jpg)
38
Intel C/C++ STM http://whatif.intel.com (NEW RELEASE IN Q3 2010)•Based on Intel’s product compiler
•Features• Consistency and privatization safety preserving close-nested
atomic blocks (__tm_atomic) to support SGLA semantics
• User abort (__tm_abort) for failure atomicity
• Transaction retry (__tm_retry) for condition synchronization
• Multiple transactional execution modes: optimistic and pessimistic STM, obstinate
• Serial execution mode (for I/O and calls to legacy binaries)
• TM support for C++ : virtual functions, (multiple) inheritance, function and class templates, exceptions
![Page 39: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/39.jpg)
39
System Architecture
transactional C/C++
Intel C/C++ compiler
multicore system
C/C++ support
APPLICATION
LANGUAGESUPPORT
TMRUNTIME
HARDWARE
![Page 40: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/40.jpg)
Runtime Overview
• In-place updates
• Cacheline-level conflict detection granularity
• Information for rollback recorded in undo log
• Reads recorded in read set:– For validation (optimistic mode)– For locking/unlocking (pessimistic and obstinate modes)
• Writes recorded in write set for locking/unlocking (all transactional modes)
• Two-phase locking (2PL) protocol
40
![Page 41: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/41.jpg)
Per thread metadata
•Transaction Descriptor
–Read set: validation or unlocking
–Write set: unlocking
–Undo log: rollback
–… local timestamp, execution mode …
•Transaction Memento
–Checkpoint of machine and transaction state
–For nesting & partial rollback
41
![Page 42: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/42.jpg)
Transation Record (TxnRec)
•Tracks transactional state of shared data
–For optimistic transactions (OptTxnRec)• Unlocked – contains timestamp (more on this later!)• Write-locked – contains transaction descriptor of lock owner
–For pessimistic transactions (PessTxnRec)• Unlocked – contains special mark• Read-locked – contains info about all readers• Write locked – contains info about single writer
•Stored in the owner table mapping each memory word to a single transaction record
42
![Page 43: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/43.jpg)
Optimistic STM Algorithm
•Timestamp-based–Global Timestamp (G_TS): incremented every time a
writing transaction commits
– Local Timestamp (L_TS): records last time transaction was valid
–On transactional read of shared data record timestamp associated with its OptTxnRec in the transaction’s read set
–On transaction termination update local timestamps and write them to OptTxnRec-s of all data updated by this transaction
•Validation for serializability and consistency
•Quiescence for privatization safety
43
![Page 44: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/44.jpg)
44
Consistency
__tm_atomic {
__tm_atomic {
int t1 = x;
…
int t2 = x;
if (t1 != t2)
*ptr = x;
}
lock(mutex);
int t1 = x;
…
int t2 = x;
if (t1 != t2)
*ptr = x;
unlock(mutex);
x=y;
}
lock(mutex);
x=y;
unlock(mutex);
int *ptr = NULL;
int x = 0; int y = 1
NULL POINTER
T1 T2
== 1
== 1
== 0
// cannot happen
![Page 45: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/45.jpg)
Validation
•For every entry in read set, abort transaction if recorded timestamp greater than local timestamp
•Performed on commit to guarantee serializability
•Performed on read to guarantee consistency (when data’s OptTxnRec > local timestamp)
45
![Page 46: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/46.jpg)
Validation
T1 T2__tm_atomic {
__tm_atomic {
int t1 = x;
…
int t2 = x;
if (t1 != t2)
*ptr = x;
}
x=y;
}
G_TS =
NULL POINTER
x
0OptTxnRec-s
0 1
L_TS = 0W_SET = <&x>
L_TS = 0R_SET = <&x>
1T1
ABORT
// cannot happen
R_SET = <&y>
y
0L_TS = 1
T1
46
![Page 47: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/47.jpg)
47
Privatization Safety
__tm_atomic { t1 = head; if (t1)
__tm_atomic { t2 = head; head = t2->next; t2->next = NULL;}priv = t2->x;…assert (priv == t2->y);
lock(mutex); t2 = head; head = t2->next; t2->next = NULL;unlock(mutex);priv = t2->x;…assert (priv == t2->y);
t1->x = t1->y = 1;}
lock(mutex); t1 = head; if (t1)
t1->x = t1->y = 1;unlock(mutex);
T1 T2
0
0
x
y
next
head
t1
t2 1
1
= NULL;
== 1
== 1== 1
== 0
![Page 48: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/48.jpg)
Quiescence
•Maintain list of active transactions containing their current local timestamp
•Implicit infinite timestamp for pessimistic transactions
•Committing transaction waits for all active transactions whose timestamp is smaller than its own timestamp
48
![Page 49: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/49.jpg)
Quiescence
__tm_atomic {
t1 = head;
if (t1)
__tm_atomic {
t2 = head;
head = t2->next;
t2->next = NULL;
}
t1->x = t1->y = 1;
}
priv = t2->x;
…
assert (priv == t2->y);
G_TS = 0 1
T1 T2
L_TS = L_TS =
T1 T2
01
WAIT
0
2
49
![Page 50: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/50.jpg)
50
Unified STM
• Both optimistic and pessimistic readers can co-exist
• Owner table is shared and contains both OptTxnRec and PessTxnRec
• Read barriers:– Optimistic – reads only OptTxnRec– Pessimistic – reads only PessTxnRec
• Write barriers need to write both TxnRec-s
![Page 51: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/51.jpg)
51
Owner Table for Unified STM
typedef uintptr_t TxnRec;typedef struct OwnerTableEntryS { TxnRec optimistic; TxnRec pessimistic;} OwnerTableEntry;
……
Owner Table
PessTxnRec OptTxnRec
![Page 52: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/52.jpg)
52
OptTxnRec
Lock bit0: Write-Locked (Exclusive)
1: Unlocked (Shared)
Upper bitsOwner TxnDesc upper bits
Or timestamp upper bits
31 … 1 0
![Page 53: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/53.jpg)
53
PessTxnRec
Lock bit0: Write-locked (Exclusive)
1: Unlocked (Shared)
Upgrading bit0: no upgrading request
1: upgrading requested
Owner bitsEach bit represents a pessimistic transaction
Locked if non zero
31 … 2 1 0
![Page 54: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/54.jpg)
54
xxx … xxxxx0000 … 0000111110
Unified STM Algorithm
T1 (PESS)
__tm_atomic { r1 = x; r3 = x;}
T2 (OPT)
__tm_atomic {
r2 = x;
x = r2 +1;
}
0
x T1
PessTxnRec OptTxnRec
T2
0 000 … 000001 000 … 000
![Page 55: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/55.jpg)
Agenda
Part 1: STM Overview• Introduction• Language Constructs and Semantics• Design space
Part 2: STM Implementation• Runtime• Compiler• Performance
55
![Page 56: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/56.jpg)
56
Compiler/Runtime Interaction
• Decouple compiler from the runtime– Enables use of different library implementations with the
same compiler (e.g. in-place updates vs. write-buffering)– Enables use of different algorithms within the library
itself (e.g. optimistic vs. pessimistic)
• Calls to the runtime realized through a vtable-like mechanism
• Compiler/runtime ABI:– General – same code used for different algorithms– Rich – to enable additional optimizations
![Page 57: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/57.jpg)
57
ABI: Txn Begin and Commit
_ITM_transaction * _ITM_getTransaction()– Returns (creates if necessary) a transaction descriptor
uint32 _ITM_beginTransaction(_ITM_transaction* td, uint32 props)– Saves machine state– Pass information to runtime via props (e.g. pr_multiwayCode
- both instrumented and uninstrumented code is available) – Can return more than once (e.g. on abort); possible return
values: a_saveLiveVariables, a_restoreLiveVariables
void _ITM_commitTransaction(_ITM_transaction *td)
![Page 58: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/58.jpg)
58
ABI: Read and Write Barriers
• Templates:void _ITM_Wtypesig(_ITM_transaction* td, type *addr, type val)
type _ITM_Rtypesig(_ITM_transaction* td, type *addr)
typesig: U[1248] – unsigned int[FDE] – float, double,
long…
•Examples:_ITM_WF(_ITM_transaction *td, float *addr, float val);
_ITM_RU4(_ITM_transaction *td, uint32 *addr);
![Page 59: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/59.jpg)
59
Simple Atomic Block Translated
uint32Val = 42;
}
uint32 props = pr_multiwayCode;
_ITM_transaction *td = _ITM_getTransaction();
uint32 doWhat =
_ITM_beginTransaction(td, props);
if (doWhat & a_restoreLiveVariables) {
/* code to restore live local variables */
}
if (doWhat & a_saveLiveVariables) {
/* code to save live local variables */
}
_ITM_WU4(td, &uint32Val, 42);
_ITM_commitTransaction(td);
__tm_atomic {
! CONFLICT !
![Page 60: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/60.jpg)
60
User Abort and Retry Translated
uint32Val = 42;
}
uint32 props = pr_multiwayCode;
_ITM_transaction *td = _ITM_getTransaction();
uint32 doWhat = _ITM_beginTransaction(td, props);
if (doWhat & a_restoreLiveVariables) {
/* code to restore live local variables */
}
if (doWhat & a_saveLiveVariables) {
/* code to save live local variables */
}
_ITM_WU4(td, &uint32Val, 42);
_ITM_commitTransaction(td);
__tm_atomic {
if (!_ITM_RU(td, &cond))
_ITM_abortTransaction(td, userRetry);
if (error) __tm_abort;
if (cond) __tm_retry;
if (_ITM_RU(td, &error))
_ITM_abortTransaction(td, userAbort);
if (doWhat & a_abortTransaction) goto ABORT_TXN;
ABORT_TXN:
![Page 61: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/61.jpg)
61
Optimizations for Transactions
•Standard optimizations– Careful IR design enables existing optimizations
• Partial redundancy elimination (PRE), dead code elimination, …
– Subtle in presence of nesting
•STM-specific optimizations–No instrumentation when executing in serial mode
– Conversion of generic STM read/write barriers to cheaper variants
– Also:• Flattening nested transactions if no user abort is inside• Barrier elimination for __thread (thread local) or const data
![Page 62: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/62.jpg)
Un-instrumented Serial Mode
if (flag) {
printf(“Hello!”); }
}
uint32 props = pr_multiwayCode;
_ITM_transaction *td = _ITM_getTransaction();
uint32 doWhat = _ITM_beginTransaction(td, props);
if (doWhat & a_restoreLiveVariables) {
/* code to restore live local variables */
}
_ITM_commitTransaction(td);
__tm_atomic {
if (doWhat & a_saveLiveVariables) {
/* code to save live local variables */
}
if (_ITM_RU4(td, &flag)) {
_ITM_changeTransactionMode(td, modeSerialIrrevocable);
printf(“Hello!”);
}
if (doWhat & a_instrumentedCode) {
} else {
if (flag) printf(“Hello!”);
}
62
![Page 63: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/63.jpg)
ABI: Optimized Barrier Templates
•After read or after write (e.g. eliminate redundant locking operations)void _ITM_W{aRW}typesig(_ITM_transaction* td, type
*addr, type val)
type _ITM_R{aRW}typesig(_ITM_transaction* td, type *addr)
•Read-for-write (e.g. acquire write lock early and eliminate read lock)type _ITM_RfWtypesig(_ITM_transaction* td, type *addr)
63
![Page 64: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/64.jpg)
6464
Barrier Optimization Example
__tm_atomic { if (x < N) { x++; }}
…t1 = _ITM_RU4(td, &x);if (t1 < N) { t2 = _ITM_RU4(td, &x); _ITM_WU4(td, &x,t2+1);}….
…t1 = _ITM_RU4(td, &x);if (t1 < N) { _ITM_WU4(td, &x,t1+1);}….
…t1 = _ITM_RU4(td, &x);if (t1 < N) { _ITM_WaRU4(td, &x,t1+1);}….
…t1 = _ITM_RfWU4(td, &x);if (t1 < N) { _ITM_WaWU4(td, &x,t1+1);}….
![Page 65: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/65.jpg)
65
ABI: Undo and Commit Functions
• Programmers may register actions executed by the runtime on transaction termination
void _ITM_addUserCommitAction(_ITM_transaction *td, _ITM_userCommitFunction fn, _ITM_transactionId tid, void *arg)
void _ITM_addUserUndoAction(_ITM_transaction *td, _ITM_userUndoFunction, void *arg)
• Current transaction id_ITM_transactionId _ITM_getTransactionId(_ITM_transaction *tid)(1: non-txn, 2: outer txn begin, ++: inner txn begin)
• Undo and commit actions can be used inside of function wrappers
![Page 66: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/66.jpg)
Transactional Function Wrappers
•Transparently replace a call to non-transactional function with a call to its transactional version
•Transactional wrapper’s code:– Un-instrumented– Can use explicit calls to the runtime
•Intended use - implementation of library functions (e.g. transactions-aware memory management)
__declspec (tm_wrap(foo)) void fooTxn();
66
![Page 67: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/67.jpg)
Memory Management Risks
•Txn allocation, non-txn de-allocation– Re-executions leading to multiple allocations but only one
de-allocation operation
•Non-txn allocation, txn de-allocation– Re-executions leading to the same region being de-
allocated more than once
•Txn allocation, txn de-allocation– Combination of two previous cases depending on when re-
execution gets triggered
67
![Page 68: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/68.jpg)
Memory Management Algorithm
•Uses function wrappers mechanism to take advantage of the existing allocators
•Allocation and de-allocation sites marked with tid
•Allocation creates an allocation record – If allocation record exists on outer commit – remove it– On abort – de-allocate and remove allocation record
•De-allocation removes allocation record– De-allocate immediately if txn_id(de-alloc) <= txn_id(alloc)– Otherwise, de-allocate on commit at the nesting level where
condition holds
68
![Page 69: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/69.jpg)
Safe Memory Management
p1 = malloc(size);
tm_atomic {
p2 = malloc(size);
tm_atomic {
free(p2);
p3 = malloc(size);
p4 = malloc(size);
}
free(p1);
free(p3);
tm_atomic {
free(p4);
}
}
2
13
3
p2
p1p3
p4
AllocationRecordstxn_id
1223333
22
2
4421
>
><
>
defer until txn_id <= 2
defer until txn_id <= 1
defer until txn_id <= 3
execute
69
![Page 70: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/70.jpg)
70
Functions Code Generation
•tm_callable–Generate two copies, instrumented (transactional) and
uninstrumented (non-transactional)
•tm_pure–Only generate uninstrumented code – does not cause
transaction to go serial
•tm_unknown– Switch to serial mode before a call is made inside a
transaction
–May be promoted to tm_callable or tm_pure by compiler
![Page 71: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/71.jpg)
71
Code Generation for tm_callable
__declspec(tm_callable)
int inc (int *p)
{
p++;
}
inc:
jmp inc_$nontxn
mov eax, MAGIC
jmp inc_$txn
inc_$nontxn:
…
inc_$txn:
…
![Page 72: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/72.jpg)
72
Code Generation for tm_pure
__declspec(tm_pure)
int peek(int *p)
{
return *p;
}
peek:
jmp peek_$nontxn
mov eax, MAGIC
jmp peek_$nontxn
peek_$nontxn:
…
![Page 73: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/73.jpg)
73
Indirect Calls
if (*(fp + MAGIC_OFFSET) == MAGIC) {
call fp + TXN_TWIN_OFFSET;
} else {
switchToSerialMode();
call fp;
}
•No overhead for indirect calls outside of transactions
•Same execution mode available across inheritance hierarchy thanks to virtual function overriding rules
•No annotation on function pointers– Indirect call to non-recompiled tm_pure function causes switch to serial mode
![Page 74: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/74.jpg)
74
Agenda
Part 1: STM Overview– Introduction– Language Constructs and Semantics– Design space
Part 2: STM Implementation– Runtime– Compiler– Performance
![Page 75: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/75.jpg)
75
TM in Real World
• Realistic workloads: STAMP, SPLASH, and PARSEC benchmark suites (fluid dynamics, raytracing, etc.)
• Performance bottlenecks– Sometimes we use a single global lock (GLOCK)
as a baseline– Bottleneck discovery performed on optimistic
STM only
![Page 76: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/76.jpg)
76
False Conflicts
•Poor scalability due to conflicts -- >90% false conflicts
•The same STM had no problems on SPLASH-2
Genome Vacation
Exe
cuti
on
Tim
e (s
)
GLOCK STM
0
5
10
15
20
25
30
1 2 4 8
# threads
0
2
4
6
8
10
12
1 2 4 8
# threads
![Page 77: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/77.jpg)
77
Mapping to TxnRec-s
0561931
Address
20
…
0x0000
0x3FFF
Ownership Table
Transaction Record
Reserved to avoid cache line
ping ponging
•Addresses map to a transaction record via a hash function
• Different addresses can map to the same record
![Page 78: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/78.jpg)
78
Refined Hash Function
• 4 additional bits to index into transaction record
• Reduce false conflict vs. pontentially increasing cache ping-ponging
031
Address
23 561920
…
0x0000
0x3FFF
Ownership Table
Transaction Record
![Page 79: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/79.jpg)
79
False Conflicts Reduced
GLOCK STM (old hash) STM (new hash)
0
2
4
6
8
10
12
1 2 4 8
0
5
10
15
20
25
30
1 2 4 8# threads # threads
Genome Vacation
Exe
cuti
on
Tim
e (s
)
![Page 80: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/80.jpg)
80
Over-Instrumentation
•Compiler generates more barriers than necessary– Thread-local memory accesses, – Objects alternating between modification and constant phase– Constant global objects
TxLD (optimal)
TxLD (compiler)
TxST (optimal)
TxST (compiler)
TxLD overhead
TxST overhead
Genome 58,701,959 624,073,490 2,252,291 19,078,705 10.63x 8.60x
Kmeans 86,666,710 255,662,754 86,666,710 86,666,711 2.95x 1.00x
Vacation 785,775,435 925,584,125 26,300,714 122,543,905 1.18x 4.66x
Transactional Barrier Counts for STAMP
![Page 81: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/81.jpg)
81
__tm_waiver
•No instrumentation for a block or function marked with __tm_waiver
• Allows incremental optimizations but should be used with caution
__tm_atomic { y= ++x; // instrumented __tm_waiver { ++local; // no instrumentation }}
![Page 82: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/82.jpg)
82
Over-Instrumentation Reduced
•__tm_waiver used for– thread-local object allocation routines – quasi-static shared objects
0
2
4
6
8
10
12
1 2 4 8
0
5
10
15
20
25
30
1 2 4 8
GLOCK STM (new hash) STM (new hash + __tm_waiver)
# threads # threads
Genome Vacation
Exe
cuti
on
Tim
e (s
)
![Page 83: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/83.jpg)
83
Quiescence Overhead
•Only some programs use privatization idiom•Provide API to let programmer selectively disable privatization safety
0
0.5
1
1.5
2
sphinx genome kmeans vacation average
2 threads 4 threads 8 threads
spee
du
p
![Page 84: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/84.jpg)
84
Other Issues
•Small transactions overwhelmed by fixed costs– Fluidanimate: ~1 load and ~1 store per transaction– Different code for small transactions
•Atomic blocks make porting of some benchmarks (e.g., BerkeleyDB) difficult but are more amenable to compiler optimizations
•Annotating transactional functions can be a burden (40% of functions in vacation)
•Many workloads require condition synchronization
![Page 85: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/85.jpg)
85
Finding the Bottlenecks
•Many workloads would not scale at first
•Cumulative stats would shed no light - low contention, no false conflicts, …
•And then we remembered … the devil is in the details …
![Page 86: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/86.jpg)
86
Per Critical Section Statistics
Only critical section 601 suffers from high abort rate and prevents scaling
critical section tx_begin commit abort abort %
code size (lines)
602 1314 1312 2 0.15% O(1)
542 222481 221043 1438 0.65% O(1)
559 220908 220908 0 0.00% O(1)
601 12306 6194 6112 49.67% O(1000)
571 42917 42889 28 0.07% O(1)
588 42770 42770 0 0.00% O(1)
301 1313 1312 1 0.08% O(1)
Transactional Statistics for Sphinx
![Page 87: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/87.jpg)
87
Overall Performance
0
1
2
3
4
5
6
7
8
geno
me
kmea
ns/lo
w
kmea
ns/h
igh
vaca
tion/
low
vaca
tion/
high
chole
sky fft
lu/co
nt.
lu/no
n co
nt.
radix
barn
esfm
m
ocea
n/co
nt.
ocea
n/no
n co
nt.
radio
sity
raytr
ace
volre
nd
water
-nsq
uare
d
water
-spa
tial
fluida
nimat
e
1 thread 2 threads 4 threads 8 threads
STM vs. single-thread GLOCK
spee
du
p
![Page 88: Software Transactional Memory TiC 2010 Adam Welc Programming Systems Lab Intel Labs.](https://reader036.fdocuments.net/reader036/viewer/2022062519/5697bff81a28abf838cbf182/html5/thumbnails/88.jpg)