ATLAS Software Development Environment for Hardware Transactional Memory
description
Transcript of ATLAS Software Development Environment for Hardware Transactional Memory
![Page 1: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/1.jpg)
April 15 2008 Thesis Defense Talk
ATLASSoftware Development Environment for Hardware Transactional Memory
Sewook Wee
Computer Systems LabStanford University
![Page 2: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/2.jpg)
2
The Parallel Programming Crisis
Multi-cores for scalable performance No faster single core any more
Parallel programming is a must, but still hard Multiple threads access shared memory Correct synchronization is required
Conventional: lock-based synchronization Coarse-grain locks: serialize system Fine-grain locks: hard to be correct
![Page 3: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/3.jpg)
3
Alternative: Transactional Memory (TM) Memory transactions [Knight’86][Herlihy & Moss’93]
An atomic & isolated sequence of memory accesses Inspired by database transactions
Atomicity (all or nothing) At commit, all memory updates take effect at once On abort, none of the memory updates appear to take
effect Isolation
No other code can observe memory updates before commit
Serializability Transactions seem to commit in a single serial order
![Page 4: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/4.jpg)
4
Advantages of TM
As easy to use as coarse-grain locks Programmer declares the atomic region No explicit declaration or management of locks
As good performance as fine-grain locks System implements synchronization Optimistic concurrency [Kung’81] Slow down only on true conflicts (R-W or W-W) Fine-grain dependency detection
No trade-off between performance & correctness
![Page 5: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/5.jpg)
5
Implementation of TM
Software TM [Harris’03][Saha’06][Dice’06] Versioning & conflict detection in software No hardware change, flexible Poor performance (up to 8x)
Hardware TM [Herlihy & Moss’93] [Hammond’04][Moore’06] Modifying data cache hardware High performance Correctness: strong isolation
![Page 6: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/6.jpg)
6
Software Environment for HTM
Programming language [Carlstrom’07] Parallel programming interface
Operating system Provides virtualization, resource management, … Challenges for TM
Interaction of active transaction and OS
Productivity tools Correctness and performance debugging tools Build up on TM features
![Page 7: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/7.jpg)
7
Contributions
An operating system for hardware TM
Productivity tools for parallel programming
Full-system prototyping & evaluation
![Page 8: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/8.jpg)
8
Agenda
Motivation
Background
Operating System for HTM
Productivity Tools for Parallel Programming
Conclusions
![Page 9: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/9.jpg)
9
TCC: Transactional Coherence/Consistency A hardware-assisted TM
implementation Avoids overhead of software-only
implementation Semantically correct TM implementation
A system that uses TM for coherence & consistency Use TM to replace MESI coherence
Other proposals build TM on top of MESI All transactions, all the time
![Page 10: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/10.jpg)
10
TCC Execution Model
CPU 0
CPU 1
CPU 2
Commit
Arbitrate
Execute
Code
Commit
Arbitrate
Execute
Code
Undo
Execute
Code
ld 0xccccRe-
Execute
Code
...
ld 0x1234
ld 0x5678
...
ld 0xcccc...
...
0xcccc0xcccc
st 0xcccc
...
ld 0xabdc
ld 0xe4e4
...
See [ISCA’04] for details
tim
e
![Page 11: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/11.jpg)
11
Processor
W7:0TAG
(2-ported)Data
Cache
ViolationLoad/Store
Address
Snoop Control
Commit Address
CommitControl
CommitData
StoreAddress
FIFO
RegisterCheckpoint
Commit Bus
Refill Bus
CommitAddress In
CommitData Out
CommitAddress Out
DATA(single-ported)
R7:0V
CMP Architecture for TCC
Transactionally Read Bits:
ld 0xdeadbeef
Transactionally Written Bits:
st 0xcafebabe
Conflict Detection:Compare incoming address to R bits
Commit:Read pointers from Store Address FIFO, flush addresses with W bits set
See [PACT’05] for details
![Page 12: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/12.jpg)
12
ATLAS Prototype Architecture
Goal Convinces a proof-of-concept of TCC Experiments with software issues
Main memory & I/O
Coherent bus with commit token arbiter
CPU0
TCCCache
CPU1
TCCCache
CPU2
TCCCache
CPU7
TCCCache
…
![Page 13: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/13.jpg)
13
Mapping to BEE2 Board
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
Arb-iter
switchmemory
![Page 14: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/14.jpg)
14
Agenda
Motivation
Background
Operating System for HTM
Productivity Tools for Parallel Programming
Conclusions
![Page 15: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/15.jpg)
15
What should we do if OS needs to run
in the middle of transaction?
Challenges in OS for HTM
![Page 16: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/16.jpg)
16
Challenges in OS for HTM
Loss of isolation at exception Exception info is not visible to OS until commit I.e. faulting address in TLB miss
Loss of atomicity at exception Some exception services cannot be undone I.e. file I/O
Performance OS preempts user thread in the middle of
transaction I.e. interrupts
![Page 17: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/17.jpg)
17
Practical Solutions
Performance A dedicated CPU for operating system No need to preempt user thread in the
middle of transaction
Loss of isolation at exception Mailbox: separate communication layer
between application and OS
Loss of atomicity at exception Serialize system for irrevocable exceptions
![Page 18: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/18.jpg)
18
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
CPU
TCCcache
CPU
TCCcache
switch
Arbiter
switch
memory
CPU
$ M
CPU
$ M
switch
CPU
$ MArbiter
switch
memory
CPU
$ M
CPU
$ M
switch
CPU
$ M
CPU
$ M
switch
CPU
$ M
CPU
$ M
switch
Architecture Update
Linux
proxy kernel
![Page 19: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/19.jpg)
19
P
$M
P
$M
Application CPU
$Mailbox
OS CPU
$ Mailbox
Operating system BootloaderATLAS core
Initialcontext
TM application
Execution overview (1) - Start of an application
ATLAS core A user-level program runs on OS CPU Same address space as TM application Start application & listen to requests from apps
Initial context Registers, PC, PID, …
![Page 20: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/20.jpg)
20
P
$M
P
$M
Application CPU
$Mailbox
OS CPU
$ Mailbox
ATLAS core
ExceptionInformation
TM application
Execution overview (2) - Exception
Proxy kernel forward the exception information to OS CPU Fault address for TLB misses Syscall number and arguments for syscalls
OS CPU services the request and returns the result TLB mapping for TLB misses Return value and error code for syscalls
Operating system
ExceptionResult
Proxy kernel
![Page 21: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/21.jpg)
21
Operating System Statistics
Strategy: Localize modifications Minimize the work needed to track main stream kernel
development
Linux kernel (version 2.4.30) Device driver that provides user-level access to
privilege-level information ~1000 lines (C, ASM)
Proxy kernel Runs on application CPU ~1000 lines (C, ASM)
A full workstation for programmer’s perspective
![Page 22: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/22.jpg)
22
System Performance
Total execution time scales OS time scales, too
Scalability in average of 10 benchmarks
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1p 2p 4p 8p
Number of processors
Normalized execution time
OS
user
![Page 23: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/23.jpg)
23
Scalability of OS CPU
Single CPU for operating system Eventually, it will become a bottleneck as
system scales Multiple CPUs for OS will need to run SMP
OS
Micro-benchmark experiment Simultaneous TLB miss requests Controlled injection ratio Looking for the number of application CPUs
that saturates OS CPU
![Page 24: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/24.jpg)
24
Experiment results
Average TLB miss rate = 1.24% Start to congest from 8 CPUs
With victim TLB (Average TLB miss rate = 0.08%) Start to congest from 64 CPUs
![Page 25: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/25.jpg)
25
Agenda
Motivation
Background
Operating System for HTM
Productivity Tools for Parallel Programming
Conclusions
![Page 26: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/26.jpg)
26
Challenges in Productivity Tools for Parallel Programming Correctness
Nondeterministic behavior Related to a thread interleaving
Need to track an entire interleaving Very expensive in time/space
Performance Detailed information of the performance
bottleneck events Light-weight monitoring
Do not disturb the interleaving
![Page 27: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/27.jpg)
27
Opportunities with HTM
TM already tracks all reads/writes Cheaper to record memory access
interleaving
TM allows non-intrusive logging Software instrumentation in TM system Not in user’s application
All transactions, all the time Everything in transactional granularity
![Page 28: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/28.jpg)
Thesis Defense Talk
Tool 1: ReplayTDeterministic Replay
![Page 29: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/29.jpg)
29
Deterministic Replay
Challenges in recording an interleaving Record every single memory access Intrusive Large footprint
ReplayT’s approach Record only a transaction interleaving Minimally overhead: 1 event per transaction Footprint: 1 byte per transaction (thread ID)
![Page 30: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/30.jpg)
30
ReplayT Runtime
Log Phase Replay Phase
Commit
time time
LOG:
T0
T1
T2 T2
Commit
T0
T1
T2 T2
Commit protocolreplays loggedcommit order
T0 T1 T2
![Page 31: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/31.jpg)
31
Runtime Overhead
Minimal time & space overhead
B: baselineL: log modeR: replay mode
Average on 10 benchmarks
7 STAMP, 3 SPLASH/SPLASH2
Less than 1.6% overhead for logging
More overhead in replay mode
longer arbitration time
1B per 7119 insts.
![Page 32: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/32.jpg)
Thesis Defense Talk
Tool 2. AVIO-TMAtomicity Violation Detection
![Page 33: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/33.jpg)
33
Atomicity Violation
Problem: programmer breaks an atomic task into two transactions
ATMDeposit:ATMDeposit: atomic {atomic { t = Balancet = Balance Balance = t + $100Balance = t + $100 }}
atomic {atomic { Balance = t + $100Balance = t + $100 }}
ATMDeposit:ATMDeposit: atomic {atomic { t = Balancet = Balance }} directDeposit:directDeposit:
atomic {atomic { t = Balancet = Balance Balance = t + $1,000Balance = t + $1,000 }}
![Page 34: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/34.jpg)
34
Atomicity Violation Detection
AVIO [Lu’06] Atomic region = No unserializable interleavings Extracts a set of atomic region from correct runs Detects unserializable interleavings in buggy runs
Challenges of AVIO Need to record all loads/stores in global order
Slow (28x) Intrusive - software instrumentation Storage overhead
Slow analysis Due to the large volume of data
![Page 35: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/35.jpg)
35
My Approach: AVIO-TM
Data collection in deterministic rerun Captures original interleavings
Data collection at transaction granularity Eliminate repeated loggings for same address
(10x) Lower storage overhead
Data analysis in transaction granularity Less possible interleavings faster extraction Less data faster analysis More accurate with complementary detection tools
![Page 36: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/36.jpg)
Thesis Defense Talk
Tool 3. TAPE
Performance Bottleneck Monitor
![Page 37: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/37.jpg)
37
TM Performance Bottlenecks
Dependency conflicts Aborted transactions waste useful cycles
Buffer overflows Speculative states may not fit into cache Serialization
Workload imbalance
Transaction API overhead
![Page 38: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/38.jpg)
38
Dependency Conflicts
Write XWrite X
Useful Arbitration Commit Abort
Time
T0
Read XRead XT1
Useful cycles are wasted in T1
![Page 39: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/39.jpg)
39
TAPE on ATLAS
TAPE [Chafi, ICS2005] Light weight runtime monitor for performance
bottlenecks
Hardware Tracks information of performance bottleneck
events
Software Collects information from hardware for events Manages them through out the execution
![Page 40: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/40.jpg)
40
TAPE Conflict
Commit X from Thread 1
T0
Read X
Object: X
Writing Thread: 1
Wasted cycles: 82,402
Restart
Read X
Read PC: 0x100037FC
Per Thread Read PC: 0x100037FC … Occurrence: 34
Per Transaction
![Page 41: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/41.jpg)
41
Read_PC Object_Addr Occurence Loss Write_Proc Read in source line 10001390 100830e0 30 6446858 1 ..//vacation/manager.c:13410001500 100830e0 32 1265341 3 ..//vacation/manager.c:13410001448 100830e0 29 766816 4 ..//vacation/manager.c:13410005f4c 304492e4 3 750669 6 ..//lib/rbtree.c:105
TAPE Conflict Report
Now, programmers know, Where the conflicts are What the conflicting objects are Who the conflicting threads are How expensive the conflicts are
Productive performance tuning!
![Page 42: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/42.jpg)
42
Runtime Overhead
Base overhead 2.7% for 1p
Overhead from real conflicts More CPU
configuration has higher chance of conflicts
Max. 5% in total
![Page 43: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/43.jpg)
43
Conclusion
An operating system for hardware TM A dedicated CPU for the operating system Proxy kernel on application CPU Separate communication channel between them
Productivity tools for parallel programming ReplayT: Deterministic replay AVIO-TM: Atomicity violation detection TAPE: Runtime performance bottleneck monitor
Full-system prototyping & evaluation Convincing proof-of-concept
![Page 44: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/44.jpg)
44
RAMP Tutorial
ISCA 2006 and ASPLOS 2008
Audience of >60 people (academia & industry) Including faculties from Berkeley, MIT, and UIUC
Parallelized, tuned, and debugged apps with ATLAS From speedup of 1 to ideal speedup in a few minutes Hands-on experience with real system
“most successful hands-on tutorial in last several decades”
- Chuck Thacker (Microsoft Research)
![Page 45: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/45.jpg)
45
Acknowledgements
My wife So Jung and our baby (coming soon) My parents who have supported me for last 30
years My advisors: Christos Kozyrakis and Kunle Olukotun My committee: Boris Murmann and Fouad A. Tobagi Njuguna Njoroge, Jared Casper, Jiwon Seo, Chi Cao
Minh, and all other TCC group members RAMP community and BEE2 developers Shan Lu from UIUC Samsung Scholarship All of my friends at Stanford & my Church
![Page 46: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/46.jpg)
Thesis Defense Talk
Backup Slides
![Page 47: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/47.jpg)
47
Single core’s Performance Trend
1
10
100
1000
10000
1978 1980 1982 1984 1986 1988 1990 1992 1994 1996 1998 2000 2002 2004 2006
Performance (vs. VAX-
25%/year
52%/year
??%/year?
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th edition, Sept. 15, 2006
![Page 48: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/48.jpg)
48
TAPE Conflict
ObjectShooting Thread ID
Read PCOccurrence
Wasted cycles
XX
T0T0
0x100037FC0x100037FC
442,4532,453
TCC cache PowerPC Software counter
Write XWrite X
Time
T0
Read XRead XT1
![Page 49: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/49.jpg)
49
Memory transaction vs. Database transaction
![Page 50: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/50.jpg)
50
TLB miss handling
![Page 51: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/51.jpg)
51
Syscall Handing
![Page 52: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/52.jpg)
52
ReplayT Extensions
Unique replay Problem: maximize usefulness of test runs Approach: shuffle commit order to generate unique
scenarios
Replay with monitoring code Problem: replay accuracy after recompilation Approach: faithfully repeat commit order if binary
changes E.g., printf statements inserted for monitoring purposes
Cross-platform replay Problem: debugging on multiple platforms Approach: support for replaying log across platforms &
ISAs
![Page 53: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/53.jpg)
53
Integration with GDB
Breakpoints Software breakpoint == self-modifying code Breakpoints may be buffered in the TCC $ by the end of
transactions be better to set it in OS core
Traps Stop all threads - controlling token arbiter Debug only committable transaction - acquiring commit
token
Stepping Backward stepping using abort & restart
Data-watch
![Page 54: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/54.jpg)
54
Intermediate Write Analyzer
Intermediate write A write that is overwritten by a local or remote
thread before it was read by a remote thread
Intermediate writes in the correct runs Potential bugs, it can be read by remote thread at
some point. Analyze the buggy run, if there’s any intermediate
write that is read by remote threads.
Why in TM? In every single memory access base, there will be
too many of intermediate writes which are actually safe. Too high false positive rate
![Page 55: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/55.jpg)
55
Buffer Overflow
Computation Arbitration Commit Token Hold
Time
T0
T1
Miss-speculation wastes computation cycles in T1
Overflow Overflow Commit
Commit
![Page 56: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/56.jpg)
56
TAPE Overflow
Overflowed PC
Type
Occurrence
Duration (cycles)
OverflowOverflow
0x10004F180x10004F18
LRU overflowLRU overflow
35,07235,072
44
CommitCommit
TCC cache Software counterPowerPC
![Page 57: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/57.jpg)
57
ATLAS’ Contribution on TAPE
Evaluation on real hardware In theory, there is no difference in theory
and practice. But, in pratice, there is. - Jan van de Snepscheut
Optimization Minimizes HW modification from original
proposal Eliminates some information to track
Runtime overhead vs. Usefulness of the
information
![Page 58: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/58.jpg)
58
Why not SMP kernel?
![Page 59: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/59.jpg)
59
What is strong isolation?
![Page 60: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/60.jpg)
60
TCC vs. SLE
Speculative Lock Elision (SLE)[Rajwar & Goodman’01] Speculate through locks If a conflict is detected, it aborts ALL involved
threads No guarantee to forward progress
TLR: Transactional Lock Removal [above’02] Extended from SLE Guarantee to forward progress by giving a priority
to the oldest thread
![Page 61: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/61.jpg)
61
TCC vs. TLS
TLS (Thread-level speculation) Maintains serial execution order Forward speculative states from less
speculative threads to more speculative threads
![Page 62: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/62.jpg)
62
void deposit(account, amount) synchronized(account) {
int t = bank.get(account);t = t + amount;bank.put(account, t);
}
void withdraw(account, amount) synchronized(account) {
int t = bank.get(account);t = t – amount;bank.put(account, t);
}
void deposit(account, amount) atomic {
int t = bank.get(account);t = t + amount;bank.put(account, t);
}
void withdraw(account, amount) atomic {
int t = bank.get(account);t = t – amount;bank.put(account, t);
}
Programming with TM
Declarative synchronization Programmers say what but not how No explicit declaration or management of locks
System implements synchronization Typically with optimistic concurrency Slow down only on true conflicts (R-W or W-W)
![Page 63: ATLAS Software Development Environment for Hardware Transactional Memory](https://reader035.fdocuments.net/reader035/viewer/2022062301/56814575550346895db2452d/html5/thumbnails/63.jpg)
63
AVIO’s serializability analysis
R
R
RR
R
WR
W
RR
W
W
W
R
RW
R
WW
W
RW
W
W
OK OK
OK OK
BUG1
BUG3 BUG4
BUG2
* OK, if interleaved access is serializable* Possibly atomicity violation, if unserializable