LogTM: Log-Based Transactional Memory
-
Upload
demetrius-kelly -
Category
Documents
-
view
35 -
download
0
description
Transcript of LogTM: Log-Based Transactional Memory
LogTM: Log-Based Transactional Memory
Kevin E. Moore, Jayaram Bobba, Michelle J. Moravan, Mark D. Hill, & David A. Wood
Presented by Colleen Lewis
Credits
Animations from the original LogTM HPCA presentation
Original graphs modified for readability
Design Decisions
Version Management Eager – write in place Lazy – write on commit
Conflict Detection Eager – detect at read/write time Lazy – detect at commit time
Transaction Logs
Pointer to the beginning of the log Pointer to the end of the log Read and Write bits for each cache line
2/15/06 HPCA-12
12--------------
--------------23
34--------------
0
Transaction Log Example
00
40
C0
1000
1040
1080
• Initial State• LogBase = LogPointer• TM count > 0
Data BlockVA
Log Base
Log Ptr
TM count
1000
1000
1
0 0
R W
0 0
0 0
2/15/06 HPCA-12
10001048--
34------------
12--------------
--------------23
34-------------- 0
Transaction Log Example
56--------------
00
40
C0
1000
1040
1080
• Store r2, (c0) /* r2 = 56 */– Set W bit for block (c0)
– Store address (c0) and old data on the log
– Increment Log Ptr to 1048
– Update memory
Data BlockVA
Log Base
Log Ptr
TM count
1000
1
0 0
R W
0 0
0 1
c0
2/15/06 HPCA-12
12--------------
--------------23
56--------------
Transaction Log Example
00
40
C0
1000
1040
1080
• Commit transaction– Clear R & W for all blocks– Reset Log Ptr to Log Base
(1000)– Clear TM count
Data BlockVA
Log Base
Log Ptr
TM count
1000
1000
0
0 0
R W
0 0
0 0
34------------c0
--
0
0 0
1
1
1048
2/15/06 HPCA-12
1
1090
Transaction Log Example
12--------------
--------------23
34--------------
00
40
C0
1000
1040
1080
• Abort transaction– Replay log entries to “undo”
the transaction– Reset Log Ptr to Log Base
(1000)– Clear R & W bits for all
blocks– Clear TM count
Data BlockVA
Log Base
Log Ptr
TM count
1000
1048
0
0 0
R W
0 0
0 0
c0 34------------
--
0
0 0
156--------------
1000
Conflict Detection
Checked at every read/write Directory forwards read requests Directory can have “sticky” data Individual nodes responsible for detecting
conflicts Needs
Transaction mode bit Overflow bit
2/15/06 HPCA-12
I [old]M@P0 [old]
I (--) [none]M (--) [old]M (-W) [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
• P0 store– P0 sends get exclusive
(GETX) request
– Directory responds with data (old)
– P0 executes store
P0
GETX DATA
2/15/06 HPCA-12
M (-W) [new]M (-W) [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• In-cache transaction conflict– P1 sends get shared
(GETS) request
– Directory forwards to P0
– P0 detects conflict and sends NACK
P0
GETS
Fwd_GETS
Conflict!
NACK
2/15/06 HPCA-12
M (-W) [new]I (--) [none]
M@P0 [old]Msticky@P0 [new]
Conflict Detection (example)
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
• Cache overflow– P0 sends put exclusive
(PUTX) request
– Directory acknowledges
– P0 sets overflow bit
– P0 writes data back to memory
P0
PUTX ACK DATA
1
2/15/06 HPCA-12
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• Out-of-cache conflict– P1 sends GETS request
– Directory forwards to P0
– P0 detects a (possible) conflict
– P0 sends NACK
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
GETS
Fwd_GETS
Conflict!
NACK
1
2/15/06 HPCA-12
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
• Commit– P0 clears TM mode and
Overflow bits
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
0
0
2/15/06 HPCA-12
Msticky@P0 [new]S(P1) [new]
0
0
0
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0 P1
I (--) [none]
TM modeOverflow
0
0
• Lazy cleanup– P1 sends GETS request
– Directory forwards request to P0
– P0 detects no conflict, sends CLEAN
– Directory sends Data to P1
P0
M (--) [old]M (-W) [new]I (--) [none]
GETS
Fwd_GETSCLEAN DATA
S (--) [new]
M (-W) [new]I (--) [none]
M@P0 [old]Msticky@P0 [new]
False Positive Example
Directory
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
1
Cache overflow P0 sends put exclusive
(PUTX) request Directory acknowledges P0 sets overflow bit P0 writes data
back to memory P0
PUTX ACK DATA
1
False Positive Example
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
Commit P0 clears TM mode and
Overflow bits
Start New Transaction P0 set TM mode Eventually overflow Set overflow bits P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
0
0
1
1
Conflict Detection (example)
Directory
I (--) [none]
TM modeOverflow
0
0P1
I (--) [none]
TM modeOverflow
0
0
M@P0 [old]
1
Out-of-cache conflict P1 sends GETS request Directory forwards to P0 P0 detects a (possible)
conflict P0 sends NACK
P0
M (--) [old]M (-W) [new]
Msticky@P0 [new]
I (--) [none]
1
GETSFwd_GETS
Conflict!
NACK
1
Conflict Resolution and Deadlock Avoidance
Options Wait – risk deadlock? Abort – risk livelock?
Current Behavior Wait Abort if waiting on a logically younger process
Future Behavior? Software contention manager
Evaluation
32 SPARC processors Solaris 9 OS SIMICS – full system simulator
Magic no-ops Tests
Micro-benchmarks SPLASH suite
Microbenchmarks
High Contention / Short Transactions
Comparing: EXP - TTS locks with exponential backoff MCS – SW Queue based locks
BEGIN_TRANSACTION();
new_total = total.count + 1; private_data[id].count++; total.count = new_total;
COMMIT_TRANSACTION();
0
10
20
30
40
50
60
70
80
90
0 5 10 15 20 25 30 35
Threads (on 32 Processors)
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
EXPMCSLogTM
SPLASH2 Benchmark Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
4.18
2.68
SPLASH2 Benchmark Results
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion
Tim
e (
in m
illion
s o
f cycle
s)
4.18
2.68
Data presented as:PARMACS locks execution time
LogTM execution time Modified version:
LogTM execution time
PARMACS locks execution time
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Execu
tion T
ime (
in m
illio
ns
of
cycl
es) 4.18
2.68
0
10
20
30
40
50
60
70
80
90
100
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Sp
eed
up
62.7%
24.6%
10.9%
18.6% 18.3%
4.3%
76.1%
0
10
20
30
40
50
60
70
80
90
100
OCEAN RADIOSITY CHOLESKY RT-OPT RT-BASE BARNES WATER
Benchmark
Sp
eed
up
62.7%
24.6%
10.9%
18.6% 18.3%
4.3%
76.1%
SPLASH2 Benchmark Results
Conclusions
Optimize commits Aborts handled by software Stall to avoid wasting work Allow sticky data because overflow is rare Good performance on microbenchmark False sharing has a big impacts on LogTM