On the Performance of Window-Based Contention Managers for Transactional Memory
description
Transcript of On the Performance of Window-Based Contention Managers for Transactional Memory
![Page 1: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/1.jpg)
On the Performance of Window-Based Contention Managers for Transactional
Memory
Gokarna Sharma and Costas BuschLouisiana State University
![Page 2: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/2.jpg)
Agenda• Introduction and Motivation
• Previous Studies and Limitations
• Execution Window Model Theoretical Results
Experimental Results
• Conclusions and Future Directions
![Page 3: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/3.jpg)
Retrospective• 1993
A seminal paper by Maurice Herlihy and J. Eliot B. Moss: “Transactional Memory: Architectural Support for Lock-Free Data Structures”
• Today Several STM/HTM implementation efforts by Intel, Sun, IBM;
growing attention
• Why TM? Many drawbacks of traditional approaches using Locks,
Monitors: error-prone, difficult, composability, …
lock datamodify/use dataunlock data
Lock: only one thread can execute TM: many threads can executeatomic {modify/use data}
![Page 4: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/4.jpg)
Transactional Memory• Transactions perform a sequence of read and write operations
on shared resources and appear to execute atomically• TM may allow transactions to run concurrently but the results
must be equivalent to some sequential execution
Example:
• ACI(D) properties to ensure correctness
Initially, x == 1, y == 2atomic { x = 2; y = x+1; }
atomic { r1 = x; r2 = y; }
T1 T2
T1 then T2 r1==2, r2==3T2 then T1 r1==1, r2==2
x = 2;y = 3;
T1 r1 == 1 r2 = 3;
T2
Incorrect r1 == 1, r2 == 3
![Page 5: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/5.jpg)
Software TM SystemsConflicts:
A contention manager decides Aborts or delay a transaction
Centralized or Distributed: Each thread may have its own CM
Example:
atomic { … x = 2; }
atomic { y = 2; … x = 3; }
T1 T2
Initially, x == 1, y == 1
conflict
Abort undo changes (set x==1) and restart
atomic { … x = 2; }
atomic { y = 2; … x = 3; }
T1 T2conflict
Abort (set y==1) and restart OR wait and retry
![Page 6: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/6.jpg)
Transaction SchedulingThe most common model:
m concurrent transactions on m cores that share s objects Sequence of operations and a operation takes one time unit Duration is fixed
Throughput Guarantees: Makespan: the time needed to commit all m transactions Makespan of my CM
Makespan of optimal CM
Problem Complexity: NP-Hard (related to vertex coloring)
Challenge: How to schedule transactions so that makespan is
minimized?
1
234
5
67
8
Competitive Ratio:
![Page 7: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/7.jpg)
Literature• Lots of proposals
Polka, Priority, Karma, SizeMatters, …
• Drawbacks Some need globally shared data (i.e., global clock) Workload dependent Many have no theoretical provable properties
i.e., Polka – but overall good empirical performance
• Mostly empirical evaluation using different benchmarks Choice of a contention manager significantly affects the
performance Do not perform well in the worst-case (i.e., contention,
system size, and number of threads increase)
![Page 8: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/8.jpg)
Literature on Theoretical BoundsGuerraoui et al. [PODC’05]: First contention manager GREEDY with O(s2) competitive bound
Attiya et al. [PODC’06]: Bound of GREEDY improved to O(s)
Schneider and Wattenhofer [ISAAC’09]: RandomizedRounds with O(C . log m) (C is the maximum degree of a transaction in the conflict graph)
Attiya et al. [OPODIS’09]: Bimodal scheduler with O(s) bound for read-dominated workloads
Sharma and Busch [OPODIS’10]:Two algorithms with O() and O() bounds for balanced workloads
![Page 9: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/9.jpg)
Objectives
Scalable transactional memory scheduling:
Design contention managers that exhibit both good theoretical and empirical performance guarantees
Design contention managers that scale well with the system size and complexity
![Page 10: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/10.jpg)
1 2 3 n
n
m
1 2
3
m
Transactions. . .
Threads
Execution Window Model• Collection of n sets of m concurrent transactions that
share s objects
. . .Assuming maximum
degree in conflict graph C and execution time duration τ
Serialization upper bound: τ . min(Cn,mn)One-shot bound: O(sn) [Attiya et al., PODC’06]Using RandomizedRounds: O(τ . Cn log m)
![Page 11: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/11.jpg)
Theoretical Results• Offline Algorithm: (maximal independent sets)
For scheduling with conflicts environments, i.e., traffic intersection control, dining philosophers problem
Makespan: O(τ. (C + n log (mn)), (C is the conflict measure) Competitive ratio: O(s + log (mn)) whp
• Online Algorithm: (random priorities) For online scheduling environments Makespan: O(τ. (C log (mn) + n log2 (mn))) Competitive ratio: O(s log (mn) + log2 (mn))) whp
• Adaptive Algorithm Conflict graph and maximum degree C both not known Adaptively guesses C starting from 1
![Page 12: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/12.jpg)
Intuition (1)• Introduce random delays at the beginning of the
execution window
1 2 3 n
n
m
1
2 3
m
Transactions . . .
n
n’
Random interval
1 2 3 n
m
• Random delays help conflicting transactions shift avoiding many conflicts
![Page 13: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/13.jpg)
Intuition (2)• Frame based execution to handle conflicts
m
Frame size
q1
q2
q3
q4
F11 F12 F1nF21
F31
F41
Fm1
F3n
Thre
ads
Makespan: max {qi} + No of frames X frame size
![Page 14: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/14.jpg)
Experimental Results (1)• Platform used
Intel i7 (4-core processor) with 8GB RAM and hyperthreading on
• Implemented window algorithms in DSTM2, an eager conflict management STM implementation
• Benchmarks used List, RBTree, SkipList, and Vacation from STAMP suite.
• Experiments were run for 10 seconds and the data plotted are average of 6 experiments
• Contention managers used for comparison Polka – Published best CM but no theoretical provable properties Greedy – First CM with both theoretical and empirical properties Priority – Simple priority-based CM
![Page 15: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/15.jpg)
Experimental Results (2)Performance throughput:
No of txns committed per second Measures the useful work done by a CM each time step
0 5 10 15 20 25 30 350
2000
4000
6000
8000
10000
12000
14000
16000
18000
List Benchmark
Polka Greedy Priority Online Adaptive
No of threads
Com
mitt
ed tr
ansa
ction
s/se
c
0 5 10 15 20 25 30 350
2000
4000
6000
8000
10000
12000
14000
16000
18000
20000
SkipList Benchmark
Polka Greedy Priority Online Adaptive
No of threads
Com
mitt
ed tr
ansa
ction
s/se
c
![Page 16: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/16.jpg)
Experimental Results (3)
0 5 10 15 20 25 30 350
2000
4000
6000
8000
10000
12000
14000
RBTree Benchmark
Polka Greedy Priority Online Adaptive
No of threads
Com
mitt
ed tr
ansa
cions
/sec
0 5 10 15 20 25 30 350
2000
4000
6000
8000
10000
12000
14000
16000
18000
Vacation Benchmark
Polka Greedy Priority Online Adaptive
No of threadsCo
mm
itted
tran
sacti
ons/
sec
Performance throughput:
Conclusion #1: Window CMs always improve throughput over Greedy and PriorityConclusion #2: Throughput is comparable to Polka (outperforms in Vacation)
![Page 17: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/17.jpg)
Experimental Results (4)Aborts per commit ratio:
No of txns aborted per txn commit Measures efficiency of a CM in utilizing computing
resources
0 5 10 15 20 25 30 350
2
4
6
8
10
12
14
16
18
20
List Benchmark
Polka Greedy Priority Online Adaptive
No of threads
No
of a
bort
s/co
mm
it
0 5 10 15 20 25 30 350
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
SkipList Benchmark
Polka Greedy Priority Online Adaptive
No of threads
No
of a
bort
s/co
mm
it
![Page 18: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/18.jpg)
Experimental Results (5)Aborts per commit ratio:
0 5 10 15 20 25 30 350
1
2
3
4
5
6
7
8
9
Vacation Benchmark
Polka Greedy Priority Online Adaptive
No of threads
No
of a
bort
s/co
mm
it0 5 10 15 20 25 30 35
0
2
4
6
8
10
12
14
16
18
20
RBTree Benchmark
Polka Greedy Priority Online Adaptive
No of threads
No
of a
bort
s/co
mm
it
Conclusion #3: Window CMs always reduce no of aborts over Greedy and PriorityConclusion #4: No of aborts are comparable to Polka (outperform in Vacation)
![Page 19: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/19.jpg)
Experimental Results (6)Execution time overhead:
Total time needed to commit all transactions Measures scalability of a CM in different contention
scenarios
Low Medium High0
5
10
15
20
25
List Benchmark
Polka Greedy Priority Online Adaptive
Amount of contention
Tota
l exe
cutio
n tim
e (in
seco
nds)
Low Medium High0
0.5
1
1.5
2
2.5
SkipList Benchmark
Polka Greedy Priority Online Adaptive
Amount of contention
Tota
l exe
cutio
n tim
e (in
seco
nds)
![Page 20: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/20.jpg)
Experimental Results (7)Execution time overhead:
Low Medium High0
2
4
6
8
10
12
14
16
18
20
RBTree Benchmark
Polka Greedy Priority Online Adaptive
Amount of contention
Tota
l exe
cutio
n tim
e (in
seco
nds)
Low Medium High0
1
2
3
4
5
6
7
8
Vacation Benchmark
Polka Greedy Priority Online Adaptive
Amount of contention
Tota
l exe
cutio
n tim
e (in
seco
nds)
Conclusion #5: Window CMs generally reduce execution time over Greedy and Priority (except SkipList) Conclusion #6: Window CMs good at high contention due to randomization overhead
![Page 21: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/21.jpg)
Future Directions• Encouraging theoretical and practical results
• Plan to explore (experimental) Wasted Work Repeat Conflicts Average Response Time Average committed transactions durations
• Plan to do experiments using more complex benchmarks E.g., STAMP, STMBench7, and other STM implementations
• Plan to explore (theoretical) Other contention managers with both theoretical and
empirical guarantees
![Page 22: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/22.jpg)
Conclusions• TM contention management is an important online
scheduling problem
• Contention managers should scale with the size and complexity of the system
• Theoretical as well as practical performance guarantees are essential for design decisions
• Need to explore mechanisms that scale well in other multi-core architectures: ccNUMA and hierarchical multilevel cache architectures Large scale distributed systems
![Page 23: On the Performance of Window-Based Contention Managers for Transactional Memory](https://reader036.fdocuments.net/reader036/viewer/2022062520/56816374550346895dd44ffe/html5/thumbnails/23.jpg)
Thank you for your attention!!!