1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False...
-
Upload
sabina-suzanna-reeves -
Category
Documents
-
view
217 -
download
0
Transcript of 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False...
![Page 1: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/1.jpg)
1 UNIVERSITY OF MASSACHUSETTS, AMHERST • School of Computer Science
PREDATOR: Predictive False Sharing Detection
Tongping Liu*, Chen Tian, Ziang Hu, Emery Berger*
*University of Massachusetts AmherstHuawei US Research Center
![Page 2: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/2.jpg)
2
Parallelism: Expectation is Awesome
Ru
nti
me (
s)
1 2 4 80
10
20
30
40
50
60
70
80
90
Number of threads
Expectation
Parallel Program
int count[8];int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++;}
int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i);}
![Page 3: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/3.jpg)
3
1 2 4 80
20
40
60
80
100
120
140
Number of threads
False sharing slows the program by 13X
Ru
nti
me (
s)
Parallel Program Expectation
Reality
Parallelism: Reality is Awful
int count[8];int W; void increment(int S) { for(in=S; in<S+W; in++) for(j=0; j<1M; j++) count[in]++;}
int main(int THREADS) { W=8/THREADS; for(i=0; i<8; i+=W) spawn(increment,i);}
False sharing
![Page 4: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/4.jpg)
4
False Sharing in Real Applications
False sharing slows MySQL by 50%
![Page 5: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/5.jpg)
5
Cache Line
False Sharing vs. True Sharing
![Page 6: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/6.jpg)
6
Task 3Task 1
Task 2 Task 4
False Sharing
Task 1
TrueSharing
Task 2
False Sharing vs. True Sharing
![Page 7: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/7.jpg)
7
Resource Contention at Cache Line Level
![Page 8: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/8.jpg)
8
Thread 1
Main Memory
Core 1
Thread 2
Core 2
Cache
Cache
Invalidate
Cache line: basic unit of data transfer
False Sharing Causes Performance Problems
![Page 9: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/9.jpg)
9
Thread 1 Thread 2
Cache
Cache
Invalidate
Interleaved accesses cause cache invalidations
Main Memory
Core 1 Core 2
False Sharing Causes Performance Problems
![Page 10: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/10.jpg)
10
me = 1;you = 1; // globals
me = new Foo;you = new Bar; // heap
class X { int me; int you;}; // fields
array[me] = 12;array[you] = 13; // array indices
False Sharing is Everywhere
![Page 11: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/11.jpg)
11
False Sharing is Hard to Diagnose
Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)
![Page 12: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/12.jpg)
12
Problems of Existing Tools
• No precise information/false positives–WIBA’09, VEE’11, EuroSys’13, SC’13
• Accurate & Precise– OOPSLA’11 ( Cannot detect read-write
FS)Shared problem: only detect observed false sharing
![Page 13: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/13.jpg)
13
Task 1
Task 2
Cache
Cache
Invalidate
Main Memory
Core 1 Core 2
False Sharing Causes Performance Problems
Find cache lines with many cache invalidations
Interleaved accesses
Cache invalidations
Performance problems
Detect false sharing causing performance problems
![Page 14: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/14.jpg)
14
Find Lines with Many Invalidations
. . . . . . .
……
Track cache invalidations on each cache line
Memory: Global, Heap
![Page 15: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/15.jpg)
15
Track Cache Invalidations
• Hardware-based approach– Needs hardware
support– No portability
• Simulation-based approach– Needs hardware info
such as cache hierarchy, cache capacity
– Very slow
• Conservative Assumptions– Each thread runs on a
different core with its private cache.
– Infinite cache capacity.
PREDATOR: based on memory access history of each cache line
![Page 16: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/16.jpg)
16
Track Cache Invalidations
r w r ww r w r
T1 T2
0
# of invalidations
12
Time
30 0 0 0T2 r T1 rT2 w
Each Entry: { Thread ID, Access Type}
T2 w 0 0T1 wT2 w 0 0T1 r
![Page 17: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/17.jpg)
17
PREDATOR Components
Compiler Instrumenta
tion
Runtime System
Instruments every memory read/write
access
Collects memory accesses and reports
false sharing
![Page 18: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/18.jpg)
18
Detect Problems Correctly & Precisely
• Correctly: –No false
alarms
Task 3Task 1
Task 2 Task 4
False Sharing
Task 1
TrueSharing
Task 2
Track memory accesses on each word
• Precisely– Global variables–Heap objects: pinpoint the line of memory
allocation
![Page 19: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/19.jpg)
19
PREDATOR’s Report
![Page 20: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/20.jpg)
20
Why do we need prediction?
![Page 21: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/21.jpg)
21
Necessity of False Sharing Prediction
Thread 1 Thread 2
Cache line 1 Cache line 2
Cache line 1 Cache line 2
False Sharin
g
Cache line 1
False Sharin
g
![Page 22: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/22.jpg)
22
Properties Affecting False Sharing Occurrence
32-bit platform 64-bit platformDifferent memory allocatorDifferent compiler or optimizationDifferent allocation order by changing the
code, e.g., printf
• Change of memory layout
• Run on hardware with different cache line size
![Page 23: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/23.jpg)
23
Example of False Sharing Sensitivity
Offset = 0
Offset = 8
Offset = 56
……
Memory
Colors represent threads
Cache line size = 64 bytes
![Page 24: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/24.jpg)
24
Offse
t=0
Offse
t=8
Offse
t=16
Offse
t=24
Offse
t=32
Offse
t=40
Offse
t=48
Offse
t=56
0
1
2
3
4
5
6
Ru
nti
me (
Secon
ds)
PREDATOR predicts false sharing
problems without occurrence
Example of False Sharing Sensitivity
![Page 25: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/25.jpg)
25
Prediction Based on Virtual Cache Lines
Thread 1 Thread 2
Cache line 1 Cache line 2
Virtual cache line 1 Virtual cache line 2
False Sharin
g
Virtual cache line 1
False Sharin
g
Real case
Prediction 1
Prediction 2
![Page 26: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/26.jpg)
26
d YX
(sz-d)/2 (sz-d)/2
Tracked virtual line
Non-tracked virtual lines
Track Invalidations on Virtual Cache Lines
d < the cache line size - sz(X, Y) from different threads && one of
them is write
![Page 27: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/27.jpg)
27
Benchmark Results
Benchmarks Unknown Problem
Without Prediction
With Prediction Improvement
Histogram ✔ ✔ ✔ 46%
Linear_regression ✔ 1207%
Reverse_index ✔ ✔ 0.09%
Word_count ✔ ✔ 0.14%
Streamcluster-1 ✔ ✔ ✔ 4.77%
Streamcluster-2 ✔ ✔ 7.52%
![Page 28: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/28.jpg)
28
Real Applications Results
• MySQL– Problem: False sharing occurs when
different threads update the shared bitmap simultaneously.
– Performance improves 180% after fixes.
• Boost library:– Problem: “there will be 16 spinlocks per
cache line”– Performance improves about 100%.
![Page 29: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/29.jpg)
29
Performance Overhead of PREDATOR
Phoenix
histogra
m
kmea
ns
linear_
regressi
on
matrix_
multiply pca
revers
e_index
string_m
atch
word_c
ount
PARSEC
blacks
choles
bodytrac
k
dedup
ferret
fluidanimate
strea
mcluste
r
swap
tions x2
64
RealApplica
tionsag
etBoost
Mem
cach
ed
MyS
QL
pbzip2
pfscan
AVERAGE
0
3
6
9
12
15
Execution Time Overhead
Original
PREDATOR-NP
PREDATOR
Nor
mal
ized
Runti
me
2326
5.6X
![Page 30: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/30.jpg)
30
Compiler Instrumenta
tionRuntime System
Thread 1
Thread 2
Cache
Cache
Invalidate
Main Memory
Core 1 Core 2
Precise report
Thread 1 Thread 2
Cache line 1 Cache line 2
Virtual cache line 1Virtual cache line 2
False Sharin
g
Virtual cache line 1
False Sharin
g
Real case
Prediction 1
Prediction 2
![Page 31: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/31.jpg)
31
![Page 32: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/32.jpg)
32
False Sharing is Hard to Diagnose
Multiple experts worked together to diagnose MySQL scalability issue (1.5M LOC)
![Page 33: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/33.jpg)
33
Detailed Prediction Algorithm
1. Find suspected cache lines
![Page 34: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/34.jpg)
34
Detailed Prediction Algorithm
1. Find suspected cache lines
2. Track detailed memory accesses
![Page 35: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/35.jpg)
35
Detailed Prediction Algorithm
1. Find suspected cache lines
2. Track detailed memory accesses
3. Predict based on hot accesses
YX
d d < sz && (X, Y) from different
threads, potential false sharing
![Page 36: 1 U NIVERSITY OF M ASSACHUSETTS, A MHERST School of Computer Science P REDATOR : Predictive False Sharing Detection Tongping Liu*, Chen Tian, Ziang Hu,](https://reader030.fdocuments.net/reader030/viewer/2022032722/56649cf05503460f949bee54/html5/thumbnails/36.jpg)
36
4: Tracking Cache Invalidations on the Virtual Line
d YX
(sz-d)/2 (sz-d)/2
Tracked virtual line
Non-tracked virtual lines