Inter-Iteration Scalar Replacement in the Presence of Control-Flow
description
Transcript of Inter-Iteration Scalar Replacement in the Presence of Control-Flow
![Page 1: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/1.jpg)
Inter-Iteration Scalar Replacement
in the Presence of Control-Flow
Mihai Budiu – Microsoft Research, Silicon Valley
Seth Copen Goldstein – Carnegie Mellon University
ODES 2005
![Page 2: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/2.jpg)
2
Summary
• What: compiler optimization
• Where: dense regular matrix codes– FORTRAN – some media processing
• Goal: reduce number of memory accesses
• How: allocate array elements to registers
• New: optimal algorithm based on predication
![Page 3: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/3.jpg)
3
Outline
• Scalar Replacement
• Predicated PRE
• Combining the two
• Results
![Page 4: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/4.jpg)
4
Scalar Replacement
a[i] = a[i] + 2;
a[i] <<= 4;
tmp = a[i];
tmp += 2;
tmp <<= 4;
a[i] = tmp;
Back-end
ld a[i]arith ...st a[i]ld a[i]arith …st a[i]
ld a[i]arith …
arith …st a[i]
Front-end
![Page 5: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/5.jpg)
5
Inter-Iteration Scalar Replacement
for (i=0; i < N; i++)
a[i] += a[i+1];
ld a[0]ld a[1]st a[0]ld a[1]ld a[2]st a[1]
Runtime
tmp0 = a[0];for (i=0; i < N; i++) { tmp1 = a[1]; a[i] = tmp0 + tmp1; tmp0 = tmp1;}
i=0
i=1
ld a[0]
ld a[1]st a[0]
ld a[2]st a[1]
i=0
i=1
tmp1
![Page 6: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/6.jpg)
6
Rotating Scalars
for (i=0; i < N; i++)
a[i] += a[i+3];
Invariant: tmp0 = a[i+0]tmp1 = a[i+1]tmp2 = a[i+2]tmp3 = a[i+3]
for (…) { …. tmp0 = tmp1; tmp1 = tmp2; tmp2 = tmp3; tmp3 = a[i+4];}
Itanium has hardware support for rotating registers.
![Page 7: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/7.jpg)
7
Control-Flow
for (i=0; i < N; i++)
if (i & 1)
a[i] += a[i+3];
![Page 8: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/8.jpg)
8
Outline
• Scalar Replacement
• Predicated PRE
• Combining the two
• Results
![Page 9: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/9.jpg)
9
Availability
y
y = a[i];
...
if (x) {
...
... = a[i];
}
![Page 10: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/10.jpg)
10
Conservative Analysis
if (x) {
...
y = a[i];
}
...
... = a[i];y?
![Page 11: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/11.jpg)
11
Predicated PREflag = false;
if (x) {
...
y = a[i];
flag = true;
}
...
... = flag ? y : a[i];
Invariant: flag = true y = a[i]
![Page 12: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/12.jpg)
12
Outline
• Scalar Replacement
• Predicated PRE
• Combining the two
• Results
![Page 13: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/13.jpg)
13
Scalars and Flags
for (i=0; i < N; i++) if (i & 1) a[i] += a[i+3];
(valid0 = true) tmp0 = a[i+0] (valid1 = true) tmp1 = a[i+1] (valid2 = true) tmp2 = a[i+2] (valid3 = true) tmp3 = a[i+3]
bool scalar
Invariant:
![Page 14: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/14.jpg)
14
Scalar Replacement Algorithm
if (! validk) {
ld a[i+k] tmpk = a[i+k]; validk = true;
}Can be implemented with predication or conditional moves
st a[i+k], v tmpk = v; validk = true;
![Page 15: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/15.jpg)
15
Optimality
• No scalarized memory location isread or written two times
• The resulting program touches exactly the same memory locationsas the original program
• Proof: trivial based on valid flags invariant
[given perfect dependence analysis and enough registers]
![Page 16: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/16.jpg)
16
Additional Details
• Initialize validk to false• Rotate scalars and valid flags• Use ‘dirtyk’ flags to avoid extra stores• Postlude for missing stores:
if (validk) a[N+k] = tmpk
• Lift loop-invariant accesses(finding loop-invariant predicates)
• Hardware support
(see paper)
(for rotating registers and flags).
![Page 17: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/17.jpg)
17
Outline
• Scalar Replacement
• Predicated PRE
• Combining the two
• Results
![Page 18: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/18.jpg)
18
Redundant Stores
0
5
10
15
20
25
30a
dp
cm_
e
ad
pcm
_d
gsm
_e
gsm
_d
ep
ic_
e
ep
ic_
d
mp
eg
2_
e
mp
eg
2_
d
jpe
g_
e
jpe
g_
d
pe
gw
it_e
pe
gw
it_d
g7
21
_e
g7
21
_d
pg
p_
e
pg
p_
d
rast
a
me
sa
09
9.g
o
12
4.m
88
ksim
12
9.c
om
pre
ss
13
0.li
13
2.ij
pe
g
13
4.p
erl
14
7.v
ort
ex
18
3.e
qu
ake
18
8.a
mm
p
16
4.g
zip
17
5.v
pr
17
6.g
cc
18
1.m
cf
19
7.p
ars
er
25
4.g
ap
30
0.tw
olf
%st promo
%st PRE
53
% r
educ
tion
![Page 19: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/19.jpg)
19
Redundant Loads
0
5
10
15
20
25
30
35
40
45ad
pcm
_e
adpc
m_d
gsm
_e
gsm
_d
epic
_e
epic
_d
mpe
g2_e
mpe
g2_d
jpeg
_e
jpeg
_d
pegw
it_e
pegw
it_d
g721
_e
g721
_d
pgp_
e
pgp_
d
rast
a
mes
a
099.
go
124.
m88
ksim
129.
com
pres
s
130.
li
132.
ijpeg
134.
perl
147.
vort
ex
183.
equa
ke
188.
amm
p
164.
gzip
175.
vpr
176.
gcc
181.
mcf
197.
pars
er
254.
gap
300.
twol
f
% ld promo
% ld PRE
% r
educ
tion
![Page 20: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/20.jpg)
20
Performance Impact%
red
uctio
n ru
nnin
g tim
e
[target: Spatial Computation]
Removed accesses tend to be cache hits:small contribution to running time.
![Page 21: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/21.jpg)
21
Conclusions
• Use predicates to dynamically detect redundant memory accesses
• Simple algorithm gives “optimal” result even with un-analyzable control flow
• Can dramatically reduce memory accesses
![Page 22: Inter-Iteration Scalar Replacement in the Presence of Control-Flow](https://reader036.fdocuments.net/reader036/viewer/2022062517/56813dde550346895da7aa36/html5/thumbnails/22.jpg)
22
Related WorkCarr & Kennedy, PLDI 1990
Scalar Replacement- Arrays, no control flow -
Carr & Kennedy, SPE 1994Generalized Scalar Replacement
- Restricted control-flow -
Scholz, Europar 2003Predicated PRE
- Single iteration, no writes -
This work, ODES 2005PPRE across iterations
- Optimal -
Morel & Renvoise, CACM 1979Partial Redundancy Elimination- Not across remote iterations -
Non-speculative promotion
Speculative promotion