Post on 10-Aug-2020
PollyPolyhedral Transformations for LLVM
Tobias Grosser - Hongbin Zheng
November 4, 2010
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 1 / 19
Outline
1 The Polyhedral Model
2 Research Projects
3 Polly
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 2 / 19
The SCoP - Static Control Part
for i = 1 to (5n + 3)
for j = n to (4i + 3n + 4)
A[i-j] = A[i]
if i < (n - 20)
A[i+20] = j
Structured control flowI Regular for loopsI Conditions
Affine expressions in induction variables and parameters for:I Loop bounds, conditions, access functions
Side effect free
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 3 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 1j = 1
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 1j = 2 = m
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 1j = 3 = (i + m)
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 1j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 2j = 1
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 2j = 2
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 2j = 3
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 2j = 4 = (i + m)
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 2j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 4j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 4j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
1 in
i = 4j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
The Iteration Domain / Schedule
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
Domains
1 ≤ i ≤ n1 ≤ j ≤ i + m
1 ≤ i ≤ n
Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
i
j
m
n
i
i = 4j = n/a
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 4 / 19
Scheduling Transformation
1
1
s1/i
s3/j
m
n
s1/is2 = 1
s2 = 0
Original Schedules
s1 = is2 = 0s3 = j
s1 = is2 = 1
1
1
t
p
m
n
Transformed Schedules
p =
i +
jt = i + j
p = i + m + 1t = 2i + m + 1
j
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 5 / 19
Code Generation
1
1
i
j
m
n
i
i = 4j = n/a
Original Code
for (i = 1; i <= n; i++) {
for (j = 1; j <= i + m; j++)
A[i][j] = A[i-1][j] + A[i][j-1]
A[i][i+m+1] = A[i-1][i+m] + A[i][i+m]
}
1
1
t
p
m
n
Transformed Codeparfor (p = 1; p <= m+n+1; p++) {if (p >= m+2)
A[p-m-1][p] = A[p-m-2][p-1]
for (t = max(p+1, 2*p-m); t <= p+n; t++)
A[-p+t][p] = A[-p+t-1][p] + A[-p+t][p-1]
}
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 6 / 19
Outline
1 The Polyhedral Model
2 Research Projects
3 Polly
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 7 / 19
Source to source frameworks
Suif | Omega | LooPo | Pluto | ...More than 20 years of research
Work onI Tiling / Parallelization / PrevectorizationI CUDA / GPGPUI Coarse grain parallelism / Grid computing
Advanced algorithms / Proven performance
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 8 / 19
Speed comparison
Compile mode GCC 4.5.1 clang 2.8 ICC 11.1
-O3 1m 22.0s 1m 22.0s 0m 5.8spluto-tiled -O3 0m 6.1s 0m 5.8s 0m 2.5s
Table: Matrix multiplication on Intel i5 M 520 - double 2048 x 2048
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 9 / 19
PoCC - SC10
0
1
2
3
4
5
6
7
2mm
3mm
adiatax
bicgcorrel
covar
doitgen
gemmgem
ver
gesumm
v
gramschm
idt
jacobi-2d
lu ludcmp
seidel
Per
f. Im
p / I
CC
-pa
ralle
l
Performance Improvement - AMD Opteron 8380 (16 threads)
pluto-smartfusepocc-maxfuse
pocc-smartfuseiterative
Figure: Matrix Multiplication
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 10 / 19
Source to source frameworks
Restrictions
Limited to subset of C/C++
Require annotated C code
Only canonical code
Correct? (Integer overflow, Operator overloading, ...)
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 11 / 19
Outline
1 The Polyhedral Model
2 Research Projects
3 Polly
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 12 / 19
The Architecture
LLVM IR LLVM IRLLVM Poly
SCoP Detection& LLVM to Poly Code Generation
Vectorizer Backend
OpenMP parallel backend
OpenSCoP Import/Export
* Classical loop transformations (Blocking, Interchange, Fusion, ...)* Expose parallelism* Dead instruction elimination / Constant propagation
Transformations
LooPo / Pluto / PoCC / Manual Optimization
DependencyAnalysis
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 13 / 19
Why optimize on LLVM-IR?
Frontend independent
Fully automatic
High SCoP coverage
Integration with vectorizer/ OpenMP code generator
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 14 / 19
The SCoP - Revised
Thanks to
Scalar evolution
Loop/Region detection
LLVM canonicalization passes
SCoP - The LLVM way
Limited Scalars Any scalars
Structured control flowI Regular for loops Anything that acts like a regular for loopI Conditions
Affine expressions Expressions that calculate an affine result
Side effect free known
Memory accesses only through arrays
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 15 / 19
Valid SCoPs
do..while loop
i = 0;
do {
int b = 2 * i;
int c = b * 3 + 5 * i;
A[c] = i;
i += 2;
} while (i < N);
pointer loop
int A[1024];
void pointer_loop () {
int *B = A;
while (B < &A[1024]) {
*B = 1;
++B;
}
}
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 16 / 19
The Architecture
LLVM IR LLVM IRLLVM Poly
SCoP Detection& LLVM to Poly Code Generation
Vectorizer Backend
OpenMP parallel backend
OpenSCoP Import/Export
* Classical loop transformations (Blocking, Interchange, Fusion, ...)* Expose parallelism* Dead instruction elimination / Constant propagation
Transformations
LooPo / Pluto / PoCC / Manual Optimization
DependencyAnalysis
Usable for experiments
Planned
Under Construction
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 17 / 19
Open Topics
Multi dimensional arrays - Delinearization
Integer modulo arithmetics
Optimal type selection
Polly alone is not yet improving performance ofyour code and may even apply incorrecttransformations on the LLVM-IR.
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 18 / 19
Open Topics
Multi dimensional arrays - Delinearization
Integer modulo arithmetics
Optimal type selection
Polly alone is not yet improving performance ofyour code and may even apply incorrecttransformations on the LLVM-IR.
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 18 / 19
Thanks to Qualcomm for sponsoring this talk.
Thank you.Any Questions?
http://wiki.llvm.org/Polly
Tobias Grosser - Hongbin Zheng Polly November 4, 2010 19 / 19