University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64...
-
Upload
jasmine-russell -
Category
Documents
-
view
219 -
download
0
Transcript of University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64...
University of Houston
Extending Global Optimizations in the OpenUH
Compiler for OpenMP
Open64 Workshop, CGO ‘08
University of Houston
Goals
• Exploit the compiler analysis and optimizations for OpenMP programs
• Enable high level optimizations by taking OpenMP semantics into consideration
• Build a general framework for OpenMP compiler optimizations
2
University of Houston
OpenUH Compiler based on Open64
IPA(Inter Procedural Analyzer)
Source code w/ OpenMP directives
Source code with runtime library calls
Linking
CG(code for IA-32, IA-64, Opteron)
WOPT(global scalar optimizer)
Object files
LOWER_MP(Transformation of OpenMP )
A NativeCompiler
A NativeCompiler
ExecutablesExecutables
A Portable OpenMPRuntime library
A Portable OpenMPRuntime library
FRONTENDS(C/C++, Fortran 90, OpenMP)
Op
en64
Co
mp
iler
in
fras
tru
ctu
re LNO(Loop Nest Optimizer)
OMP_PRELOWER(Preprocess OpenMP )
WHIRL2C & WHIRL2F(IR-to-source for none-Itanium )
University of Houston
OpenUH Compiler based on Open64
IPA(Inter Procedural Analyzer)
Source code w/ OpenMP directives
Source code with runtime library calls
Linking
CG(code for IA-32, IA-64, Opteron)
WOPT(global scalar optimizer)
Object files
LOWER_MP(Transformation of OpenMP )
A NativeCompiler
A NativeCompiler
ExecutablesExecutables
A Portable OpenMPRuntime library
A Portable OpenMPRuntime library
FRONTENDS(C/C++, Fortran 90, OpenMP)
Op
en64
Co
mp
iler
in
fras
tru
ctu
re LNO(Loop Nest Optimizer)
OMP_PRELOWER(Preprocess OpenMP )
WHIRL2C & WHIRL2F(IR-to-source for none-Itanium )
University of Houston
Motivation
Compiler flags
-O3 -O3 –mp3
PRE-example
7.42 46.8
NAS FT 18.45 26.17
NAS UA 130.31 220.15
Why different performance?
University of Houston
A PRE Example
University of Houston
A PRE Example
copy propagation
no copy propagation!
University of Houston
Parallel Data Flow Analysis
• Compilers need to further optimize OpenMP codes
• Most current OpenMP compilers perform optimizations after OpenMP constructs have been lowered to threaded codes– Have to restrict the traditional optimizations inside an
OpenMP construct, not crossing synchronizations• Need to enable global optimizations
– Missed opportunity to perform high-level OpenMP optimizations
• Such as barrier elimination
University of Houston
Solution Method
• Based on the OpenMP Memory Model– Relaxed Consistency– Flush is the key operation!
• Design a Parallel Control Flow Graph to represent a OpenMP program
University of Houston
Barrier
a=1; b=1;
Flush(a,b) Flush(a,b)
Else…
a=0; b=0;#pragma omp parallel sections{ #pragma omp section { a=1; #pragma omp flush(a,b) IF (b == 0){ Critical1;
a:= 0;#pragma omp flush(a) }ELSE else1;
#pragma omp section { b=1; #pragma omp flush(a,b) IF (a == 0){ Critical2; b= 0; #pragma omp flush(b) }ELSE else2; }}
A: an OpenMP section example
B: The corresponding PCFG
Super node: Composite node:
Basic Node:
Parallel edge:
Sequential edge:
Entry
Conflict edge:
If (a ==0)
Flush(b)
b=0Else…
If (b ==0)
Flush(a)
a=0
University of Houston
CFGCFG
HSSAHSSA
IVRIVR
CPDCECP
DCE
EmitEmit
Input WHIRL tree
Output WHIRL tree
-Construct CFG-Control Flow Analyses-Flow Free Alias Analysis
-Construct HSSA representation-Points-to and Pointer Alias Analysis-Create CODEMAP representation
-PREOPT SSA-based optimizations
“Flow free copy propagation”
-Emit new WHIRL from optimized CFG/SSA
PCFG
HSSA
IVRIVR
CPDCECP
DCE
EmitEmit
Input WHIRL tree
Output WHIRL tree
-Construct CFG-Control Flow Analyses-Parallel Control Flow Analysis-Flow Free Alias Analysis
-Construct HSSA representation-Phi insertion for conflict edges-Points-to and Pointer Alias Analysis-Create CODEMAP representation
-SSA-based optimizations
“Flow free copy propagation”
-Emit new WHIRL from optimized CFG/SSA
SSAPRE -Perform PRE on OpenMP code
University of Houston
Conclusion
• Implementing in the OpenUH compiler
• Improve the scalability of OpenMP programs
• A framework for conducting more aggressive optimizations for Cluster OpenMP
• Can be used in conjunction with data race detection tools