University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64...

University of Houston

Extending Global Optimizations in the OpenUH

Compiler for OpenMP

Open64 Workshop, CGO ‘08


Goals

• Exploit the compiler analysis and optimizations for OpenMP programs

• Enable high level optimizations by taking OpenMP semantics into consideration

• Build a general framework for OpenMP compiler optimizations

2


OpenUH Compiler based on Open64

IPA(Inter Procedural Analyzer)

Source code w/ OpenMP directives

Source code with runtime library calls

Linking

CG(code for IA-32, IA-64, Opteron)

WOPT(global scalar optimizer)

Object files

LOWER_MP(Transformation of OpenMP )

A NativeCompiler

A NativeCompiler

ExecutablesExecutables

A Portable OpenMPRuntime library

A Portable OpenMPRuntime library

FRONTENDS(C/C++, Fortran 90, OpenMP)

Op

en64

Co

mp

iler

in

fras

tru

ctu

re LNO(Loop Nest Optimizer)

OMP_PRELOWER(Preprocess OpenMP )

WHIRL2C & WHIRL2F(IR-to-source for none-Itanium )


Motivation

Compiler flags

-O3 -O3 –mp3

PRE-example

7.42 46.8

NAS FT 18.45 26.17

NAS UA 130.31 220.15

Why different performance?


A PRE Example


A PRE Example

copy propagation

no copy propagation!


Parallel Data Flow Analysis

• Compilers need to further optimize OpenMP codes

• Most current OpenMP compilers perform optimizations after OpenMP constructs have been lowered to threaded codes– Have to restrict the traditional optimizations inside an

OpenMP construct, not crossing synchronizations• Need to enable global optimizations

– Missed opportunity to perform high-level OpenMP optimizations

• Such as barrier elimination


Solution Method

• Based on the OpenMP Memory Model– Relaxed Consistency– Flush is the key operation!

• Design a Parallel Control Flow Graph to represent a OpenMP program


Barrier

a=1; b=1;

Flush(a,b) Flush(a,b)

Else…

a=0; b=0;#pragma omp parallel sections{ #pragma omp section { a=1; #pragma omp flush(a,b) IF (b == 0){ Critical1;

a:= 0;#pragma omp flush(a) }ELSE else1;

#pragma omp section { b=1; #pragma omp flush(a,b) IF (a == 0){ Critical2; b= 0; #pragma omp flush(b) }ELSE else2; }}

A: an OpenMP section example

B: The corresponding PCFG

Super node: Composite node:

Basic Node:

Parallel edge:

Sequential edge:

Entry

Conflict edge:

If (a ==0)

Flush(b)

b=0Else…

If (b ==0)

Flush(a)

a=0


CFGCFG

HSSAHSSA

IVRIVR

CPDCECP

DCE

EmitEmit

Input WHIRL tree

Output WHIRL tree

-Construct CFG-Control Flow Analyses-Flow Free Alias Analysis

-Construct HSSA representation-Points-to and Pointer Alias Analysis-Create CODEMAP representation

-PREOPT SSA-based optimizations

“Flow free copy propagation”

-Emit new WHIRL from optimized CFG/SSA

PCFG

HSSA

IVRIVR

CPDCECP

DCE

EmitEmit

Input WHIRL tree

Output WHIRL tree

-Construct CFG-Control Flow Analyses-Parallel Control Flow Analysis-Flow Free Alias Analysis

-Construct HSSA representation-Phi insertion for conflict edges-Points-to and Pointer Alias Analysis-Create CODEMAP representation

-SSA-based optimizations

“Flow free copy propagation”

-Emit new WHIRL from optimized CFG/SSA

SSAPRE -Perform PRE on OpenMP code


Conclusion

• Implementing in the OpenUH compiler

• Improve the scalability of OpenMP programs

• A framework for conducting more aggressive optimizations for Cluster OpenMP

• Can be used in conjunction with data race detection tools

University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64...

Documents

Transcript of University of Houston Extending Global Optimizations in the OpenUH Compiler for OpenMP Open64...