A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr....
-
Upload
gary-preston -
Category
Documents
-
view
213 -
download
0
Transcript of A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr....
A Compiler-Based Tool for Array Analysis in HPC Applications
Presenter: Ahmad Qawasmeh
Advisor: Dr. Barbara Chapman
2013 PhD Showcase Event
2
Motivation 1.
Related Work2.
Array Analysis Techniques3.
Array Analysis Module in OpenUH4.
Our Integrated System 5.
Outline
Motivation
4
BB Reduce Data movement
AA Identify and fix inefficiencies in defining arrays
DD Enhance analyzing code
CC Identify auto-parallelization opportunities
Parallelization/Reduce Data Movement
sdfs
Host
Main Memory
Application data sdfs
GPU
GPU Memory
Application data
Host coresGPU cores
A[lb:ub]
5
!$acc region copyin(A(1:100,1:100))
Access Density/Array Region
5 10 15 20
5
10
15
20
25
DEF
USE
USE
USE
start Declare char A[20]for i = 0 to 19 A[i] = ………….……….for i = 0 to 10 … = A[i]for i = 10 to 15 … = A[i]……….……….for i = 10 to 15 … = A[i]……….……….for i = 15 to 17 … = A[i]
end
4 times
at diff positions
Access Density
Region
6
Related Work
BB Par4All compiler tackles data transfer management between host and accelerator using array regions analysis.
AA PGI accelerator compiler applies array region analysis to reduce memory transfers
DD
CC CAPO depends on interprocedural data dependence info to insert compiler directives to facilitate parallelism
EE Dragon was previously developed with some limitations
HPM toolkit, PAPI, and OProfile provide facilities to instrument programs, record HWC data, and analyze results.
FF Array Regrouping was targeted.7
Array Access Analysis Techniques
8
BB Importance for optimizations in parallel compiler
AA What is Array Region Analysis?
CC It is usually impractical to simply list elements referenced
Array Access Analysis Techniques
Methods in term of efficiency and precision:
Triplet-based(RS)
Linear-based (Region)
Reference-based(Atom)
Precision
Efficiency
Classic
9
Our Integrated System
HPCApplication
ARA Module
HL-Whirl-Tree
DragonArray Analysis
GraphLowering .rgn file
OpenUH IPA Phase Extension
10
Conclusion
15
BB We show that this information can be critical and crucial for a better parallelization, cache and memory utilization.
AA We unfold an interactive tool to find the hotspot portions of interprocedural arrays in HPC applications.
CC Reduce data transfers by exploiting the sub-array offloading functionality supported by D-B GPU programming models.
DD Our tool has been tested on some HPC benchmarks.
Future Work
16
BB Extend our array analysis tool to support the analysisand visualization of remote array accesses in PGAS context
AA Combine Array Analysis and Data Dependency modules in OpenUH to enhance memory and cache utilization
CC Enrich our tool’s features by supporting high performance 3D visualization via Qt OpenGL module
Bibliography
[1] P. Group. (2008) Pgi compilers, gpus and you! pgi presentation sc08.pdf. [Online].
Available: http://www.pgroup.com/lit/presentations/
[2] M. Amini, F. Coelho, F. Irigoin, and R. Keryell, “Static compilation analysis for host-
accelerator communication optimization,” in The 24th International Workshop on
Languages and Compilers for Parallel Computing, Fort Collins, Colorado, Sep. 2011.
[3] (2001) Code parallelization with capo – a user manual. [Online]. Available:
http://people.nas.nasa.gov/hjin/CAPO/nas-01-008-abstract.html
[4] (2008) Hardware performance monitor(hpm) toolkit users guide. [Online]. Available:
https://wiki.alcf.anl.gov/images/5/59/HPM ug.pdf
[5] P. J. Mucci, S. Browne, C. Deane, and G. Ho. (1999, Sep.) Papi: A portable interface
to hardware performance counters. dodugc99-papi.pdf. [Online]. Available:
http://web.eecs.utk.edu/ mucci/latest/pubs/
17
Bibliography
[6] W. E. Cohen. (2004) Tuning programs with oprofile. Oprofile.pdf. [Online]. Available:
http://people.redhat.com/wcohen/
[7] O. Hernandez, C. Liao, and B. Chapman, “Dragon: A static and dynamic tool for
openmp,” in In Workshop on OpenMP Applications and Tools (WOMPAT 2004), 2005,
pp. 53–66.
[8] A. Qawasmeh, B. Chapman, and A. Banerjee, “A Compiler-Based Tool for Array
Analysis in HPC Applications,” In Proceedings of the 41st International Conference
on Parallel Computing Workshops, Pittsburgh, PA, USA, Sep. 2012, pp. 454–463.
[9] X. Shen, Y. Gao, C. Ding, and R. Archambault, “Lightweight reference affinity
analysis,” in In Proceedings of the 19th ACM International Conference on
Supercomputing, Boston, MA, USA, Jun. 2005, pp. 131–140.
[10] (2012) High Performance Computing and Tools Research Group. [Online]. Available:
http://www2.cs.uh.edu/~hpctools/
18