A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr....

19
A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr. Barbara Chapman 2013 PhD Showcase Event

Transcript of A Compiler-Based Tool for Array Analysis in HPC Applications Presenter: Ahmad Qawasmeh Advisor: Dr....

A Compiler-Based Tool for Array Analysis in HPC Applications

Presenter: Ahmad Qawasmeh

Advisor: Dr. Barbara Chapman

2013 PhD Showcase Event

2

Motivation 1.

Related Work2.

Array Analysis Techniques3.

Array Analysis Module in OpenUH4.

Our Integrated System 5.

Outline

3

6. Dragon Tool

7. Conclusion

8. Future work

Outline

Motivation

4

BB Reduce Data movement

AA Identify and fix inefficiencies in defining arrays

DD Enhance analyzing code

CC Identify auto-parallelization opportunities

Parallelization/Reduce Data Movement

sdfs

Host

Main Memory

Application data sdfs

GPU

GPU Memory

Application data

Host coresGPU cores

A[lb:ub]

5

!$acc region copyin(A(1:100,1:100))

Access Density/Array Region

5 10 15 20

5

10

15

20

25

DEF

USE

USE

USE

start Declare char A[20]for i = 0 to 19 A[i] = ………….……….for i = 0 to 10 … = A[i]for i = 10 to 15 … = A[i]……….……….for i = 10 to 15 … = A[i]……….……….for i = 15 to 17 … = A[i]

end

4 times

at diff positions

Access Density

Region

6

Related Work

BB Par4All compiler tackles data transfer management between host and accelerator using array regions analysis.

AA PGI accelerator compiler applies array region analysis to reduce memory transfers

DD

CC CAPO depends on interprocedural data dependence info to insert compiler directives to facilitate parallelism

EE Dragon was previously developed with some limitations

HPM toolkit, PAPI, and OProfile provide facilities to instrument programs, record HWC data, and analyze results.

FF Array Regrouping was targeted.7

Array Access Analysis Techniques

8

BB Importance for optimizations in parallel compiler

AA What is Array Region Analysis?

CC It is usually impractical to simply list elements referenced

Array Access Analysis Techniques

Methods in term of efficiency and precision:

Triplet-based(RS)

Linear-based (Region)

Reference-based(Atom)

Precision

Efficiency

Classic

9

Our Integrated System

HPCApplication

ARA Module

HL-Whirl-Tree

DragonArray Analysis

GraphLowering .rgn file

OpenUH IPA Phase Extension

10

Dragon Array Analysis Graph

11

Dragon Call Graph for NAS LU Benchmark

12

Dragon Array Graph for NAS LU Benchmark

13

Dragon Array Graph for NAS LU Benchmark

14

Conclusion

15

BB We show that this information can be critical and crucial for a better parallelization, cache and memory utilization.

AA We unfold an interactive tool to find the hotspot portions of interprocedural arrays in HPC applications.

CC Reduce data transfers by exploiting the sub-array offloading functionality supported by D-B GPU programming models.

DD Our tool has been tested on some HPC benchmarks.

Future Work

16

BB Extend our array analysis tool to support the analysisand visualization of remote array accesses in PGAS context

AA Combine Array Analysis and Data Dependency modules in OpenUH to enhance memory and cache utilization

CC Enrich our tool’s features by supporting high performance 3D visualization via Qt OpenGL module

Bibliography

[1] P. Group. (2008) Pgi compilers, gpus and you! pgi presentation sc08.pdf. [Online].

Available: http://www.pgroup.com/lit/presentations/

[2] M. Amini, F. Coelho, F. Irigoin, and R. Keryell, “Static compilation analysis for host-

accelerator communication optimization,” in The 24th International Workshop on

Languages and Compilers for Parallel Computing, Fort Collins, Colorado, Sep. 2011.

[3] (2001) Code parallelization with capo – a user manual. [Online]. Available:

http://people.nas.nasa.gov/hjin/CAPO/nas-01-008-abstract.html

[4] (2008) Hardware performance monitor(hpm) toolkit users guide. [Online]. Available:

https://wiki.alcf.anl.gov/images/5/59/HPM ug.pdf

[5] P. J. Mucci, S. Browne, C. Deane, and G. Ho. (1999, Sep.) Papi: A portable interface

to hardware performance counters. dodugc99-papi.pdf. [Online]. Available:

http://web.eecs.utk.edu/ mucci/latest/pubs/

17

Bibliography

[6] W. E. Cohen. (2004) Tuning programs with oprofile. Oprofile.pdf. [Online]. Available:

http://people.redhat.com/wcohen/

[7] O. Hernandez, C. Liao, and B. Chapman, “Dragon: A static and dynamic tool for

openmp,” in In Workshop on OpenMP Applications and Tools (WOMPAT 2004), 2005,

pp. 53–66.

[8] A. Qawasmeh, B. Chapman, and A. Banerjee, “A Compiler-Based Tool for Array

Analysis in HPC Applications,” In Proceedings of the 41st International Conference

on Parallel Computing Workshops, Pittsburgh, PA, USA, Sep. 2012, pp. 454–463.

[9] X. Shen, Y. Gao, C. Ding, and R. Archambault, “Lightweight reference affinity

analysis,” in In Proceedings of the 19th ACM International Conference on

Supercomputing, Boston, MA, USA, Jun. 2005, pp. 131–140.

[10] (2012) High Performance Computing and Tools Research Group. [Online]. Available:

http://www2.cs.uh.edu/~hpctools/

18