OpenMP in a H eterogeneous W orld
description
Transcript of OpenMP in a H eterogeneous W orld
OpenMP in a Heterogeneous World
Ayodunni AribukiAdvisor: Dr. Barbara Chapman
HPCTools GroupUniversity of Houston
2
Top 10 Supercomputers (June 2011)
3
Why OpenMP• Shared memory parallel programming model
– Extends C, C++. Fortran• Directives-based
– Single code for sequential and parallel version• Incremental parallelism
– Little code modification• High-level
– Leave multithreading details to compiler and runtime• Widely supported by major compilers
– Open64, Intel, GNU, IBM, Microsoft, …– Portable
www.openmp.org
4
OpenMP Example
#pragma omp parallel{ int i;#pragma omp for for(i=0;i<100;i++){ //do stuff } //do more stuff}
0-24 25-49
50-74
75-99
Implicit barrierMore
stuff
More
stuff
More
stuff
More
stuff
Fork
Join
5
Present/Future Architectures & Challenges they pose
Node 0
Memory
Node 1
Node 2 Node 3
Memory
Memory Memory
accelerator
Memory
…
Many more CPUS
Location
Heterogeneity
Scalability
Node 0
Memory
Node 1
Node 2 Node 3
Memory
Memory Memory
6
Heterogeneous Embedded Platform
7
Heterogeneous High-Performance Systems
Each node has multiple CPU cores, and some of the nodes are equipped with additional computational accelerators, such as
GPUs.www.olcf.ornl.gov/wp-content/uploads/.../Exascale-ASCR-Analysis.pdf
8
• Must map data/computations to specific devices
• Usually involves substantial rewrite of code• Verbose code– Move data to/from device x– Launch kernel on device– Wait until y is ready/done
• Portability becomes an issue– Multiple versions of same code– Hard to maintain
Programming Heterogeneous Multicore:Issues
Always hardware-specific!
9
Programming Models? Today’s Scenario
// Run one OpenMP thread per device per MPI node #pragma omp parallel num_threads(devCount) if (initDevice()) {
// Block and grid dimensions dim3 dimBlock(12,12);kernel<<<1,dimBlock>>>(); cudaThreadExit();
} else {
printf("Device error on %s\n",processor_name);}
MPI_Finalize(); return 0;
}
www.cse.buffalo.edu/faculty/miller/Courses/CSE710/heavner.pdf
10
OpenMP in the Heterogeneous World• All threads are equal– No vocabulary for heterogeneity, separate device
• All threads must have access to the memory– Distributed memories common in embedded systems– Memories may not be coherent
• Implementations rely on OS and threading libraries– Memory allocation, synchronization e.g. Linux,
Pthreads
11
Extending OpenMP Example
#pragma omp parallel for target(dsp) for(j=0;i<m;i++) for (i=0;i<n,i++) c(i,j)=a(i,j)+b(i,j)
Main Memor
y
Application data
General Purpose
Processor Cores
HWA
Application data
Device cores
Upload remote
data
Download remote
data
Remote Procedure
call
12
Heterogeneous OpenMP Solution Stack
OpenMP Application
Directives, Compiler
OpenMP library
Environment
variables
Runtime library
OS/system support for shared memory
OpenMP Parallel Computing Solution Stack
Use
r la
ye r
Prog
. la
yer
Ope
nMP
AP
I
Syst
em
laye
r
Core 1 Core 2 Core n…MCAPI, MRAPI, MTAPI
• Language extensions
• Efficient code generation
12
• Target Portable Runtime Interface
13
Summarizing My Research• OpenMP on heterogeneous architectures– Expressing heterogeneity– Generating efficient code for GPUs/DSPs• Managing memories
– Distributed– Explicitly managed
– Enabling portable implementations
14
Backup
15
MCA: Generic Multicore Programming
• Solve portability issue in embedded multicore programming
• Defining and promoting open specifications for– Communication - MCAPI– Resource Management - MRAPI– Task Management - MTAPI
(www.multicore-association.org)
16
Heterogeneous Platform: CPU + Nvidia GPU