A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...

Post on 22-Dec-2015

219 views 2 download

Tags:

Transcript of A Coarray Fortran Implementation to Support Data-Intensive Application Development Deepak Eachempati...

DISCS'12 Workshop 1

A Coarray Fortran Implementation to Support Data-Intensive Application Development

Deepak Eachempati1, Alan Richardson2, Terrence Liao3, Henri Calandra3, Barbara Chapman1

Data-Intensive Scalable Computing Systems 2012 (DISCS’12) Workshop, November 16, 2012

1 Department of Computer Science, University of Houston2 Department of Earth, Atmospheric, and Planetary Sciences, MIT

3 Total E&P

DISCS'12 Workshop 2

Industry is looking for faster and more cost-effective ways to process massiveamounts of data• more powerful hardware• more productive programming models• innovative software techniques

Oil and Gas Industry: Compute Needs

DISCS'12 Workshop 3

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 4

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 5

Coarray Model in Fortran 2008

• Derives from Co-Array Fortran (CAF)• SPMD execution model, PGAS memory model– execution entities called images– coarrays: globally-accessible, symmetric data

objects • additional intrinsic subroutines/functions for

querying process and data information• additional statements in language for

synchronization

DISCS'12 Workshop 6

Working with Distributed Data using Coarrays

… … … … ……

1

2

3

4

M

1 2 3 4 *

real:: B[M, *]

B references local BB[3,4] references local BB[3,3] references B in left

neighbor

DISCS'12 Workshop 7

Working with Distributed Data using Coarrays

… … … … ……

1

2

3

4

M

1 2 3 4 *

real:: B(10,10)[M, *]

B(2:4,2:4) references local subarray of B

B(2:4,2:4)[3,4] references local subarray of B

B(2:4,2:4)[3,3] references subarray of B in left neighbor

DISCS'12 Workshop 8

2D Halo Exchange Example with CAF

real :: a(0:R+1, 0:C+1)[pR,*]…a(R+1,1)[top(1),top(2)] = a(1,1:C)

a(0,1:C)[bottom(1),bottom(2)] = a(R,1:C)

a(1:R,0)[right(1),right(2)] = a(1:R,C)

a(1:R,C+1)[left(1),left(2)] = a(1:R,1)

sync all

DISCS'12 Workshop 9

2D Halo Exchange with MPIreal :: a(0:R+1, 0:C+1)…call mpi_isend( a(1,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_irecv( a(R+1,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_isend( a(R,1:C), C, mpi_real, & bottom(myp), TAG, ...)call mpi_irecv( a(0,1:C), C, mpi_real, & top(myp), TAG, ...)call mpi_isend( a(1:R,C), R, mpi_real, & right(myp), TAG, ...)call mpi_irecv( a(1:R,0), R, mpi_real, & left(myp), TAG, ...)call mpi_isend( a(1:R,1), R, mpi_real, & left(myp), TAG, ...)call mpi_irecv( a(C+1,1:R), R, mpi_real, & right(myp), TAG, ...)call mpi_waitall( 8, ...)

DISCS'12 Workshop 10

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 11

Implementation of CAF • OpenUH compiler

– an industry-quality, optimizing compiler based on Open64– features: dependence and data-flow analysis, interprocedural

analysis, OpenMP– backend supports multiple targets (x86_64, IA64, IA32, MIPS, PTX)

Fortran Front-Endwith coarray

support

CAFSource

Code

Coarray Translation

Phase

OpenUHCAF Runtime

Library

Loop OptimizerGlobal Optimizer

Code Gen

exec.

OpenUH Compiler

DISCS'12 Workshop 12

Runtime Support for CAF

Runtime Interface (libcaf)

1-sided Communication

PGAS Memory Allocation

Synchronization

Collectives Support (e.g. reductions)

Atomics

Portable Communication Substrate: GASNet or ARMCI

DISCS'12 Workshop 13

Comparison with other Implementations

Compiler Commercial/Free Fortran 2008 Coarray Support?

OpenUH Free Yes

G95 Partially Free, No longer supported

Missing Locks Support

Gfortran Free In progress

Rice CAF 2.0 Free Partially, but adds different features

Cray Fortran Commercial Yes

Intel Fortran Commercial Yes

DISCS'12 Workshop 14

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 15

Seismic Subsurface Imaging:Reverse Time Migration

• A source wave is emitted per shot• Reflected waves captured by array of sensors• RTM (in time domain) uses finite difference method to

numerically solve wave equation and reconstruct subsurface image (in parallel, with domain decomposition)

DISCS'12 Workshop 16

RTM Implementations

• Isotropic– simplest model – assumes reflected waves propagate at same speed

in every direction from a point– only swaps faces (8 swaps in halo exchange)

• Tilted Transverse Isotropy (TTI)– assumes waves may propagate at different speeds– swaps faces and edges (18 swaps in halo

exchange)

DISCS'12 Workshop 17

Typical Data Usage

• Generally several thousand shots– data parallel problem, where each shot can be

processed independently in parallel– each shot handles several GB of data– so, total data to analyze is in terabytes range

• Handling I/O– C I/O reads in velocity and coefficient models– Shot headers read by master and distributed– Each processor writes to a distinct file, and file is

merged in post-processing step

DISCS'12 Workshop 18

Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Forward ShotIsotropic case: up to 32% faster compared to corresponding MPI implementationTTI case: competitive performance with MPI

DISCS'12 Workshop 19

Results for CAF RTM portTotal Domain Size: 1024 x 768 x 512 (3.0 GB, per shot)Backward ShotIsotropic case: performance hit at 256 procsTTI case: lagging a bit behind MPI

DISCS'12 Workshop 20

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 21

Extending Fortran for Parallel I/O

• We are currently designing a prototype implementation for a parallel I/O language extension

• Fortran I/O was not yet extended to facilitate cooperative I/O to shared files– original Co-Array Fortran specified a simple

extension to Fortran I/O– parallel I/O may be added in a future version of

the standard

DISCS'12 Workshop 22

Fortran I/O

• Fortran provides interfaces for formatted and unformatted I/O

record 1

record 2

record 3

record 4

open( 10, file=‘fn’, action=‘write’, & access=‘direct’, recl=k )…write (10, rec=3) A

A

write

file ‘fn’ connected to unit 10

DISCS'12 Workshop 23

Current limitations of I/O

• Issues:1. no defined, legal way for multiple images to

access the same file2. a file is a 1-dimensional sequence of records3. records are read/written one at a time4. no mechanism for collectives accesses to a

shared file amongst multiple images

DISCS'12 Workshop 24

Proposed Extension for Parallel I/O

• Allow a file to be “share-opened”, e.g. OPEN( 10, file=‘fn’, TEAM=‘yes’, …)– all images form a team with shared access to the same

file– implicit synchronization

• recommended only for direct access mode• FLUSH statement used to ensure changes by one

image are visible to other images in team• CLOSE statement has implicit image synchronization

DISCS'12 Workshop 25

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files1,1

open( 10, file=‘fn’, action=‘write’, & access=‘direct’, ndim=2, & dims=(/M/), team=‘yes’, recl=k )…

file ‘fn’ connected to unit 10

1,2 1,3 …

2,1 2,2 2,3 …

3,1 3,2 3,3 …

4,1 4,2 4,3 …

5,1 5,2 5,3 …

M,1 M,2 M,3 …

DISCS'12 Workshop 26

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files1,1

write (10, rec_lb=(/ 2,2 /), rec_ub=(/ 4,3 /) ) & A(1:4, 1:2)

file ‘fn’ connected to unit 10

1,2 1,3 …

2,1 2,2 2,3 …

3,1 3,2 3,3 …

4,1 4,2 4,3 …

5,1 5,2 5,3 …

M,1 M,2 M,3 …

A(1:4,1:2)

write

DISCS'12 Workshop 27

Further extensions we’re exploring• Multi-dimensional view of records• Read/write multiple records at a time• Collective read/write operations on

shared files

1,1

type(T) :: A(2,2)[3,*] …my_rec_lbs = get_rec_lbs( this_image() )my_rec_ubs = get_rec_ubs( this_image() )write_team( 10, rec_lb=my_rec_lbs, & rec_lb=my_rec_lbs) & A(:,:)

file ‘fn’ connected to unit 10

1,2 1,3 1,4

2,1 2,2 2,3 2,4

3,1 3,2 3,3 3,4

4,1 4,2 4,3 4,4

5,1 5,2 5,3 5,4

6,1 6,2 6,3 6,4

A(1:2,1:2)[1,1]

A(1:2,1:2)[2,1]

A(1:2,1:2)[1,2]

A(1:2,1:2)[2,2]

A(1:2,1:2)[3,1] A(1:2,1:2)[3,2]

write_team

DISCS'12 Workshop 28

Leverage Global Arrays as memory buffers for I/O

• Implementation in progress which utilizes global arrays (GA) as I/O buffers in memory

I/O requests

asynchronous disk updates

compute nodes

I/O nodes

DISCS'12 Workshop 29

Outline

• Fortran 2008 parallel processing additions (CAF)

• CAF Implementation in OpenUH Fortran compiler

• Application port to CAF and Results• Further extensions for Parallel I/O• Closing Remarks

DISCS'12 Workshop 30

In Summary

• Fortran coarray model may be used for processing large data sets

• Developed implementation that’s freely available and used it to develop RTM application

• Fortran’s I/O model doesn’t support parallel I/O for large-scale, multi-dimensional array data sets, and we are working on addressing this

DISCS'12 Workshop 31

Thanks