1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian...

Post on 24-Dec-2015

221 views 0 download

Tags:

Transcript of 1 ProActive performance evaluation with NAS benchmarks and optimization of OO SPMD Brian...

1

ProActive performance evaluationwith NAS benchmarks

andoptimization of OO SPMD

Brian Amedro Vladimir Bodnartchouk

2

Outline

• TimIt : A profiling tool for ProActive

• OO SPMD model in ProActive

• Performance evaluation with NAS benchmarks

• Optimizing group communications

3

TimIt : A profiling tool for ProActiveA ProActive feature to time and analyze applications

4

OO SPMD model

• A parallel programming model

• Flexibility and high level of abstraction

• Strongly used in NAS benchmarks implementations

One To All Scattering Reduce operation

5

NAS Parallel Benchmarks

• Designed by NASA to evaluate benefits of high performance systems

• Strongly based on CFD

• 5 benchmarks (kernels) to test different aspects of a system

• Easy to implement thanks to OOSPMD pattern

• Tests performed on Sun 1.5 with RMI for ProActive and PGI 6.0 compiler for MPI

6

CG Kernel (Conjugate Gradient)

• Floating point operations

• Eigen value computation

• High number of unstructuredcommunications

• 12000 calls

• 570 MB sent

• 1 min 32

• 65 % comms

7

MG Kernel (Multi Grid)

• Floating point operations

• Solving Poisson problem

• Structured communications

• 600 calls

• 45 MB sent

• 1 min 32

• 80 % comms

8

IS Kernel (Integer Sort)

• Keys ranking operations

• Bucket sort

• Large arrays in memory

• 65 calls

• 22 MB sent

• 4 min 32

• 60 % comms

9

EP Kernel (Embarrassingly Parallel)

• Random numbers generation

• Almost no communications

• 6 calls

• 246 bytes sent

• 7 min 32

• 2 % comms

10

FT Kernel (Fourier Transformation)

• Floating point operations• Big messages : 8 MB per call

• 22 calls

• 180 MB sent

• 1 min 32

• 40 % comms

11

Optimizing group communications

Implement efficient group communication• Minimize the TCP traffic• Decrease the network congestion

Use clustering techniques to choose the better algorithm to use

12

Ring all-to-all algorithm

• Best for large size communications

• Takes n-1 steps

step

1 2 3

13

Recursive doubling all-to-all algorithm

• Best for small size communications

• Takes log(n) steps

step

1 2

14

Conclusion

• TimIt : easy and helpful profiling tool

• NAS benchmarks easy to implements with ProActive and OO SPMD patternhttp://www-sop.inria.fr/oasis/proactive/nas

• Good performances expected with future Sun Java 6 and usage of Ibis RMI

15

Questions

?

16

MPI / ProActiveMPI ProActive

Mpirun deployment

MPI_Init activities creationMPI_Finalize

MPI_Comm_Size getMyGroupSize

MPI_Comm_rank getMyRank

MPI_*Send method call (setter and getter)MPI_*Recv

MPI_Barrier barrier

MPI_Bcast method call

MPI_Scatter method call with a scattergroup as parameter

MPI_Gather result of a group communication

MPI_Reduce programmer's method

Back