Zero Copy MPI Derived Datatype Communication Over...

39
Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan Santhanaraman Jiesheng Wu D.K.Panda Network Based Computing Lab The Ohio State University

Transcript of Zero Copy MPI Derived Datatype Communication Over...

Page 1: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Zero Copy MPI Derived Datatype Communication Over InfiniBand

Gopalakrishnan SanthanaramanJiesheng WuD.K.Panda

Network Based Computing LabThe Ohio State University

Page 2: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction

� Background and Existing approaches

� Motivation for new Scatter/Gather (SGRS) approach

� Design and implementation issues

� Performance Evaluation

� Conclusions and Future work

Page 3: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Introduction

� Non-contiguous data communication is common in scientific applications.

� Decomposition of multi dimensional volumes, FFT, finite elementcodes

� NAS BENCHMARKS, LINPACK

� MPI provides derived datatype interface to facilitate this kind of data movement

� Current Implementations of derived datatypes not very efficient

Page 4: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction

� Background and Existing Approaches

� Motivation for new Scatter/Gather(SGRS) approach

� Design and Implementation Issues

� Performance Evaluation

� Conclusions

Page 5: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Related Work

� Improve datatype processing

� Optimized packing and Unpacking Procedures

� Taking advantage of network features to improve non contiguous datatype communication

Page 6: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

InfiniBand Overview

� Emerging interconnect based on Open standards

� Provides low latency and high Bandwidth

� Several Novel features

� RDMA

� Scatter/Gather

� Atomic operations

� VAPI – low level interface (API) over InfiniBand

Page 7: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Our Previous Work

� Different Approaches

� Pack/Unpack Based Approach

� Copy on both sides

� Pipeline packing, network communication and unpacking

� Reduced Copy

� RDMA write with Gather on sender side

� RDMA read with Scatter on receiver side

� Zero Copy

� Multiple RDMA writes on sender side (Multi-W scheme)

Jiesheng Wu, Pete Wyckoff, and Dhabaleswar K. Panda. High Performance Implementation of MPI Datatype Communication over InfiniBand. In Int'l Parallel and Distributed Processing Symposium (IPDPS 04), April, 2004

Page 8: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Conclusions of Previous Work

� For small messages with eager protocol, segment pack/unpack is best.

� For messages in rendezvous protocol range, zero copy schemes are beneficial.

Multi-W zero copy scheme was proposed.

��� � � �� ��� � �� � �

�� �� ��� � � � � �� �� ��� � � � �

�� �� �� � �

�� �� �� � �

�� �� �� � �

�� �� �� � �

Page 9: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Limitations of Earlier Approaches

� RDMA write/gather, RDMA read/scatter

Needs copy in order to handle non-contiguity on both sides

� Multi-W

For large number of small segments, performance degrades.

� Overhead of large number of RDMA operations

� Poor network utilization

� Motivation to explore other zero copy schemes

� Problem statementHow can we utilize the advanced features provided by modern

interconnects like InfiniBand to handle non-contiguous data communication efficiently and overcome the above limitations?

Page 10: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for New Scatter/Gather (SGRS)

Approach� Design and Implementation issues� Performance Evaluation� Conclusions and Future work

Page 11: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Semantics of send/gather, receive/scatter feature

� Based on send/receive channel semantics

� Handles non-contiguity on both send/receive sides which is the most generic case

� To implement datatype using this feature needs a synchronization phase. Hence applicable for messages which fall under the rendezvous protocol

Page 12: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

VAPI level Comparison Multi-W vs SGRS

Observations

� For a fixed number of segments SGRS approach outperforms the Multi-W approach for different message sizes

� For a fixed message size with increasing degree of non-contiguity,

SGRS scheme degradation is negligible

Multi-W degradation is significant

Page 13: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather

approach� Design and Implementation issues� Performance Evaluation� Conclusions

Page 14: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MVAPICH Overview

� High Performance Implementation of MPI over InfiniBand

� Design based on MPICH and MVICH

� Eager protocol for small messages

� Rendezvous protocol for large messages

� Datatype Implementation currently uses the generic packing and unpacking scheme.

� small datatype messages are packed/unpacked

� large datatype messages both sides allocate pack/ unpack buffers dynamically

Page 15: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

� Open Source (current version is 0.9.4 released last week)

� Have been directly downloaded by more than 119 organizations and industry

� Available in the software stack distributions of IBA vendors

MVAPICH Software Distribution

��� � ��� � � � �� �� � �� �� �� � � � � � � ��

����� � �� ��� � �� �� ! � "� � � �� �#$� � � � % & �� � # $� � � � �$� � � � � '� � (� � & � )� � ��* + � � , $� )-. � � � /* � � �* �0% & � � � � & � � � � , + 12 � + �43 '� � 5�6 - � � � )� � �� 7 &# + ��* + 08 � � )� �# 1

2 � + �43 '� � 7 �� � � � ) / ��. * �. � � + � � ,9 � �� :�� � � ��; � � �� �08 � � )� �# 1

! �< � � �* � =�� � >� � # ��� � �� �� ! � "� � � �� �#!� + � � )� + �� � �� �� ! � "� � � �� � #(� 6 7 ?� �* > 2 � + � � �. � � '� � � + ��� �� )# 0 8 � � )� �# 1

� � / � � ) � + @� +� � � * & $� � �� �� $ / ��� � �� �� $� � � � � '� � � � )� +- & � � �* @ � +� � � * &

: & �� /. - � � *� )-. � � � $� � �� �7 � * � ' �* �� � � &< � + � ��� � �� �� ! � "� � � �� � #7 � � � + ". �� & /. - � � *� ) -. � � �� $� � � � �@� +� � � * &A 9 �B � � - )� � � 2 � + � � �. � � CB � � � 0 @. + + � � 1

/* � � �* � � - - �* � � �� � + 2 � � � � �� � �� �� $� � -� � � � �� �/� � , ��� ��� � �� �� ! � "� � � �� � #

D � ��E � �� � � � ��

8 �� � � � � % � * &

2 � , ��� �� F � �B � � + � �#C� � � � F � �B 3 0 C� � � � 1

C� � � � 2 � + �43 : ' /* � � �* � � � ,% � * & 3 0 C� � � � 1

C# . + &. F � �B 3 0G � - � � 1( � + + � + + � - - � / �� � � F � �B � � + � �#(� +*� < / �� � � F � �B � � + � �# 0 @. + + � � 1

�� � � & � � + � � � � F � �B � � + � �#7 � � � / �� � � F � �B � � + � �#@. + + � � � �* � , � )# � ' /* � � �* � + 0 @. + + � � 1

/ �� � '� � , F � �B � � + � �#% � * & � �� � 0 2 +� � � 1

% � * & � �* � F � �B 3� ' (. �* & � � 0 8 � � )� �# 1

% � * & � �* � F � �B 3� ' $ & � ) � � �; 0 8 � � )� �# 1

F � �B 3� ' 8 � � �B � 0 /< � �; � � � � , 1

F � �B 3� 'H�� . + �� �F � �B 3� ' C� � +�. & � 0 8 � � )� �# 1

F � �B 3� ' (� + +� * &. +� � � +!� < �

F � �B 3� '7 � , � � "� � � 0 8 � � )� �# 1

F � �B 3� '7� � + ,� ) 0 8 � � )� �# 1

F � �B 3� ' @ �� 8 � � � ,?� 0 =� � ; � 1

F � �B 3� ' / & � � "�� � >� 0 $� �� ,� 1

F � �B 3� ' / �. � �� � � � 0 8 � � )� �# 1

F � �B 3� '%� �� � �� 0 $� �� ,� 1

Page 16: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MVAPICH Users (Cont’d)

� " "� % � * & �� � � #� ,B � �* � , $ . + � � � � �� % � * & 3� (9� ) )� + +�� - - ���� � � # /# + � � ) + $� )- 3 0 $� �� ,� 1

� � � - � % � * & �� � � � � +�� � � � �% � * & �� � � � � +$ . + �� � + /. - � � *� )-. � � ���% � * & �� � � # 2 �* 3 0 $ & � �� 1

$ . + � � �B � + �� � 0 � � � & � � � � , + 1$� )-. +# + 0 F C 1

$ / /! � "� � � �� � � � +�� 2 �* 39 � 9 � �� $� )-. �� � 0 8 � � )� �# 1

5 )- �* + 0 8 � � )� �# 1

� . � � � 2 �* 356 � � � � 0 2 +� � � 1

8 � � - & / �� � � ) � 2 �* 3H 7H 7 0 �� � �* � 1

� � ��� � � ��

2 = (2 = ( 0 �� � �* � 1

2 = ( 0 8 � � )� �# 1

2 �% 5 @ / 59 0 �� � �* � 1

2 � ' � � � $� �2 � � � 2 � � � 0 $ & � �� 1

2 � � � 0 8 � � )� �# 1

2 � � � /� . � �� � /�� �B ��* � + 0H�� �� C� �� 1

2 � � � /� . � �� � /�� �B ��* � + 0G � - � � 1G � 2C� � ' �< � # 0 @. + + � � 1

! � �� * &�� 0 $ & � �� 1

! � �. 6 � � �< � � 6! � �B � + �� � 0 � � � & � � � � , + 1( � � �< � � � 0 8 � � )� �# 1

( � � * . �# $� )-. �� � /# + �� ) +( � � �� 6 % � * & �� � � � � +( � �� +# + 0 �� � �* � 1

( ��* �� < � # � 2 �* 3� 5 $ 0G � - � � 1� 5 $ /� . � �� � + � 2 �* 3� 5 $ 0 / � �� � -� � � 1

� 2 $ 5 % 0 @. + + � � 1

: $ � - * 0 F � � � � , C � �� ,� ) 1

:* � �� � =� # 0 $� �� ,� 1

7 � �% � /# + �� ) +7 � � % � * 0 8 � � )� �# 1

7 � � & /* � � � 2 �* 37 . �� * 0G � - � � 17 # � � ) � , $� )-. � � � 08 � � )� �# 1

. + � � � + 0 2 +� � � 1

@� # � & �� � 2 �* 3@! �% � * & �� � � � � +@� + �� ! � ,3 0 @. + + � � 1

/ = $% � * & �� � � � � +�� 2 �* 3/* # , /� ' �< � � �/ 8 2 0 / � ��*� � 8 � � - & �* + � 2 �* 3 1

/ C� $� ) -. � � � +/ �� � � ) � �� $� )-. � � �� 0 F C 1

/# + �� � �%� )� �% � ?*� � , ��� � - - � � , @ � +� � � * &

% &� � + F � , � �< � � � � /# + � � ) + 0 F C 1

% � � � + � � * 0 8 � � )� �# 1

%� 7 � � '� � ) + 0 @. + + � � 1

%� - +- � �F � � +# +

� �� �� � � � > + �� � �� � + F C� ! � ,3 0 F C 1

� B � � /# + � � ) + � 2 �* 3

Page 17: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Larger IBA Clusters using MVAPICH and Top500 Rankings

� 1105-node cluster at Virginia Tech

� 3rd in Nov. ’03 ranking

� 192-node cluster at Mississippi State University

� 150th in June ’04 ranking

� 128-node cluster at Sandia/Livermore

� 111th in Nov ’03 ranking and 211th in June ’04 ranking

� 256-node cluster at Los Alamos

� 116th in Nov ’03 ranking and 218th in June ’04 ranking

� 128-node cluster at Ohio Supercomputer Center (OSC)

� 272th in June ’04 ranking

� More are getting installed ….

Page 18: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Framework For Handling Datatypes

MPI INTERFACE

INFINIBAND LAYER

Rendezvous

Reduced CopyPipeline Zero copyPack

Small messages Large messages

Eager

Page 19: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

��� � � � � �� � �� �

� �� � ��

��� � � � �

�� � � ��� � � � � �� �� ��� � � � �

��� � �

� �� � � �

Basic Idea

Page 20: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Design Issues

� Exchanging layout information

MPI datatype has only local semantics

Optimizing layout exchange

� Layout matching decision needs to be conveyed

� Registration and deregistration on user datatype message buffers

Unique issue due to non-contiguity in buffers

� Posting Descriptors

Upper limit on number of scatter gather descriptors.

Needs a secondary connection for transmitting non-contiguous data

Page 21: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

SG

RS

CO

MM

UN

ICA

TIO

N P

RO

TO

CO

L

SE

ND

ER

RE

CE

IVE

R

RE

QU

ES

T C

TR

L M

ES

G+

LA

YO

UT

(P

RIM

AR

Y C

ON

NE

CT

ION

)

PO

ST

SC

AT

TE

R

PO

ST

GA

TH

ER

RE

PL

Y C

TR

L M

ES

G +

DE

CIS

ION

INF

O (

PR

IMA

RY

CO

NN

EC

TIO

N)

DA

TA

(S

EC

ON

D C

ON

NE

CT

ION

)

Page 22: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Layout Exchange and Matching Decision

� Take advantage of handshake messages in the rendezvous protocol to achieve this

� Sender’s datatype layout is appended to Rendezvous start control message

� The matching decision information is conveyed in the Rendezvous reply/clear to send message

� A layout cache mechanism is implemented to reduce overhead of layout transfer

� Datatype information is exchanged only once

� Only the index needs to be sent for future messages

� Datatype Cache mechanism proposed by Traff et al.

Page 23: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Registration

� Registration and Deregistration on user datatype message buffers

� Common issues in both the zero copy schemes

� Unique issue due to non-contiguity in buffers

� Use Optimistic Group Registration scheme

J. Wu, P. Wyckoff, and D. K. Panda. “Supporting Efficient Noncontiguous Access in PVFS over InfiniBand”. IEEE Cluster Computing 2003, Dec. 2003

Page 24: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Posting Descriptors

� Needs a separate Queue pair connection

Ordering

Scalability

� Upper limit on number of gather/scatter descriptor

Message might need to be chopped into multiple gather/scatter descriptors

Number of posted gather descriptors must be equal to the number of posted scatter

Needs a negotiation phase

Page 25: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather (SGRS)

approach� Design and Implementation issues� Performance Evaluation� Conclusions and Future work

Page 26: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Experimental Evaluation

� Experimental Test bed

Cluster of 8 Supermicro nodes

� Dual Xeon 3.0 GHz processors

� 512 KB L2 Cache, PCI-X 64bit 133 MHz bus

� InfiniHost SDK version 3.0.1

� Physical memory 1GB DDR-SDRAM memory

� Experiments conducted

Latency, Bandwidth with vector datatype

Collective latency (MPI_Alltoall)

CPU overhead tests

Impact of layout cache

Page 27: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Vector Datatype TestA vector (multiple columns in a 64x4096 integer array) test

Page 28: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI Level Vector Latency

0

100

200

300

400

500

600

700

800

900

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)L

aten

cy (u

sec)

SGRS-128

Multi-W-128

Generic-128

Contiguous

0

300

600

900

1200

1500

1800

2100

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size (bytes)

Lat

ency

(u

secs

)

SGRS-64

MultiW-64

Contiguous

Generic-64

� SGRS scheme reduces latency by up to 62% as compared to Multi-W

Page 29: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI Level Vector Bandwidth

0

100

200

300

400

500

600

700

800

900

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

Ban

dw

idth

(M

egab

ytes

/sec

)

SGRS-64

SGRS-128

MultiW-64

MultiW-128

Contiguous

Generic-128

Generic-64

� SGRS scheme gives the best performance

� For large messages we get Bandwidth close to that of contiguous Bandwidth

Page 30: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

0

10

20

30

40

50

60

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

CP

U o

verh

ead

(use

c)

MultiW-64 segments

MultiW-128 segments

SGRS-64 segments

SGRS-128 segments

• The CPU overhead associated with SGRS protocol is relatively low

CPU Overhead

Receiver side OverheadSender side Overhead

0

4

8

12

16

20

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)C

PU

ove

rhea

d(us

ec)

MultiW-64 segments

MultiW-128 segments

SGRS-64 segments

SGRS-128 segments

Page 31: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

MPI_Alltoall Latency

• The Alltoall latency test shows significant improvement for the SGRS approach

0

2000

4000

6000

8000

10000

12000

4k 8k 16k 32k 64k 128k 256k 512k

Message size(bytes)

Lat

ency

(u

sec)

Multi-W 64

Multi-W-128

SGRS-64

SGRS-128

Page 32: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Synthetic Benchmark to Measure Impact of Layout Caching

� Need to transfer the two diagonals of a square matrix.

� Diagonal elements are actually blocks.

� Need significant layout size to describe it

Page 33: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

0

5

10

15

20

25

30

500 750 1000 1250 1500 1750 2000

Num of blocks

Per

centa

ge

of O

verh

ead

blocksize:4bytes

blocksize:8bytes

blocksize:16bytes

Effect of Layout Cache

� Layout cache shows benefits for certain scenarios

� Layout itself is contiguous as compared to the data that it describes

Page 34: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Presentation Layout

� Introduction� Background and Existing approaches� Motivation for new Scatter/Gather (SGRS)

approach� Design and implementation issues� Performance Evaluation� Conclusions and Future work

Page 35: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Conclusions and Future Work

� Provided a new zero-copy scheme for datatype communication over InfiniBand

� The new scheme outperforms the existing schemes

Latency can be improved by up to 62%

Bandwidth can be increased by up to 400%

Collective communication like Alltoall can derive potential benefits

Layout cache is shown to be beneficial for some scenarios

� Future Work

Evaluate the effectiveness of this scheme at application level

Provide a comprehensive solution that internally uses multiple schemes to achieve best performance

Page 36: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

��� � � � � � ��� �� � � � ���� � � � �� � � � � �

� � � �� � � � ��� � � ��� � � � ��� � �� � ���

� � �� � � � � � !"� � � �� # $ � � � � �%

& � � ' � � � ( � ) � �� � � � � %

Thank You!

NBC Home Page

Page 37: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

BACKUP SLIDES

Page 38: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Vapi Level Bandwidth Comparison SGRS vs. Multi-W

• SGRS scheme consistently outperforms the Multi-W

0

100

200

300

400

500

600

700

800

900

1000

2k 4k 8k 16k 32k 64k 128k 256k 512k

Message size (bytes)

Ban

dw

idth

(M

ega

byt

es/s

ec)

Multi-W-Bw

SGRS-Bw

Page 39: Zero Copy MPI Derived Datatype Communication Over InfiniBandmvapich.cse.ohio-state.edu/static/media/... · Zero Copy MPI Derived Datatype Communication Over InfiniBand Gopalakrishnan

Effect of degree of non-contiguity

• SGRS scheme fares better with increased non-contiguity

500

550

600

650

700

750

800

850

900

4 8 16 32 64

Num of Blocks

Ban

dw

idth

(M

egab

ytes

/sec

)

Multi-W-128k

Multi-W-256k

Multi-W-512k

SGRS-128k

SGRS-256k

SGRS-512k