A Brief Look At MPI’s Point To Point Communication

A Brief Look At MPI’s Point To Point A Brief Look At MPI’s Point To Point CommunicationCommunication

Brian T. Smith

Professor, Department of Computer Science

Director, Albuquerque High Performance Computing Center (AHPCC)

Point To Point CommunicationPoint To Point Communication

What is meant by this concept? There is a sender and a receiver

The sender prepares a message in a package from the application storage area

The sender has a protocol on how it contacts and communicates with the receiver

– The protocol is an agreement on how the communication is set up

– The sender and receive agree to and how to communicate The receiver receives the message package per its agreement with the

sender The receiver processes the packet and installs the data in the

application storage area

Communication ModelsCommunication Models

Many models are feasible and have been implemented in various environments, past and current

MPI’s goal is to be portable across all of the reasonable models This means that essentially NO assumptions can be made either

– by the implementation, or

– by the user as to which model is or can be used

Let’s talk about two possible models Models like these actually were used informally and differently by

individual “CPUs” in our recent trial communications amongst the three institutions

MPIs ConventionsMPIs Conventions

Messages have a format or a template Message container, called a buffer, which is frequently assumed to

be specified in user space – the storage set up by the user’s code Length in terms of number of objects of message type The type of objects in the message (basic type or user defined type) A message tag – a user specified integer id for the message Destination (for the sender) or source (for the receiver) of the

message The destination is the rank of the process in the process group

Communication world or group – named arrangement established by calls to MPI

MPIs Conventions ContinuedMPIs Conventions Continued Kinds of communication

Blocking Sender does not return from an MPI call until the message buffer (the user’s

container for the message) can be reused without corrupting the message that is being sent

Receiver does not return until the receiving message buffer contains all of the message

Non-blocking Sender call returns after sufficient processing has been performed to allow the

processor in a separate and independent thread to complete sending the message – in particular, changes in the sending tasks message buffer may change the message sent

Receiver call returns after sufficient processing has been performed to allow the processor in a separate and independent thread to complete receiving the message – in particular, receiver tasks message buffer likely changes after the receiver call returns to the user’s code

Other MPI procedures test or wait for the completion of sends and receives

MPI Conventions ContinuedMPI Conventions Continued Modes of communication (contact protcols and assumptions)

These are assumptions that may be made by the user and the implementation must follow these assumptions

Modes are determined by the name of the MPI SEND procedure used Eg: MPI_BSEND specifies a buffered send

Standard (no letter) Assumes no particular protocol used – see later modes for typical protocols

– Because no protocol is assumed, the programmer must assume the most restrictive one is used – namely “Ready” mode

Non-local operation – another process may have ‘to do something’ before this operation completes

Buffered (B letter) Buffers created used by the protocol and allocated in user-space Send can be started whether or not a receive has been posted Local operation – another process does not have to do anything before this

operation completes

Modes ContinuedModes Continued

Synchronous (S letter) Rendezvous semantics implemented

– Sender starts but does not complete until the receiver has posted a receive

» Buffer may be created in the receiver’s space or may be a direct transfer

– Non-local operation Ready (R letter)

Sender starts only if the matching receive has been posted Erroneous if receive not posted – result is undefined Non-local operation Highest performance as it can be a direct transfer with no buffer

MPI Conventions ContinuedMPI Conventions Continued

Communication “worlds” or communicators Specifies the domain of the processes within the group A processor may be in more than one processor group Each processor has a rank in each group

The rank of a particular process may be different in each group The purpose of the groups is to arrange the processors so that it is

convenient to send/receive message to the particular group and others processors do not see the message

Processors in a grid (north-south-east-west communication) Processors distributed in a line or row or column of a grid Processors in a circle Processors in a hypercube configuration

Pictures of Implementation ModelsPictures of Implementation Models

SenderUser data

Buffer

ReceiverUser data

Buffer

Send buffer used

No receive buffer used

SenderUser data

Buffer

ReceiverUser data

Buffer

Send buffer used

Receive buffer used

Pictures of Implementation ModelsPictures of Implementation Models

SenderUser data

Buffer

ReceiverUser data

Buffer

No send buffer used

No receive buffer used

SenderUser data

Buffer

ReceiverUser data

Buffer

No send buffer used

Receive buffer used

Blocking Communication Blocking Communication OperationsOperations

MPI_SEND and MPI_RECV Let’s look at 3 reasonable ways to perform communication between

2 processors which exchange messages One always works One always deadlocks

– That is, both processors hang waiting for the other to communicate

One may or may not work depending on the actual protocols used by the MPI implementation

Steps: Determine what rank the process is If rank == 0

Send a message from send_buffer to process with rank 1 Receive a message into recv_buffer from process with rank 1

Else if rank == 1 Receive a message into recv_buffer from process with rank 0 Send a message from send_buffer to process with rank 0

Pattern of communication (doesn’t matter who (0 or 1) executes first)

This One Always WorksThis One Always Works

Processor 0 Processor 1

Send firstReceive next

Receive firstSend next

Example Code – Always WorksExample Code – Always Works

Call MPI_Comm_rank( comm, rank, ierr)If( rank == 0 ) then

call MPI_Send( sendbuf, count, MPI_REAL, &1, tag, comm, ierr )

call MPI_Recv( recvbuf, count, MPI_REAL, &1, tag, comm, status, ierr )

Else if( rank == 1 ) thencall MPI_Recv( recvbuf, count, MPI_REAL, &

0, tag, comm, status, ierr )call MPI_Send( recvbuf, count, MPI_REAL, &

0, tag, comm, ierr )Endif

This One Always DeadlocksThis One Always Deadlocks

Steps: Determine what rank the process is If rank == 0

Receive a message into recv_buffer from process with rank 1 Send a message from send_buffer to process with rank 1

Else if rank == 1 Receive a message into recv_buffer from process with rank 0 Send a message from send_buffer to process with rank 0

Pattern of communication (doesn’t matter who (0 or 1) executes first)




Example Code – Always DeadlocksExample Code – Always Deadlocks




Else if( rank == 1 ) thencall MPI_Recv( recvbuf, count, MPI_REAL, &

0, tag, comm, status, ierr )call MPI_Send( recvbuf, count, MPI_REAL, &

0, tag, comm, ierr )Endif

This One may or May Not Work – This One may or May Not Work – The Worst Of All PossibilitiesThe Worst Of All Possibilities

That is, it may work on one implementation and not work on another Whether it works may depend on the size of the message or other

unknown features of the implementation It relies on the buffering of the messages for which the code does

not specify – no MPI_BSEND used or no MPI_Buffer_attach Pattern of communication (doesn’t matter who (0 or 1) executes

first)




Example Code – May FailExample Code – May Fail




Else if( rank == 1 ) thencall MPI_Send( recvbuf, count, MPI_REAL, &

0, tag, comm, ierr )call MPI_Recv( recvbuf, count, MPI_REAL, &

0, tag, comm, status, ierr )Endif

An Application Showing These Issues – An Application Showing These Issues – Very Close To Your CodeVery Close To Your Code

Consider a 2-D Jacobi iteration (n n matrix) using a 5 point stencil The data structure to be used here is a 1-D data structure

The coding illustrations are simpler here However, this code does not scale well when the ratio of the size of the

problem n to the number of processors is large – the practical case

– The communication overhead is too large in this case The algorithm or computation is:

Given an initial data for the matrix A, compute the average of the E-W-N-S neighbors of a point and assign it to the matrix B

Assign matrix B to A and repeat the process until the process has converged

Serial CodeSerial Code

real A(0:n+1,0:n+1), B(1:n,:1:n)

! Main loop

do while( .NOT. Converged(A) )

do j = 1, n

b(1:n,j) = 0.25*(a(0:n-1,j)+a(2:n,j)+ &

a(1:n,j-1)+a(1:n,j+1))

enddo

a(1:n,1:n) = b(1:n,1:n)

enddo

Partitioning A an Partitioning A an B B Amongst The ProcessorsAmongst The Processors For simplicity of explaining the SEND/RECV commands, we use

a 1-D partition

A

00

m+1

n+1

00

m+1

n+1

00

m+1

n+1

Process 0

1

1

m

n

B

1

1

m

n

1

1

m

n

Code For This -- UnsafeCode For This -- Unsafereal A(0:n+1,0:n+1), B(1:n,:1:n)! Call MPI to return p (number of processors), and myrank! Assume m is an integral multiple of p! Main loopdo while( .NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( myrank > 0 ) then

! Send first column of B to last column of A of myrank-1 endif if( myrank < p-1 ) then

! Send last column of B to first column of A of myrank+1 endif if( myrank > 0 ) then

! Receive last column of B to first column of A of myrank-1 endif if( myrank < p-1 ) then

! Receive first column of B to last column of A of myrank+1 endif enddo

Unsafe Why?Unsafe Why?

All the sends are executed before any received is posted Assumes as before that the messages are buffered

This should not be assumed in standard mode Solution:

Divide the processors in two groups – even and odd proccssors The odd processors send to the even processors first

– Then the odd processors receive from the even processors The even processors receive from the odd processors first

– Then the even processors send to the odd processors The effect is to interleave the send and receive commands so that no

buffers are required to complete the communication They, of course, may be used

Safe CommunicationSafe Communicationdo while( .NOT. Converged(A) ) ! Compute with A and store in B as in the serial code … if( mod(myrank,2) == 1 ) then ! Odd ranked processors

! Send first column of B to last column of A of myrank-1! If not the last processor, send the last column of B to ! processor myrank+1! Receive into first column of A from processor myrank-1! If not the last processor, receive into last column of A ! from processor myrank+1

else ! Even ranked processors if( mod(myrank,2) == 1 ) then ! Odd ranked processors

! If not the first processor, receive last column of B to ! first column of A of myrank-1! If not the last processor, receive the first column of B

to ! processor myrank+1! If not the first processor, send into first column of B to ! processor myrank-1! If not the last processor, send the last column of B

! to processor myrank+1 endif enddo

Safe And Simpler CommunicationsSafe And Simpler Communications

Use the send/receive commands for all but the first and last processors

Use null processes to avoid the use of the special cases of dealing with the first and last processors

A Brief Look At MPI’s Point To Point Communication

Documents

Transcript of A Brief Look At MPI’s Point To Point Communication