Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R....

35
Parallel Processing 1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters 3- 5,11 in Pacheco

Transcript of Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R....

Page 1: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 1

Parallel Processing (CS 676)

Lecture 7: Message Passing using MPI*

Jeremy R. Johnson

*Parts of this lecture was derived from chapters 3-5,11 in Pacheco

Page 2: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 2

Introduction

• Objective: To introduce distributed memory parallel programming using message passing. Introduction to the MPI standard for message passing.

• Topics– Introduction to MPI

• hello.c

• hello.f

– Example Problem (numeric integration)– Collective Communication– Performance Model

Page 3: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 3

MPI

• Message Passing Interface• Distributed Memory Model

– Single Program Multiple Data (SPMD)– Communication using message passing

• Send/Recv

– Collective Communication• Broadcast

• Reduce (AllReduce)

• Gather (AllGather)

• Scatter (AllScatter)

• Alltoall

Page 4: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 4

Benefits/Disadvantges

• No new language is requried

• Portable

• Good performance

• Explicitly forces programmer to deal with local/global access

• Harder to program that shared memory – requires larger program/algorithm changes

Page 5: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 5

Further Information

• http://www-unix.mcs.anl.gov/mpi/• en.wikipedia.org/wiki/Message_Passing_Interface• www.mpi-forum.org• www.open-mpi.org• www.mcs.anl.gov/research/projects/mpich2

• Textbook– Peter S. Pacheco, Parallel Programming with MPI, Morgan Kaufman,

1997.

Page 6: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 6

Basic MPI Functions

• int MPI_Init(• int* argc /* in/out */,• char** argv /* in/out */)• • int MPI_Finalize(void)

• Int MPI_Comm_size(• MPI_Comm communicator /* in */,• int* number_of_processors /* out */)• Int MPI_Comm_rank(• MPI_Comm communicator /* in */,• int* my_rank /* out */)

Page 7: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 7

Send

• Must package message in envelope containing destination, size, and an identifying tag, set of processors participating in the communication.

• int MPI_Send(• void* message /* in */• int count /* in */• MPI_Datatype datatype /* in */• int dest /* in */• int tag /* in */• MPI_Comm communicator /* in */)

Page 8: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 8

Receive

• int MPI_Recv(• void* message /* out */• int count /* in */• MPI_Datatype datatype /* in */• int source /* in */• int tag /* in */• MPI_Comm communicator /* in */• MPI_Status* status /* out */)

Page 9: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 9

Status

• Status-> MPI_SOURCE• Status-> MPI_TAG• Status-> MPI_ERROR

• Int MPI_Get_count(• MPI_Status* status /* in */,• MPI_Datatype datatype /* in */,• int* count_ptr /* out */)

Page 10: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 10

hello.c#include <stdio.h>#include <string.h>#include "mpi.h"

main(int argc, char * argv[]){ int my_rank; /* rank of process */ int p; /* number of processes */ int source; /* rank of sender */ int dest; /* rank of receiver */ int tag = 0; /* tag for messages */ char message[100]; /* storage for message */ MPI_Status status; /* return status for receive */

/* Start up MPI */ MPI_Init(&argc, &argv);

/* Find out process rank */ MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

/* Find out number of processes */ MPI_Comm_size(MPI_COMM_WORLD, &p);

Page 11: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 11

hello.c if (my_rank != 0) { /* create message */ sprintf(message, "Greetings from process %d!\n",my_rank); dest = 0; /* user strlen + 1 so tat '\0' gets transmitted */ MPI_Send(message, strlen(message)+1,MPI_CHAR, dest, tag, MPI_COMM_WORLD); } else { for (source = 1; source < p; source++) { MPI_Recv(message,100, MPI_CHAR, source, tag, MPI_COMM_WORLD,&status); printf("%s\n",message); } }

/* Shut down MPI */ MPI_Finalize();}

Page 12: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 12

AnySource if (my_rank != 0)

{

/* create message */

sprintf(message, "Greetings from process %d!\n",my_rank);

dest = 0;

/* user strlen + 1 so tat '\0' gets transmitted */

MPI_Send(message, strlen(message)+1,MPI_CHAR,

dest, tag, MPI_COMM_WORLD);

}

else

{

for (source = 1; source < p; source++) {

MPI_Recv(message,100, MPI_CHAR, MPI_ANY_SOURCE, tag,

MPI_COMM_WORLD,&status);

printf("%s\n",message);

}

}

Page 13: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Ring Communication

Oct. 30, 2002 Parallel Processing 13

0 1

5 4

7

6

2

3

Page 14: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 14

First Attempt

sprintf(message, "Greetings from process %d!\n",my_rank);

dest = (my_rank + 1) % p;

MPI_Send(message, strlen(message)+1,MPI_CHAR,

dest, tag, MPI_COMM_WORLD);

source = (my_rank -1) % p;

MPI_Recv(message,100, MPI_CHAR, source, tag,

MPI_COMM_WORLD,&status);

printf("PE %d received: %s\n",my_rank,message);

Page 15: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 15

Deadlock

sprintf(message, "Greetings from process %d!\n",my_rank);

dest = (my_rank + 1) % p;

source = (my_rank -1) % p;

MPI_Recv(message,100, MPI_CHAR, source, tag,

MPI_COMM_WORLD,&status);

MPI_Send(message, strlen(message)+1,MPI_CHAR,

dest, tag, MPI_COMM_WORLD);

printf("PE %d received: %s\n",my_rank,message);

Page 16: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Oct. 30, 2002 Parallel Processing 16

Buffering Assumption

• Previous code is not safe since it depends on sufficient system buffers being available so that deadlock does not occur.

• SendRecv can be used to guarantee that deadlock does not occur.

Page 17: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Oct. 30, 2002 Parallel Processing 17

SendRecv

• int MPI_Sendrecv(• void* send_buf /* in */,• int send_count /* in */,• MPI_Datatype send_type /* in */,• int dest /* in */,• int send_tag /* in */,• void* recv_buf /* out */,• int recv_count /* in */,• MPI_Datatype recv_type /* in */,• int source /* in */,• int recv_tag /* in */,• MPI_Comm communicator /* in */,• MPI_Status* status /* out */)

Page 18: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 18

Correct Version with SendReceive

sprintf(omessage, "Greetings from process %d!\n",my_rank);

dest = (my_rank + 1) % p;

source = (my_rank -1) % p;

MPI_Sendrecv(omessage,strlen(omessage)+1,MPI_CHAR,dest,tag,

imessage,100,MPI_CHAR,source,tag,MPI_COMM_WORLD,&status);

printf("PE %d received: %s\n",my_rank,imessage);

Page 19: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 19

Lower Level Implementation

sprintf(smessage, "Greetings from process %d!\n",my_rank);

dest = (my_rank + 1) % p;

source = (my_rank -1) % p;

if (my_rank % 2 == 0) {

MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR,

dest, tag, MPI_COMM_WORLD);

MPI_Recv(dmessage,100, MPI_CHAR, source, tag,

MPI_COMM_WORLD,&status);

} else {

MPI_Recv(dmessage,100, MPI_CHAR, source, tag,

MPI_COMM_WORLD,&status);

MPI_Send(smessage, strlen(smessage)+1,MPI_CHAR,

dest, tag, MPI_COMM_WORLD);

}

printf("PE %d received: %s\n",my_rank,dmessage);

Page 20: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 20

Compiling and Executing MPI Programs with OpenMPI

• To compile a C program with MPI calls– mpicc hello.c -o hello

• To run an MPI program– mpirun –np PROCS hello– You can provide a hostfile with –hostfile NAME (see

man page for details)

Page 21: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 21

dot.c

#include <stdio.h>

float Serial_doc(float x[] /* in */, float y[] /* in */, int n /* in */)

{

int i; float sum = 0.0;

for (i=0; i< n; i++)

sum = sum + x[i]*y[i];

return sum;

}

Page 22: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 22

Parallel Dot

float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */)

{

float local_dot;

local_dot = Serial_dot(local_x, local_y,b_bar);

MPI_Reduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);

return dot;

}

Page 23: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 23

Parallel All Dot

float Parallel_doc(float local_x[] /* in */, float local_y[] /* in */, int n_bar /* in */)

{

float local_dot;

local_dot = Serial_dot(local_x, local_y,b_bar);

MPI_Allreduce(&local_dot, &dot, 1, MPI_FLOAT, MPI_SUM, 0, MPI_COMM_WORLD);

return dot;

}

Page 24: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 24

Reduce

• int MPI_Reduce(• void* operand /* in */• void* result /* out */• int count /* in */• MPI_Datatype datatype /* in */• MPI_Op operator /* in */• int root /* in */• MPI_Comm communicator /* in */)

• Operators– MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND,

MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

Page 25: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

x0+ x1+x2+x3+x4+ x5+x6+ x7

Parallel Processing 25

Reduce

0

0 1

0 1 2 3

0 1 2 3 4 5 6 7

x0 x1 x2 x3 x4 x5 x6 x7

x0+x4,x1+x5,x2+x6,x3+x7

x0+x4+x2+x6

x1+x5+x3+ x7

Page 26: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 26

AllReduce

• int MPI_Allreduce(• void* operand /* in */• void* result /* out */• int count /* in */• MPI_Datatype datatype /* in */• MPI_Op operator /* in */• int root /* in */• MPI_Comm communicator /* in */)

• Operators– MPI_MAX, MPI_MIN, MPI_SUM, MPI_PROD, MPI_LAND, MPI_BAND,

MPI_LOR, MPI_BOR, MPI_LXOR, MPI_BXOR, MPI_MAXLOC, MPI_MINLOC

Page 27: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 27

AllReduce

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

0 1 2 3 4 5 6 7

Page 28: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 28

Broadcast

• int MPI_Bcast(• void* message /* in */• int count /* in */• MPI_Datatype datatype /* in */• int root /* in */• MPI_Comm communicator /* in */)

Page 29: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 29

Broadcast

0

0 1

0 1 2 3

0 1 2 3 4 5 6 7

Page 30: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 30

Gather• int MPI_Gather(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• int root /* in */• MPI_Comm communicator /* in */)

Process 0

Process 1

Process 2

Process 3

x0

x1

x2

x3

Page 31: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 31

Scatter• int MPI_Scatter(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• int root /* in */• MPI_Comm communicator /* in */)

Process 0

Process 1

Process 2

Process 3

x0 x1 x2 x3

Page 32: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 32

AllGather• int MPI_AllGather(• void* send_data /* in */• int send_count /* in */• MPI_Datatype send_type /* in */• void* recv_data /* out */• int recv_count /* in */• MPI_Datatype recv_type /* in */• MPI_Comm communicator /* in */)

Process 0

Process 1

Process 2

Process 3

x0

x1

x2

x3

Page 33: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 33

Matrix-Vector Product(block cyclic storage)

• y = Ax, yj =0i<n Aij*xj 0 i < m

– Store blocks of A, x, y in local memory– Gather local blocks of x in each process– Compute chunks of y in parallel

Process 0

Process 1

Process 2

Process 3

A x y

=

Page 34: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 34

Parallel Matrix-Vector Product

float Parallel_matrix_vector_product(LOCAL_MATRIX_T local_A, int m, int n, float local_x, float global_x, float local_y, int local_m, int local_n)

{

/* local_m = m/p0, local_n = n/p */

int I, j;

MPI_Allgather(local_x, local_n, MPI_FLOAT, global_x, local_n, MPI_FLOAT, MPI_COMM_WORLD);

for (i=0; i< local_m; i++) {

local_y[i] = 0.0;

for (j = 0; j< n; j++)

local_y[i] = local_y[i] + local_A[I][j]*global_x[j];

}

Page 35: Parallel Processing1 Parallel Processing (CS 676) Lecture 7: Message Passing using MPI * Jeremy R. Johnson *Parts of this lecture was derived from chapters.

Parallel Processing 35

Embarrassingly Parallel Example

• Numerical integration (trapezoid rule)

t=0..1 f(t) dt 1/2*h[f(x0) + 2f(x1)+…+ 2f(xn-1)+f(xn)]

a bx1 xn-1