Programming using MPI - ccds.iitkgp.ac.in
Transcript of Programming using MPI - ccds.iitkgp.ac.in
![Page 1: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/1.jpg)
Programming using MPI
Preeti MalakarIndian Institute of Technology Kanpur
National Supercomputing
Mission
Centre for Development of
Advanced Computing
SKAIndia
IITKharagpur
High Performance Computing for Astronomy and Astrophysics
![Page 2: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/2.jpg)
FLOPS
2
![Page 3: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/3.jpg)
Programming Supercomputers
PARAM Shakti, IIT Kharagpur3
![Page 4: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/4.jpg)
Top 500 Supercomputers
4
![Page 5: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/5.jpg)
Parallel Programming Models
Shared memory programming – OpenMP, Pthreads
Thread
Core
5
Cache
![Page 6: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/6.jpg)
Message Passing Interface
• Standard for message passing in a distributed memory environment (most widely used programming model in supercomputers)
• Efforts began in 1991 by Jack Dongarra, Tony Hey, and David W. Walker
• MPI Forum formed in 1993• Version 1.0: 1994
• Version 2.0: 1997
• Version 3.0: 2012
•
6
![Page 7: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/7.jpg)
Parallel Programming Models
Distributed memory programming – MPI (Message Passing Interface)
Memory
Process
Distributed memory (each process has its own address space)
Core
7
![Page 8: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/8.jpg)
8Adapted from Neha Karanjkar’s slides
![Page 9: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/9.jpg)
What is MPI?
Enables parallel programming using message passing
9
Process 0 Process 1
![Page 10: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/10.jpg)
Distinct Process Address Space
x = 1, y = 2. . .
x+=2. . .
print x, y
Program order
x = 1, y = 2. . .y++. . .
print x, y
3, 2 1, 3
Process 0 Process 1
x = 1, y = 2. . .
x+=1. . .
print x, y
x = 10, y = 20. . .x++. . .
print x, y
2, 2 11, 20
Process 0 Process 1
10
![Page 11: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/11.jpg)
Parallel Execution – Single Node
11
Cache
Process
Core
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
parallel.x parallel.x parallel.x parallel.x
parallel.x
![Page 12: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/12.jpg)
Parallel Execution – Multiple Nodes
12
Memory
Process
Core
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
Instruction 1
Instruction 2
…
parallel.x parallel.x parallel.x parallel.x
parallel.x
![Page 13: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/13.jpg)
Message Passing
13
Process 0 Process 1 Process 2 Process 3
![Page 14: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/14.jpg)
Getting Started
14
Initialization
Finalization
• gather information about the parallel job
• set up internal library state• prepare for communication
![Page 15: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/15.jpg)
Process Identification
15
Total number of processes
Rank of a process
• MPI_Comm_size: get the total number of processes
• MPI_Comm_rank: get my rank
![Page 16: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/16.jpg)
MPI Code Execution Steps (On cluster)
• Compile• mpicc -o program.x program.c
• Execute• mpirun -np 1 ./program.x (or mpiexec in place of mpirun)
• Runs 1 process on the launch node
• mpirun -np 6 ./program.x• Runs 6 processes on the launch node
• mpirun -np 6 -f hostfile ./program.x• Runs 6 processes on the nodes specified in the hostfile
16
<hostfile>
Host1:3Host2:2Host3:1…
![Page 17: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/17.jpg)
Multiple Processes on Multiple Cores and Nodes
mpiexec –np 8 –f hostfile ./program.x
17
Memory
Process
Core
Node 1 Node 2 Node 3 Node 4
0
1
2
3
4
5
6
7
Node1:2Node2:2Node3:2Node4:2
Hostfile
How?
![Page 18: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/18.jpg)
Multiple Processes on Multiple Cores and Nodes
mpiexec –np 8 –f hostfile ./program.x 18
Memory
Process
Core
Node 1 Node 2 Node 3 Node 4
0
1
2
3
4
5
6
7
Node1:2Node2:2Node3:2Node4:2
Hostfile
Launch node
Proxy Proxy ProxyProxy
![Page 19: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/19.jpg)
19
MPI Process Spawning
The child process and the parent process run in separate memory spaces. At the time of fork() both memory spaces have the same content. The entire virtual address space of the parent is replicated in the child.
https://man7.org/linux/man-pages/man2/fork.2.html
https://linux.die.net/man/3/execvp
The exec() family of functions replaces the current process image with a new process image.
![Page 20: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/20.jpg)
SIMD Parallelism (Recap)
Memory
Process
Core
Instruction 1
Instruction 2
…
parallel.x
20
NO centralized server/master (typically)
![Page 21: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/21.jpg)
Sum of Squares of N numbers
21
Serial
for i = 1 to N
sum += a[i] * a[i]
Parallel
for i = 1 to N/P
sum += a[i] * a[i]
P
Memory
Process
Core
0 1 2 3
for i = 1 to N/P
sum += a[i] * a[i]
for i = 1 to N/P
sum += a[i] * a[i]
for i = 1 to N/P
sum += a[i] * a[i]
for i = 1 to N/P
sum += a[i] * a[i]
for i = N/P * rank ; i < N/P * (rank+1) ; i++
localsum += a[i] * a[i]
Collect localsum, add up at one of the ranks
![Page 22: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/22.jpg)
MPI Installation
22
1. sudo apt-get install mpich
OR
1. Download source code from https://www.mpich.org/downloads/2. ./configure --prefix=<INSTALL_PATH>3. make4. make install
![Page 23: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/23.jpg)
MPI Code
23
Global communicator
Global communicator
![Page 24: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/24.jpg)
MPI_COMM_WORLD
Required in every MPI communication
24
![Page 25: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/25.jpg)
Communication Scope
25
Communicator (communication handle)
• Defines the scope
• Specifies communication context
Process
• Belongs to a group
• Identified by a rank within a group
Identification
• MPI_Comm_size – total number of processes in communicator
• MPI_Comm_rank
![Page 26: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/26.jpg)
MPI_COMM_WORLD
2
4
3
1
0
26
Process identified by rank/id
![Page 27: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/27.jpg)
MPI Message
• Data and header/envelope
• Typically, MPI communications send/receive messages
27
Source: Origin of message
Destination: Receiver of message
Communicator
Tag (0:MPI_TAG_UB)
Message Envelope
![Page 28: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/28.jpg)
MPI Communication Types
Point-to-pointCollective
![Page 29: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/29.jpg)
Point-to-point Communication
• MPI_Send
• MPI_Recv
SENDER
RECEIVER
int MPI_Send (const void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
int MPI_Recv (void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
Tags should match
Blocking send and receive
![Page 30: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/30.jpg)
MPI_Datatype
• MPI_BYTE
• MPI_CHAR
• MPI_INT
• MPI_FLOAT
• MPI_DOUBLE
30
![Page 31: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/31.jpg)
Example 1
31
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
}
// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, 0, 99, MPI_COMM_WORLD, &status);printf ("received :%s\n", message);
}
Message tag
Message tag
![Page 32: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/32.jpg)
MPI_Status
• Source rank
• Message tag
• Message length• MPI_Get_count
32
![Page 33: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/33.jpg)
MPI_ANY_*
• MPI_ANY_SOURCE• Receiver may specify wildcard value for source
• MPI_ANY_TAG• Receiver may specify wildcard value for tag
33
![Page 34: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/34.jpg)
Example 2
34
MPI_Comm_rank (MPI_COMM_WORLD, &myrank);
// Sender processif (myrank == 0) /* code for process 0 */{strcpy (message,"Hello, there");MPI_Send (message, strlen(message)+1, MPI_CHAR, 1, 99, MPI_COMM_WORLD);
}
// Receiver processelse if (myrank == 1) /* code for process 1 */{MPI_Recv (message, 20, MPI_CHAR, MPI_ANY_SOURCE, 99, MPI_COMM_WORLD,
&status);printf ("received :%s\n", message);
}
![Page 35: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/35.jpg)
MPI_Send (Blocking)
• Does not return until buffer can be reused
• Message buffering
• Implementation-dependent
• Standard communication mode
SENDER
RECEIVER
35
![Page 36: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/36.jpg)
Safety
36
0 1
MPI_Send
MPI_Send
MPI_Recv
MPI_RecvSafe
MPI_Send
MPI_Recv
MPI_Send
MPI_RecvUnsafe
MPI_Send
MPI_Recv
MPI_Recv
MPI_SendSafe
MPI_Recv
MPI_Send
MPI_Recv
MPI_SendUnsafe
![Page 37: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/37.jpg)
Other Send Modes
• MPI_Bsend Buffered• May complete before matching receive is posted
• MPI_Ssend Synchronous• Completes only if a matching receive is posted
• MPI_Rsend Ready•
37
![Page 38: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/38.jpg)
Non-blocking Point-to-Point
• MPI_Isend (buf, count, datatype, dest, tag, comm, request)
• MPI_Irecv (buf, count, datatype, source, tag, comm, request)
• MPI_Wait
38
0 1
MPI_Isend
MPI_Recv
MPI_Isend
MPI_RecvSafe
![Page 39: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/39.jpg)
Computation Communication Overlap
39
0 1
MPI_Isend
MPI_Recv
MPI_Wait
compute
compute
compute
compute
compute
Time
![Page 40: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/40.jpg)
Collective Communications
• Must be called by all processes that are part of the communicator
Types
• Synchronization (MPI_Barrier)
• Global communication (MPI_Bcast, MPI_Gather, …)
• Global reduction (MPI_Reduce
40
![Page 41: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/41.jpg)
Barrier
• Synchronization across all group members
• Collective call
• Blocks until all processes have entered the call
• MPI_Barrier (comm)
41
![Page 42: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/42.jpg)
Broadcast
• Root process sends message to all processes
• Any process can be root process but has to be the same in all processes
• int MPI_Bcast (buffer, count, datatype, root, comm)
• Number of elements in buffer – count
• buffer – Input or output
42
X X
X
X
X
X
X
X
X
![Page 43: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/43.jpg)
Example 3
43
int rank, size, color;MPI_Status status;
MPI_Init (&argc, &argv);MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);
color = rank + 2;int oldcolor = color;MPI_Bcast (&color, 1, MPI_INT, 0, MPI_COMM_WORLD);
printf ("%d: %d color changed to %d\n", rank, oldcolor, color);
![Page 44: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/44.jpg)
Gather
• Gathers values from all processes to a root process
• int MPI_Gather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)
• Arguments recv* not relevant on non-root processes
44
A0 A1 A2
A0
A1
A2
DATA
PR
OC
ESSE
S
![Page 45: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/45.jpg)
Example 4
45
MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);
color = rank + 2;
int colors[size];MPI_Gather (&color, 1, MPI_INT, colors, 1, MPI_INT, 0, MPI_COMM_WORLD);
if (rank == 0)for (i=0; i<size; i++)printf ("color from %d = %d\n", i, colors[i]);
![Page 46: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/46.jpg)
Scatter
• Scatters values to all processes from a root process
• int MPI_Scatter (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, root, comm)
• Arguments send* not relevant on non-root processes
• Output parameter – recvbuf
46
A0 A1 A2
A0
A1
A2
DATA
PR
OC
ESSE
S
![Page 47: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/47.jpg)
Allgather
• All processes gather values from all processes
• int MPI_Allgather (sendbuf, sendcount, sendtype, recvbuf, recvcount, recvtype, comm)
47
A0 A1 A2
A0 A1 A2
A0 A1 A2
A0
A1
A2
DATA
PR
OC
ESSE
S
![Page 48: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/48.jpg)
Reduce
• MPI_Reduce (inbuf, outbuf, count, datatype, op, root, comm)
• Combines element in inbuf of each process
• Combined value in outbuf of root
• op: MIN, MAX, SUM, PROD, …
48
0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0
2 1 2 3 4 5 6 MAX at root
![Page 49: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/49.jpg)
Allreduce
• MPI_Allreduce (inbuf, outbuf, count, datatype, op, comm)
• op: MIN, MAX, SUM, PROD, …
• Combines element in inbuf of each process
• Combined value in outbuf of each process
49
0 1 2 3 4 5 6 2 1 2 3 2 5 2 0 1 1 0 1 1 0
2 1 2 3 4 5 6
MAX
2 1 2 3 4 5 62 1 2 3 4 5 6
![Page 50: Programming using MPI - ccds.iitkgp.ac.in](https://reader034.fdocuments.net/reader034/viewer/2022052004/62859f8b8ddde21918521e2e/html5/thumbnails/50.jpg)
DEMO
50