0
MPI Training Course Part 1 Basic
KISTI Supercomputing Center http://www.ksc.re.kr http://edu.ksc.re.kr
1 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 1
Objectives After successfully learning the tutorial in this module,
you will be able to
- Understand parallel program concept
- Compile and run MPI programs using the MPICH implementation
- Write MPI programs using the core library
1
2 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 2
Syllabus (Basic Course) Motivation/Introduction to Education/ A first MPI
program
Break
MPI Basic I – Six-function MPI
Lunch
MPI Basics II – P2P:Blocking Communication
Break
MPI Basics III – P2P:Non-Blocking Comm. & Hands on
Summary
09:30 – 10:30
10:30 – 10:40
10:40 – 12:00
12:00 – 13:30
13:30 – 14:50
14:50 – 15:00
15:00 – 16:30
16:30 – 17:00
2
3 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 3
Reference & Useful Site 1. MPI Tutorial <MPI 온라인 강좌> http://www.citutor.org/login.php
<MPI Tutorial> https://computing.llnl.gov/tutorials/mpi/
<Domain Decomposition 강좌> http://www.nccs.nasa.gov/tutorials/mpi_tutorial2/mpi_II_tutorial.html
<MPI 한글 레퍼런스> http://incredible.egloos.com/3755171
<SP Parallel Programming Workgroup> http://www.itservices.hku.hk/sp2/workshop/html/samples/exercises.html
<PDC Center for HPC> http://www.pdc.kth.se/education/tutorials/summer-school/mpi-exercises/mpi-
labs/mpi-lab-1-program-structure-and-point-to-point-communication-in-mpi/mpi-lab-1-program-structure-and-point-to-point-communication-in-mpi
3
4 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 4
Reference & Useful Site 2. MPI 예제사이트 <MPI standard> http://www.mcs.anl.gov/research/projects/mpi/tutorial/mpiexmpl/co
ntents.html
<Hands on Session> http://www.cse.iitd.ernet.in/~dheerajb/MPI/Document/hos_cont.html
<A simple Jacobi iteration> http://www.cecalc.ula.ve/HPCLC/slides/day_04/mpi-
exercises/src/jacobi/C/main.html
3. MPI Book <RS/6000 SP: Practical MPI Programming> http://www.redbooks.ibm.com/redbooks/pdfs/sg245380.pdf
<MPI 병렬 프로그래밍: 멀티코어 시대에 꼭 알아야 할> http://www.yes24.com/24/goods/3931112
4
5 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
MPI Introduction and Concept
6 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 6
Why Parallelization? Parallelization is another optimization technique
The goal is to reduce the execution time
To this end, multiple processors, or cores, are used
1 core
Tim
e parallelization
4 core
Using 4 cores, the execution time is ¼ of the single core time
6
7 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 7
Current HPC Platforms: COTS-Based Clusters
COTS = Commercial off-the-shelf
…
Login Node(s)
Access Control
Compute Nodes
File Server(s)
Nehalem
Gulftown
7
8 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 8
Evolution of Intel Multi-core CPU
Penryn Bloomfield
Gulftown Beckton
멀티 코어
8
9 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
How To Program A Parallel Program?
10 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 10
Shared and Distributed Memory
core1 core2
CPU1
core1 core2
CPU2
controller RAM
FSB
Platform (node)
node1 node2 node3
Cluster interconnect
Shared memory on each node… Distributed memory across cluster
Multi-core CPUs in clusters – two types of parallelism to consider
10
11 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 11
Shared & Distributed Memory
core1 core2
CPU1
core1 core2
CPU2
controller RAM
FSB
Platform (node)
node1 node2 node3
Cluster interconnect
Shared memory on each node… Distributed memory across cluster
Multi-core CPUs in clusters – two types of parallelism to consider
Pthreads OpenMP
Automatic Parallelization
Sockets PVM MPI
11
12 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 12
How To Program A Parallel Program? A Single System(“Shared Memory”)
– Native Threading Model(standardized, low level) – Open MP(de-facto standard) – Automatic Parallelization(compiler does if for you)
A Cluster of Systems(“Distributed Memory”)
– Sockets(standardized, low level) – PVM – Parallel Virtual Machine (obsoleted) – MPI – Message Passing Interface (de-facto standard)
12
13 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 13
Parallel Models Compared MPI Threads OpenMP*
Portable
Scalable
Performance Oriented
Supports Data Parallel
Incremental Parallelism
High Level
Serial Code Intact
Verifiable Correctness
Distributed Memory
13
14 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 14
What is MPI? Message Passing Interface MPI is a library, not a language It is a library for inter-process communication and data
exchange
Use for Distributed Memory
History – MPI-1 Standard (MPI Forum) : 1994
• http://www.mcs.anl.gov/mpi/index.html • MPI-1.1(1995), MPI-1.2(1997)
– MPI-2 Announce : 1997 • http://www.mpi-forum.org/docs/docs.html • MPI-2.1(2008), MPI-2.2(2009)
14
15 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 15
Common MPI Implementations MPICH(Argonne National Laboratory)
– Most common MPI implementation – Derivatives
• MPICH GM – Myrinet support (available from Myricom) • MVAPICH – infiniband support (available from Ohio State University) • Intel MPI – Version tuned to Intel Architecture systems
Open MPI(Indiana University/LANL) – Contains many MPI 2.0 features – FT-MPI: University of Tennessee (Data types, process fault tolerance, high
performance) – LA-MPI: Los Alamos (Pt-2-Pt, data fault-tolearnce, high performance, thread
safety) – LAM/MPI: Indiana University (Component architecture, dynamic processes) – PACX-MPI: HLRS - Stuttgart (dynamic processes, distributed environments,
collectives) Scali MPI Connect
– Provides native support for most high-end interconnects MPI/Pro (MPI Software Technology)
15
16 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
How To Run MPI
17 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 17
MPI Basic Steps Writing a program
– using “mpi.h” and some essential function calls
Compiling your program – using a compilation script
Specify the machine file
17
18 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 18
“Hello, World” in MPI
#include <stdio.h> #include “mpi.h” int main (int argc, char* argv[]) { /* Initialize the library */ MPI_Init (&argc, &argv); printf(“Hello world\n”); /* Wrap it up. */ MPI_Finalize (); }
Demonstrates how to create, compile and run a simple MPI program on the lab cluster using the Intel MPI implementation
Initialize MPI Library
Do some work!
Return the resources
18
19 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 19
Compiling an MPI Program Most MPI implementations supply compilation scripts, eg.
Manual compilation/linking also possible:
Language Command Used to Compile
Fortran 77 mpif77 mpi_prog.f
Fortran 90 mpif90 mpi_prog.f
C mpicc mpi_prof.f90
C++ mpiCC or mpicxx mpi_prof.C
cc -o hello -L/applic/mpi/mvapich2/gcc/lib -I/applic/mpi/mvapic/gcc/include -lmpich hello.c
19
20 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 20
MPI Machine File A text file telling MPI where to launch processes Put separate host name on each line Example
Check implementation for multi-processor node formats Default file found in MPI installation
host0 host1 host2 host3
20
21 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 21
Parallel Program Execution Launch scenario for MPIRUN
– Find machine file(to know where to launch) – Use SSH or RSH to execute a copy of the program
on each node in machine file – Once launched each copy establishes
communications with local MPI lib (MPI_Init) – Each copy ends MPI interaction with MPI_Finalize
21
22 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 22
Starting an MPI Program
Execute on node1: mpirun -np 4 -hostfile mpihosts mpi_prog
Check the MPICH hostfile: node1 node2 node3 node4 node5 node6 node7 node8
mpi_prog(rank 0)
node1
mpi_prog(rank 1)
node2
mpi_prog(rank 2)
node3
mpi_prog(rank 3)
node4
1
2
3
22
23
24 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 24
Activity 1: A First MPI Program “Hello, World” in MPI MPI hostfile
#include <stdio.h> #include “mpi.h” int main(int argc, char* argv[]) { MPI_Init(&argc, &argv); /* Initialize the library */ printf (“Hello World \n”); MPI_Finalize(); /*Wrap it up */ return 0; }
s0001 s0002 s0003 s0004
Compile : mpicc -g -o hello –Wall hello.c Run : mpirun -np 4 -hostfile mpihosts hello
24
25 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 25
Run MPI Program
25
26 SUPERCOMPUTING EDUCATION CENTER
27 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
Six-function MPI
28 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 28
The Whole Library MPI is large and complex
– MPI 1.0 – 125 functions – MPI 2.0 is even larger
But, many MPI features are rarely used
– Inter-communicators – Topologies – Persistent communication – Functions designed for library developers
28
29 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 29
The Absolute Minimum Six MPI functions
– Many parallel algorithms can be implemented efficiently with only these functions
Fortran C
MPI_INIT()
MPI_COMM_SIZE()
MPI_COMM_RANK()
MPI_SEND()
MPI_RECV()
MPI_FINALIZE()
MPI_Init()
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Finalize()
29
30 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 30
The Absolute Minimum Six MPI functions
– Many parallel algorithms can be implemented efficiently with only these functions
Fortran C
MPI_INIT()
MPI_COMM_SIZE()
MPI_COMM_RANK()
MPI_SEND()
MPI_RECV()
MPI_FINALIZE()
MPI_Init()
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Finalize()
30
31 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 31
Starting MPI
MPI_Init prepares the system for MPI execution
Call to MPI_Init may update arguments in C
• Implementation dependent
No MPI functions may be called before MPI_Init
CPU 7
Memory
CPU 1
Memory
CPU 4
Memory
CPU 3
Memory
Inter-connection Ne
twork
CPU 0
Memory
CPU 2
Memory
CPU 5
Memory
CPU 6
Memory
Rank 0
Rank 1
Rank 2
Rank 3 MPI_COMM_WORLD
Fortran C
CALL MPI_INIT(ierr) int MPI_Init(&argc, &argv)
31
32 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 32
Exiting MPI
MPI_Finalize frees any memory allocated by the MPI library
No MPI function may be called after calling MPI_Finalize
– Exception:MPI_Init
Fortran C
CALL MPI_FINALIZE(ierr) int MPI_Finalize();
If any one process does not reach the finalization statement, the program will appear to hang.
32
33 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 33
The Absolute Minimum Six MPI functions
Fortran C
MPI_INIT()
MPI_COMM_SIZE()
MPI_COMM_RANK()
MPI_SEND()
MPI_RECV()
MPI_FINALIZE()
MPI_Init()
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Finalize()
33
34 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 34
MPI communicator A handle representing a group of processes that can
communicate with each other(more about communicators later)
All MPI communication calls have a communicator argument
Most often you will use MPI_COMM_WORLD – Defined when you call MPI_Init – It is all of your processors.
0 1
5 2
4 3 6
MPI_COMM_WORLD
34
35 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Communicator size
MPI_Comm_size returns the number of processes in the specified communicator
The communicator structure, MPI_Comm, is defined in mpi.h
How many processes are contained within a
communicator?
36 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Process rank
Process ID number within communicator
MPI_Comm_rank returns the rank of calling process
within the specified communicator
Processes are numbered from 0 to N-1
Fortran CALL MPI_COMM_RANK(comm, size)
C int MPI_Comm_rank(MPI_Comm com, int *size)
37
38 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Lab 2: “Hello, World” with IDs
#include <stdio.h> #include “mpi.h” int main (int argc, char* argv[]) { int numProc, myRank; MPI_Init (&argc, &argv); /* Initialize the library */ MPI_Comm_rank (MPI_COMM_WORLD, &myRank); /* Who am I?” */ MPI_Comm_size (MPI_COMM_WORLD, &numProc); /*How many? */ printf (“Hello. Process %d of %d here.\n”, myRank, numProc); MPI_Finalize (); /* Wrap it up. */ }
39 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Lab 2: “Hello, World” with IDs
40 SUPERCOMPUTING EDUCATION CENTER
41 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
Basic Communication
42 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
The Absolute Minimum • Six MPI functions
Fortran C
MPI_INIT()
MPI_COMM_SIZE()
MPI_COMM_RANK()
MPI_SEND()
MPI_RECV()
MPI_FINALIZE()
MPI_Init()
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Finalize()
43 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Sending and Receiving Message Basic message passing process
•Where to send •What to send •How many to send
•Where to receive •What to receive •How many to receive
44 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Message Organization in MPI Message is divided into data and envelope
• data – buffer
– count – datatype
• envelope – process identifier (source/destination rank) – message tag – communicator
45 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
MPI Data Types (1/2) MPI Data Type C Data Type
MPI_CHAR – 1 Byte character signed char
MPI_SHORT – 2 Byte integer signed short int
MPI_INT – 4 Byte integer signed int
MPI_LONG – 4 Byte integer signed long int
MPI_UNSIGNED_CHAR – 1 Byte u char unsigned char
MPI_UNSIGNED_SHORT – 2 Byte u int unsigned short int
MPI_UNSIGNED – 4 Byte u int unsigned int
MPI_UNSIGNED_LONG– 4 Byte u int
unsigned long int
MPI_FLOAT – 4 Byte float point float
MPI_DOUBLE – 8 Byte float point double
MPI_LONG_DOUBLE- – 8 Byte float point
long double
46 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
MPI Data Types (2/2) MPI Data Type Fortran Data Type
MPI_INTEGER – 4 Byte Integer INTEGER
MPI_REAL – 4 Byte floating point REAL
MPI_DOUBLE_PRECISION – 8 Byte DOUBLE PRECISION
MPI_COMPLEX – 4 Byte float real COMPLEX
MPI_LOGICAL – 4 Byte logical LOGICAL
MPI_CHARACTER – 1 Byte character CHARACTER(1)
2345 654 96574 -12 7676
count=5 INTEGER arr(5) datatype=MPI_INTEGER
47 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 47
Communication Envelope
Message = Data + Envelope
1 Communicator
Source
rank
Destination
rank
tag
4
0
2 To: destination rank
From: source rank
item-1 item-2 item-3 „count“ item-4 elements ... item-n
47
48 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
MPI Blocking Send & Receive
49 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Blocking Recv - Status • Status Information
– Send Process (Rank) – Tag – Data size : MPI_GET_COUNT
Information Fortran C
source status(MPI_SOURCE) status.MPI_SOURCE
tag status(MPI_TAG) status.MPI_TAG
Error Status(MPI_ERROR) Status.MPI_ERROR
count MPI_GET_COUNT() MPI_Get_count()
50 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
MPI Send & Receive Example • To send an integer x from process 0 to process 1
51 SUPERCOMPUTING EDUCATION CENTER
52 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 52
Ping Pong Example Fortran C
PROGRAM Pingpong
INCLUDE 'mpif.h‘
INTEGER nrank, nprocs, ierr, status(MPI_STATUS_SIZE)
INTEGER :: tag = 55, ROOT = 0
INTEGER :: send = -1, recv = -1
CALL MPI_INIT(ierror)
CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nprocs, ierr)
CALL MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr)
IF (nrank == ROOT) THEN
PRINT *, 'Before : nrank=', nrank, 'send=', send, 'recv=',
recv
send = 7
CALL MPI_SEND(send, 1, MPI_INTEGER, 1, tag, MPI_COMM_WORLD,
ierr)
ELSE
CALL MPI_RECV(recv, 1, MPI_INTEGER, ROOT, tag, MPI_COMM_WORLD,
status, ierr)
PRINT *, 'After : nrank=', nrank, 'send=', send, 'recv=',
recv
END IF
CALL MPI_FINALIZE(ierr)
END
#include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int nrank, nprocs, tag = 55, ROOT = 0; int send = -1, recv = -1; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &nrank); if (nrank == ROOT) { printf("Before : nrank(%d) send = %d, recv = %d\n", nrank, send, recv); send = 7; MPI_Send(&send, 1, MPI_INTEGER, 1, tag, MPI_COMM_WORLD); } else { MPI_Recv(&recv, 1, MPI_INTEGER, ROOT, tag, MPI_COMM_WORLD, &status); printf("After : nrank(%d) send = %d, recv = %d\n", nrank, send, recv); } MPI_Finalize(); return 0; }
$ mpif90 –o pingpong.x pingpong.f90 $ mpirun –np 4 –hostfile hosts ./pingpong.x
$ mpicc –o pingpong pingpong.c $ mpirun –np 2 –hostfile hosts ./pingpong
52
53 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 53
The Absolute Minimum Six MPI functions
Fortran C
MPI_INIT()
MPI_COMM_SIZE()
MPI_COMM_RANK()
MPI_SEND()
MPI_RECV()
MPI_FINALIZE()
MPI_Init()
MPI_Comm_size()
MPI_Comm_rank()
MPI_Send()
MPI_Recv()
MPI_Finalize()
53
54 SUPERCOMPUTING EDUCATION CENTER
55
P2P: Blocking Communications
56 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 56
Point-to-Point Communication Simplest form of message passing.
One process sends a message to another.
Different types of point-to-point communication: synchronous send buffered = asynchronous send
56
57 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 57
Synchronous Sends A synchronous communication does not complete until
the message has been received Analogue to the beep or okay-sheet of a fax.
ok
beep
57
58 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 58
Buffered = Asynchronous Sends An asynchronous communication completes as
soon as the message is on the way. A post card or email
58
59 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 59
Blocking Operations Some sends/receives may block until another process
acts: – synchronous send operation blocks until receive is
issued; – receive operation blocks until message is sent.
Blocking subroutine returns only when the operation has completed.
59
60 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 60
Non-Blocking Operations Non-blocking operations return immediately and
allow the sub-program to perform other work.
ok
beep
60
61 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
Communication Mode
Mode MPI Call Routine
Blocking Non Blocking
Shynchronous MPI_SSEND MPI_ISSEND
Ready MPI_RSEND MPI_IRSEND
Buffer MPI_BSEND MPI_IBSEND
Standard MPI_SEND MPI_ISEND
Recv MPI_RECV MPI_IRECV
62 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 62
블록킹 송신 : 표준
(CHOICE) buf : 송신 버퍼의 시작 주소 (IN) INTEGER count : 송신될 원소 개수 (IN) INTEGER datatype : 각 원소의 MPI 데이터 타입(핸들) (IN) INTEGER dest : 수신 프로세스의 랭크 (IN) 통신이 불필요하면 MPI_PROC_NULL INTEGER tag : 메시지 꼬리표 (IN) INTEGER comm : MPI 커뮤니케이터(핸들) (IN)
MPI_SEND(a, 50, MPI_REAL, 5, 1, MPI_COMM_WORLD, ierr)
C int MPI_Send(void *buf, int count, MPI_Datatype datatype, int dest, int tag, MPI_Comm comm)
Fortran MPI_SEND(buf, count, datatype, dest, tag, comm, ierr)
62
63 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 63
블록킹 수신
(CHOICE) buf : 수신 버퍼의 시작 주소 (OUT) INTEGER count : 수신될 원소 개수 (IN) INTEGER datatype : 각 원소의 MPI 데이터 타입(핸들) (IN) INTEGER source : 송신 프로세스의 랭크 (IN) 통신이 불필요하면 MPI_PROC_NULL INTEGER tag : 메시지 꼬리표 (IN) INTEGER comm : MPI 커뮤니케이터(핸들) (IN) INTEGER status(MPI_STATUS_SIZE) : 수신된 메시지의 정보 저장 (OUT)
MPI_RECV(a,50,MPI_REAL,0,1,MPI_COMM_WORLD,status,ierr)
C int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag, MPI_Comm comm, MPI_Status *status)
Fortran MPI_RECV(buf, count, datatype, source, tag, comm, status, ierr)
63
64 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 64
Precautions for successful Com The receiver has exact rank of sender
The sender has exact rank of receiver
Same communicator
Same message tag
Enough buffer size of receiver
[s0001:20135] *** An error occurred in MPI_Recv [s0001:20135] *** on communicator MPI_COMM_WORLD [s0001:20135] *** MPI_ERR_TRUNCATE: message truncated [s0001:20135] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
64
65 SUPERCOMPUTING EDUCATION CENTER
66 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 66
Lab #4 Parallel Sum
<Problem> – Parallel sum, ex) 1 ~ 100
<Hint & Requirement> – Decide the sum range of each rank
• Use processor size (rank), sum number (100)
– ROOT (0 Rank) involve in the own calculation and get total sum
66
67 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 67
Lab #4 Parallel Sum – C (1/2) /*
Example Name : parallel_sum.c
Compile : $ mpicc -g -o parallel_sum -Wall parallel_sum.c
Run : $ mpirun -np 6 -hostfile hosts parallel_sum
*/
#include <stdio.h>
#include <mpi.h>
#define NUMBER 100
int main(int argc, char *argv[])
{
int i, nRank, nProcs, ROOT = 0;
int nQuotient, nRemainder, nMyStart, nMyEnd, nMySum = 0, nTotal;
MPI_Status status;
MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nProcs); MPI_Comm_rank(MPI_COMM_WORLD, &nRank);
nQuotient = NUMBER / nProcs;
nRemainder = NUMBER % nProcs;
nMyStart = nRank * nQuotient + 1;
nMyEnd = nMyStart + nQuotient – 1;
67
68 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 68
Lab #4 Parallel Sum – C (2/2) if (nRank == nProcs-1) nMyEnd += nRemainder;
printf("nRank (%d) : nMyStart = %3d ~ nMyEnd = %3d\n", nRank, nMyStart, nMyEnd);
// Sum each scope
for (i = nMyStart; i <= nMyEnd; i++) nMySum += i;
if (nRank == ROOT) {
nTotal = nMySum;
for (i = 1; i < nProcs ; i++ ) {
MPI_Recv(&nMySum, 1, MPI_INT, i, 10, MPI_COMM_WORLD, &status); nTotal += nMySum;
}
printf("\nTotal sum (1 ~ %d) = %d.\n", NUMBER, nTotal);
}
else
MPI_Send(&nMySum, 1, MPI_INT, ROOT, 10, MPI_COMM_WORLD); MPI_Finalize();
return 0;
}
68
69 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 69
Lab #4 Parallel Sum - Compile & Run
$ mpicc -g -o parallel_sum -Wall parallel_sum.c
$ mpirun -np 6 -hostfile hosts parallel_sum
nRank (3) : nMyStart = 49 ~ nMyEnd = 64
nRank (4) : nMyStart = 65 ~ nMyEnd = 80
nRank (0) : nMyStart = 1 ~ nMyEnd = 16
nRank (1) : nMyStart = 17 ~ nMyEnd = 32
nRank (5) : nMyStart = 81 ~ nMyEnd = 100
nRank (2) : nMyStart = 33 ~ nMyEnd = 48
Total sum (1 ~ 100) = 5050.
69
70 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 70
Lab #4 Parallel Sum – Fortran (1/2) ! Example Name : parallel_sum.f90
! Compile : $ mpif90 -g -o parallel_sum.x -Wall parallel_sum.f90
! Run : $ mpirun –np 6 -hostfile hosts parallel_sum.x
PROGRAM parallel_sum
IMPLICIT NONE
INCLUDE 'mpif.h' INTEGER i, nRank, nProcs, iErr, status(MPI_STATUS_SIZE)
INTEGER :: ROOT = 0, nMySum = 0, nTotal
INTEGER nQuotient, nRemainder, nMyStart, nMyEnd
INTEGER NUMBER
PARAMETER (NUMBER = 100)
CALL MPI_INIT(iErr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nProcs, iErr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, nRank, iErr)
nQuotient = NUMBER/ nProcs
nRemainder = mod(NUMBER, nProcs)
nMyStart = nRank * nQuotient + 1
nMyEnd = nMyStart + nQuotient - 1
IF (nRank == nProcs-1) nMyEnd = nMyEnd + nRemainder
70
71 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 71
Lab #4 Parallel Sum – Fortran (2/2)
PRINT *, 'nRank=', nRank, 'nMyStart=', nMyStart, 'nMyEnd=', nMyEnd
! sum each scope
DO i=nMyStart, nMyEnd
nMySum = nMySum + i
END DO
IF (nRank == ROOT) THEN
nTotal = nMySum
DO i=1, nProcs-1
CALL MPI_RECV(nMySum, 1, MPI_INTEGER, i, 55, MPI_COMM_WORLD, status, iErr) nTotal = nTotal + nMySum
END DO
PRINT *
PRINT *, 'Total sum (1 ~ ', NUMBER, ') = ', nTotal
ELSE
CALL MPI_SEND(nMySum, 1, MPI_INTEGER, ROOT, 55, MPI_COMM_WORLD, iErr) END IF
CALL MPI_FINALIZE(iErr) END
71
72 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 72
Lab #4 Parallel Sum – Compile & Run
$ mpif90 -g -o parallel_sum.x -Wall parallel_sum.f90
$ mpirun -np 6 -hostfile hosts parallel_sum.x
nRank= 3 nMyStart= 49 nMyEnd= 64
nRank= 4 nMyStart= 65 nMyEnd= 80
nRank= 1 nMyStart= 17 nMyEnd= 32
nRank= 2 nMyStart= 33 nMyEnd= 48
nRank= 5 nMyStart= 81 nMyEnd= 100
nRank= 0 nMyStart= 1 nMyEnd= 16
Total sum (1 ~ 100 ) = 5050
72
73 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
P2P : Non-Blocking Communication
74 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 74
논블록킹 통신 통신을 세 가지 상태로 분류
1. 논블록킹 통신의 초기화 : 송신 또는 수신의 포스팅
2. 전송 데이터를 사용하지 않는 다른 작업 수행
• 통신과 계산 작업을 동시 수행
3. 통신 완료 : 대기 또는 검사
교착 가능성 제거, 통신 부하 감소
74
75 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER
논블록킹 통신의 초기화
INTEGER request : 초기화된 통신의 식별에 이용 (핸들) (OUT)
논블록킹 수신에는 status 인수가 없음
76 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 76
논블록킹 통신의 완료 대기(waiting) 또는 검사(testing)
– 대기 (waiting)
• 루틴이 호출되면 통신이 완료될 때까지 프로세스를 블록킹 • 논블록킹 통신 + 대기 = 블록킹 통신
– 검사 (testing)
• 루틴은 통신의 완료여부에 따라 참 또는 거짓을 리턴
76
77 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 77
MPI_WAIT
INTEGER request : 포스팅된 통신의 식별에 이용 (핸들) (IN) INTEGER status(MPI_STATUS_SIZE) : 수신 메시지에 대한 정
보 또는 송신 루틴에 대한 에러코드 (OUT)
C int MPI_Wait(MPI_Request *request, MPI_Status *status)
Fortran MPI_WAIT(request, status, ierr)
77
78 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 78
MPI_TEST
INTEGER request : 포스팅된 통신의 식별에 이용 (핸들) (IN) LOGICAL flag : 통신이 완료되면 참, 아니면 거짓을 리턴 (OUT) INTEGER status(MPI_STATUS_SIZE) : 수신 메시지에 대한 정
보 또는 송신 루틴에 대한 에러코드 (OUT)
C int MPI_Test(MPI_Request *request, int *flag, MPI_Status *status)
Fortran MPI_TEST(request, flag, status, ierr)
78
79 SUPERCOMPUTING EDUCATION CENTER
80 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 80
논블록킹 표준송수신 예제 : C /* Example name : isend.c */ #include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int nrank, nprocs, i; float data[100], value[200]; MPI_Request req; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &nrank); if (nrank == 1) { for (i=0; i<100; i++) data[i] = i; MPI_Isend(data, 100, MPI_FLOAT, 0, 55, MPI_COMM_WORLD, &req); MPI_Wait(&req, &status); } else if (nrank == 0){ for (i=0; i<200; i++) value[i] = 0.0; MPI_Irecv(value, 200, MPI_FLOAT, 1, 55, MPI_COMM_WORLD, &req); MPI_Wait(&req, &status); printf("rank(%d) value[5] = %f.\n", nrank, value[5]); } MPI_Finalize(); return 0; } $ mpicc -o isend -Wall isend.c $ mpirun -np 2 -hostfile hosts isend rank(0) value[5] = 5.000000.
80
81 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 81
Nonblocking Example Fortran C
Example name : non_p2p.f90 PROGRAM non_p2p INCLUDE 'mpif.h' INTEGER ierr, nrank, req INTEGER status(MPI_STATUS_SIZE) INTEGER :: send = -1, recv = -1, ROOT = 0 CALL MPI_INIT(ierr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, nrank, ierr) IF (nrank == ROOT) THEN PRINT *, 'Before : nrank = ', nrank, 'send = ', send, 'recv = ', recv send = 7 CALL MPI_ISEND(send, 1, MPI_INTEGER, 1, 55, MPI_COMM_WORLD, req, ierr) PRINT *, 'Other job calculating' CALL MPI_WAIT(req, status, ierr) ELSE CALL MPI_RECV(recv, 1, MPI_INTEGER, ROOT, 55, MPI_COMM_WORLD, status, ierr) PRINT *, 'After : nrank = ', nrank, 'send = ', send, 'recv = ', recv ENDIF CALL MPI_FINALIZE(ierr) END
#include <stdio.h> #include <mpi.h> int main(int argc, char *argv[]) { int nrank, nprocs, tag = 55, ROOT = 0; int send = -1, recv = -1; MPI_Request req; MPI_Status status; MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nprocs); MPI_Comm_rank(MPI_COMM_WORLD, &nrank); if (nrank == ROOT) { printf("Before : nrank(%d) send = %d, recv = %d\n", nrank,
send, recv); send = 7; MPI_Isend(&send, 1, MPI_INTEGER, 1, tag, MPI_COMM_WORLD,
&req); printf("Other job calculating.\n\n"); MPI_Wait(&req, &status); } else { MPI_Recv(&recv, 1, MPI_INTEGER, ROOT, tag, MPI_COMM_WORLD,
&status); printf("After : nrank(%d) send = %d, recv = %d\n", nrank,
send, recv); } MPI_Finalize(); return 0; }
$ mpif90 –o non_p2p.x non_p2p.f90 $ mpirun –np 4 –hostfile hosts ./non_p2p.x
$ mpicc –o non_p2p non_p2p.c $ mpirun –np 2 –hostfile hosts ./non_p2p
81
82 SUPERCOMPUTING EDUCATION CENTER
83 2012-08-30
SUPERCOMPUTING EDUCATION CENTER 2012-08-30
SUPERCOMPUTING EDUCATION CENTER
II. MPI Programming Basic
Advanced Exercise
84 SUPERCOMPUTING EDUCATION CENTER
85 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 85
Advanced Lab #1 PI (Monte Carlo Simulation) <Problem>
– Monte carlo simulation – Random number use – PI = 4 ⅹAc/As
<Requirement> – N’s processor(rank) use – P2p communication
r
85
86 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 86
Advanced Lab #1 PI MC Simulation – C (1/3) /*
Example Name : pi_monte.c
Compile : $ mpicc -g -o pi_monte -Wall pi_monte.c
Run : $ mpirun -np 4 -hostfile hosts pi_monte
*/
#include <stdio.h>
#include <stdlib.h>
#include <math.h>
#include <time.h>
#include <mpi.h>
#define SCOPE 100000000
/*
** PI (3.14) Monte Carlo Method
*/
int main(int argc, char *argv[])
{
int nProcs, nRank, proc, ROOT = 0;
MPI_Status status;
int nTag = 55;
int i, nCount = 0, nMyCount = 0;
double x, y, z, pi, z1;
86
87 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 87
Advanced Lab #1 PI MC Simulation – C (2/3) MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nProcs); MPI_Comm_rank(MPI_COMM_WORLD, &nRank);
srand(time(NULL)*(nRank+1));
for (i = 0; i < SCOPE; i++) {
x = (double)rand() / (double)(RAND_MAX);
y = (double)rand() / (double)(RAND_MAX);
z = x*x + y*y;
z1 = sqrt(z);
if (z1 <= 1) nMyCount++;
}
if (nRank == ROOT) {
nCount = nMyCount;
for (proc=1; proc<nProcs; proc++) {
MPI_Recv(&nMyCount, 1, MPI_REAL, proc, nTag, MPI_COMM_WORLD, &status); nCount += nMyCount;
}
pi = (double)4*nCount/(SCOPE * nProcs);
87
88 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 88
Advanced Lab #1 PI MC Simulation – C (3/3) printf("Processor %d sending results = %d to ROOT processor\n", nRank, nMyCount);
printf("\n # of trials (cpu#: %d, time#: %d) = %d, estimate of pi is %f\n",
nProcs, SCOPE, SCOPE*nProcs, pi);
}
else {
printf("Processor %d sending results = %d to ROOT processor\n", nRank, nMyCount);
MPI_Send(&nMyCount, 1, MPI_REAL, ROOT, nTag, MPI_COMM_WORLD); }
printf("\n");
MPI_Finalize();
return 0;
}
88
89 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 89
Advanced Lab #1 PI MC Simulation - Compile & Run
$ mpicc -g -o pi_monte -Wall pi_monte.c
$ mpirun -np 4 -hostfile hosts pi_monte
Processor 2 sending results = 78541693 to ROOT processor
Processor 1 sending results = 78532183 to ROOT processor
Processor 3 sending results = 78540877 to ROOT processor
Processor 0 sending results = 78540877 to ROOT processor
# of trials (cpu#: 4, time#: 100000000) = 400000000, estimate of pi is 3.141516
89
90 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 90
Advanced Lab #1 PI MC Simulation – Fortran (1/3) ! Example Name : pi_monte.f90
! Compile : $ mpif90 -g -o pi_monte.x -Wall pi_monte.f90
! Run : $ mpirun -np 4 -hostfile hosts pi_monte.x
PROGRAM pi_monte
IMPLICIT NONE
INCLUDE 'mpif.h'
INTEGER nRank, nProcs, iproc, iErr
INTEGER :: ROOT = 0, scope = 100000000
INTEGER :: i, nMyCount = 0
REAL :: x, y, z, pi, z1, nCount = 0
INTEGER status(MPI_STATUS_SIZE)
CALL MPI_INIT(iErr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, nRank, iErr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nProcs, iErr)
! Get Random #
CALL INIT_RANDOM_SEED(nRank)
DO i=0, scope-1
CALL RANDOM_NUMBER(x)
CALL RANDOM_NUMBER(y)
90
91 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 91
Advanced Lab #1 PI MC Simulation – Fortran (2/3) z = x*x + y*y
z1 = SQRT(z)
IF (z1 <= 1) nMyCount = nMyCount + 1
END DO
IF (nRank == ROOT) THEN
nCount = nMyCount
DO iproc=1, nProcs-1
CALL MPI_RECV(nMyCount, 1, MPI_INTEGER, iproc, 55, MPI_COMM_WORLD, status, iErr)
nCount = nCount + nMyCount
END DO
pi = 4 * nCount / (scope * nProcs)
WRITE (*, '(A, I2, A, I10, A)‘) 'Processor=', nRank, ' sending results = ', nMyCount, ' to ROOT process'
WRITE (*, '(A, I2, A, F15.10)‘) '# of trial is ', nProcs, ' estimate of PI is ', pi
ELSE
CALL MPI_SEND(nMyCount, 1, MPI_INTEGER, ROOT, 55, MPI_COMM_WORLD, iErr) WRITE (*, '(A, I2, A, I10)') 'Processor=', nRank, ‘sending results = ', nMyCount
91
92 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 92
Advanced Lab #1 PI MC Simulation – Fortran (3/3) END IF
CALL MPI_FINALIZE(iErr)
CONTAINS
SUBROUTINE INIT_RANDOM_SEED(rank)
IMPLICIT NONE
INTEGER rank
INTEGER :: i, n, clock
INTEGER, DIMENSION(:), ALLOCATABLE :: seed
CALL RANDOM_SEED(size = n)
ALLOCATE(seed(n))
CALL SYSTEM_CLOCK(COUNT=clock)
seed = clock + 37 * (/ (i - 1, i = 1, n) /)
seed = seed * rank * rank
CALL RANDOM_SEED(PUT = seed)
DEALLOCATE(seed)
END SUBROUTINE
END
92
93 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 93
Advanced Lab #1 PI MC Simulation - Compile & Run
$ mpif90 -g -o pi_monte.x -Wall pi_monte.f90
$ mpirun -np 4 -hostfile hosts pi_monte.x
Processor= 3 sending results = 78540734
Processor= 2 sending results = 78537818
Processor= 0 sending results = 78540734 to ROOT process
Processor= 1 sending results = 78542358
# of trial is 4 estimate of PI is 3.1415631771
93
94 SUPERCOMPUTING EDUCATION CENTER
95 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 95
Advanced Lab #2 PI (Numerical Integration) <Problem>
– Get PI using Numerical integration
<Requirement>
– Point to point communication
....
n1
)( 1xf
nx 1)5.01(1 ×−=
nx 1)5.02(2 ×−=
nnxn
1)5.0( ×−=
)( 2xf
)( nxf
∫ 4.0 (1+x2)
dx = π
0
1
nn
i
n
i
1
)1)5.0((1
41 2
××−+
≈∑=
π
95
96 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 96
Advanced Lab #2 PI Numerical Integration – C(1/3) /*
Example Name : pi_integral.c
Compile : $ mpicc -g -o pi_integral -Wall pi_integral.c
Run : $ mpirun -np 4 -hostfile hosts pi_integral
*/
#include <stdio.h>
#include <math.h>
#include <mpi.h>
#define SCOPE 100000000
int main(int argc, char *argv[])
{
int i, n = SCOPE;
double sum, step, pi, x, tsum, ROOT = 0;
int nRank, nProcs;
MPI_Status status;
MPI_Init(&argc, &argv); MPI_Comm_size(MPI_COMM_WORLD, &nProcs); MPI_Comm_rank(MPI_COMM_WORLD, &nRank);
if (nRank == ROOT) {
for (i = 1; i < nProcs; i++)
96
97 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 97
Advanced Lab #2 PI Numerical Integration – C(2/3) MPI_Send(&n, 1, MPI_INT, i, 55, MPI_COMM_WORLD); }
else
MPI_Recv(&n, 1, MPI_INT, ROOT, 55, MPI_COMM_WORLD, &status);
step = 1.0 / (double)n;
sum = 0.0;
tsum = 0.0;
for (i = nRank; i < n; i += nProcs) {
x = ((double)i-0.5)*step;
sum = sum + 4 /(1.0 + x*x);
}
if (nRank == ROOT) {
tsum = sum;
for (i = 1; i < nProcs; i++) {
MPI_Recv(&sum, 1, MPI_DOUBLE, i, 56, MPI_COMM_WORLD, &status); tsum = tsum + sum;
}
pi = step * tsum;
97
98 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 98
Advanced Lab #2 PI Numerical Integration – C(3/3)
printf("---------------------------------------------\n");
printf("PI = %.15f (Error = %E)\n", pi, fabs(acos(-1.0) - pi));
printf("---------------------------------------------\n");
}
else
MPI_Send(&sum, 1, MPI_DOUBLE, ROOT, 56, MPI_COMM_WORLD); MPI_Finalize();
return 0;
}
98
99 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 99
Advanced Lab #2 PI Compile & Run
$ mpicc -g -o pi_integral -Wall pi_integral.c
$ mpirun -np 4 -hostfile hosts pi_integral
---------------------------------------------
PI = 3.141592673590217 (Error = 2.000042E-08)
---------------------------------------------
99
100 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 100
Advanced Lab #2 PI Numerical Integration – Fortran(1/2) ! Example Name : pi_monte.f90 ! Compile : $ mpif90 -g -o pi_integral.x -Wall pi_integral.f90 ! Run : $ mpirun -np 4 -hostfile hosts pi_integral.x PROGRAM pi_integral IMPLICIT NONE INCLUDE "mpif.h" INTEGER*8 :: i, n = 1000000, ROOT = 0 DOUBLE PRECISION sum, step, mypi, x, tsum INTEGER nRank, nProcs, iErr INTEGER status(MPI_STATUS_SIZE) CALL MPI_INIT(iErr) CALL MPI_COMM_SIZE(MPI_COMM_WORLD, nProcs, iErr) CALL MPI_COMM_RANK(MPI_COMM_WORLD, nRank, iErr) IF (nRank .EQ. ROOT) THEN DO i=1, nProcs-1 CALL MPI_SEND(n, 1, MPI_INTEGER8, i, 55, MPI_COMM_WORLD, iErr) END DO ELSE CALL MPI_RECV(n, 1, MPI_INTEGER8, ROOT, 55, MPI_COMM_WORLD, status, iErr) END IF step = (1.0d0 / dble(n)) sum = 0.0d0
100
101 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 101
Advanced Lab #2 PI Numerical Integration – Fortran(2/2) tsum = 0.0d0 DO i=nRank+1, n, nProcs x = (dble(i) - 0.5d0) * step sum = sum + 4.d0 / (1.d0 + x*x) END DO IF (nRank .EQ. ROOT) THEN tsum = sum DO i=1, nProcs-1 CALL MPI_RECV(sum, 1, MPI_DOUBLE_PRECISION, i, 56, MPI_COMM_WORLD, status,
iErr) tsum = tsum + sum END DO mypi = step * tsum WRITE (*, 400) WRITE (*, 100) mypi, dabs(dacos(-1.d0) - mypi) WRITE (*, 400) 100 FORMAT ('mypi = ', F17.15, ' (Error = ', E11.5, ')') 400 FORMAT ('----------------------------------------') ELSE CALL MPI_SEND(sum, 1, MPI_DOUBLE_PRECISION, ROOT, 56, MPI_COMM_WORLD, iErr) END IF CALL MPI_FINALIZE(iErr) END
101
102 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 102
Advanced Lab #2 PI Compile & Run
$ mpif90 -g -o pi_integral.x -Wall pi_integral.f90
$ mpirun -np 4 -hostfile hosts pi_integral.x
----------------------------------------
mypi = 3.141592653589903 (Error = 0.11013E-12)
----------------------------------------
102
103 SUPERCOMPUTING EDUCATION CENTER
104 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 104
Advanced Lab #3 2D Array Msg Communication <Problem>
– Global 2D Array (10 X 10) – Local 2D Array (5 x 5) – Use 4 Processor
<Requirement> – Initialize the 2D array with number
• 0# RANK : 101 ~ 125 • 1# RANK : 201 ~ 225 • 2# RANK : 301 ~ 325 • 3# RANK : 401 ~ 425
– Collecting four local 2d arrays(5x5) into one – Global 2d array(10x10) – Use MPI_Isend & MPI_Recv – Print Global 2d array(10x10) in the screen
0 1
2 3
104
105 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 105
Advanced Lab #3 2D Array Msg Communication – C (1/4) /*
Example Name : lab5.c
Compile : mpicc -g -o lab5 -Wall lab5.c
Run : $ mpirun -np 4 -hostfile hosts lab5
*/
#include <stdio.h>
#include <math.h>
#include <mpi.h>
#define LOCAL_ROW 5
#define LOCAL_COL 5
#define GLOBAL_ROW 10
#define GLOBAL_COL 10
#define MAX_PROCS 16
int main (int argc, char *argv[])
{
int nRank, nProcs, ROOT = 0;
int i, j, row, col;
int nLocalGrid[LOCAL_ROW][LOCAL_COL];
int nGlobalGrid[GLOBAL_ROW][GLOBAL_COL];
MPI_Status status[MAX_PROCS];
MPI_Request req[MAX_PROCS];
105
106 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 106
Advanced Lab #3 2D Array Msg Communication – C (2/4) MPI_Init(&argc, &argv); // MPI start
MPI_Comm_rank(MPI_COMM_WORLD, &nRank); // Get current processor rank id
MPI_Comm_size(MPI_COMM_WORLD, &nProcs); // Get number of processors
// Init Local Array
for (i = 0; i < LOCAL_ROW; i++)
for (j = 0; j < LOCAL_COL; j++)
nLocalGrid[i][j] = (nRank+1)*100 + i*LOCAL_COL + j + 1;
MPI_Barrier(MPI_COMM_WORLD);
printf("nRank : %d\n", nRank);
printf("-------------------\n");
for (i = 0; i < LOCAL_ROW; i++) {
for (j = 0; j < LOCAL_COL; j++) {
printf("%d ", nLocalGrid[i][j]);
}
printf("\n");
}
printf("-------------------\n");
if (nRank != ROOT) {
MPI_Isend(nLocalGrid[0], LOCAL_COL, MPI_INTEGER, ROOT, 11, MPI_COMM_WORLD, &req[0]);
MPI_Isend(nLocalGrid[1], LOCAL_COL, MPI_INTEGER, ROOT, 12, MPI_COMM_WORLD, &req[1]);
MPI_Isend(nLocalGrid[2], LOCAL_COL, MPI_INTEGER, ROOT, 13, MPI_COMM_WORLD, &req[2]);
MPI_Isend(nLocalGrid[3], LOCAL_COL, MPI_INTEGER, ROOT, 14, MPI_COMM_WORLD, &req[3]);
MPI_Isend(nLocalGrid[4], LOCAL_COL, MPI_INTEGER, ROOT, 15, MPI_COMM_WORLD, &req[4]);
106
107 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 107
Advanced Lab #3 2D Array Msg Communication – C (3/4) for (i = 0; i < 5; i++)
MPI_Wait(&req[i], &status[i]);
}
else {
for (i = 0; i < LOCAL_ROW; i++)
for (j = 0; j < LOCAL_COL; j++)
nGlobalGrid[i][j] = nLocalGrid[i][j];
for (i = 1; i < nProcs; i++) {
row = i / (int)sqrt(nProcs);
col = i % (int)sqrt(nProcs);
MPI_Recv(&nGlobalGrid[row*LOCAL_ROW][col*LOCAL_COL], LOCAL_COL, MPI_INT,
i, 11, MPI_COMM_WORLD, &status[0]);
MPI_Recv(&nGlobalGrid[row*LOCAL_ROW+1][col*LOCAL_COL], LOCAL_COL, MPI_INT,
i, 12, MPI_COMM_WORLD, &status[1]);
MPI_Recv(&nGlobalGrid[row*LOCAL_ROW+2][col*LOCAL_COL], LOCAL_COL, MPI_INT,
i, 13, MPI_COMM_WORLD, &status[2]);
MPI_Recv(&nGlobalGrid[row*LOCAL_ROW+3][col*LOCAL_COL], LOCAL_COL, MPI_INT,
i, 14, MPI_COMM_WORLD, &status[3]);
MPI_Recv(&nGlobalGrid[row*LOCAL_ROW+4][col*LOCAL_COL], LOCAL_COL, MPI_INT,
i, 15, MPI_COMM_WORLD, &status[4]);
}
107
108 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 108
Advanced Lab #3 2D Array Msg Communication – C (4/4)
printf("nRank : %d (Total)\n", nRank);
printf("-------------------------------------------\n");
for (i = 0; i < GLOBAL_ROW; i++) {
for (j = 0; j < GLOBAL_COL; j++) {
printf("%d ", nGlobalGrid[i][j]);
if (j == (GLOBAL_COL/2)-1)
printf(" | ");
}
if (i == (GLOBAL_ROW/2)-1) {
printf("\n");
for (j = 0; j <= GLOBAL_COL; j++) {
printf(" - ");
}
}
printf("\n");
}
printf("-------------------------------------------\n");
}
MPI_Finalize();
return 0;
}
108
109 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 109
Advanced Lab #3 2D Array Compile & Run - C $ mpicc -g -o lab lab5.c
$ mpirun -np 4 -hostfile hosts ./lab5
nRank : 0
-------------------
101 102 103 104 105
106 107 108 109 110
111 112 113 114 115
116 117 118 119 120
121 122 123 124 125
-------------------
nRank : 0 (Total)
-------------------------------------------
101 102 103 104 105 | 201 202 203 204 205
106 107 108 109 110 | 206 207 208 209 210
111 112 113 114 115 | 211 212 213 214 215
116 117 118 119 120 | 216 217 218 219 220
121 122 123 124 125 | 221 222 223 224 225
- - - - - - - - - - -
301 302 303 304 305 | 401 402 403 404 405
306 307 308 309 310 | 406 407 408 409 410
311 312 313 314 315 | 411 412 413 414 415
316 317 318 319 320 | 416 417 418 419 420
321 322 323 324 325 | 421 422 423 424 425
-------------------------------------------
109
110 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 110
Advanced Lab #3 2D Array Msg – Fortran (1/4) ! Example Name : lab5.f90
! Compile : $ mpif90 -g -o lab5.x -Wall lab5.f90
! Run : $ mpirun -np 4 -hostfile hosts lab5.x
PROGRAM lab5
IMPLICIT NONE
INCLUDE 'mpif.h‘
INTEGER nRank, nProcs, nProcs_Sqrt, iErr
INTEGER status(16), req(16)
INTEGER :: ROOT = 0, i, j, row, col
INTEGER nLocalROW, nLocalCOL, nGlobalROW, nGlobalCOL
PARAMETER (nLocalROW=5, nLocalCOL=5, nGlobalROW=10, nGlobalCOL=10)
INTEGER nLocalGrid(0:nLocalROW-1, 0:nLocalCOL-1)
INTEGER nGlobalGrid(0:nGlobalROW-1, 0:nGlobalCOL-1)
CALL MPI_Init(iErr)
CALL MPI_Comm_size(MPI_COMM_WORLD, nProcs, iErr)
CALL MPI_Comm_rank(MPI_COMM_WORLD, nRank, iErr)
DO i = 0, nLocalROW-1
DO j = 0, nLocalCOL-1
nLocalGrid(i, j) = ((nRank+1) * 100) + i*nLocalCOL + j + 1
END DO
END DO
110
111 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 111
Advanced Lab #3 2D Array Msg – Fortran (2/4) CALL MPI_Barrier(MPI_COMM_WORLD, iErr)
WRITE(*, 100) 'nRank: ', nRank
WRITE(*, *) '------------------'
DO i = 0, nLocalROW-1
DO j = 0, nLocalCOL-1
WRITE (*, 200, advance='no') nLocalGrid(i, j)
END DO
WRITE(*, *)
END DO
WRITE(*, *) '------------------'
100 FORMAT(A, I2)
200 FORMAT(I3, X)
IF (nRank /= ROOT) THEN
CALL MPI_Isend(nLocalGrid(0, 0), nLocalROW, MPI_INTEGER, ROOT, 11, MPI_COMM_WORLD, req(1), ierr)
CALL MPI_Isend(nLocalGrid(0, 1), nLocalROW, MPI_INTEGER, ROOT, 12, MPI_COMM_WORLD, req(2), ierr)
CALL MPI_Isend(nLocalGrid(0, 2), nLocalROW, MPI_INTEGER, ROOT, 13, MPI_COMM_WORLD, req(3), ierr)
CALL MPI_Isend(nLocalGrid(0, 3), nLocalROW, MPI_INTEGER, ROOT, 14, MPI_COMM_WORLD, req(4), ierr)
CALL MPI_Isend(nLocalGrid(0, 4), nLocalROW, MPI_INTEGER, ROOT, 15, MPI_COMM_WORLD, req(5), ierr)
111
112 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 112
Advanced Lab #3 2D Array Msg – Fortran (3/4) DO i = 1, 5
CALL MPI_WAIT(req(i), status(i), ierr)
END DO
ELSE
DO i = 0, nLocalCOL-1
DO j = 0, nLocalROW-1
nGlobalGrid(j, i) = nLocalGrid(j, i)
END DO
END DO
DO i = 1, nProcs-1
nProcs_Sqrt = SQRT(REAL(nProcs))
row = i / nProcs_Sqrt
col = MOD(i, nProcs_Sqrt)
CALL MPI_Recv(nGlobalGrid(row*nLocalROW, col*nLocalCOL), nLocalROW, MPI_INTEGER, i, 11, MPI_COMM_WORLD, status(1), ierr)
CALL MPI_Recv(nGlobalGrid(row*nLocalROW, col*nLocalCOL+1), nLocalROW, MPI_INTEGER, i, 12, MPI_COMM_WORLD, status(2), ierr)
CALL MPI_Recv(nGlobalGrid(row*nLocalROW, col*nLocalCOL+2), nLocalROW, MPI_INTEGER, i, 13, MPI_COMM_WORLD, status(3), ierr)
CALL MPI_Recv(nGlobalGrid(row*nLocalROW, col*nLocalCOL+3), nLocalROW, MPI_INTEGER, i, 14, MPI_COMM_WORLD, status(4), ierr)
CALL MPI_Recv(nGlobalGrid(row*nLocalROW, col*nLocalCOL+4), nLocalROW, MPI_INTEGER, i, 15, MPI_COMM_WORLD, status(5), ierr)
END DO
112
113 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 113
Advanced Lab #3 2D Array Msg – Fortran (4/4)
WRITE(*, 100) 'nRank(Total): ', nRank
WRITE(*, *) '--------------------------------------'
DO i = 0, nGlobalROW-1
DO j = 0, nGlobalCOL-1
WRITE (*, 200, advance='no') nGlobalGrid(i, j)
END DO
WRITE(*, *)
END DO
WRITE(*, *) '--------------------------------------'
END IF
CALL MPI_FINALIZE(iErr)
END
113
114 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 114
Advanced Lab #3 2D Array Msg – Compile & Run $ mpif90 -g -o lab5.x lab5.f90
$ mpirun -np 4 -hostfile hosts ./lab5.x
nRank(Total): 0
--------------------------------------
101 102 103 104 105 201 202 203 204 205
106 107 108 109 110 206 207 208 209 210
111 112 113 114 115 211 212 213 214 215
116 117 118 119 120 216 217 218 219 220
121 122 123 124 125 221 222 223 224 225
301 302 303 304 305 401 402 403 404 405
306 307 308 309 310 406 407 408 409 410
311 312 313 314 315 411 412 413 414 415
316 317 318 319 320 416 417 418 419 420
321 322 323 324 325 421 422 423 424 425
--------------------------------------
114
115 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 115
Summary MPI is best for distributed-memory parallel systems There are several MPI implementations The MPI library is large but the core is small MPI Communication: Point-to-point, blocking Point-to-point, non-blocking Collective
115
116 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 116
Core MPI 11 Out of 125 Total Functions Program startup and shutdown
– MPI_Init, MPI_Finalize – MPI_Comm_size, MPI_Comm_rank
Point-to-point communication – MPI_Send, MPI_Recv
Non-blocking communication – MPI_Isend, MPI_Irecv, MPI_Wait
Collective communication – MPI_Bcast, MPI_Reduce
116
117 SUPERCOMPUTING EDUCATION CENTER SUPERCOMPUTING EDUCATION CENTER 117
Course Outline (KISTI) Part 1 – Introduction
– Basics of Parallel Computing – Six-function MPI – Basic Communication – Point-to-Point Communications – Non-blocking Communication
Part 2 – High-Performance MPI – Review of MPI Basics – Collective Communication – User-defined datatypes – Topologies – The Rest of MPI
117
118 SUPERCOMPUTING EDUCATION CENTER
Q&A
119 SUPERCOMPUTING EDUCATION CENTER
Top Related