12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer...
-
Upload
jewel-nelson -
Category
Documents
-
view
217 -
download
0
Transcript of 12.1 Parallel Programming. 12.2 Types of Parallel Computers Two principal types: 1.Single computer...
12.1
Parallel Programming
12.2
Types of Parallel Computers
Two principal types:
1. Single computer containing multiple processors - main memory is shared, hence called “Shared memory multiprocessor”
2. Interconnected multiple computer systems
12.3
Conventional ComputerConsists of a processor executing a program stored in a (main) memory:
Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address.
Main memory
Processor
Instructions (to processor)Data (to or from processor)
12.4
Shared Memory Multiprocessor• Extend single processor model - multiple
processors connected to a single shared memory with a single address space:
Memory
Processors
A real system will have cache memory associated with each processor
12.5
Examples
• Dual Pentiums
• Quad Pentiums
12.6
Quad Pentium Shared Memory MultiprocessorProcessor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Processor
L2 Cache
Bus interface
L1 cache
Memory controller
Memory
I/O interface
I/O bus
Processor/memorybus
Shared memory
12.7
Programming Shared Memory Multiprocessors
• Threads - programmer decomposes program into parallel sequences (threads), each being able to access variables declared outside threads.
Example: Pthreads
• Use sequential programming language with preprocessor compiler directives, constructs, or syntax to declare shared variables and specify parallelism. Examples: OpenMP (an industry standard), UPC
(Unified Parallel C) -- needs compilers.
12.8
• Parallel programming language with syntax to express parallelism. Compiler creates executable code -- not now common.
• Use parallelizing compiler to convert regular sequential language programs into parallel executable code - also not now common.
12.9
Message-Passing Multicomputer
Complete computers connected through an interconnection network:
Processor
Interconnectionnetwork
Local
Computers
Messages
memory
12.10
Dedicated cluster with a master node
UserExternal network
Master node
Compute nodes
Switch
2nd Ethernet interface
Ethernet interface
Cluster
12.11
Programming Clusters
• Usually based upon explicit message-passing.
• Common approach -- a set of user-level libraries for message passing. Example:– Parallel Virtual Machine (PVM) - late 1980’s.
Became very popular in mid 1990’s. – Message-Passing Interface (MPI) - standard
defined in 1990’s and now dominant.
12.12
MPI(Message Passing Interface)
• Message passing library standard developed by group of academics and industrial partners to foster more widespread use and portability.
• Defines routines, not implementation.
• Several free implementations exist.
12.13
MPI designed:
• To address some problems with earlier message-passing system such as PVM.
• To provide powerful message-passing mechanism and routines - over 126 routines(although it is said that one can write reasonable MPI
programs with just 6 MPI routines).
12.14
Message-Passing Programming using User-level Message Passing Libraries
Two primary mechanisms needed:
1. A method of creating separate processes for execution on different computers
2. A method of sending and receiving messages
12.15
Multiple program, multiple data (MPMD) model
Sourcefile
Executable
Processor 0 Processor p - 1
Sourcefile
12.16
Single Program Multiple Data (SPMD) model.
Basic MPI way
Sourcefile
Executables
Processor 0 Processor p - 1
Different processes merged into one program. Control statements select different parts for each processor to execute.
12.17
Multiple Program Multiple Data (MPMD) Model
Process 1
Process 2spawn();
Time
Start executionof process 2
Separate programs for each processor. One processor may execute as master process. Other processes started from within master process - dynamic process creation.
Can be done with MPI version 2
12.18
Communicators• Defines scope of a communication operation.
• Processes have ranks associated with communicator.
• Initially, all processes enrolled in a “universe” called MPI_COMM_WORLD, and each process is given a unique rank, a number from 0 to p - 1, with p processes.
• Other communicators can be established for groups of processes.
12.19
Using SPMD Computational Model
main (int argc, char *argv[]) {MPI_Init(&argc, &argv);
.
.MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /*find rank */
if (myrank == 0)master();
elseslave();..
MPI_Finalize();}
where master() and slave() are to be executed by master process and slave process, respectively.
12.20
Basic “point-to-point”Send and Receive Routines
Process 1 Process 2
send(&x, 2);
recv(&y, 1);
x y
Movementof data
Generic syntax (actual formats later)
Passing a message between processes using send() and recv() library calls:
12.21
Message Tag
• Used to differentiate between different types of messages being sent.
• Message tag is carried within message.
• If special type matching is not required, a wild card message tag is used, so that the recv() will match with any send().
12.22
Message Tag Example
Process 1 Process 2
send(&x,2, 5);
recv(&y,1, 5);
x y
Movementof data
Waits for a message from process 1 with a tag of 5
To send a message, x, with message tag 5 from a source process, 1, to a destination process, 2, and assign to y:
12.23
Synchronous Message Passing
Routines return when message transfer completed.
Synchronous send routine• Waits until complete message can be accepted by
the receiving process before sending the message.
Synchronous receive routine• Waits until the message it is expecting arrives.
12.24
Synchronous send() and recv() using 3-way protocol
Process 1 Process 2
send();
recv();Suspend
Time
processAcknowledgment
MessageBoth processescontinue
(a) When send() occurs before recv()
Request to send
12.25
Synchronous send() and recv() using 3-way protocol
Process 1 Process 2
recv();
send();Suspend
Time
process
Acknowledgment
MessageBoth processescontinue
(b) When recv() occurs before send()
Request to send
12.26
• Synchronous routines intrinsically perform two actions:– They transfer data and – They synchronize processes.
12.27
Asynchronous Message Passing
• Routines that do not wait for actions to complete before returning. Usually require local storage for messages.
• In general, they do not synchronize processes but allow processes to move forward sooner. Must be used with care.
12.28
MPI Blocking and Non-Blocking• Blocking - return after their local actions
complete, though the message transfer may not have been completed.– For example, returns before message written to OS
buffer.
• Non-blocking - return immediately. Assumes that data storage used for transfer not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this.
12.29
How message-passing routines return before message transfer completed
Process 1 Process 2
send();
recv();
Message buffer
Readmessage buffer
Continueprocess
Time
Message buffer needed between source and destination to hold message:
12.30
Asynchronous routines changing to synchronous routines
• Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted.
• Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine.
12.31
Parameters of MPI blocking send
MPI_Send(buf, count, datatype, dest, tag, comm)
Address of send buffer
Number of items to send
Datatype of each item
Rank of destination process
Message tag
Communicator
12.32
Parameters of MPI blocking receive
MPI_Recv(buf,count,datatype,dest,tag,comm,status)
Address of receive buffer Max. number of
items to receive
Datatype of each item
Rank of source process
Message tag
Communicator
Status after operation
12.33
Example
To send an integer x from process 0 to process 1,
MPI_Comm_rank(MPI_COMM_WORLD,&myrank); /* find rank */
if (myrank == 0) {int x;MPI_Send(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD);
} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag,MPI_COMM_WORLD,status);
}
12.34
MPI Nonblocking Routines
• Nonblocking send - MPI_Isend() - will return “immediately” even before source location is safe to be altered.
• Nonblocking receive - MPI_Irecv() - will return even if no message to accept.
12.35
Detecting when message receive if sent with non-blocking send routine
Completion detected by MPI_Wait() and MPI_Test().
MPI_Wait() waits until operation completed and returns then.
MPI_Test() returns with flag set indicating whether operation completed at that time.
Need to know which particular send you are waiting for.
Identified with request parameter.
12.36
Example
To send an integer x from process 0 to process 1 and allow process 0 to continue,
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {
int x;MPI_Isend(&x,1,MPI_INT,1,msgtag,MPI_COMM_WORLD, req1);compute();MPI_Wait(req1, status);
} else if (myrank == 1) {int x;MPI_Recv(&x,1,MPI_INT,0,msgtag, MPI_COMM_WORLD, status);
}
12.37
Collective message passing routines
Have routines that send message(s) to a group of processes or receive message(s) from a group of processes
Higher efficiency than separate point-to-point routines although not absolutely necessary.
12.38
BroadcastSending same message to a group of processes.(Sometimes “Multicast” - sending same message to defined group of processes, “Broadcast” - to all processes.)
MPI_bcast();
buf
MPI_bcast();
data
MPI_bcast();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
12.39
MPI Broadcast routine
int MPI_Bcast(void *buf, int count, MPI_Datatype datatype, int root, MPI_Comm comm)
Actions:Broadcasts message from root process to all processes in comm and itself.
Parameters:*buf message buffercount number of entries in bufferdatatype data type of bufferroot rank of root
12.40
Scatter
MPI_scatter();
buf
MPI_scatter();
data
MPI_scatter();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.
12.41
Gather
MPI_gather();
buf
MPI_gather();
data
MPI_gather();
datadata
Process 0 Process p - 1Process 1
Action
Code
MPI form
Having one process collect individual values from set of processes.
12.42
Reduce
MPI_reduce();
buf
MPI_reduce();
data
MPI_reduce();
datadata
Process 0 Process p - 1Process 1
+
Action
Code
MPI form
Gather operation combined with specified arithmetic/logical operation.
Example: Values could be gathered and then added together by root:
12.43
Collective Communication
Involves set of processes, defined by an intra-communicator. Message tags not present. Principal collective operations:
• MPI_Bcast() - Broadcast from root to all other processes• MPI_Gather() - Gather values for group of processes• MPI_Scatter() - Scatters buffer in parts to group of processes• MPI_Alltoall() - Sends data from all processes to all
processes• MPI_Reduce() - Combine values on all processes to single
value• MPI_Reduce_scatter() - Combine values and scatter results• MPI_Scan() - Compute prefix reductions of data on processes
12.44
ExampleTo gather items from group of processes into process 0, using dynamically allocated memory in root process:
int data[10];/*data to be gathered from processes*/
MPI_Comm_rank(MPI_COMM_WORLD, &myrank);/* find rank */if (myrank == 0) {MPI_Comm_size(MPI_COMM_WORLD,&grp_size);/*find size*/
/*allocate memory*/buf = (int *)malloc(grp_size*10*sizeof (int));
}MPI_Gather(data,10,MPI_INT,buf,grp_size*10,MPI_INT,0, MPI_COMM_WORLD);
MPI_Gather() gathers from all processes, including root.
12.45
Sample MPI program
#include “mpi.h”#include <stdio.h>#include <math.h>#define MAXSIZE 1000
void main(int argc, char *argv) {int myid, numprocs;int data[MAXSIZE], i, x, low, high, myresult, result;char fn[255];char *fp;MPI_Init(&argc,&argv);//requiredMPI_Comm_size(MPI_COMM_WORLD,&numprocs);MPI_Comm_rank(MPI_COMM_WORLD,&myid);
12.46
Sample MPI program
if (myid == 0) { /* Open input file and initialize data */
strcpy(fn,getenv(“HOME”));strcat(fn,”/MPI/rand_data.txt”);if ((fp = fopen(fn,”r”)) == NULL) {
printf(“Can’t open the input file: %s\n\n”, fn);
exit(1);}for(i = 0; i < MAXSIZE; i++)
fscanf(fp,”%d”, &data[i]);} //All processes execute the rest of code.
MPI_Bcast(data, MAXSIZE, MPI_INT, 0, MPI_COMM_WORLD); /* broadcast data */
12.47
Sample MPI program
x = n/nproc; /* Add my portion Of data */
low = myid * x;
high = low + x;
for(i = low; i < high; i++)myresult += data[i];
printf(“I calculated %d from %d\n”, myresult, myid); /* Compute global sum */
MPI_Reduce(&myresult, &result, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);
if (myid == 0) printf(“The sum is %d.\n”, result);MPI_Finalize();
}
12.48
Message-Passing on a Grid
• VERY expensive, sending data across network costs millions of cycles
• Bandwidth shared with other users
• Links unreliable
12.49
Computational Strategies
• As a computing platform, a grid favors situations with absolute minimum communication between computers.
12.50
StrategiesWith no/minimum communication:
• “Embarrassingly Parallel” Computations– those computations which obviously can be
divided into parallel independent parts. Parts executed on separate computers.
• Separate instance of the same problem executing on each system, each using different data
12.51
Embarrassingly Parallel Computations
A computation that can obviously be divided into a number of completely independent parts, each of which can be executed by a separate process(or).
Processes
Results
Input data
No communication or very little communication between processes.Each process can do its tasks without any interaction with other processes
12.52
Monte Carlo Methods
• An embarrassingly parallel computation.
• Monte Carlo methods use of random selections.