Parallel Programming

35
Parallel Programming

description

Parallel Programming. Introduction. Idea has been around since 1960’s pseudo parallel systems on multiprogram-able computers True parallelism Many processors connected to run in concert Multiprocessor system Distributed system stand-alone systems connected - PowerPoint PPT Presentation

Transcript of Parallel Programming

Page 1: Parallel Programming

Parallel Programming

Page 2: Parallel Programming

Introduction

• Idea has been around since 1960’s– pseudo parallel systems on multiprogram-

able computers

• True parallelism – Many processors connected to run in

concert• Multiprocessor system• Distributed system

– stand-alone systems connected– More complex with high-speed networks

Page 3: Parallel Programming

Programming Languages

• Used to express algorithms to solve problems presented by parallel processing systems

• Used to write OSs that implement these solutions

• Used to harness capabilities of multiple processors efficiently

• Used to implement and express communication across networks

Page 4: Parallel Programming

Two kinds of parallelism

• Existing in underlying hardware• As expressed in programming language

– May not result in actual parallel processing– Could be implemented with pseudo

parallelism– Concurrent programming – expresses only

potential for parallelism

Page 5: Parallel Programming

Some Basics

• Process – An instance of a program or program part

that has been scheduled for independent execution

• Heavy-weight process– full-fledged independent entity with all the

memory and other resources that are ordinarily allocated by OS

• Light-weight process or thread – shares resources with program it came

from

Page 6: Parallel Programming

Primary requirements for organization

• Must be a way for processors to synchronize their activities

– 1st processor input and sorts data– 2nd processor waits to perform

computations on sorted data

• Must be a way for processors to communicate data among themselves

– 2nd processor needs data

Page 7: Parallel Programming

Architectures

• SIMD (single-instruction, multiple-data)– One processor is controller– All processors execute same instructions on

respective registers or data sets– Multiprocessing– Synchronous (all processors operate at same

speed)– Implicit solution to synchronization problem

• MIMD (multiple-instruction, multiple-data)– All processors act independently– Multiprocessor or distributed processor systems– Asynchronous (synchronization critical problem)

Page 8: Parallel Programming

OS requirements for Parallelism

• Means of creating and destroying processes

• Means of managing the number of processors used by processes

• Mechanism for ensuring mutual exclusion on shared-memory systems

• Mechanism for creating and maintaining communication channels between processors on distributed-memory systems

Page 9: Parallel Programming

Language requirements

• Machine independence• Adhere to language design principles

• Some languages use shared-memory model and provide facilities for mutual exclusion through a library

• Some assume distributed-memory model and provide communication facilities

• A few include both

Page 10: Parallel Programming

Common mechanisms

• Threads

• Semaphores

• Monitors

• Message passing

Page 11: Parallel Programming

2 common sample problems

• Bounded buffer problem – similar to producer-consumer problem

• Parallel matrix multiplication – N3 algorithm– Assign a process to compute each

element, each process on a separate processor N steps

Page 12: Parallel Programming

Without explicit language facilities

• One approach is not to be explicit– Possible in some functional, logical, and

OO languages– Certain inherent parallelism implicit

• Language translators use optimization techniques to make use automatically of OS utilities to assign different processors to different parts of program

• Suboptimal

Page 13: Parallel Programming

Another alternative without explicit language facilities

• Translator offers compiler options to allow explicit indicating of areas where parallelism is called for.

• Most effective in nested loops• Example: Fortran

Page 14: Parallel Programming

integer a(100, 100), b(100, 100), c(100,100)

integer i, j, k, numprocs, err

numprocs = 10

C code to read in a and b goes here

err = m_set_procs (numprocs)

C$doacross share (a, b, c), local (j, k)

do 10 i = 1, 100

do 10 j = 1, 100

c(i,j) = 0

do 10 k = 1, 100

c(i, j) = c(i,j) + a(i, k) * b (k, j)

10 continue

call m_kill_procs

C code to write out c goes here

end

compiler directive

synchronizes the processes, all processeswait for entire loop tofinish; one processcontinues after loop

local – local to process

share – access by all processes

m_set_procs –sets thenumber of processes

Page 15: Parallel Programming

3rd way with explicit constructs

• Provide a library of functions • This passes facilities provided by

OS directly to programmer• (This is the same as providing it in

language)• Example: C with library parallel.h

Page 16: Parallel Programming

#include <parallel.h>

#define size 100

#define NUMPROCS 10

shared int a[SIZE][SIZE], b[SIZE][SIZE], c [SIZE] [SIZE]

void multiply (void)

{ int i, j, k;

for (i=m_get_myid(); i < SIZE; i += NUMPROCS)

for (j=0; j < SIZE; j++)

for (k=0; k < SIZE; k++)

c(i, j) += a(i, k) * b (k, j);

}

main ()

{ int err;

// code to read in a and b goes here

m_set_procs (NUMPROCS);

m_fork (multiply);

m_kill_procs ();

// C code to write out c goes here

return 0;

}

m_set_procs –creates the10 processes, all instances of multiply

Page 17: Parallel Programming

4th final alternative

• Simply rely on OS• Example:

– pipes in Unix OS ls | grep “java”– runs ls and grep in parallel– output of ls is piped to grep

Page 18: Parallel Programming

Language with explicit mechanism

• 2 basic ways to create new processes– SPMD (single program multiple data)

• split the current process into 2 or more that execute copies of the same program

– MPMD (multiple program multiple data)• a segment of code associated with each new

process• typical case fork-join model, in which a process

creates several child processes, each with its own code (a fork), and then waits for the children to complete their execution (a join)

• last example similar, but m_kill_procs takes place of join

Page 19: Parallel Programming

Granularity

• Size of code assignable to separate processes– fine-grained: statement-level parallelism– medium-grained: procedure-level

parallelism– large-grained: program-level parallelism

• Can be an issue in program efficiency– small-grained: overhead – large-grained: may not exploit all

opportunities for parallelism

Page 20: Parallel Programming

Thread

• fine-grained or medium-grained without overhead of full-blown process creation

Page 21: Parallel Programming

Issues

• Does parent suspend execution while child processes are executing, or does it continue to execute alongside them?

• What memory, if any, does a parent share with its children or the children share among themselves?

Page 22: Parallel Programming

Answers in Last example

• parent process suspended execution

• indicate explicitly global variables shared by all processes

Page 23: Parallel Programming

Process Termination

• Simplest case– a process executes its code to

completion then ceases to exist

• Complex case– process may need to continue

executing until a certain condition is met and then terminate

Page 24: Parallel Programming

Statement-Level Parallelism (Ada)

parbegin

S1;

S2;

Sn;

parend;

Page 25: Parallel Programming

Statement-Level Parallelism (Fortran95)

FORALL (I = 1:100, J=1:100)

C(I,J) = 0;

DO 10 K = 1,100

C(I,J) = C(I,J) + A(I,k) * B(K,j)

10 CONTINUE

END FORALL

Page 26: Parallel Programming

Procedure-Level Parallelism (Ada)

x = newprocess(p);

killprocess(x);

• where p is declared procedure and x is a process designator

• similar to tasks in Ada

Page 27: Parallel Programming

Program-Level Parallelism (Unix)

• fork creates a process that is an• exact copy of calling process

if (fork ( ) == 0) { /*..child executes this part */}else { /* ..parent executes this part */}

• a returned 0-value indicates process is the child

Page 28: Parallel Programming

Java threads

• built into Java• Thread class part of java.lang

package• reserved word synchronize

– establish mutual exclusion

• create an instance of Thread object• define its run method that will

execute when thread starts

Page 29: Parallel Programming

Java threads

• 2 ways (I’ll show you second more versatile way)

• Define a class that implements Runnable interface (define run method)

• Then pass an object of this class to the Thread constructor

• Note: Every Java program is already executing inside a thread whose run method is main.

Page 30: Parallel Programming

Java Thread Example

class MyRunner implements Runnable

{ public void run()

{ … }

}

MyRunner m = new MyRunner ();

Thread t = new Thread (m);

t.start (); //t will now execute the run

//method

Page 31: Parallel Programming

Destroying threads

• let each thread run to completion• wait for other threads to finish

t.start (); //do some other workt.join () //wait for t to finish

• interrupt itt.start (); //do some other workt.interrupt() //tell t we are waiting…t.join () //wait for t to finish

Page 32: Parallel Programming

Mutual exclusion

class Queue{ … synchronized public Object dequeue () { if (empty()) throw … } synchronized public Object enqueue (Object

obj) { … }…}

Page 33: Parallel Programming

Mutual exclusion

class Remover implements Runnable

{ public Remover (Queue q) { ..}

public void run( ) { …q.dequeue() …}

}

class Insert implements Runnable

{ public Insert (Queue q) {…}

public void run () { …q.enqueue (…) …}

}

Page 34: Parallel Programming

Mutual exclusion

Queue myqueue = new Queue(..);

Remover r = new Remover (q);

Inserter i = new Insert (q);

Thread t1 = new Thread (r);

Thread t2 = new Thread (i);

t1.start();

t2.start();

Page 35: Parallel Programming

Manually stalling a thread and then reawakening it

class Queue{ … synchronized public Object dequeue () { try { while (empty()) wait(); } catch (InterruptedException e) //reset interrupt { … } } synchronized public Object enqueue (Object obj) { … notifyAll(); }…}