Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W...

31
U U NIVERSITY OF NIVERSITY OF M M ASSACHUSETTS ASSACHUSETTS A A MHERST MHERST Department of Computer Science Department of Computer Science Parallel & Concurrent Programming: OpenMP Emery Berger CMPSCI 691W Spring 2006

Transcript of Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W...

Page 1: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science

Parallel & Concurrent Programming:

OpenMPEmery BergerCMPSCI 691WSpring 2006

Page 2: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 2

Outline

Last time(s):MPI – point-to-point & collective

Library calls

Today:OpenMP - parallel directives

Language extensions to Fortran/C/C++

Page 3: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 3

Motivation

Take vectors a & b (100 ints)Distribute across all processorsEach processor:

Compute sum of all a[i] * b[i]Print overall sum

MPI: Use MPI_Scatter, MPI_Gather orMPI_Reduce

MPI_Scatter/Gather(sendbuf, cnt, type, recvbuf, recvcnt, type, root, comm)MPI_Reduce(sendbuf, recvbuf, cnt, type, op, root, comm)

Page 4: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 4

MPI SolutionMPI_Init (&argc, &argv);MPI_Comm_rank (MPI_COMM_WORLD, &rank);MPI_Comm_size (MPI_COMM_WORLD, &size);

// Distribute a and bMPI_Scatter (a, 100, MPI_INT, a1, 100 / size, MPI_INT, 0, MPI_COMM_WORLD);MPI_Scatter (b, 100, MPI_INT, b1, 100 / size, MPI_INT, 0, MPI_COMM_WORLD);

// Multiply each chunkfor (int i = 0; i < 100/size; i++) {

z += a[i] *b1[i];}

// Reduce by summingif (rank == 0) {z1 = new int[size]; }MPI_Reduce (&z, &z, 1, MPI_INT, MPI_OP_PLUS, 0, MPI_COMM_WORLD);

// Output resultif (rank == 0) {

cout << z << endl; }

Page 5: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 5

Ideal Solution

int z = 0;parallel for (i = 0; i < nProcessors; i++) {z += a[i] * b[i];

}cout << z << endl;

Page 6: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 6

OpenMP Solution

int z = 0;#pragma omp forfor (int i = 0; i < 100; i++) {z += a[i] * b1[i];

}cout << z << endl;

OpenMP pragma directivesOmit = sequential programMore declarative styleAdd more pragmas for more efficiency

Page 7: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 7

OpenMP Concepts

Fork-join modelOne thread executes sequential codeUpon reaching parallel directive:

Start new team of work-sharing threadsWait until all done (usually barrier)Can be nested!

Apparent global shared memory but relaxed consistency model

Page 8: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 8

Consistency

Consistency =ordering of reads & writes

In same thread, across threads

Most “intuitive” consistency model = sequential consistency (Lamport)

Behaves like some sequential executionBUT: seriously limits parallelism

Must synchronize frequently

Page 9: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 9

OpenMP Consistency

OpenMP: consistency across flushesWrites set of variables to memoryIf two flushes have intersecting sets, flushes must be seen in some sequential order by all threads

Page 10: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 10

Parallel Execution

#pragma omp parallelExecutes next chunk of code across all or some number of threads

num_threads(n)

Only “master thread” continues after parallel section completes

Page 11: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 11

Dynamic Threads

Page 12: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 12

Parallel + nowait

Implicit barrier unless nowaitBarrier = flush operation

Page 13: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 13

Parallel + Memory

Memory model:Heap objects sharedStack objects private

Includes loop iterators

unless indicated otherwise...

Page 14: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 14

Parallel Example

Page 15: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 15

Data-Sharing Attributes

sharedprivate

Each thread gets own private copyUndefined value

firstprivateCopies in original value

lastprivateCopies out private value

Page 16: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 16

Lastprivate Example

Page 17: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 17

Threadprivate Example

Can also declare variables as alwaysthread-private

Page 18: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 18

Reduce

reductionprivate value per threadinitialized “appropriately”

uses predefined operators

copies out to originalreduction(+:a)

initializes a = 0reduction(*:1)

initializes a = 1

Page 19: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 19

OpenMP Solution

int z = 0;#pragma omp for reduction(+:z)for (int i = 0; i < 100; i++) {z += a[i] * b1[i];

}cout << z << endl;

OpenMP pragma directivesOmit = sequential programMore declarative styleAdd more pragmas for more efficiency

Page 20: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 20

All Together

Page 21: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 21

But Still Races...

Page 22: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 22

Master & Synchronization

masterAlways run by master thread

criticalDeclares critical section (one thread at a time)Can add names for greater concurrency

barrieratomic

Updated atomically (a++, a--, etc.)ordered

Executes loop body sequentially

Page 23: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 23

Atomic Example

Page 24: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 24

The End

Page 25: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 25

Single Example

Page 26: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 26

Page 27: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 27

Ordered For

Page 28: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 28

Copyin Example

Page 29: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 29

Copyprivate Example

Page 30: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 30

Page 31: Parallel & Concurrent Programming: OpenMPemery/classes/cmpsci... · OpenMP Emery Berger CMPSCI 691W Spring 2006. UNIVERSITY OF MASSACHUSETTS AMHERST ...

UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS AAMHERST MHERST •• Department of Computer ScienceDepartment of Computer Science 31

The End

Next time:OpenMP