OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP...

76
1 OpenMP

Transcript of OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP...

Page 1: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

1

OpenMP

Page 2: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

2

Shared Memory Architektur

Processor

BUS

Memory

Processor Processor Processor

Page 3: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

3

OpenMP

• Portable programming of shared memory systems.

• It is a quasi-standard.

• OpenMP-Forum

• Started in 1997

• Current standard OpenMP 4.0 from July 2013

• API for Fortran and C/C++

• directives

• runtime routines

• environment variables

• www.openmp.org

Page 4: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

4

> export OMP_NUM_THREADS=3

> a.out

Hello World

Hello World

Hello World

> export OMP_NUM_THREADS=2

> a.out

Hello world

Hello world

> icc –O3 –openmp openmp.c

#include <omp.h>

main(){

#pragma omp parallel

{

printf(“Hello world”);

}

}

Example

Program

Compilation

Execution

Page 5: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

5

#pragma omp parallel

{

printf(“Hello world %d\n”, omp_get_thread_num());

}

PARALLEL {

Execution Model

print print print

}

T0

T0 T1 T2

T0

Thread

Team

Creates

Team is destroyed

Page 6: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

6

Fork/Join Execution Model

1. An OpenMP-program starts as a single thread (master

thread).

2. Additional threads (Team) are created when the master hits

a parallel region.

3. When all threads finished the parallel region, the new

threads are given back to the runtime or operating system.

• A team consists of a fixed set of threads executing

the parallel region redundantly.

• All threads in the team are synchronized at the end

of a parallel region via a barrier.

• The master continues after the parallel region.

Page 7: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

7

Work Sharing in a Parallel Region

main (){

int a[100];

#pragma omp parallel

{

#pragma omp for

for (int i= 1; i<n;i++)

a(i) = i;

}

}

Page 8: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

8

Shared and Private Data

• Shared data are accessible by all threads. A reference

a[5] to a shared array accesses the same address in

all threads.

• Private data are accessible only by a single thread.

Each thread has its own copy.

• The default is shared.

Page 9: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

9

Private clause for parallel loop

main (){

int a[100], t;

#pragma omp parallel

{

#pragma omp for private(t)

for (int i= 1; i<n;i++){

t=f(i);

a(i)=t;

}

}

}

Page 10: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

10

Example: Private Data

I=3

#pragma omp parallel private(i)

{

I=17

}

Printf(“Value of I=%d\n”, I);

I = 3 I = 3

I1 = 17

I2 = 17

I3 = 17

I = 3

I = 3 I = 17

I1 = 17

I2 = 17

I = 17

Page 11: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

11

Example

main (){

int iam, nthreads;

#pragma omp parallel private(iam,nthreads)

{

iam = omp_get_thread_num();

nthreads = omp_get_num_threads();

printf(“ThradID %d, out of %d threads\n”, iam, nthreads);

if (iam == 0) ! Different control flow

printf(“Here is the Master Thread.\n”);

else

printf(“Here is another thread.\n”);

}

}

Page 12: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

12

Private Data

• A new copy is created for each thread.

• One thread may reuse the global shared copy.

• The private copies are destroyed after the parallel

region.

• The value of the shared copy is undefined.

Page 13: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

13

Example: Shared Data

I=77

#pragma omp parallel shared(i)

{

I=omp_get_thread_num();

}

Printf(“Value of I=%d\n”, I);

I = 77 I = 2 I = 1 I = 0I = 0

In Parallel Region

I = 77 I = 0 I = 1 I = 2I = 2I = 77 I = 3 I = 2 I = 1I = 1

Page 14: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

14

int main() {

#pragma omp parallel default(shared)

{

printf(”hello world\n” );

}

}

!$OMP PARALLEL DEFAULT(SHARED)

write(*,*) ´Hello world´

!$OMP END PARALLEL

Syntax of Directives and Pragmas

Fortran

!$OMP directive name [parameters]

C / C++

#pragma omp directive name [parameters]

Page 15: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

15

Directives

Directives can have continuation lines• Fortran

!$OMP directive name first_part &

!$OMP continuation_part• C

#pragma omp parallel private(i) \

private(j)

Page 16: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

16

#pragma omp parallel [parameters]

{

parallel region

}

Parallel Region

• The statements enclosed lexically within a region

define the lexical extent of the region.

• The dynamic extent further includes the routines

called from within the construct.

Page 17: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

17

Lexical and Dynamic Extend

main (){

int a[100];

#pragma omp parallel

{

}

}

sub(int a[])

{

#pragma omp for

for (int i= 1; i<n;i++)

a(i) = i;

}

• Local variables of a subroutine called in a parallel region are

by default private.

Page 18: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

18

Work-Sharing Constructs

• Work-sharing constructs distribute the specified work

to all threads within the current team.

• Types

• Parallel loop

• Parallel section

• Master region

• Single region

• General work-sharing construct (only Fortran)

Page 19: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

19

#pragma omp for [parameters]

for ...

Parallel Loop

• The iterations of the do-loop are distributed to the threads.

• The scheduling of loop iterations is determined by one of the

scheduling strategies static, dynamic, guided, and runtime.

• There is no synchronization at the beginning.

• All threads of the team synchronize at an implicit barrier if the

parameter nowait is not specified.

• The loop variable is by default private. It must not be modified in

the loop body.

• The expressions in the for-statement are very restricted.

Page 20: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

20

Scheduling Strategies

• Schedule clause

schedule (type [,size])

• Scheduling types:

• static: Chunks of the specified size are assigned in a round-

robin fashion to the threads.

• dynamic: The iterations are broken into chunks of the

specified size. When a thread finishes the execution of a

chunk, the next chunk is assigned to that thread.

• guided: Similar to dynamic, but the size of the chunks is

exponentially decreasing. The size parameter specifies the

smallest chunk. The initial chunk is implementation

dependent.

• runtime: The scheduling type and the chunk size is

determined via environment variables.

Page 21: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

21

Example: Dynamic Scheduling

main(){

int i, a[1000];

#pragma omp parallel

{

#pragma omp for schedule(dynamic, 4)

for (int i=0; i<1000;i++)

a[i] = omp_get_thread_num();

#pragma omp for schedule(guided)

for (int i=0; i<1000;i++)

a[i] = omp_get_thread_num();

}

}

Page 22: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

22

Reductions

• This clause performs a reduction on the variables that

appear in list, with the operator operator.

• Variables must be shared scalars

• operator is one of the following:

• +, *, -, &, ˆ, |, &&, ||

• Reduction variable might only appear in statements

with the following form:

• x = x operator expr

• x binop= expr

• x++, ++x, x--, --x

reduction(operator: list)

Page 23: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

23

Example: Reduction

#pragma omp parallel for reduction(+: a)

for (i=0; i<n; i++) {

a = a + b[i];

}

Page 24: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

24

Classification of Variables

• private(var-list)

• Variables in var-list are private.

• shared(var-list)

• Variables in var-list are shared.

• default(private | shared | none)

• Sets the default for all variables in this region.

• firstprivate(var-list)

• Variables are private and are initialized with the value of the

shared copy before the region.

• lastprivate(var-list)

• Variables are private and the value of the thread executing the

last iteration of a parallel loop in sequential order is copied to

the variable outside of the region.

Page 25: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

25

Scoping Variables with Private Clause

• The values of the shared copies of i and j are undefined on exit

from the parallel region.

• The private copies of j are initialized in the parallel region to 2.

int i, j;

i = 1;

j = 2;

#pragma omp parallel private(i) firstprivate(j)

{

i = 3;

j = j + 2;

printf("%d %d\n", i, j);

}

Page 26: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

26

Parallel Section

• Each section of a parallel section is executed once by

one thread of the team.

• Threads that finished their section wait at the implicit

barrier at the end of the section construct.

#pragma omp sections [parameters]

{

[#pragma omp section]

block

[#pragma omp section

block ]

}

Page 27: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

27

Example: Parallel Section

main(){

int i, a[1000], b[1000]

#pragma omp parallel private(i)

{

#pragma omp sections

{

#pragma omp section

for (int i=0; i<1000; i++)

a[i] = 100;

#pragma omp section

for (int i=0; i<1000; i++)

b[i] = 200;

}

}

}

Page 28: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

28

OMP Workshare (Fortran only)

• The WORKSHARE directive divides the work of

executing the enclosed code into separate units of

work and distributes the units amongst the threads.

• An implementation of the WORKSHARE directive

must insert any synchronization that is required to

maintain standard Fortran semantics.

• There is an implicit barrier at the end of the workshare

region.

!$OMP WORKSHARE [parameters]

block

!$OMP END WORKSHARE [NOWAIT]

Page 29: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

29

Sharing Work in a Fortran 90 Array Statement

A(1:N)=B(2:N+1)+C(1:N)

• Each evaluation of an array expression for an

individual index is a unit of work.

• The assignment to an individual array element is also

a unit of work.

Page 30: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

30

Master / Single Region

• A master or single region enforces that only a single thread executes the enclosed code within a parallel region.

• Common• No synchronization at the beginning of region.

• Different• Master region is executed by master thread while the single

region can be executed by any thread.

• Master region is skipped by other threads while all threads are synchronized at the end of a single region.

#pragma omp master

block

#pragma omp single [parameters]

block

Page 31: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

31

Combined Work-Sharing and Parallel Constructs

• #pragma omp parallel for

• #pragma omp parallel sections

• !$OMP PARALLEL WORKSHARE

Page 32: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

32

#pragma omp barrier

Barrier

• The barrier synchronizes all the threads in a team.

• When encountered, each thread waits until all of the other threads in that

team have reached this point.

Page 33: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

33

#pragma omp critical [(Name)]

{ ... }

Critical Section

• Mutual exclusion

• A critical section is a block of code that can be executed by only one

thread at a time.

• Critical section name

• A thread waits at the beginning of a critical section until no other

thread is executing a critical section with the same name.

• All unnamed critical directives map to the same name.

• Critical section names are global entities of the program. If a name

conflicts with any other entity, the behavior of the program is

unspecified.

• Avoid long critical sections

Page 34: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

34

#pragma omp parallel private(i)

{

#pragma omp sections

{

#pragma omp section

{

for (int i=0;i<N;i++)

ia = ia + a[i];

#pragma omp critical (c1)

{

itotal = itotal + ia;

}}

#pragma omp section

{

for (int i=0;i<N;i++)

ib = ib + b[i]

#pragma omp critical (c1)

{

itotal = itotal + ib;

}}

}}

Example: Critical Section

main(){

int ia = 0

int ib = 0

int itotal = 0

for (int i=0;i<N;i++)

{

a[i] = i;

b[i] = N-i;

}

}

Page 35: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

35

#pragma ATOMIC

expression-stmt

Atomic Statements

• The ATOMIC directive ensures that a specific memory

location is updated atomically

• Has to have the following form:

–x binop= expr

–x++ or ++x

–x-- or -- x

• where x is an lvalue expression with scalar type and expr

does not reference the object designated by x.

• All parallel assignments to the location must be

protected with the atomic directive.

Page 36: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

36

Translation of Atomic

#pragma omp atomic

x += expr

can be rewritten as

xtmp = expr

!$OMP CRITICAL (name)

x = x + xtmp

!$OMP END CRITICAL (name)

•Only the load and store of x are protected.

Page 37: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

37

Simple Locks

• Locks can be hold by only one thread at a time.

• A lock is represented by a lock variable of type

omp_lock_t.

• The thread that obtained a simple lock cannot set it

again.

• Operations

• omp_init_lock(&lockvar): initialize a lock

• omp_destroy_lock(&lockvar): destroy a lock

• omp_set_lock(&lockvar): set lock

• omp_unset_lock(&lockvar): free lock

• logicalvar = omp_test_lock(&lockvar): check lock and possibly

set lock, returns true if lock was set by the executing thread.

Page 38: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

38

Example: Simple Lock

#include <omp.h>

int id;

omp_lock_t lock;

omp_init_lock(lock);

#pragma omp parallel shared(lock) private(id)

{

id = omp_get_thread_num();

omp_set_lock(&lock); //Only a single thread writes

printf(“My Thread num is: %d”, id);

omp_unset_lock(&lock);

WHILE (!omp_test_lock(&lock))

other_work(id); //Lock not obtained

real_work(id); //Lock obtained

omp_unset_lock(&lock);//Lock freed

}

omp_destroy_lock(&lock);

locked

locked

Page 39: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

39

Nestable Locks

• Unlike simple locks, nestable locks can be set multiple

times by a single thread.

• Each set operation increments a lock counter.

• Each unset operation decrements the lock counter.

• If the lock counter is 0 after an unset operation, the

lock can be set by another thread.

Page 40: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

40

Ordered Construct

• Construct must be within the dynamic extent of an

omp for construct with an ordered clause.

• Ordered constructs are executed strictly in the order in

which they would be executed in a sequential

execution of the loop.

#pragma omp for ordered

for (...)

{ ...

#pragma omp ordered

{ ... }

...

}

Page 41: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

41

Example with ordered clause

#pragma omp for ordered

for (...)

{ S1

#pragma omp ordered

{ S2}

S3

} i=1 i=2 i=3 i=N

S1 S1 S1

S2S2

S2

S2

S3 S3

S3

S3

S1

Barrier

Page 42: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

42

Flush

• The flush directive synchronizes copies in register or cache of the

executing thread with main memory.

• It synchronizes those variable in the given list or, if no list is

specified, all shared variables accessible in the region.

• It does not update implicit copies at other threads.

• Load/stores executed before the flush in program order have to

be finished.

• Load/stores following the flush in program order are not allowed

to be executed before the flush.

• A flush is executed implicitly for some constructs, e.g. begin and

end of a parallel region, end of work-sharing constructs ...

#pragma omp flush [(list)]

Page 43: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

43

Example: Flush

#define MAXTHREAD 100

int iam, neigh, isync[MAXTHREAD+1];

isync[0] = 1;isync[1..MAXTHREAD]=0;

#pragma omp parallel private(iam, neigh)

{

iam = omp_get_thread_num()+1;

neigh = iam – 1;

//Wait for neighbor

while (isync[neigh] == 0) {

#pragma omp flush(isync)

}

//Do my work

work();

isync[iam] = 1; //I am done

#pragma omp flush(isync)

}

1

0

0

0

isync

0

Page 44: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

44

Lastprivate example

k=0

#pragma omp parallel

{

#pragma omp for lastprivate(k)

for (i=0; i<100; i++)

a[i] = b[i] + b[i+1];

k=2*i;

}

// The value of k is 198

Page 45: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

45

Copyprivate Example

• Copyprivate

• Clause only for single region.

• Variables must be private in enclosing parallel region.

• Value of executing thread is copied to all other threads.

#pragma omp parallel private(x)

{

#pragma omp single copyprivate(x)

{

getValue(x);

}

useValue(x);

}

Page 46: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

46

Other Copyprivate Example

float read_next( ) {

float * tmp;

float return_val;

#pragma omp single copyprivate(tmp)

{

tmp = (float *) malloc(sizeof(float));

}

#pragma omp master

{

get_float( tmp );

}

#pragma omp barrier

return_val = *tmp;

#pragma omp barrier

#pragma omp single

{

free(tmp);

}

return return_val;

}

Page 47: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

47

Runtime Routines for Threads (1)

• Determine the number of threads for parallel regions

• omp_set_num_threads(count)

• Query the maximum number of threads for team

creation

• numthreads = omp_get_max_threads()

• Query number of threads in the current team

• numthreads = omp_get_num_threads()

• Query own thread number (0..n-1)

• iam = omp_get_thread_num()

• Query number of processors

• numprocs = omp_get_num_procs()

Page 48: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

48

Runtime Routines for Threads (2)

• Query state

logicalvar = omp_in_parallel()

• Allow runtime system to determine the number of

threads for team creation

omp_set_dynamic(logicalexpr)

• Query whether runtime system can determine the

number of threads

logicalvar= omp_get_dynamic()

• Allow nesting of parallel regions

omp_set_nested(logicalexpr)

• Query nesting of parallel regions

logicalvar= omp_get_nested()

Page 49: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

49

Environment Variables

• OMP_NUM_THREADS=4

• Number of threads in a team of a parallel region

• OMP_SCHEDULE=”dynamic”

OMP_SCHEDULE=”GUIDED,4“

• Selects scheduling strategy to be applied at runtime

• OMP_DYNAMIC=TRUE

• Allow runtime system to determine the number of threads.

• OMP_NESTED=TRUE

• Allow nesting of parallel regions.

Page 50: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

50

OpenMP 3.0

• Introduced May 2008

• OpenMP 3.1, July 2011

Page 51: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

51

Explicit Tasking

• Explicit creation of tasks#pragma omp parallel

{

#pragma omp single {

for ( elem = l->first; elem; elem = elem->next)

#pragma omp task

process(elem)

}

// all tasks are complete by this point

}

• Task scheduling

• Tasks can be executed by any thread in the team

• Barrier

• All tasks created in the parallel region have to be finished.

Page 52: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

52

#pragma omp Task [clause list]

{ ... }

Tasks

Clauses

• If (scalar-expression)• FALSE: Execution starts immediately by the creating thread

• The suspended task may not be resumed until the new task is finished.

• Untied• Task is not tied to the thread starting its execution. It might be rescheduled to another

thread.

• Default (shared|none), private, firstprivate, shared

• If no default clause is present, the implicit data-sharing attribute is firstprivate.

Binding

• The binding thread set of the task region is the current team.

• A task region binds to the innermost enclosing parallel region.

Page 53: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

53

Example: Tree Traversal

struct node {

struct node *left;

struct node *right;

};

void traverse( struct node *p ) {

if (p->left)

#pragma omp task // p is firstprivate by default

traverse(p->left);

if (p->right)

#pragma omp task // p is firstprivate by default

traverse(p->right);

process(p);

}

Page 54: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

54

#pragma omp taskwait

{ ... }

Task Wait

• Waits for completion of immediate child tasks

• Child tasks: Tasks generated since the beginning of the current task.

Page 55: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

55

OpenMP 4

• Task dependencies via new depend clause

• Depend (dependence-type:list)

– Where dependence-type=IN | INOUT | OUT

• Dependencies to previously generated sibling tasks.

• IN: The generated task will be a dependent task of all

previously generated sibling tasks that reference at least one

of the list items in an out or inout dependence-type list.

• OUT & INOUT: The generated task will be a dependent task

of all previously generated sibling tasks that reference at least

one of the list items in an in, out, or inout dependence-type

list.

Page 56: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

56

#pragma omp taskyield

{ ... }

Taskyield

• The taskyield construct specifies that the current task can be

suspended in favor of execution of a different task.

• Explicit task scheduling point

• Implicit task scheduling points• Task creation

• End of a task

• Taskwait

• Barrier synchronization

Page 57: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

57

Switch task while waiting

void foo ( omp_lock_t * lock, int n )

{

int i;

for ( i = 0; i < n; i++ )

#pragma omp task

{

something_useful();

while ( !omp_test_lock(lock) ) {

#pragma omp taskyield

}

something_critical();

omp_unset_lock(lock);

}

}

Page 58: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

58

Page 59: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

59

Page 60: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

60

Page 61: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

61

Page 62: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

62

Page 63: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

63

Page 64: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

64

Page 65: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

65

Terms

• tied task A task that, when its task region is

suspended, can be resumed only by the same thread

that suspended it. That is, the task is tied to that

thread.

• untied task (untied clause) A task that, when its task

region is suspended, can be resumed by any thread in

the team. That is, the task is not tied to any thread.

• undeferred task (if clause is false) A task for which

execution is not deferred with respect to its generating

task region. That is, its generating task region is

suspended until execution of the undeferred task is

completed.

Page 66: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

66

Terms

• included task A task for which execution is

sequentially included in the generating task region.

That is, it is undeferred and executed immediately by

the encountering thread. It has ist own data

environment.

• merged task (mergeable clause) A task whose data

environment is the same as that of its generating task

region.

• final task (final clause) A task that forces all of its

child tasks to become final and included tasks.

Page 67: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

67

Mergeable tasks

##include <stdio.h>

void foo ( )

{

int x = 2;

#pragma omp task mergeable

{

x++;

}

#pragma omp taskwait

printf("%d\n",x); // prints 2 or 3

Page 68: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

68

Mergeable tasks

#include <stdio.h>

void foo ( )

{

int x = 2;

#pragma omp task shared(x) mergeable

{

x++;

}

#pragma omp taskwait

printf("%d\n",x); // prints 3

Page 69: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

69

Synchronization in Tasks – Potential Deadlock

void work()

{

#pragma omp task

{ //Task 1

#pragma omp task

{ //Task 2

#pragma omp critical //Critical region 1

{/*do work here */ }

}

#pragma omp critical //Critical Region 2

{ //Capture data for the following task

#pragma omp task

{ /* do work here */ } //Task 3

}

}

Page 70: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

70

Collapsing of loops

• Handles multi-dimensional perfectly nested loops

• Larger iteration space ordered according to sequential

execution.

• Schedule clause applies to new iteration space

#pragma omp parallel for collapse(2)

for (i=0; i<n; i++)

for (j=0; j<n; j++)

for (k=0; k<n; k++)

{

.....

}

Page 71: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

71

Guaranteed Scheduling

• Same work distribution if

• Same number of iterations, schedule static with same

chunksize

• Both regions bind to same parallel region

!$omp do schedule(static)

do i=1,n

a(i) = ....

end do

!$ompend do nowait

!$omp do schedule(static)

do i=1,n

.... = a(i)

end do

Page 72: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

72

Scheduling strategy auto for parallel loops

• New scheduling strategy auto

• It is up to the compiler to determine the scheduling.

Page 73: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

73

Nested Parallelism

• Currently only a single copy of the control variable

specifying the number of threads in a team.

• omp_set_num_threads()

• Can be called only outside of parallel regions.

• This is applied for nested parallelism

• All teams have the same size.

• But num_threads clause of parallel region

• OpenMP 3.0 supports individual copies

• There is one copy per task.

• Teams might have different sizes.

Page 74: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

74

OpenMP 4

• SIMD support

• Directive for loops: guarantees that loop can be executed in a

SIMD fashion

• Directive for omp loops: interations are parallelized and those

assigned to a thread are executed with SIMD instructions

• Target construct for accelerators

• User-defined reductions

• Cancellation of a parallel region

• Affinity

• Places: Thread, core, socket

• Affinity policies: spread, close, master

Page 75: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

75

Page 76: OpenMP - Technische Universität Münchengerndt/home/Teaching/Parallel Programming... · 3 OpenMP • Portable programming of shared memory systems. • It is a quasi-standard. •

76

Summary

• OpenMP is quasi-standard for shared memory

programming

• Based on Fork-Join Model

• Parallel region and work sharing constructs

• Declaration of private or shared variables

• Reduction variables

• Scheduling strategies

• Synchronization via Barrier, Critical section, Atomic,

locks, nestable locks

• Task concept

• SIMD and accelerator support.