OpenMP - Introduction

44
OpenMP - Introduction Süha TUNA Bilişim Enstitüsü UHeM Yaz Çalıştayı - 21.06.2012

description

OpenMP - Introduction. Süha TUNA. Bilişim Enstitüsü. UHeM Yaz Çalıştayı - 21.06.2012. Outline. What is OpenMP? Introduction (Code Structure, Directives, Threads etc.) Limitations Data Scope Clauses Shared, Private Work-sharing constructs Synchronization. What is OpenMP?. - PowerPoint PPT Presentation

Transcript of OpenMP - Introduction

Page 1: OpenMP - Introduction

OpenMP - Introduction

Süha TUNABilişim Enstitüsü

UHeM Yaz Çalıştayı - 21.06.2012

Page 2: OpenMP - Introduction

Outline

• What is OpenMP?– Introduction (Code Structure, Directives, Threads etc.)– Limitations– Data Scope Clauses

• Shared, Private

– Work-sharing constructs– Synchronization

Page 3: OpenMP - Introduction

What is OpenMP?

• An Application Program Interface (API) that may be used to explicitly direct multithreaded, shared memory parallelism

• Three main API components– Compiler directives– Runtime library routines– Environment variables

• Portable & Standardized– API exist both C/C++ and Fortan 90/77– Multi platform Support (Unix, Linux etc.)

Page 4: OpenMP - Introduction

OpenMP Specifications

• Version 3.1, Complete Specifications, July 2011

• Version 3.0, May 2008

• Version 2.5, May 2005 (C/C++ & Fortran)

• Version 2.0– C/C++, March 2002– Fortran, November 2000

• Version 1.0– C/C++, October 1998– Fortran, October 1997

Detailed Info: http://www.openmp.org/wp/openmp-specifications/

Page 5: OpenMP - Introduction

Intel & GNU OpenMP

• Intel Compilers– OpenMP 2.5 conforming

– Nested parallelisim

– Workqueuing extension to OpenMP

– Interoperability with POSIX and Windows threads

– OMP_DYNAMIC support

• GNU OpenMP (OpenMP+gcc)– OpenMP 3.0 Support (gcc 4.4 and later)

Page 6: OpenMP - Introduction

OpenMP Programming Model

• Explicit parallelism

• Thread based parallelism; program runs with user specified number of multiple thread

• Uses fork & join model

Synchronization Point (“barrier”, “critical region”, “single processor region”)

Page 7: OpenMP - Introduction

• Shared Memory Model– Each thread must be reach a shared memory (SMP)

• Intel compilers use the POSIX threads library to implement OpenMP.

Limitations of OpenMP

Page 8: OpenMP - Introduction

Terminology and Behavior

• OpenMP Team = Master + Worker

• Parallel Region is a block of code executed by all threads simultaneously (has implicit barrier)– The master thread always has thread id 0– Parallel regions can be nested– If clause can be used to guard the parallel region

• A Work-Sharing construct divides the execution of the enclosed code region among the members of the team. (Loop, Section etc.)

Page 9: OpenMP - Introduction

OpenMP Code Structure

#include <omp.h>main () {int var1, var2, var3;/* Serial code */.../* Beginning of parallel section. Fork a team of threads.Specify variable scoping */#pragma omp parallel private(var1, var2) \ shared(var3){Parallel section executed by all threads..All threads join master thread and disband}/* Resume serial code */..}

#include <omp.h>main () {int var1, var2, var3;/* Serial code */.../* Beginning of parallel section. Fork a team of threads.Specify variable scoping */#pragma omp parallel private(var1, var2) \ shared(var3){Parallel section executed by all threads..All threads join master thread and disband}/* Resume serial code */..}

PROGRAM MYCODE USE omp_libC Or USE “omp_lib.h” INTEGER var1, var2, var3C Serial code . . .C Beginning of parallel section. Fork aC team of threads.Specify variable C scoping$OMP PARALLEL PRIVATE(var1, var2) & SHARED(var3)Parallel section executed by all threads .$OMP BARRIER .All threads join master thread and disband$OMP END PARALLELC Resume serial code . . END

PROGRAM MYCODE USE omp_libC Or USE “omp_lib.h” INTEGER var1, var2, var3C Serial code . . .C Beginning of parallel section. Fork aC team of threads.Specify variable C scoping$OMP PARALLEL PRIVATE(var1, var2) & SHARED(var3)Parallel section executed by all threads .$OMP BARRIER .All threads join master thread and disband$OMP END PARALLELC Resume serial code . . END

C/C++ Fortran

Page 10: OpenMP - Introduction

OpenMP Directives

Format in C/C++:

Format in Fortran 77:

Format in Fortran 90:

#pragma omp: Required for all OpenMP C/C++ directives.

directivename: A valid OpenMP directive. Must appear after the pragma and before any clauses.

[clause, ...] : Optional. Clauses can be in any order, and repeated as necessary unless otherwise restricted.

#pragma omp directivename [clause, ...] \

C$OMP directivename [clause, ...] &

!$OMP directivename [clause, ...] &

Page 11: OpenMP - Introduction

OpenMP Directives

Example:

General Rules: Directives follow conventions of the C/C++ standards for

compiler directives. Case sensitive Only one directivename may be specified per directive Each directive applies to at most one succeeding

statement, which must be a structured block. Long directive lines can be "continued" on succeeding lines

by escaping the newline character with a backslash ("\") at the end of a directive line.

#pragma omp parallel default(shared) private(beta,pi)

Page 12: OpenMP - Introduction

OpenMP Directives

PARALLEL Region Construct: A parallel region is a block of code that will be executed by

multiple threads. This is the fundamental OpenMP parallel construct.

#pragma omp parallel [clause ...] newline if (scalar_expression) private (list) shared (list) default (shared | none) firstprivate (list) reduction (operator: list) copyin (list)structured_block

Page 13: OpenMP - Introduction

OpenMP Directives

C/C++ OpenMP structured block definition.

Fortran OpenMP structured block definition.

#pragma omp parallel [clause ...]{ structured_block}

!$OMP PARALLEL [clause ...] structured_block!$OMP END PARALLEL

Page 14: OpenMP - Introduction

• Parallel region construct …– supported clauses

OpenMP Directives

Page 15: OpenMP - Introduction

When a thread reaches a PARALLEL directive

• It creates a term of threads and becomes the master of the team

• The master is a member of that team, it has thread number 0 within that team (THREAD ID)

• Starting from the beginning of this parallel region, the code is duplicated and all threads will execute that code (different path of exec.)

• There is an implied barrier at the end of a parallel section

• Only the master thread continues execution past this point

Page 16: OpenMP - Introduction

OpenMP Constructs

Page 17: OpenMP - Introduction

Data Scope Attribute Clauses

• SHARED Clause:– It declares variables in its list to be shared to each thread.– Behavior

• The pointer of the object of the same type is declared once for each thread in the team

• All threads reference to the original object• The default clause is SHARED for all variables in OpenMP

shared (list) SHARED (list)

C/C++ Fortran

Page 18: OpenMP - Introduction

Data Scope Attribute Clauses

• PRIVATE Clause:– It declares variables in its list to be private to each thread.– Behavior

• A new object of the same type is declared once for each thread in the team

• All references to the original object are replaced with references to the new object

• Variables declared PRIVATE are uninitialized for each thread (FIRSTPRIVATE can be used for initialization of variables)

private (list) PRIVATE (list)

C/C++ Fortran

Page 19: OpenMP - Introduction

Data Scope Attribute Clauses

• DEFAULT Clause:– It declares the default scope attribute for the variables in parallel

region.– If not declared the default value is SHARED– If declared, the default value will be defined in the specific data scope

only.– You should not be courage to change the default value to PRIVATE.– Changing DEFAULT to PRIVATE overhead the parallelization.

default (private/shared) DEFAULT (private/shared)

C/C++ Fortran

Page 20: OpenMP - Introduction

Lab: Helloworld

• INTEL

• GCC

• LSF submition

bash: $ ifort -openmp hi-omp.f -o hi-omp.xhi-omp.f(3) : (col. 6) remark: OpenMP DEFINED REGION WAS PARALLELIZED.

bash: $ gcc -fopenmp hi-omp.c -o hi-omp.x

bash: $ bsub -a openmp –q short -o %J.out -e %J.err -n 4 -x ./hi-omp.x

Page 21: OpenMP - Introduction

Lab: Helloworld

• Set environment variables (setenv, export)

• Run your OpenMP compile

bash: $ ./hi-omp.xHello OpenMP!Hello OpenMP!Hello OpenMP!Hello OpenMP!

bash: $ export OMP_NUM_THREADS=4

Optional Exercise:1 - set OMP_NUM_THREADS to an higher value (such as 10)2 - uncomment critical section3 - repeat example.

Page 22: OpenMP - Introduction

A work-sharing construct divides the execution of the enclosed code region among the members of team that encounter it.

Must be enclosed in a parallel region otherwise it is simply ignored.

Work-sharing constructs do not launch/create new threads.

There is no implied barrier upon entry to a work-sharing construct. However there is an implicit barrier at the end of a work-sharing construct.

Work-Sharing Constructs

Page 23: OpenMP - Introduction

• Types

Only available in FortranParallelize the array Operations. For example,A(:,:)=B(:,:)+C(:,:)

Work-Sharing Constructs

Page 24: OpenMP - Introduction

shares iterations of a loop across the team. Represents a type of "data parallelism".

breaks work into separate, discrete sections. Each section is

executed by a thread. Can be used to implement a type of

"functional parallelism".

serializes a section of code

Work-Sharing Constructs

Page 25: OpenMP - Introduction

Work-Sharing Constructs

• DO directive (Fortran)

!$OMP DO [clause ...] SCHEDULE (type [,chunk]) ORDERED PRIVATE (list) FIRSTPRIVATE (list) LASTPRIVATE (list) SHARED (list) REDUCTION (operator | intrinsic : list) do_loop

!$OMP END DO [ NOWAIT ]

Page 26: OpenMP - Introduction

Work-Sharing Constructs

• for directive (C/C++)

#pragma omp for [clause ...] newline schedule (type [,chunk]) ordered private (list) firstprivate (list) lastprivate (list) shared (list) reduction (operator: list) nowait

{ for_loop}

Page 27: OpenMP - Introduction

• schedule clause: schedule(kind [,chunk_size])– static: less overhead, default on many OpenMP compilers– dynamic & guided: useful for poorly balanced and unpredictable

workload. In guided the size of chunk decreases over time.– runtime: kind is selected according to the value of environment

variable OMP_SCHEDULE.– Larger chunks are desirable because they reduce the overhead– Load balancing is often more of an issue toward the end of

computation

Work-Sharing Constructs

Page 28: OpenMP - Introduction

• schedule clause:– describes how iterations of the loop are divided

among the threads in the team

Loop iterations are divided into pieces of size chunk statically

When a thread finishes one chunk, it is dynamically assigned another. The default chunk size is 1.

The chunk size is exponentially reduced with each dispatched piece of the iteration space. The default chunk size is 1.

Work-Sharing Constructs

Page 29: OpenMP - Introduction

schedule clause:

runtime: If this schedule is selected, the decision regarding scheduling kind is made at run time. The schedule and (optional) chunk size are set through the OMP_SCHEDULE environment variable.

• NO WAIT (Fortran) / nowait (C/C++) clause:– If specified, then threads do not synchronize at the end of the

parallel loop. Threads proceed directly to the next statements after the loop.

Work-Sharing Constructs

Page 30: OpenMP - Introduction

• Example steps:– Examine the code for schedule (‘static schedule’), compile and

run – Change and work with ‘dynamic schedule’. What did change?

The iterations of the loop will be distributed dynamically in chunk sized pieces.

– Add ‘nowait’ at the end of omp for clause. What did change? Threads will not synchronize upon completing their individual pieces of work (nowait).

Work-Sharing Lab 1

bash: $ icc -openmp omp_workshare1.c -o omp_workshare1.xbash: $ ./omp_workshare1.x

Page 31: OpenMP - Introduction

• SECTIONS construct:– Easiest way to get different threads to carry out different kinds of work– Each section must be a structured block of code that is independent of

the other sections– If there are fewer code blocks than threads, the remaining threads will be

idle– If there are fewer threads than code blocks, some or all of the threads

execute multiple code blocks– Depending on the type of work, this construct might lead to a load-

balancing problem

Work-Sharing Lab 2

Page 32: OpenMP - Introduction

• SECTIONS construct for 2 functions (or threads)

#pragma omp parallel{ #pragma omp sections { #pragma omp section { FUNCTION_1(MAX) } #pragma omp section { FUNCTION_2(MIN) } } // Sections Ends Here} // Parallel Ends Here

Work-Sharing Lab 2

Page 33: OpenMP - Introduction

• This example demonstrates use of the OpenMP SECTIONS worksharing construct Note how the PARALLEL region is divided into separate sections, each of which will be executed by one thread.

• Run the program several times and observe any differences in output. Because there are only two sections, you should notice that some threads do not do any work.

• You may/may not notice that the threads doing work can vary. For example, the first time thread 0 and thread 1 may do the work, and the next time it may be thread 0 and thread 3.

Work-Sharing Lab 2

bash: $ icc -openmp omp_workshare2.c -o omp_workshare2.xbash: $ ./omp_workshare2.x

Page 34: OpenMP - Introduction

Work-Sharing Constructs

• SINGLE Constructs:– It specifies that the enclosed code is to be executed by only

one thread in the team. – The thread chosen could vary from one run to another.– Threads that are not executing in the SINGLE directive wait at

the END SINGLE directive unless NOWAIT is specified.

#pragma omp single [clause ...] structured_block

!$OMP SINGLE [clause...]structured-block!$OMP END SINGLE [NOWAIT]

C/C++ Fortran

Page 35: OpenMP - Introduction

• SINGLE Constructs:

Only one thread initializes the

shared variable a

Work-Sharing Constructs

Page 36: OpenMP - Introduction

Work-Sharing Constructs

• SINGLE Constructs:– It specifies that the enclosed code is to be executed by only

one thread in the team. – The thread chosen could vary from one run to another.– Threads that are not executing in the SINGLE directive wait at

the END SINGLE directive unless NOWAIT is specified.

#pragma omp single [clause ...] structured_block

!$OMP SINGLE [clause...]structured-block!$OMP END SINGLE [NOWAIT]

C/C++ Fortran

Page 37: OpenMP - Introduction

• SINGLE Constructs:

Only one thread initializes the

shared variable a

Work-Sharing Constructs

Page 38: OpenMP - Introduction

Synchronization (BARRIER)

• BARRIER Directive:– Synchronizes all threads in the team.– When a BARRIER directive is reached, a thread will wait at

that point until all other threads have reached that barrier.– All threads then resume executing in parallel the code that

follows the barrier.

C/C++ Fortran

#pragma omp barrier newline

structured_block

Example: Check barrier.f and barrier.c example code.

!$OMP BARRIER newline

structured_block

Page 39: OpenMP - Introduction

• BARRIER Directive• Important restrictions

– Each barrier must be encountered by all threads in a team, or by none at all

– The sequence of work-sharing regions and barrier regions encountered must be the same for every thread in the team.

• Without these rules some threads wait forever (or until somebody kills the process) for other threads to reach a barrier

Synchronization (BARRIER)

Page 40: OpenMP - Introduction

• MASTER Directive:– Specifies a region that is to be executed only by the master

thread of the team.– All other threads on the team skip this section of code– It is similar to the SINGLE construct

Synchronization (MASTER)

C/C++ Fortran

#pragma omp master newline

Statement_or_expression

!$OMP MASTER newlineStatement_or_expression!$OMP END MASTER

Page 41: OpenMP - Introduction

• ORDERED Directive:– allows one to execute a structured block within a parallel loop

in sequential order– The code outside this block runs in parallel– if threads finish out of order, there may be an additional

performance penalty because some threads might have to wait.

Synchronization (ORDERED)

C/C++ Fortran

#pragma omp ordered newline

structured_block

Example: Check ordered.c example code.

!$OMP ORDERED newlinestructured_block!$OMP END ORDERED

Page 42: OpenMP - Introduction

• CRITICAL Directive:– It provides a means to ensure that multiple

threads do not attempt to update the same shared data simultaneously.

– An optional name can be given to a critical construct. Name must be global and unique

– When a thread encounters a critical construct, it waits until no other thread is executing a critical region with the same name.

Synchronization (CRITICAL)C/C++ Fortran

#pragma omp critical (name)

structured_block

!$OMP CRITICAL (name)structured_block!$OMP END CRITICAL (name)

Example: Correct critical.F90 and critical.c example code.

race condition

Page 43: OpenMP - Introduction

• ATOMIC Directive:– Specifies that a specific memory location must be updated

atomically, rather than letting multiple threads attempt to write to it.

– In essence, this directive provides a mini-CRITICAL section. It is an e cient alternative to the critical regionffi

Synchronization (ATOMIC)

C/C++ Fortran

#pragma omp atomic newlineExpression_statement

!$OMP ATOMIC newlineExpression_statement

Example: Check atomic.c example code.

Page 44: OpenMP - Introduction

TEŞEKKÜRLER!