Thinking in parallel ab tuladev

SPONSORED BY

HPC & GPU Supercomputing Groups

Non-profit, free to join groups hosted on www.meetup.com

A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems.

Started January 2011 with New York group and reached today 1000 members with all groups from Boston, Silicon Valley, Chicago, New Mexico, Denver, Seattle, Austin, Washington D.C., South Florida, Tokyo

Please visit

www.SupercomputingGroup.com for South Florida group.

ПАРАЛЛЕЛЬНОЕ МЫШЛЕНИЕ

( THINKING IN PARALLEL )BY ADNAN BOZ

Красноармейский проспект 25, 19 ноября 2011

November 19, 2011

Many thanks to Andrew Sheppard for providing supporting content for this presentation.

Andrew is the organizer of New York meetup group and he is a financial consultant with extensive experience in quantitative financial analysis, trading-desk software development, and technical management. Andrew is also the author of the forthcoming book "Programming GPUs”, to be published by O’Reilly (www.oreilly.com).

“Thinking in Parallel”

is the term for making the conceptual leap that takes

a developer from writing programs that run on

hardware with little real parallelism, to writing

programs that execute efficiently on massively

parallel hardware, with 100’s and 1000’s of cores,

leading to very substantial speedups (x10, x100 and

beyond).

“[A]nd in this precious phial is the power to think twice as fast, move twice as quickly, do twice as much work in a given time as you could otherwise do.”

—H. G. Wells, “The New Accelerator” (1901)

Serial programs are traditional (most of the programs are serial), sequential (just sequence of tasks) and relatively easy to think the flow.

For example : Prefix Sum ( Scan )

data[] = {5, 1, 8, 11, 4}forall i from 1 to n do

data[i] = data[i – 1] + data[i]

5 1 8 11

5 5 + 1 5 + 1 + 8 5 + 1 + 8 + 11

5 6 14 25

where binary associative operator is summation.

But sequential thinking is about to change, because the serial performance improvement has slowed down from 50% to %20 since 2002 and we can not expect huge improvement in serial performance anymore. Therefore, programming is going parallel.

Multi- and many-core computing hitting the mainstream Today we have 4-12 cores, in few years 32 cores and Intel is predicting

in 2015 we will have 100 cores.

AMD Opteron (12) IBM Power 7 (8) Intel Xeon (12) Sun UltraSparc T3 (16) Cell (9) NVIDIA GeForce (1024) Adapteva (4096) Tilera Tile-Gx (100)

There is lots of effort towards developing good runtime compilers, debuggers and OS support:

MS TPL, Intel TBB, MPI, PVM, pthreads, PLINQ, OpenMP, MS Concurrency Runtime, MS Dryad, MS C++ AMP, NVIDIA CUDA C, ATI APP, OpenCL, Microsoft DirectCompute, Brooks, Shaders, PGI CUDA Fortran, GPU.Net, HMPP, Thrust etc.

More than one hundred parallel programming languages in 2008 ( http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part-1.html or http://tinyurl.com/3p4a8to )

http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part-1.html

http://tinyurl.com/3p4a8to

http://tinyurl.com/3p4a8to

What are some problems moving into a multi-core world? A lot of companies have a huge code base developed with

little or no parallelism. Converting those great product to multi-core will take time.

We haven’t been teaching much about parallelism for many years. Most students we educated in the last 10 years know very little about parallelism.

Engineers need to understand parallelism, understand all the issues of parallelism, to utilize all these cores.

Parallel thinking is not the latest API, library or hardware. Parallel thinking is a set of core ideas we have to identify and teach our students and workforce.

Writing good serial software was hard, writing good parallel software is harder: require new tools, new techniques, and a new “Thinking in Parallel” mindset.

Per

form

ance

Time2004 – Multi-core is on desktop

Hardware Potential

Parallel Applications

Serial Applications

Competitive Advantage

Parallel Prefix Sum

Parallel Prefix Sum (Scan) with CUDA (NVIDIA)(@ http://tinyurl.com/3s9as2j)

1 16

Where to start?

Concurrency

Programming issueSingle processorGoal : running multiple interleaved threadsOnly one thread executes at any given time

Parallelism

Property of the machine Multi-processor Goal : speedup Threads are executed

simultaneously

Time >>>

Task A Task B

Thread 1

Thread 2

Thread 1

Thread 2

Thread 1

Thread 2

Time >>>

Task A Task B

Thread 1 Thread 2

Thread 1 Thread 2

Thread 1 Thread 2

Flynn’s Taxonomy of Architectures

Single Instruction/Single Data

Single Instruction/Multiple Data

Multiple Instruction/Single Data

Multiple Instruction/Multiple Data

http://en.wikipedia.org/wiki/File:MISD.svg

http://en.wikipedia.org/wiki/File:MIMD.svg

SISD vs. SIMD

Parallel Programming Methodology

Measure

Design

Code

Test

Analyze

Proceed

Analyzing Parallelism Amdahl’s Law helps to predict the theoretical

maximum speedup on fixed problem size:

Gustafson's Law proposes that larger problems can be solved by scaling the parallel computing power :

S = rs + n . rp

Amdahl’s Law

Gustafson’s Law

Design Patterns

Finding Concurrency Design Space

Algorithm Structure Design Space

Supporting Structures Design Space

Implementation Mechanism Design Space

Finding Concurrency Design Space

Implementation Mechanism Design Space

Decomposition Dependency Analysis

Task Decomposition

Data Decomposition

Data-Flow Decomposition

Group Task

Order Task

Data Sharing

Design Evaluation

Algorithm Structure Design Space

Organize by Tasks Organize by Data Decomp. Organize by Flow of Data

Task Parallelism

Divide and Conquer

Geometric Decomposition

Recursive Data

Pipeline

Event-Based Coordination

Supporting Structures Design Space

Program Structures Data Structures

SPMD

Master / Worker

Loop Parallelism

Fork / Join

Shared Data

Shared Queue

Distributed Array

8 Rules for “Thinking in Parallel”

1. Identify truly independent computations.

2. Implement parallelism at the highest level possible.

3. Plan early for scalability to take advantage of increasing numbers

of cores.

4. Hide parallelization in libraries.

5. Use the right parallel programming model.

6. Never assume a particular order of execution.

7. Use non-shared storage whenever possible.

8. Dare to change the algorithm for a better chance of parallelism.

9. Be creative and pragmatic.

Pragmatic Parallelization Programming, in practice, is pragmatic.

Most people prefer a practical “good enough” solution over an “ideal” solution.

Chaotic Pragmatic Bureaucratic

Importance of Rules

Parallel Programming Support

CPU MS TPL Intel TBB MPI PVM pthreads PLINQ OpenMP MS Concurrency Runtime MS Dryad MS C++ AMP

etc.

GPU NVIDIA CUDA C ATI APP OpenCL Microsoft DirectCompute Brooks Shaders PGI CUDA Fortran GPU.Net HMPP Thrust

etc.

Links and References Patterns for Parallel Programming. Mattson, Timothy G.;

Sanders, Beverly A.; Massingill, Berna L. (2004-09-15). Addison-Wesley Professional.

An Introduction to Parallel Programming. Pacheco, Peter (2011-01-07). Morgan Kaufmann.

The Art of Concurrency . Breshears, Clay (2009-05-07). O'Reilly Media.

Wikipedia

http://newsroom.intel.com/community/intel_newsroom/blog/2011/09/15/the-future-accelerated-multi-core-goes-mainstream-computing-pushed-to-extremes

http://perilsofparallel.blogspot.com/2011/09/conversation-with-intels-james-reinders.html






Thinking in parallel ab tuladev

Technology

Transcript of Thinking in parallel ab tuladev