Thinking in parallel ab tuladev
-
Upload
pavel-tsukanov -
Category
Technology
-
view
594 -
download
2
description
Transcript of Thinking in parallel ab tuladev
SPONSORED BY
HPC & GPU Supercomputing Groups
Non-profit, free to join groups hosted on www.meetup.com
A group for the application of cutting-edge HPC & GPU supercomputing technology to cutting-edge business problems.
Started January 2011 with New York group and reached today 1000 members with all groups from Boston, Silicon Valley, Chicago, New Mexico, Denver, Seattle, Austin, Washington D.C., South Florida, Tokyo
Please visit
www.SupercomputingGroup.com for South Florida group.
ПАРАЛЛЕЛЬНОЕ МЫШЛЕНИЕ
( THINKING IN PARALLEL )BY ADNAN BOZ
Красноармейский проспект 25, 19 ноября 2011
November 19, 2011
Many thanks to Andrew Sheppard for providing supporting content for this presentation.
Andrew is the organizer of New York meetup group and he is a financial consultant with extensive experience in quantitative financial analysis, trading-desk software development, and technical management. Andrew is also the author of the forthcoming book "Programming GPUs”, to be published by O’Reilly (www.oreilly.com).
“Thinking in Parallel”
is the term for making the conceptual leap that takes
a developer from writing programs that run on
hardware with little real parallelism, to writing
programs that execute efficiently on massively
parallel hardware, with 100’s and 1000’s of cores,
leading to very substantial speedups (x10, x100 and
beyond).
“[A]nd in this precious phial is the power to think twice as fast, move twice as quickly, do twice as much work in a given time as you could otherwise do.”
—H. G. Wells, “The New Accelerator” (1901)
Serial programs are traditional (most of the programs are serial), sequential (just sequence of tasks) and relatively easy to think the flow.
For example : Prefix Sum ( Scan )
data[] = {5, 1, 8, 11, 4}forall i from 1 to n do
data[i] = data[i – 1] + data[i]
5 1 8 11
5 5 + 1 5 + 1 + 8 5 + 1 + 8 + 11
5 6 14 25
where binary associative operator is summation.
But sequential thinking is about to change, because the serial performance improvement has slowed down from 50% to %20 since 2002 and we can not expect huge improvement in serial performance anymore. Therefore, programming is going parallel.
Multi- and many-core computing hitting the mainstream Today we have 4-12 cores, in few years 32 cores and Intel is predicting
in 2015 we will have 100 cores.
AMD Opteron (12) IBM Power 7 (8) Intel Xeon (12) Sun UltraSparc T3 (16) Cell (9) NVIDIA GeForce (1024) Adapteva (4096) Tilera Tile-Gx (100)
There is lots of effort towards developing good runtime compilers, debuggers and OS support:
MS TPL, Intel TBB, MPI, PVM, pthreads, PLINQ, OpenMP, MS Concurrency Runtime, MS Dryad, MS C++ AMP, NVIDIA CUDA C, ATI APP, OpenCL, Microsoft DirectCompute, Brooks, Shaders, PGI CUDA Fortran, GPU.Net, HMPP, Thrust etc.
More than one hundred parallel programming languages in 2008 ( http://perilsofparallel.blogspot.com/2008/09/101-parallel-languages-part-1.html or http://tinyurl.com/3p4a8to )
What are some problems moving into a multi-core world? A lot of companies have a huge code base developed with
little or no parallelism. Converting those great product to multi-core will take time.
We haven’t been teaching much about parallelism for many years. Most students we educated in the last 10 years know very little about parallelism.
Engineers need to understand parallelism, understand all the issues of parallelism, to utilize all these cores.
Parallel thinking is not the latest API, library or hardware. Parallel thinking is a set of core ideas we have to identify and teach our students and workforce.
Writing good serial software was hard, writing good parallel software is harder: require new tools, new techniques, and a new “Thinking in Parallel” mindset.
Per
form
ance
Time2004 – Multi-core is on desktop
Hardware Potential
Parallel Applications
Serial Applications
Competitive Advantage
Parallel Prefix Sum
Parallel Prefix Sum (Scan) with CUDA (NVIDIA)(@ http://tinyurl.com/3s9as2j)
1 16
Where to start?
Concurrency
Programming issueSingle processorGoal : running multiple interleaved threadsOnly one thread executes at any given time
Parallelism
Property of the machine Multi-processor Goal : speedup Threads are executed
simultaneously
Time >>>
Task A Task B
Thread 1
Thread 2
Thread 1
Thread 2
Thread 1
Thread 2
Time >>>
Task A Task B
Thread 1 Thread 2
Thread 1 Thread 2
Thread 1 Thread 2
Flynn’s Taxonomy of Architectures
Single Instruction/Single Data
Single Instruction/Multiple Data
Multiple Instruction/Single Data
Multiple Instruction/Multiple Data
SISD vs. SIMD
Parallel Programming Methodology
Measure
Design
Code
Test
Analyze
Proceed
Analyzing Parallelism Amdahl’s Law helps to predict the theoretical
maximum speedup on fixed problem size:
Gustafson's Law proposes that larger problems can be solved by scaling the parallel computing power :
S = rs + n . rp
Amdahl’s Law
Gustafson’s Law
Design Patterns
Finding Concurrency Design Space
Algorithm Structure Design Space
Supporting Structures Design Space
Implementation Mechanism Design Space
Finding Concurrency Design Space
Implementation Mechanism Design Space
Decomposition Dependency Analysis
Task Decomposition
Data Decomposition
Data-Flow Decomposition
Group Task
Order Task
Data Sharing
Design Evaluation
Algorithm Structure Design Space
Organize by Tasks Organize by Data Decomp. Organize by Flow of Data
Task Parallelism
Divide and Conquer
Geometric Decomposition
Recursive Data
Pipeline
Event-Based Coordination
Supporting Structures Design Space
Program Structures Data Structures
SPMD
Master / Worker
Loop Parallelism
Fork / Join
Shared Data
Shared Queue
Distributed Array
8 Rules for “Thinking in Parallel”
1. Identify truly independent computations.
2. Implement parallelism at the highest level possible.
3. Plan early for scalability to take advantage of increasing numbers
of cores.
4. Hide parallelization in libraries.
5. Use the right parallel programming model.
6. Never assume a particular order of execution.
7. Use non-shared storage whenever possible.
8. Dare to change the algorithm for a better chance of parallelism.
9. Be creative and pragmatic.
Pragmatic Parallelization Programming, in practice, is pragmatic.
Most people prefer a practical “good enough” solution over an “ideal” solution.
Chaotic Pragmatic Bureaucratic
Importance of Rules
Parallel Programming Support
CPU MS TPL Intel TBB MPI PVM pthreads PLINQ OpenMP MS Concurrency Runtime MS Dryad MS C++ AMP
etc.
GPU NVIDIA CUDA C ATI APP OpenCL Microsoft DirectCompute Brooks Shaders PGI CUDA Fortran GPU.Net HMPP Thrust
etc.
Links and References Patterns for Parallel Programming. Mattson, Timothy G.;
Sanders, Beverly A.; Massingill, Berna L. (2004-09-15). Addison-Wesley Professional.
An Introduction to Parallel Programming. Pacheco, Peter (2011-01-07). Morgan Kaufmann.
The Art of Concurrency . Breshears, Clay (2009-05-07). O'Reilly Media.
Wikipedia
http://newsroom.intel.com/community/intel_newsroom/blog/2011/09/15/the-future-accelerated-multi-core-goes-mainstream-computing-pushed-to-extremes
http://perilsofparallel.blogspot.com/2011/09/conversation-with-intels-james-reinders.html
Q&A