Using Software Refactoring to Form Parallel Programs: the ParaPhrase Approach
description
Transcript of Using Software Refactoring to Form Parallel Programs: the ParaPhrase Approach
Using Software Refactoring to Form Parallel Programs: the ParaPhrase Approach
Kevin Hammond, Chris BrownUniversity of St Andrews, Scotland
http://www.paraphrase-ict.eu
@paraphrase_fp7
The Dawn of a New Multicore Age
2AMD Opteron Magny-Cours , 6-Core (source: wikipedia)
The Near Future: Scaling toward Manycore
3
The Future: “megacore” computers?
Hundreds of thousands, or millions, of cores
4
CoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCore
CoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCore
CoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCore
CoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCoreCore
What will “megacore” computers look like?
Probably not just scaled versions of today’s multicore Perhaps hundreds of dedicated lightweight integer units Hundreds of floating point units (enhanced GPU designs) A few heavyweight general-purpose cores Some specialised units for graphics, authentication, network etc possibly soft cores (FPGAs etc) Highly heterogeneous
Probably not uniform shared memory NUMA is likely, even hardware distributed shared memory or even message-passing systems on a chip shared-memory will not be a good abstraction
5
The Implications for Programming
We must program heterogeneous systems in an integrated way
it will be impossible to program each kind of core differently
it will be impossible to take static decisions about placement etc
it will be impossible to know what each thread does
6
The Challenge
“Ultimately, developers should start thinking about tens, hundreds, and thousands of cores now in their algorithmic development and deployment pipeline.”
Anwar Ghuloum, Principal Engineer, Intel Microprocessor Technology Lab
“The dilemma is that a large percentage of mission-critical enterprise applications will not ``automagically'' run faster on multi-core servers. In fact, many will actually run slower. We must make it as easy as possible for applications programmers to exploit the latest developments in multi-core/many-core architectures, while still making it easy to target future (and perhaps unanticipated) hardware developments.”
Patrick Leonard, Vice President for Product DevelopmentRogue Wave Software
Programming Issues
We can muddle through on 2-8 cores maybe even 16 or so modified sequential code may work we may be able to use multiple programs to soak up cores BUT larger systems are much more challenging
typical concurrency techniques will not scale
How to build a wall
(with apologies to Ian Watson, Univ. Manchester)
How to build a wall faster
How NOT to build a wall
Task identification is not the only problem…Must also consider Coordination, communication, placement,scheduling, …
12
We need structureWe need abstraction
We don’t need another brick in the wall
Parallelism in the Mainstream
Mostly procedural do this, do that
Parallelism is a “bolt-on” afterthought: Threads Message passing Mutexes Shared Memory
Results in lots of pain Deadlocks race conditions synchronization non-determinism etc. etc.
A critique of typical current approaches
Applications programmers must be systems programmers insufficient assistance with abstraction too much complexity to manage
Difficult/impossible to scale, unless the problem is simple
Difficult/impossible to change fundamentals scheduling task structure migration
Many approaches provide libraries they need to provide abstractions 14
Thinking in Parallel
Fundamentally, programmers must learn to “think parallel” this requires new high-level programming constructs you cannot program effectively while worrying about deadlocks
etc they must be eliminated from the design!
you cannot program effectively while fiddling with communication etc this needs to be packaged/abstracted!
A Solution?
“The only thing that works for parallelism is functional programming”
Bob Harper, Carnegie Mellon
The ParaPhrase Project (ICT-2011-288570)
€3.5M FP7 STReP Project9 partners in 5 countries3 yearsStarts 1/10/11Coordinated from St Andrews
Project Consortium
ParaPhrase Aims
Our overall aim is to produce a new pattern-based approach to programming parallel systems.
19
ParaPhrase Aims (2)
Specifically,
1. develop high-level design and implementation patterns
2. develop new dynamic mechanisms to support adaptivity for heterogeneous multicore/manycore systems
20
ParaPhrase Aims (2)
3. verify that these patterns and adaptivity mechanisms can be used easily and effectively.
4. ensure that there is scope for widespread takeup
We are applying our work in two main language settingsErlang Commercial FunctionalC/C++ Imperative
21
Thinking in Parallel, Revisited
Direct programming using e.g. spawn
Parallel stream-based approaches
Coordination approaches
Pattern-based approaches
Avoid issues such as deadlock etc…
Parallelism by Construction!
Patterns…
Patterns Abstract generalised expressions of common algorithms
Map, Fold, Function Composition, Divide and Conquer, etc.
map(F, XS) -> [ F(X) || X <- XS].
23
The ParaPhrase Model
Refactorer
Erlang C/C++ Haskell
Patterns
Erlang C/C++ Haskell
Costing/profiling
Refactoring
Refactoring is about changing the structure of a program’s source code… while preserving the semantics
RefactorReview
Refactoring = Condition + Transformation
26
ParaPhrase Approach
Start bottom-up identify (strongly hygienic) components using refactoring
Think about the PATTERN of parallelism
Structure the components into a parallel program using refactoring
Restructure if necessary! using refactoring
Refactoring from Patterns
Static Mapping
28
Dynamic Re-Mapping
29
30
But will this scale??
31
THANK YOU!
http://www.paraphrase-ict.eu
@paraphrase_fp7
32
Generating Parallel Erlang Programs from High-Level Patterns using Refactoring
Chris Brown, Kevin HammondUniversity of St Andrews
May 2012
Wrangler: the Erlang Refactorer
1. Developed at the University of Kent Simon Thompson and Huiqing Li
2. Embedded in common IDEs: (X)Emacs, Eclipse.3. Handles full Erlang language4. Faithful to layout and comments5. Undo6. Built in Erlang, and applies to the tool itself
34
35
Sequential Refactoring
1. Renaming2. Inlining3. Changing scope4. Adding arguments5. Generalising Definitions6. Type Changes
36
Parallel Refactoring!
New approach to parallel programming Tool support allows programmers to think in parallel
Guides the programmer step by step Database of transformations Warning messages Costing/profiling to give parallel guidance
More structured than using e.g. spawn directly Helps us get it “Just Right”
Patterns…
Patterns Abstract generalised expressions of common algorithms
Map, Fold, Function Composition, Divide and Conquer, etc.
map(F, XS) -> [ F(X) || X <- XS].
38
…and Skeletons
Skeletons Implementations of patterns
Parallel Map, Farm, Workpool, etc.
39
40
Example Pattern: parallel Map
map(F, List) -> [ F(X) || (X) <- List ]
map(fun(X) -> X + 1 end, [ 1..10 ]) -> [ 1 + 1, 2 + 1, ... 10 + 1 ]
map (Complexfun, Largelist) -> [ Complexfun(X1), ...
Can be executed in parallel provided the results are independent
41
Example Implementation: Data Parallel
42
Implementation: Task Farm
[t1, …, tn]
[t1, t5, t9, …] [t2, t6, t10, …] [t3, t7, t11, …] [t4, t8, t12, …]
w w w w
[r1, r5, r9, …] [r2, r6, r10, …] [r3, r7, r11, …] [r4, r8, r12, …]
[r1, …, rn]
43
Implementation: Workpool
44
Implementation: mapReduce
mapF
… reduceF
input data
reduceF
…
partially-reducedresults
partitionedinput data
……
mapF
…
intermediate data sets
resultsreduceF
localreduce function
overallreduce function
mappingfunction
Map Skeleton
45
worker(Pid, F, [X]) -> Pid ! F(X).
accumulate(0, Result) -> Result;accumulate(N, Result) -> receive Element -> accumulate(N-1, [Element|Result]) end.
parallel_map(F, List) -> lists:foreach( fun (Task) -> spawn(skeletons2, worker, [self(), F, [Task]]) end, List ), lists:reverse(accumulate(length(List), [])).
Fibonacci
46
fib(0) -> 0;fib(1) -> 1;fib(N) -> fib(N - 1) + fib(N - 2).
47
Divide-and-Conquer (wikipedia)
Classical Divide and Conquer
1. Split the input into N tasks2. Compute over the N tasks3. Combine the results
48
Fibonacci
49
fib(0) -> 0;fib(1) -> 1;fib(N) -> fib(N - 1) + fib(N - 2).
Introduce N-1 as a local definition
Fibonacci
50
fib(0) -> 0;fib(1) -> 1;fib(N) ->
L = N-1, fib(L) + fib(N - 2).
Fibonacci
51
fib(0) -> 0;fib(1) -> 1;fib(N) ->
L = N-1, fib(L) + fib(N - 2).
Introduce N-2 as a local definition
Fibonacci
52
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2}, fib(L) + fib(R).
Fibonacci
53
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2}, fib(L) + fib(R).
Introduce task parallelism for the left branch of the fib…
Fibonacci
54
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2},spawn(?MODULE, fib_worker,
[self(),L],S1 = receive R1 -> R1 end,
S1 + fib(R).
fib_worker(Pid, N) -> Pid ! fib(N).
Fibonacci
55
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2},spawn(?MODULE, fib_worker,
[self(),L],S1 = receive R1 -> R1 end,
S1 + fib(R).Introduce task parallelism for the
right branch of the fib…
Fibonacci
56
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2},spawn(?MODULE, fib_worker,
[self(),L],
spawn(?MODULE, fib_worker, [self(),R],
S1 = receive R1 -> R1 end,S2 = receive R2 -> R2 end,
S1 + S2.
Fibonacci – Adding a Threshold
57
fib(0) -> 0;fib(1) -> 1;fib(N) ->
{L,R} = {N-1, N-2},spawn(?MODULE, fib_worker,
[self(),L],
spawn(?MODULE, fib_worker, [self(),R],
S1 = receive R1 -> R1 end,S2 = receive R2 -> R2 end,
S1 + S2.
Fibonacci – Adding a Threshold
58
fib(0, T) -> 0;fib(1, T) -> 1;fib(N, T) -> {L,R} = {N-1, N-2}, case T(N) of true -> fib(L, T) + fib(R, T); false ->
spawn(?MODULE, fib_worker, [self(),L, T],
spawn(?MODULE, fib_worker, [self(),R, T],
S1 = receive R1 -> R1 end,S2 = receive R2 -> R2 end,
S1 + S2 end.
Fibonacci
59
fib_worker(Pid, N, T) -> Pid ! fib(N, T).
fib (200, fun(X) -> X < 20 end).
Speedups for Fibonacci
60
1 2 3 4 5 6 7 80
1
2
3
4
5
6
7
8
9
LinearSpeedup
No Threshold
Conclusions
Functional Programming First Class Concurrency Refactoring
makes parallelism much easiergood speedups are possible
Conclusions
Refactoring tool support: Enhances creativity Guides a programmer through steps to achieve parallelism Warns the user if they are going wrong Avoids common pitfalls Helps with understanding and intuition
Brown. Loidl and Hammond“ParaForming Forming Parallel Haskell Programs using Novel Refactoring Techniques”To Appear in Proc. 2011 Trends in Functional Programming (TFP), Madrid, Spain, 2012
Future Work
More parallel refactorings Database of parallel skeleton templates Refactoring support for parallel patterns
New (unified) framework for C/C++/Erlang/etc. Refactoring language (DSL) for expressing transformations +
conditions Language for expressing patterns?
Cost directed refactoring Proving refactorings correct
Funded by
• SCIEnce (EU FP6), Grid/Cloud/Multicore coordination• €3.2M, 2005-2012
• Advance (EU FP7), Multicore streaming• €2.7M, 2010-2013
• HPC-GAP (EPSRC), Legacy system on thousands of cores• £1.6M, 2010-2014
• Islay (EPSRC), Real-time FPGA streaming implementation• £1.4M, 2008-2011
• ParaPhrase (EU FP7), Patterns for heterogeneous multicore• €2.6M, 2011-2014
64
Industrial Connections
SAP GmbH, KarlsrüheBAe SystemsSelex GalileoBioId GmbH, StuttgartPhilips HealthcareSoftware Competence Centre, HagenbergMellanox Inc.Erlang Solutions LtdMicrosoft ResearchWell-Typed
65
THANK YOU!
http://www.paraphrase-ict.eu
@paraphrase_fp7
67