Post on 11-Jan-2016
Parallel Programming
Philippas TsigasChalmers University of TechnologyComputer Science and Engineering
Department
© Philippas Tsigas
2
WHY PARALLEL PROGRAMMING IS ESSSENTIAL IN DISTRIBUTED SYSTEMS AND NETWORKING
© Philippas Tsigas
3tsigas@cs.chalmers.se Philippas Tsigas
How did we reach there?
Picture from Pat Gelsinger, IntelDeveloper Forum, Spring 2004 (Pentium at 90W)
4
3GHz
Concurrent Software Becomes Essential
3GHz
6GH
z
3GHz
3GH
z12G
Hz
24GH
z
1 Core
2 Cores4 Cores
8 CoresOur work is to help the programmer to develop efficient parallel programs but also survive the multicore
transition. © Philippas Tsigas
1) Scalability becomes an issue for all software.
2) Modern software development relies on the ability to compose libraries into larger programs.
5
DISTRIBUTED APPLICATIONS
© Philippas Tsigas
6
Data Sharing: Gameplay Simulation as an example
This is the hardest problem… 10,000’s of objects Each one contains mutable state Each one updated 30 times per second Each update touches 5-10 other objects
Manual synchronization (shared state concurrency) is
hopelessly intractable here. Solutions?
Slide:Tim SweeneyCEO Epic GamesPOPL 2006
© Philippas Tsigas
7
Distributed Applications Demand
Quite High Level Data Sharing:
Commercial computing (media and information processing)
Control Computing (on board flight-control system)
© Philippas Tsigas
8
NETWORKING
© Philippas Tsigas
9© Philippas Tsigas
40 multithreaded packet-processing engines
http://www.cisco.com/assets/cdc_content_elements/embedded-video/routers/popup.html On chip, there are 40 32-bit,
1.2-GHz packet-processing engines. Each engine works on a packet from birth to death within the Aggregation Services Router.
each multithreaded engine handles four threads (each thread handles one packet at a time) so each QuantumFlow Processor chip has the ability to work on 160 packets concurrently
10
DATA SHARING
© Philippas Tsigas
11tsigas@cs.chalmers.se Philippas Tsigas
Blocking Data Sharing
class Counter { int next = 0;
synchronized int getNumber () { int t;
t = next; next = t + 1;
return t; }
}
next = 0
A typical Counter Impl: Thread1:getNumber()
t = 0
Thread2:getNumber()
result=0
Lock released
Lock acquired
result=1next = 1next = 2
12tsigas@cs.chalmers.se Philippas Tsigas
Do we need Synchronization?
class Counter { int next = 0;
int getNumber () { int t;
t = next; next = t + 1;
return t; }
}
What can go wrong here?
next = 0
Thread1:getNumber()
t = 0
Thread2:getNumber()
t = 0
result=0
next = 1
result=0
13
Blocking Synchronization = Sequential Behavior
© Philippas Tsigas
14
BS ->Priority Inversion
A high priority task is delayed due to a low priority task holding a shared resource. The low priority task is delayed due to a medium priority task executing.
Solutions: Priority inheritance protocols Works ok for single processors, but for multiple processors …
Task H:
Task M:
Task L:
© Philippas Tsigas
15
Critical Sections + Multiprocessors
Reduced Parallelism. Several tasks with overlapping critical sections will cause waiting processors to go idle.
Task 1:
Task 2:
Task 3:
Task 4:
© Philippas Tsigas
16© Philippas Tsigas
The BIGEST Problem with Locks?
Blocking Locks are not composable
All code that accesses a piece of shared state must know and obey the locking convention, regardless of who wrote the code or where it resides.
17
Interprocess Synchronization = Data Sharing
Synchronization is required for concurrency Mutual exclusion (Semaphores, mutexes, spin-locks, disabling
interrupts: Protects critical sections)- Locks limits concurrency- Busy waiting – repeated checks to see if lock has been
released or not- Convoying – processes stack up before locks- Blocking Locks are not composable- All code that accesses a piece of shared state must
know and obey the locking convention, regardless of who wrote the code or where it resides.
A better approach is to use data structuresthat are …
18tsigas@cs.chalmers.se Philippas Tsigas
A Lock-free Implementation
19
LET US START FROM THE BEGINING
© Philippas Tsigas
20
Object in shared memory- Supports some set of operations (ADT)- Concurrent access by many processes/threads- Useful to e.g.
Exchange data between threads Coordinate thread activities
Shared MemoryData-structures
P1 P2
P3
Op A
Op B
P4
Op B
Op B
Op B
Op A
21Borrowed from H. Attiya
21
Executing Operations
P1
invocation response
P2
P3
2222
Interleaving Operations
Concurrent execution
2323
Interleaving Operations
(External) behavior
2424
Interleaving Operations, or Not
Sequential execution
25
May 1, 2008
Concurrent algorithms @ COVA 25
Interleaving Operations, or Not
Sequential behavior: invocations & response alternate and match (on process & object)
Sequential specification: All the legal sequential behaviors, satisfying the semantics of the ADT- E.g., for a (LIFO) stack: pop returns the last item
pushed
26
May 1, 2008
Concurrent algorithms @ COVA 26
Correctness: Sequential consistency
[Lamport, 1979]
For every concurrent execution there is a sequential execution that- Contains the same operations- Is legal (obeys the sequential specification)- Preserves the order of operations by the same
process
2727
Sequential Consistency: Examples
push(4)
pop():4push(7)
Concurrent (LIFO) stack
push(4)
pop():4push(7)
Last In First Out
2828
Sequential Consistency: Examples
push(4)
pop():7push(7)
Concurrent (LIFO) stack
Last In First Out
29
Safety: Linearizability
Linearizable data structure - Sequential specification defines legal sequential
executions- Concurrent operations allowed to be interleaved - Operations appear to execute atomically
External observer gets the illusion that each operation takes effect instantaneously at some point between its invocation and its response
time
push(4)
pop():4push(7)
push(4)
pop():4push(7)
Last In First Out
concurrent LIFO stack
T1
T2
30
Safety II
An accessible node is never freed.
31
Liveness
Non-blocking implementations- Wait-free implementation of a DS
[Lamport, 1977]Every operation finishes in a finite number of
its own steps.- Lock-free (≠ FREE of LOCKS)
implementation of a DS [Lamport, 1977]
At least one operation (from a set of concurrent operation) finishes in a finite number of steps (the data structure as a system always make progress)
32
Liveness II
every garbage node is eventually collected
33
Abstract Data Types (ADT)
Cover most concurrent applications At least encapsulate their data needs An object-oriented programming point of view
Abstract representation of data& set of methods (operations) for accessing it Signature Specification data
34
Implementing High-Level ADT
data
data
-------------------------------------------------------
-----------------------------------------------
-------------------------------------
-------------------------------------------------------
-----------------------------------------------
-------------------------------------
Using lower-level ADTs
& procedures
35
Lower-Level Operations
High-level operations translate into primitives on base objects that are available on H/W Obvious: read, write Common: compare&swap
(CAS), LL/SC, FAA
36
Our Research: Non-Blocking Synchronization for Accessing Shared Data
We are trying to design efficient non-blocking implementations of building blocks that are used in concurrent software design for data sharing.