Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and...

Parallel Programming

Philippas TsigasChalmers University of TechnologyComputer Science and Engineering

Department

WHY PARALLEL PROGRAMMING IS ESSSENTIAL IN DISTRIBUTED SYSTEMS AND NETWORKING

3tsigas@cs.chalmers.se Philippas Tsigas

How did we reach there?

Picture from Pat Gelsinger, IntelDeveloper Forum, Spring 2004 (Pentium at 90W)

Concurrent Software Becomes Essential

1 Core

2 Cores4 Cores

8 CoresOur work is to help the programmer to develop efficient parallel programs but also survive the multicore

transition. © Philippas Tsigas

1) Scalability becomes an issue for all software.

2) Modern software development relies on the ability to compose libraries into larger programs.

DISTRIBUTED APPLICATIONS

Data Sharing: Gameplay Simulation as an example

This is the hardest problem… 10,000’s of objects Each one contains mutable state Each one updated 30 times per second Each update touches 5-10 other objects

Manual synchronization (shared state concurrency) is

hopelessly intractable here. Solutions?

Slide:Tim SweeneyCEO Epic GamesPOPL 2006

Distributed Applications Demand

Quite High Level Data Sharing:

Commercial computing (media and information processing)

Control Computing (on board flight-control system)

NETWORKING

9© Philippas Tsigas

40 multithreaded packet-processing engines

http://www.cisco.com/assets/cdc_content_elements/embedded-video/routers/popup.html On chip, there are 40 32-bit,

1.2-GHz packet-processing engines. Each engine works on a packet from birth to death within the Aggregation Services Router.

each multithreaded engine handles four threads (each thread handles one packet at a time) so each QuantumFlow Processor chip has the ability to work on 160 packets concurrently

DATA SHARING

Blocking Data Sharing

class Counter { int next = 0;

synchronized int getNumber () { int t;

t = next; next = t + 1;

return t; }

next = 0

A typical Counter Impl: Thread1:getNumber()

Thread2:getNumber()

result=0

Lock released

Lock acquired

result=1next = 1next = 2

Do we need Synchronization?

class Counter { int next = 0;

int getNumber () { int t;

t = next; next = t + 1;

return t; }

What can go wrong here?

next = 0

Thread1:getNumber()

Thread2:getNumber()

result=0

next = 1

result=0

Blocking Synchronization = Sequential Behavior

BS ->Priority Inversion

A high priority task is delayed due to a low priority task holding a shared resource. The low priority task is delayed due to a medium priority task executing.

Solutions: Priority inheritance protocols Works ok for single processors, but for multiple processors …

Task H:

Task M:

Task L:

Critical Sections + Multiprocessors

Reduced Parallelism. Several tasks with overlapping critical sections will cause waiting processors to go idle.

Task 1:

Task 2:

Task 3:

Task 4:

16© Philippas Tsigas

The BIGEST Problem with Locks?

Blocking Locks are not composable

All code that accesses a piece of shared state must know and obey the locking convention, regardless of who wrote the code or where it resides.

Interprocess Synchronization = Data Sharing

Synchronization is required for concurrency Mutual exclusion (Semaphores, mutexes, spin-locks, disabling

interrupts: Protects critical sections)- Locks limits concurrency- Busy waiting – repeated checks to see if lock has been

released or not- Convoying – processes stack up before locks- Blocking Locks are not composable- All code that accesses a piece of shared state must

know and obey the locking convention, regardless of who wrote the code or where it resides.

A better approach is to use data structuresthat are …

A Lock-free Implementation

LET US START FROM THE BEGINING

Object in shared memory- Supports some set of operations (ADT)- Concurrent access by many processes/threads- Useful to e.g.

Exchange data between threads Coordinate thread activities

Shared MemoryData-structures

21Borrowed from H. Attiya

Executing Operations

invocation response

Interleaving Operations

Concurrent execution

Interleaving Operations

(External) behavior

Interleaving Operations, or Not

Sequential execution

May 1, 2008

Concurrent algorithms @ COVA 25

Interleaving Operations, or Not

Sequential behavior: invocations & response alternate and match (on process & object)

Sequential specification: All the legal sequential behaviors, satisfying the semantics of the ADT- E.g., for a (LIFO) stack: pop returns the last item

pushed

May 1, 2008

Concurrent algorithms @ COVA 26

Correctness: Sequential consistency

[Lamport, 1979]

For every concurrent execution there is a sequential execution that- Contains the same operations- Is legal (obeys the sequential specification)- Preserves the order of operations by the same

process

Sequential Consistency: Examples

push(4)

pop():4push(7)

Concurrent (LIFO) stack

push(4)

pop():4push(7)

Last In First Out

Sequential Consistency: Examples

push(4)

pop():7push(7)

Concurrent (LIFO) stack

Last In First Out

Safety: Linearizability

Linearizable data structure - Sequential specification defines legal sequential

executions- Concurrent operations allowed to be interleaved - Operations appear to execute atomically

External observer gets the illusion that each operation takes effect instantaneously at some point between its invocation and its response

push(4)

pop():4push(7)

push(4)

pop():4push(7)

Last In First Out

concurrent LIFO stack

Safety II

An accessible node is never freed.

Liveness

Non-blocking implementations- Wait-free implementation of a DS

[Lamport, 1977]Every operation finishes in a finite number of

its own steps.- Lock-free (≠ FREE of LOCKS)

implementation of a DS [Lamport, 1977]

At least one operation (from a set of concurrent operation) finishes in a finite number of steps (the data structure as a system always make progress)

Liveness II

every garbage node is eventually collected

Abstract Data Types (ADT)

Cover most concurrent applications At least encapsulate their data needs An object-oriented programming point of view

Abstract representation of data& set of methods (operations) for accessing it Signature Specification data

Implementing High-Level ADT

-------------------------------------------------------

-----------------------------------------------

-------------------------------------

-------------------------------------------------------

-----------------------------------------------

-------------------------------------

Using lower-level ADTs

& procedures

Lower-Level Operations

High-level operations translate into primitives on base objects that are available on H/W Obvious: read, write Common: compare&swap

(CAS), LL/SC, FAA

Our Research: Non-Blocking Synchronization for Accessing Shared Data

We are trying to design efficient non-blocking implementations of building blocks that are used in concurrent software design for data sharing.

Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and...

Documents

Transcript of Parallel Programming Philippas Tsigas Chalmers University of Technology Computer Science and...

Zhang Fu, Marina Papatriantafilou, Philippas Tsigas Chalmers University of Technology, Sweden 1 ACM SAC 2010 ACM SAC 2011.

Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.

Reactive Multi-word Synchronization for Multiprocessors · Reactive Multi-word Synchronization for Multiprocessors∗ Phuong Hoai Ha phuong@cs.chalmers.se Philippas Tsigas tsigas@cs.chalmers.se

Allis Chalmers - Steiner Tractor · Allis Chalmers. Allis Chalmers - Agco. A brief history . about Allis Chalmers. Allis Chalmers farm tractors were in production from . 1914 through

Chalmers Universty

Scalable and Lock-Free Concurrent Dictionaries Håkan Sundell Philippas Tsigas.

DISTRIBUTED SYSTEMS II REPLICATION –QUORUM CONSENSUS Prof Philippas Tsigas Distributed Computing and Systems Research Group.

DISTRIBUTED SYSTEMS II A POLYNOMIAL LOCAL SOLUTION TO MUTUAL EXCLUSION Prof Philippas Tsigas Distributed Computing and Systems Research Group.

DISTRIBUTED SYSTEMS II REPLICATION CNT. II Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Chalmers Hyperintensionality

by Thomas W. Hertel and Marinos E. Tsigas I. … · by Thomas W. Hertel and Marinos E. Tsigas ... We deal in turn with production, ... The difference between this and the cif-based

Dynamic Load Balancing Using Work-Stealing 35 - …tsigas/papers/Work-Stealing-GPUs-Chapter.pdf · CHAPTER Dynamic Load Balancing Using Work-Stealing 35 Daniel Cederman and Philippas

Good Chalmers

DISTRIBUTED SYSTEMS II FAULT-TOLERANT BROADCAST Prof Philippas Tsigas Distributed Computing and Systems Research Group.

DISTRIBUTED SYSTEMS II AGREEMENT (2-3 PHASE COM.) Prof Philippas Tsigas Distributed Computing and Systems Research Group.

project chalmers

Fast and Lock-Free Concurrent Priority Queues for Multi-Thread Systems Håkan Sundell Philippas Tsigas.

Self-tuning Reactive Distributed Trees for Counting and Balancing Phuong Hoai Ha Marina Papatriantafilou Philippas Tsigas OPODIS ’04, Grenoble, France.

DISTRIBUTED SYSTEMS II REPLICATION Prof Philippas Tsigas Distributed Computing and Systems Research Group.

Understanding Performance of Concurrent Data Structures on Graphics Processors Daniel Cederman, Bapi Chatterjee, Philippas Tsigas Distributed Computing.