1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques...

1

Lectures on Parallel and Distributed

Algorithms

COMP 523: Advanced Algorithmic Techniques

Lecturer: Dariusz Kowalski

Lectures on Parallel and Distributed Algorithms

Overview

These lectures:• Parallel machine

– Prefix computation

• Distributed computing– Consensus problem

2Lectures on Parallel and Distributed Algorithms

3

Parallel machine - model• Set of n processors and m memory cells• Computation in synchronized rounds:

– During one round each processor does either of• local computation step (constant local cache)• read/write to shared memory

• Minimize:– Time– Work (total number of processors steps)– Number of processors– Additional memory


4

Types of parallel machines• EREW: Exclusive Read Exclusive Write• CREW: Concurrent Read Exclusive Write• ERCW: Exclusive Read Concurrent Write• CRCW: Concurrent Read Concurrent Write

• In each round a cell can be either read or written• Exclusive Read/Write: only one processor can

read/write to a memory cell during one round• Concurrent Read/Write: many processors can

read/write to a memory cell during one round• Concurrent Write: arbitrary, maximum, sum, etc.


5

Problem - prefix computation

• Input: m memory cells with integers• Goal: for each cell i compute a function F(1,i),

where F(,) is such that – F(i,k) can be computed in constant time from F(i,j)

and F(j+1,k) for any j between i and k– F(i,i) is a value stored originally in cell i

• Examples:– Computing a maximum (for every prefix)– Computing a sum (for every prefix)


6

CRCW - simple solution

• Let the result of the concurrent writing of two processors be according to the function F(,)

• m memory cells, m additional memory cells, m2 processors

Algorithm: • Processor with Id im+j reads cell i j m and then

writes the value to cell j

Time: 2 Memory: m Work: O(m2)


7

EREW - algorithm• m memory cells, n = m/log m processors• Additional array M[1…n]Recursive Algorithm: • Parallel Preprocessing:

each processor i sequentially computes functions F(i log m + 1 , i log m + 1) ,… , F(i log m + 1,(i+1)log m)then writes M[i] := F(i log m + 1,(i+1)log m)

• Parallel Recursion (pointer jumping):in step 1 t log n if i - 2t-1> 0 then a processor with ID i reads M[i - 2t-1] and combines it with its current value M[i] -- as if M[i - 2t-1] correspond to F((i - 2t) log m + 1 , (i - 2t-1) log m) and as if M[i] correspond to F((i - 2t-1) log m + 1 , i log m) -- and writes the result to M[i]

• Parallel Post-processing:each processor i sequentially computes functions F(1 , i log m + 1),…,F(1 , (i+1)log m) using value F(1, i log m) stored in M[i] and previously computed (in preprocessing part) values F(i log m + 1 , i log m + 1) , … , F(i log m + 1,(i+1)log m)


8

Analysis• Correctness:

It is sufficient to show that after step t of recursive part each location M[i] contains computed value F(max{1 , (i - 2t) log m + 1} , i log m) Proof by induction: for t = 1 it follows from initialization of M and preprocessing part;the inductive step follows immediately from the recursive algorithm

• Memory: O(n) for additional memory M used during recursionor none if modify the original values

• Time: O(log m)Parallel preprocessing and post-processing: O(log m)Parallel recursion: O(log m)

• Work: O(m)time O(log m) times number of processors O(m/log m)


9

Conclusions

• Prefix computation– Finding maximum/minimum– Computing sums

for all m prefixes, in optimal logarithmic time and linear work


10

Textbook and Questions

• How to modify the prefix algorithms for smaller/larger number of processors?

• There is given a regular expression containing braces of type ( ) and [ ]. How to check in parallel, in logarithmic time, if it is a proper expression (each open brace has its corresponding closing counterpart)?

Is it easier if there is only one kind of braces in the expression?


11

Distributed message-passing model

• Set of n processors/processes with different IDs {p1,...,pn}• In each step each processor can either (depending on the

algorithm)– send a message to any subset of other processors– receive incoming messages– perform local computation

• Computation can be either (depending on the adversary) – in synchronized rounds: in a round every processor performs

three steps: local computation, sending and receiving, e.g., (p1,p2, p3), (p1,p2, p3), (p1,p2, p3),...

– in asynchronous pattern: steps are done according to some arbitrary order unknown to the processors, e.g., p1,p2,p2,p3,p2,p3,p2,p1,...Lectures on Parallel and Distributed Algorithms

12

Fault-tolerance

Failures in the system:• Lack of synchrony: unknown order of steps is generated

by the adversary• Processors’ crashes: adversary decides which processors

crash and chooses steps for these events• Messages are lost (not properly sent or received):

malicious processors/links are selected by the adversary• Byzantine failures: processors may cheat, e.g., can behave

on the way described above, mess up content of messages, pretend they have different ID, etc.


13

Analysis of distributed algorithms

Designing the algorithm, our goal is to prove:• Correctness: because the lack of central information

and because of failures• Termination: because of the lack of central control• Efficiency:

– Time– Work (total number of processors steps)– Number of messages sent– Total size of messages sent


14

Consensus in synchronous crash model

Consensus: • Each processor has its initial value• Goal: processors decide on the same value

among initial ones• We require from the algorithm:

– Agreement: no two processors decide on different value

– Termination: each processor decides eventually unless fails

– Validity: if all initial values are the same then this value is a decision


15

Model for consensus problemWe consider model with crash failures (easier than others, e.g., Byzantine failures): a processor stops every activity, and messages sent during crash are delivered or lost arbitrarily (depending on the adversary)• Asynchronous: impossible to solve even if one

processor can crash• Synchronous: requires at least f + 1 rounds if f

processors crashConsensus can be viewed as a kind of maximum-finding problem: lets agree on the largest initial value (although could be easier, since we could agree on any initial value)


16

Flooding algorithm for consensus• f-resilient algorithm : algorithm that solves

consensus problem if at most f crashes occurFlooding Algorithm: • During each round 1 j f + 1 each processor

sends to all other processors all the initial values about which it has already learnt

• Decision of a processor : if the set of collected initial values is a singleton then decide on this value, otherwise decide on default value (e.g., maximum)


17

Flooding algorithm - example4 processors, f = 2 crashes, default: maximum

Init R1 R2 R3 Decision

p1 : 1 --- --- --- ---

p2 : 0 0,1 --- --- ---

p3 : 0 0 0,1 0,1 1

p4 : 0 0 0 0,1 1


18

Analysis of Flooding algorithm• Agreement: there is a round j (clean) when no crash

occurs. During this round all non-faulty processors exchange messages, hence sets of collected values will be the same after this round. Obviously they will not change after this round, and consequently all non-faulty processors decide the same

• Termination: after round f + 1• Validity: if all initial values are the same, set of collected

initial values is always a singleton, and decision is on this value; otherwise on max among received values

• Message complexity - total number of messages sent: O(f n2)


19

Decreasing message complexityModification of the algorithm:• Processor sends messages to all processors during the first round

and during round j > 1 only if in the previous round it has learnt about a new initial value

• Termination and Validity remain the same• Agreement: similar argument; the only difference that the message

exchange may not happen in a clean round, but by the end of the clean round: all previously learnt values were sent before this round, new ones are sent during this round

• Communication: there are constant number of different values and each of them causes sending it as newly learnt value at most n times, each time to at most n-1 processors, hence in total O(n2) messages.


20

Conclusion and Reading

• Distributed models– Message-passing– Synchronous/asynchronous– Fault-tolerance

• Distributed problems and algorithms– Consensus in synchronous crash setting

Textbook:

• Johnsonbaugh, Schaefer: Algorithms, Chapter 12

• Attiya, Welch: Distributed Computing, Chapter 5Lectures on Parallel and Distributed Algorithms

1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques...

Documents

Transcript of 1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques...