1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques...
Transcript of 1 Lectures on Parallel and Distributed Algorithms COMP 523: Advanced Algorithmic Techniques...
1
Lectures on Parallel and Distributed
Algorithms
COMP 523: Advanced Algorithmic Techniques
Lecturer: Dariusz Kowalski
Lectures on Parallel and Distributed Algorithms
Overview
These lectures:• Parallel machine
– Prefix computation
• Distributed computing– Consensus problem
2Lectures on Parallel and Distributed Algorithms
3
Parallel machine - model• Set of n processors and m memory cells• Computation in synchronized rounds:
– During one round each processor does either of• local computation step (constant local cache)• read/write to shared memory
• Minimize:– Time– Work (total number of processors steps)– Number of processors– Additional memory
Lectures on Parallel and Distributed Algorithms
4
Types of parallel machines• EREW: Exclusive Read Exclusive Write• CREW: Concurrent Read Exclusive Write• ERCW: Exclusive Read Concurrent Write• CRCW: Concurrent Read Concurrent Write
• In each round a cell can be either read or written• Exclusive Read/Write: only one processor can
read/write to a memory cell during one round• Concurrent Read/Write: many processors can
read/write to a memory cell during one round• Concurrent Write: arbitrary, maximum, sum, etc.
Lectures on Parallel and Distributed Algorithms
5
Problem - prefix computation
• Input: m memory cells with integers• Goal: for each cell i compute a function F(1,i),
where F(,) is such that – F(i,k) can be computed in constant time from F(i,j)
and F(j+1,k) for any j between i and k– F(i,i) is a value stored originally in cell i
• Examples:– Computing a maximum (for every prefix)– Computing a sum (for every prefix)
Lectures on Parallel and Distributed Algorithms
6
CRCW - simple solution
• Let the result of the concurrent writing of two processors be according to the function F(,)
• m memory cells, m additional memory cells, m2 processors
Algorithm: • Processor with Id im+j reads cell i j m and then
writes the value to cell j
Time: 2 Memory: m Work: O(m2)
Lectures on Parallel and Distributed Algorithms
7
EREW - algorithm• m memory cells, n = m/log m processors• Additional array M[1…n]Recursive Algorithm: • Parallel Preprocessing:
each processor i sequentially computes functions F(i log m + 1 , i log m + 1) ,… , F(i log m + 1,(i+1)log m)then writes M[i] := F(i log m + 1,(i+1)log m)
• Parallel Recursion (pointer jumping):in step 1 t log n if i - 2t-1> 0 then a processor with ID i reads M[i - 2t-1] and combines it with its current value M[i] -- as if M[i - 2t-1] correspond to F((i - 2t) log m + 1 , (i - 2t-1) log m) and as if M[i] correspond to F((i - 2t-1) log m + 1 , i log m) -- and writes the result to M[i]
• Parallel Post-processing:each processor i sequentially computes functions F(1 , i log m + 1),…,F(1 , (i+1)log m) using value F(1, i log m) stored in M[i] and previously computed (in preprocessing part) values F(i log m + 1 , i log m + 1) , … , F(i log m + 1,(i+1)log m)
Lectures on Parallel and Distributed Algorithms
8
Analysis• Correctness:
It is sufficient to show that after step t of recursive part each location M[i] contains computed value F(max{1 , (i - 2t) log m + 1} , i log m) Proof by induction: for t = 1 it follows from initialization of M and preprocessing part;the inductive step follows immediately from the recursive algorithm
• Memory: O(n) for additional memory M used during recursionor none if modify the original values
• Time: O(log m)Parallel preprocessing and post-processing: O(log m)Parallel recursion: O(log m)
• Work: O(m)time O(log m) times number of processors O(m/log m)
Lectures on Parallel and Distributed Algorithms
9
Conclusions
• Prefix computation– Finding maximum/minimum– Computing sums
for all m prefixes, in optimal logarithmic time and linear work
Lectures on Parallel and Distributed Algorithms
10
Textbook and Questions
• How to modify the prefix algorithms for smaller/larger number of processors?
• There is given a regular expression containing braces of type ( ) and [ ]. How to check in parallel, in logarithmic time, if it is a proper expression (each open brace has its corresponding closing counterpart)?
Is it easier if there is only one kind of braces in the expression?
Lectures on Parallel and Distributed Algorithms
11
Distributed message-passing model
• Set of n processors/processes with different IDs {p1,...,pn}• In each step each processor can either (depending on the
algorithm)– send a message to any subset of other processors– receive incoming messages– perform local computation
• Computation can be either (depending on the adversary) – in synchronized rounds: in a round every processor performs
three steps: local computation, sending and receiving, e.g., (p1,p2, p3), (p1,p2, p3), (p1,p2, p3),...
– in asynchronous pattern: steps are done according to some arbitrary order unknown to the processors, e.g., p1,p2,p2,p3,p2,p3,p2,p1,...Lectures on Parallel and Distributed Algorithms
12
Fault-tolerance
Failures in the system:• Lack of synchrony: unknown order of steps is generated
by the adversary• Processors’ crashes: adversary decides which processors
crash and chooses steps for these events• Messages are lost (not properly sent or received):
malicious processors/links are selected by the adversary• Byzantine failures: processors may cheat, e.g., can behave
on the way described above, mess up content of messages, pretend they have different ID, etc.
Lectures on Parallel and Distributed Algorithms
13
Analysis of distributed algorithms
Designing the algorithm, our goal is to prove:• Correctness: because the lack of central information
and because of failures• Termination: because of the lack of central control• Efficiency:
– Time– Work (total number of processors steps)– Number of messages sent– Total size of messages sent
Lectures on Parallel and Distributed Algorithms
14
Consensus in synchronous crash model
Consensus: • Each processor has its initial value• Goal: processors decide on the same value
among initial ones• We require from the algorithm:
– Agreement: no two processors decide on different value
– Termination: each processor decides eventually unless fails
– Validity: if all initial values are the same then this value is a decision
Lectures on Parallel and Distributed Algorithms
15
Model for consensus problemWe consider model with crash failures (easier than others, e.g., Byzantine failures): a processor stops every activity, and messages sent during crash are delivered or lost arbitrarily (depending on the adversary)• Asynchronous: impossible to solve even if one
processor can crash• Synchronous: requires at least f + 1 rounds if f
processors crashConsensus can be viewed as a kind of maximum-finding problem: lets agree on the largest initial value (although could be easier, since we could agree on any initial value)
Lectures on Parallel and Distributed Algorithms
16
Flooding algorithm for consensus• f-resilient algorithm : algorithm that solves
consensus problem if at most f crashes occurFlooding Algorithm: • During each round 1 j f + 1 each processor
sends to all other processors all the initial values about which it has already learnt
• Decision of a processor : if the set of collected initial values is a singleton then decide on this value, otherwise decide on default value (e.g., maximum)
Lectures on Parallel and Distributed Algorithms
17
Flooding algorithm - example4 processors, f = 2 crashes, default: maximum
Init R1 R2 R3 Decision
p1 : 1 --- --- --- ---
p2 : 0 0,1 --- --- ---
p3 : 0 0 0,1 0,1 1
p4 : 0 0 0 0,1 1
Lectures on Parallel and Distributed Algorithms
18
Analysis of Flooding algorithm• Agreement: there is a round j (clean) when no crash
occurs. During this round all non-faulty processors exchange messages, hence sets of collected values will be the same after this round. Obviously they will not change after this round, and consequently all non-faulty processors decide the same
• Termination: after round f + 1• Validity: if all initial values are the same, set of collected
initial values is always a singleton, and decision is on this value; otherwise on max among received values
• Message complexity - total number of messages sent: O(f n2)
Lectures on Parallel and Distributed Algorithms
19
Decreasing message complexityModification of the algorithm:• Processor sends messages to all processors during the first round
and during round j > 1 only if in the previous round it has learnt about a new initial value
• Termination and Validity remain the same• Agreement: similar argument; the only difference that the message
exchange may not happen in a clean round, but by the end of the clean round: all previously learnt values were sent before this round, new ones are sent during this round
• Communication: there are constant number of different values and each of them causes sending it as newly learnt value at most n times, each time to at most n-1 processors, hence in total O(n2) messages.
Lectures on Parallel and Distributed Algorithms
20
Conclusion and Reading
• Distributed models– Message-passing– Synchronous/asynchronous– Fault-tolerance
• Distributed problems and algorithms– Consensus in synchronous crash setting
Textbook:
• Johnsonbaugh, Schaefer: Algorithms, Chapter 12
• Attiya, Welch: Distributed Computing, Chapter 5Lectures on Parallel and Distributed Algorithms