DistributedAlgorithms
(22903)
Lecturer: Danny Hendler
Shared objects: linearizability, wait-freedom and simulations
Most of this presentation is based on the book “Distributed Computing” by Hagit attiya & Jennifer Welch.
Some slides are based on presentations of Nir Shavit.
210
Back to shared memory:Shared Objects
memory
object object
3
Shared Objects (cont’d)
• Each object has a state– Usually given by a set of shared memory
fields
• Objects may be implemented from simpler base objects.
• Each object supports a set ofoperations– Only way to manipulate state– E.g. – a shared counter supports the
fetch&increment operation.
4
Shared Objects Correctness
Correctness of a sequential counter
• fetch&increment, applied to a counter with value v, returns v and increments the counter’s value to (v+1).
• Values returned by consecutive operations:
0, 1, 2, …But how do we define the correctness
of a shared counter?
5
time
q.enq(x)
q.enq(y)
q.deq(x)
q.deq(y)
fetch&inc
fetch&inc
fetch&inc
fetch&inc
time
Shared Objects Correctness (cont’d)
There is only a partial order between operations!
Invocation Response
6
Shared Objects Correctness (cont’d)
An invocation calls an operation on an object.
c.f&I ()
object
method
arguments
7
Shared Objects Correctness (cont’d)
An object returns the response of the operation.
c: 12
object
response
8
Shared Objects Correctness (cont’d)
A sequential object history is a sequence of matching invocations and responses on the object.
Example: a sequential history of a queue
q.enq(3)q:voidq.enq(7)q:voidq.deq()q:3q.deq()q:7
9
Shared Objects Correctness (cont’d)
Sequential specification
The correct behavior of the object in the absence of concurrency. A set of legal sequential object histories.
Example: the sequential spec of a counter
H0: H1: c.f&i() c:0H2: c.f&i() c:0 c.f&i() c:1 H3: c.f&i() c:0 c.f&i() c:1 c.f&i() c:2H4: c.f&i() c:0 c.f&i() c:1 c.f&i() c:2 c.f&i() c:3
.
.
.
10
Shared Objects Correctness (cont’d)
Linearizability
An execution is linearizable if there exists a permutation of the operations on each object o, , such that
• is a sequential history of o
• preserves the partial order of the execution.
11
Example
time
q.enq(x)
q.enq(y) q.deq(x)
q.deq(y)
linearizableq.enq(x)
q.enq(y) q.deq(x)
q.deq(y)
time
(6)
12
Example
time
q.enq(x)
q.enq(y)
q.deq(y)
not
linearizableq.enq(x)
q.enq(y)
(5)
13
Example
time
q.enq(x)
q.deq(x)
q.enq(x)
q.deq(x)
linearizable
time
(4)
14
Example
time
q.enq(x)
q.enq(y)
q.deq(y)
linearizable
q.deq(x)
time
q.enq(x)
q.enq(y)
q.deq(y)
q.deq(x)
q.enq(x)
q.enq(y)
q.deq(y)
q.deq(x)
multiple orders
OK
(8)
15
Wait freedomWait-freedom
An algorithm is wait-free if every operation terminates after performing some finite number of events.Wait-freedom implies that there is no use of locks (no mutual exclusion).
Thus the problems inherent to locks are avoided:
• Deadlock
• Priority inversion
16
Wait-free linearizable implementations
Example: the sequential spec of a register
H0: H1: r.read() r:initH2: r.write(v1) r:ack H3: r.write(v1) r:ack r.read() r:v1 r.read() r:v1 H4: r.write(v1) r:ack r.write(v2) r:ack r.read() r:v2 ...
Read returns the value written by last Write (or init value if there were no preceding writes)
17
Wait-free (linearizable) register simulations
Binary single-reader/single-writer register
(Multi-valued) single-reader/single-writer register
multi-reader/single-writer register
multi-reader/multi-writer register
18
A wait-free (linearizable) implementation of a single-writer-single-reader (SRSW) multi-valued register from binary SRSW registers
Would the above implementation of a k-valued register (initialized to i) work?
Initially B[0]…B[k-1]=0, B[i]=1 (i is the initial value of R)Read(R)
Return the index of the single entry of B that equals 1
Write(R, v) Write 1 to B[v], clear the entry corresponding to the
previous value (if other than v).
No!
19
An example of a non-linearizable execution
Initially B[0]…B[2]=0, B[3]=1
Read
Write(1) Write(2)
Read B[0]
Return 0
Read B[1]
Return 0
Write 1 to B[1]
Ack Write 0to B[3]
Ack Write 1 to B[2]
Ack
Read B[2]
Return1
Ack
Return 2 Read
Read B[0]
Return 0
Read B[1]
Return 1
Write 0 to B[1]
Ack
Ack
Return 1
= linearization point
Write(1) precedes Write(2) ANDRead(2) precedes Read(1).
This is not linearizable!
20
A Wait-free Linearizable ImplementationInitially B[v]=1 and all other entries equal 0, where v is the initial value of R.
Read(R)1. i:=02. while B[i]=0 do i:=i+13. up:= i, v:=i4. for i=up –1 downto 0 do5. if B[i]=1 then v:=i6. return v
Write(R,v)1. B[v]:=12. For i:=v-1 downto 0 do B[i]:=03. return ack
21
The linearization orderWrite1(R,1) Write2(R,4) Write3(R,3) Write4(R,1)
Read1(R, init) Read2(R, 4) Read4(R, 3)
Write1(R, 1)
Write2(R, 4)
Write3(R, 3)
Write4(R, 1)
Read1(R, init)
Read2(R, 4)
Read3(R, 4)
Read4(R, 3)
Read5(R, 1)
Writes linearized first
All reads from a specific write linearized after it, in their real-time order.
Read3(R, 4) Read5(R, 1)
22
Correctness proof for the SRSW multi-valued register
simulation
23
Illustration for Lemma 1
B01
v
u
1
0
v1
24
Illustration for Lemma 1
B01
v
u
1
0
v1 0
v2
25
A wait-free Implementation of a (muti-valued) multi-reader register from (multi-valued)
SRSW registers.
26
Illustration for Lemma 1
B01
v
u
1
0
Written by W
v1 Written by W1
27
Illustration for Lemma 1
B01
v
u
1
0
v1 0
Written by W
Written by W1
v2 Written by W2
28
Illustration for Lemma 2W(v
)E: R
Rπ:W(v
)(v’)
Case 1: v’ ≤ v
v
v’ 1 Written by W’
1 Written by W
0
00
0
0
W’(v’)
W’(v’)
29
Illustration for Lemma 2 (cont’d)
W(v)E: R
Rπ:W(v
)W’(v
’)
W’(v’)
(v’)
Case 2: v’ > v
v’ 1 Written by W’
v 1 Written by WWritten by W’’ 0
From Lemma 1,
R returns a value written
by an operation later than
W’’!
W’(v’)
W’(v’)
30
Illustration for Lemma 3
E:R1
R2
π: R2R1
W1(v1)W2(v2)
Case 1: v1 = v2
v1=v2 1 Written by W21Written by W1
31
Illustration for Lemma 3 (cont’d)
E:R1
R2
π: R2R1
W1(v1)W2(v2)
Case 2: v1 > v2
v1 1 Written by W1
v2 1 Written by W2
Since R1 precedes R2 and R2 reads from W2, R1 sees 1 in v2
when scanning
down
32
Illustration for Lemma 3 (cont’d)
E:R1
R2
π: R2R1
W1(v1)W2(v2)
Case 3: v1 < v2
v2 1 Written by W2
v1 1 Written by W1
From Lemma 1, R2 returns
a value written by an
operation later than
W3!
0Written by W3
33
A wait-free Implementation of a (muti-valued) multi-reader register
from (multi-valued) SRSW registers.
34
Would this work?
Read(R) by pi
1. return Val[i]
Write(R,v)1. For i:=0 to n-1 do Val[i]:=v2. return ack
SRSW Val[i]: The value written by the writer for reader pi
Is the algorithm wait-free?Is the algorithm linearziable?
Yes
Nope
35
An example of a non-linearizable execution
Initially Val[0]=Val[1]=0
Read
Read Val[0]
Return 1
= linearization point
Read(1) precedes Read(0).
This is not linearizable!
Write(1)
Write 1 to Val[0]
Ack Write 1to Val[1]
Ack
AckPw:
P0:
P1:
Return 1
Read
Read Val[1]
Return 0
Return 0
36
A proof that: no such implementation is
possible, unless the readers…write!
37
A wait-free Implementation of a (muti-valued) multi-reader register from (multi-valued) SRSW
registers.
Data structures used
•Values are pairs of the form: <val, sequence-number>.
•Sequence-numbers are ever increasing.
Val[i]: The value written by pw for reader pi, for 1 ≤ i ≤ n
Report[i,j]: The value returned by the most recent read operation performed by pi; written by pi and read by pj, 1 ≤ i,j ≤ n.
38
A wait-free Implementation of a multi-reader register from SRSW registers
(cont’d).Initially Report[i,j]=Val[i]=(v0, 0), where v0 is R’s initial value.
Read(R) ; performed by process pr
1. (v[0],s[0]):=Val[r] ; most recent value returned by writer2. for (i:=1 to n do)
(v[i],s[i])=Report[i,r] ; most recent value reported to pr by reader pi
3. Let j be such that s[j]=max{s[0], s[1], …, s[n]}4. for i:=1 to n do Report[r,i]=(v[j],s[j]) ; pr reports to all readers
5. Return (v[j])Write(R,v) ; performed by the single writer
1. seq:=seq+12. for i=1 to n do Val[i]=(v,seq)3. return ack
39
The linearization orderWrite(v1, 1) Write(v2,2) Write(v3,3) Write(v4,4)
Read1(init, 0)
Read2(v1, 1)
Read4(v2, 2)
Read3(v2, 2)
Read5(v4, 4)
Write(v1, 1)
Write(v2, 2)
Write(v3, 3)
Write(v4, 4)
Read1(init, 0)
Read2(v1, 1)
Read3(v2, 2)
Read4(v2, 2)
Read5(v4, 4)
Writes linearized first
Reads linearized according to increasing order of response, and put after the write with same sequence ID.
40
A wait-free Implementation of a multi-reader-multi-writer register from multi-reader-single-writer
registers
41
A wait-free Implementation of a MRMW register from MRSW registers.
Data structures used
•Values are pairs of the form: <val, sequence-number>.
•Sequence-numbers are ever increasing.
TS[i]: The vector timestamp of writer pi, for 0 ≤ i ≤ m-1. Written by pi and read by all writers.
Val[i]: The latest value written by writer pi, for 0 ≤ i ≤ m-1, together with the vector timestamp associated with that value. Written by pi and read by all n readers.
42
Concurrent timestamps
• Provide a total order for write operations
• The total order respects the partial order of write operations
• Timestamp implemented as vectors
• Ordered by lexicographic order
• Each writer increments its vector entry
43
Concurrent timestamps example
Writer 1
Writer 2
Writer 3
TS[1]
TS[2]
TS[3]
<0,0,0>
<0,0,0>
<0,0,0>
Order:<0,0,0>
< , , >
0
100
44
Concurrent timestamps example
Writer 1
Writer 2
Writer 3
TS[1]
TS[2]
TS[3]
<1,0,0>
<0,0,0>
<0,0,0>
Order:<0,0,0>
< , , >100
< , , >110
<1,0,0>
45
Concurrent timestamps example
Writer 1
Writer 2
Writer 3
TS[1]
TS[2]
TS[3]
<1,0,0>
<1,1,0>
<0,0,0>
Order:<0,0,0>
< , , >100
< , , >110
<1,0,0><1,1,0>
< , , >
< , , >
1 1 1
1 2 1
<1,2,1><1,1,1>
46
A wait-free Implementation of a MRMW register from MRSW registers.
Initially TS[i]=<0,0,…,0> and Val[i] equals the initial value of R
Read(R) ; performed by reader pr
1. for i:=0 to m-1 do (v[i], t[i]):=Val[i] ; v and t are local2. Let j be such that t[j]=max{t[0],…,t[m-1]} ; Lexicographic max3. Return v[j]
Write(R,v) ; performed by the writer pw
1. ts=NewCTS() ; Writer pw obtains a new vector timestamp
2. Val[w]:=(v,ts)3. return ack
Procedure NewCTS() ; called by writer pw
1. for i:=0 to m-1 do2. lts[i]:=TS[i].i ; extract the i’th entry from TS of the i’th writer3. lts[w]=lts[w]+1 ; Increment own entry4. TS[w]=lts ; write pw’s new timestamp
5. return lts
47
The linearization orderWrite(v1, <1,0>) Write(v4,<2,2>)
Read1(init, <0,0>)
Read2(init, <0,0>)
Read4(v2, <1,1>)
Read3(v2, <1,1>)
Read5(v4, <2,2>)
Write(v1, <1,0>)
Write(v2, <1,1>)
Write(v3, <1,2>)
Write(v4, <2,2>)
Read1(init, <0,0>)
Read2(init, <0,0>)
Read3(v2, <1,1>)
Read4(v2, <1,1>)
Read5(v4, <2,2>)
Writes linearized first by timestamp order
Reads considered according to increasing order of response, and put after the write with same timestamped
Writer 1
Writer 2Write(v2, <1,1>) Write(v3,<1,2>)
Reader 1
Reader 2
Top Related