Post on 20-Feb-2019
Distributed AlgorithmsTime, clocks and the ordering of events
Alberto Montresor
University of Trento, Italy
2017/05/19
This work is licensed under a Creative CommonsAttribution-ShareAlike 4.0 International License.
references
O. Babaoglu and K. Marzullo.Consistent global states of distributed systems: Fundamentalconcepts and mechanisms.In S. Mullender, editor, Distributed Systems (2nd ed.).Addison-Wesley, 1993.http://www.disi.unitn.it/~montreso/ds/papers/
ConsistentGlobalStates.pdf.
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Modeling Distributed Executions
Distributed Execution
Definition (Distributed algorithm)
A distributed algorithm is a collection of distributed automata, one perprocess
Definition (Distributed execution)
The execution of a distributed algorithm is a sequence of events executedby the processes
Partial execution: a finite sequence of eventsInfinite execution: a infinite sequence of events
Possible events
send(m, p): sends a message m to process preceive(m): receives a message mlocal events that change the local state
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 1 / 96
Modeling Distributed Executions Histories
Histories
Definition (Local history)
The local history of process pi is a (possibly infinite) sequence of eventshi = e0i e
1i e
2i . . . e
mii (canonical enumeration)
Definition (Partial history)
The partial history up to event eki is denoted hki and is given by theprefix of the first k events of hi
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 2 / 96
Modeling Distributed Executions Histories
Histories
Local histories do not specify any relative timing between eventsbelonging to different processes.
We need a notion of ordering between events, that could help us indeciding whether:
I one event occurs before anotherI they are actually concurrent
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 3 / 96
Modeling Distributed Executions Happen-before
Happen-Before
Definition (Happen-before)
We say that an event e happens-before an event e′, and write e→ e′, ifone of the following three cases is true:
1 ∃pi ∈ Π : e = eri , e′ = esi , r < s(e and e′ are executed by the same process, e before e′)
2 e = send(m, ∗) ∧ e′ = receive(m)(e is the send event of a message m and e′ is the correspondingreceive event)
3 ∃e′′ : e→ e′′ → e′)(in other words, → is transitive)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 4 / 96
Modeling Distributed Executions Happen-before
Space-Time Diagram of a Distributed Computation
e1 1 e2
1 e3 1 e4
1 e5 1 e6
1
e1 2 e2
2 e3 2
e1 3 e2
3 e3 3 e4
3 e5 3 e6
3
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 5 / 96
Modeling Distributed Executions Happen-before
Meaning of Happen-Before
If e→ e′, this means that we can find a series of events e1e2e3 . . . en,where e1 = e and en = e′, such that for each pair of consecutive eventsei and ei+1:
1 ei and ei+1 are executed on the same process, in this order
2 ei = send(m, ∗) and ei+1 = receive(m)
Notes:
happen-before captures the concept of potential causal ordering
happen-before captures a flow of data between two events.
Two events e, e′ that are not related by the happen-before relation(e 6→ e′ ∧ e′ 6→ e) are concurrent, and we write e||e′.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 6 / 96
Modeling Distributed Executions Happen-before
Homework
Prove or disprove that the || relation is transitive.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 7 / 96
Modeling Distributed Executions Happen-before
Reality check
Forse non tutti sanno che...
The multi-threaded memory model of Java is based on thehappen-before relation, where communication between threads is basedon the acquisition and release of locks.
http://download.oracle.com/javase/6/docs/api/java/util/
concurrent/package-summary.html
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 8 / 96
Modeling Distributed Executions Global states and cuts
Global States
Definition (Local state)
The local state of process pi after the execution of event eki isdenoted σkiThe local state contains all data items accessible by that process
Local state is completely private to the process
σ0i is the initial state of process pi
Definition (Global state)
The global state of a distributed computation is an n-tuple of localstates Σ = (σ1, . . . , σn), one for each process.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 9 / 96
Modeling Distributed Executions Global states and cuts
Cut
Definition (Cut)
A cut of a distributed computation is the union of n partial histories,one for each process:
C = hc11 ∪ h
c22 ∪ . . . ∪ h
cnn
A cut may be described by a tuple (c1, c2, . . . , cn), identifying thefrontier of the cut, i.e. the set of last events, one per process.
Each cut (c1, . . . , cn) has a corresponding global state(σc1
1 , σc22 , . . . , σ
cnn ).
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 10 / 96
Modeling Distributed Executions Global states and cuts
Cuts
e1 1 e2
1 e3 1 e4
1 e5 1 e6
1
e1 2 e2
2 e3 2
e1 3 e2
3 e3 3 e4
3 e5 3 e6
3C C’
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 11 / 96
Modeling Distributed Executions Global states and cuts
Consistent cut
Consider cuts C ′ and C in the previous figure.
Is it possible that cut C correspond to a “real” state in theexecution of a distributed algorithm?
Is it possible that cut C ′ correspond to a “real” state in theexecution of a distributed algorithm?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 12 / 96
Modeling Distributed Executions Global states and cuts
Consistent cut
Definition (Consistent cut)
A cut C is consistent, if for all events e and e′,
(e ∈ C) ∧ (e′ → e)⇒ e′ ∈ C
Definition (Consistent global state)
A global state is consistent if the corresponding cut is consistent.
In other words:
A consistent cut is left-closed w.r.t. the happen-before relation
All messages that have been received must have been sent before
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 13 / 96
Modeling Distributed Executions Global states and cuts
Consistent cut
In the previous figures, C is consistent and C ′ is not.
In the space-time diagram, a cut C is consistent if all the arrowsstart on the left of the cut and finish on the right of the cut.
Consistent cuts represent the concept of scalar time in distributedcomputation: it is possible to distinguish between a “before” andan “after”.
Predicates can be evaluated in consistent cuts, because theycorrespond to potential global states that could have taken placeduring an execution.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 14 / 96
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Global predicate evaluation Problem definition
Introduction
Definition (Global Predicate Evaluation)
The problem of detecting whether the global state of a distributedsystem satisfies some predicate Φ.
Motivation
Many problems in distributed systems require a reaction when theglobal state of the system satisfies some condition.
I Monitoring: Notify an administrator in case of failuresI Debugging: Verify whether an invariant is respected or notI Deadlock detection: can the computation continue?I Garbage collection: like Java, but distributed
Thus, the ability to construct a global state and evaluate apredicate over it is a core problem in distributed computing.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 15 / 96
Global predicate evaluation Problem definition
Examples
Garbagecollection
Deadlock
p1
p1
p2
wait-for
wait-for
p1
Message
objectreference
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 16 / 96
Global predicate evaluation Problem definition
Why GPE is difficult
A global state obtained through remote observations could be
obsolete: represent an old state of the system.Solution: build the global state more frequently
inconsistent: capture a global state that could never have occurredin realitySolution: build only consistent global states
incomplete: not “capture” every moment of the systemSolution: build all possible consistent global states
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 17 / 96
Global predicate evaluation Example: deadlock detection
Space-Time Diagram of a Distributed Computation
e1 1 e2
1 e3 1 e4
1 e5 1 e6
1
e1 2 e2
2 e3 2
e1 3 e2
3 e3 3 e4
3 e5 3 e6
3
reqreq
respreq
resp
req
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 18 / 96
Global predicate evaluation Example: deadlock detection
Example – Deadlock detection on a multi-tier system
Processes in the previous figures use RPCs:
Client sends a request for method execution; blocks.
Server receives request.
Server executes method; may invoke methods on other servers,acting as a client.
Server sends reply to client
Clients receives reply; unblocks.
Such a system can deadlock, as RPCs are blocking. It is important tobe able to detect when a deadlock occurs.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 19 / 96
Global predicate evaluation Example: deadlock detection
Runs and consistent runs
Definition (Run)
A run of global computation is a total ordering R that includes all theevents in the local histories and that is consistent with each of them.
In other words, the events of pi appear in R in the same order inwhich they appear in hi.
A run corresponds to the notion that events in a distributedcomputation actually occur in a total order
A distributed computation may correspond to many runs
Definition (Consistent run)
A run R is said to be consistent if for all events e and e′, e→ e′ impliesthat e appears before e′ in R.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 20 / 96
Global predicate evaluation Example: deadlock detection
Runs and consistent runs
e11 e21 e
31 e
41 e
51 e
61 e
12 e
22 e
32 e
13 e
23 e
33 e
43 e
53 e
63 ?
e11 e12 e
13 e
21 e
23 e
33 e
31 e
41 e
43 e
22 e
32 e
51 e
53 e
61 e
63 ?
e1 1 e2
1 e3 1 e4
1 e5 1 e6
1
e1 2 e2
2 e3 2
e1 3 e2
3 e3 3 e4
3 e5 3 e6
3
reqreq
respreq
resp
req
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 21 / 96
Global predicate evaluation Example: deadlock detection
Monitoring Distributed Computations
Assumptions:I There is a single process p0 called monitor which is responsible for
evaluating ΦI We assume that the monitor p0 is distinct from the observed
processes p1 . . . pnI Events executed on behalf of monitoring do not alter canonical
enumeration of “real” events
In general, observed processes send notifications about local eventsto the monitor, which builds an observation.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 22 / 96
Global predicate evaluation Example: deadlock detection
Observations
Definition (Observation)
The sequence of events corresponding to the order in which notificationmessages arrive at the monitor is called an observation.
Given the asynchronous nature of our distributed system, anypermutation of a run R is a possible observation of it.
Definition (Consistent observation)
An observation is consistent if it corresponds to a consistent run.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 23 / 96
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Real clocks vs logical clocks
How to obtain consistent observations
The happen-before relation captures the concept of potentialcausality
In the “day-to-day” life, causality/concurrency are tracked usingphysical time
I We use loosely synchronized watches;I Example: I have withdrawn money from an ATM in Trento at 13.00
on 17th May 2006, so I can prove that I’ve not withdrawn money onthe same day at 13.20 in Paris
In distributed computing systems:I the rate of occurrence of events is several magnitudes higherI event execution time is several magnitudes smaller
If physical clocks are not precisely synchronized, thecausality/concurrency relations between events may not beaccurately captured
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 24 / 96
Real clocks vs logical clocks Logical clocks
Logical clocks
Instead of using physical clocks, which are impossible to synchronize,we use logical clocks.
Every process has a logical clock that is advanced using a set ofrules
Its value is not required to have any particular relationship to anyphysical clock.
Every event is assigned a timestamp, taken from the logical clock
The causality relation between events can be generally inferredfrom their timestamps
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 25 / 96
Real clocks vs logical clocks Logical clocks
Logical clocks
Definition (Logical clock)
A logical clock LC is a function that maps an event e from the historyH of a distributed system execution to an element of a time domain T :
LC : H → T
Definition (Clock Condition)
e→ e′ ⇒ LC (e) < LC (e′)
Definition (Strong Clock Condition)
e→ e′ ⇔ LC (e) < LC (e′)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 26 / 96
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks
Definition (Scalar logical clocks)
Lamport’s scalar logical clock is a monotonically increasingsoftware counter
Each process pi keeps its own logical clock LC i
The timestamp of event e executed by process pi is denoted LCi(e)
Messages carry the timestamp of their send event
Logical clocks are initialized to 0
Update rule
Whenever an event e is executed by process pi, its local logical clock isupdated as follows:
LC i =
{LCi + 1 If ei is an internal or send eventmax{LCi,TS (m)}+ 1 If ei = receive(m)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 27 / 96
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks
1 2 4 5 6 7
1 5 6
1 2 43 5 7
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 28 / 96
Real clocks vs logical clocks Scalar logical clocks
Properties
Theorem
Scalar logical clocks satisfy Clock condition, i.e.
e→ e′ ⇒ LC(e) < LC(e′)
Proof.
This immediately follows from the update rules of the clock.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 29 / 96
Properties
Theorem
Scalar logical clocks satisfy Clock condition, i.e.
e→ e′ ⇒ LC(e) < LC(e′)
Proof.
This immediately follows from the update rules of the clock.
2018-1
2-1
6
DS - Time,clocks,events
Real clocks vs logical clocks
Scalar logical clocks
Properties
Total ordering: It is also possible to provide a total order for events, bysorting based on the process identifier for events with the same timestamp.
Real clocks vs logical clocks Scalar logical clocks
Scalar logical clocks
Theorem
Scalar logical clocks do not satisfy Strong clock condition, i.e.
LC(e) < LC(e′) 6⇒ e→ e′
1 2 4 5 6 7
1 5 6
1 2 43 5 7
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 30 / 96
Scalar logical clocks
Theorem
Scalar logical clocks do not satisfy Strong clock condition, i.e.
LC(e) < LC(e′) 6⇒ e→ e′
1 2 4 5 6 7
1 5 6
1 2 43 5 7
p1
p2
p3
2018-1
2-1
6
DS - Time,clocks,events
Real clocks vs logical clocks
Scalar logical clocks
Scalar logical clocks
LC(p12) < LC(p2
3), yet p12 6→ p2
3
Real clocks vs logical clocks Vector logical clocks
Causal histories clocks
Definition (Causal History)
The causal history of an event e is the set of events that happen-beforee, plus e itself.
θ(e) = {e′ ∈ H | e′ → e} ∪ {e}
Theorem
Causal histories satisfy Strong clock condition
Proof.
∀e 6= e′ : LC(e) < LC(e′)def⇔ θ(e) ⊂ θ(e′)⇔ e ∈ θ(e′)⇔ e→ e′
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 31 / 96
Real clocks vs logical clocks Vector logical clocks
Example
Problem:Causal histories tend to grow too much; they cannot be used as“timestamps” for messages.
e1 1 e2
1 e3 1 e4
1 e5 1 e6
1
e1 2 e2
2 e3 2
e1 3 e2
3 e3 3 e4
3 e5 3 e6
3
reqreq
respreq
resp
req
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 32 / 96
Real clocks vs logical clocks Vector logical clocks
Vector clocks
Causal history projection: θi(e) = θ(e) ∩ hi = hciiθ(e) = θ1(e) ∪ θ2(e) ∪ . . . ∪ θn(e) = hc1
1 ∪ hc22 ∪ . . . ∪ hcnn
In other words, θ(e) is a cut, which happens to be consistent.
Cuts can be represented by their frontiers: θ(e) = (c1, c2, . . . , cn)
Definition
The vector clock associated to event e is a n-dimensional vector VC (e)such that
VC (e)[i] = ci where θi(e) = hcii
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 33 / 96
Real clocks vs logical clocks Vector logical clocks
Vector clocks: implementation
Each process pi maintains a vector clock VC i, initially all zeroes;
When event ei is executed, its vector clock is updated and assumesthe value of VC (ei);
If ei = send(m, ∗), the timestamp of m is TS(m) = VC (ei);
Update rule
When event ei is executed by process pi, VC i is updated as follows:
If ei is an internal or send event:
VC i[i] = V Ci[i] + 1
If ei = receive(m):
VC i[j] = max{V Ci[j], TS(m)[j]} ∀j 6= iVC i[i] = V Ci[i] + 1
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 34 / 96
Real clocks vs logical clocks Vector logical clocks
Example
(1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3)
(0,1,0) (1,2,4)
(4,3,4)
(0,0,1) (1,0,2) (1,0,3)(1,0,4) (1,0,5) (5,1,6)
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 35 / 96
Real clocks vs logical clocks Vector logical clocks
Properties of Vector clocks
“Less than” relation for Vector clocks
V < V ′ ⇔ (V 6= V ′) ∧ (∀k : 1 ≤ k ≤ n : V [k] ≤ V ′[k])
Strong Clock Condition
e→ e′ ⇔ VC (e) < V C(e′)⇔ θ(e) ⊂ θ(e′)
Simple Strong Clock Condition
ei → ej ⇔ VC (ei)[i] ≤ V C(ej)[i]
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 36 / 96
Real clocks vs logical clocks Vector logical clocks
Properties of Vector clocks
Definition (Concurrent events)
Events ei and ej are concurrent (i.e. ei||ej) if and only if:
(V C(ei)[i] > V C(ej)[i]) ∧ (V C(ej)[j] > V C(ei)[j])
In other words, event ei does not happen-before ej , and ej does nothappen before ei.
Example: ?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 37 / 96
Real clocks vs logical clocks Vector logical clocks
Concurrent events
(V C(ei)[i] > V C(ej)[i]) ∧ (V C(ej)[j] > V C(ei)[j])
(1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3)
(0,1,0) (1,2,4)
(4,3,4)
(0,0,1) (1,0,2) (1,0,3)(1,0,4) (1,0,5) (5,1,6)
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 38 / 96
Concurrent events
(V C(ei)[i] > V C(ej)[i]) ∧ (V C(ej)[j] > V C(ei)[j])
(1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3)
(0,1,0) (1,2,4)
(4,3,4)
(0,0,1) (1,0,2) (1,0,3)(1,0,4) (1,0,5) (5,1,6)
p1
p2
p32018-1
2-1
6
DS - Time,clocks,events
Real clocks vs logical clocks
Vector logical clocks
Concurrent events
Example of Concurrent Events: Example: (3,1,3) and (1,2,4)
Real clocks vs logical clocks Vector logical clocks
Properties of vector clocks
Definition (Pairwise Inconsistent)
Events ei and ej with i 6= j are pairwise inconsistent if and only if
(V C(ei)[i] < V C(ej)[i]) ∨ (V C(ej)[j] < V C(ei)[j])
In other words, two events are pairwise inconsistent if they cannotbelong to the frontier of the same consistent cut. The formulacharacterize the fact that the cut include a receive event withoutincluding a send event.
Example: ?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 39 / 96
Real clocks vs logical clocks Vector logical clocks
Pairwise inconsistent
(V C(ei)[i] < V C(ej)[i]) ∨ (V C(ej)[j] < V C(ei)[j])
(1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3)
(0,1,0) (1,2,4)
(4,3,4)
(0,0,1) (1,0,2) (1,0,3)(1,0,4) (1,0,5) (5,1,6)
p1
p2
p3
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 40 / 96
Pairwise inconsistent
(V C(ei)[i] < V C(ej)[i]) ∨ (V C(ej)[j] < V C(ei)[j])
(1,0,0) (2,1,0) (3,1,3) (4,1,3) (5,1,3) (6,1,3)
(0,1,0) (1,2,4)
(4,3,4)
(0,0,1) (1,0,2) (1,0,3)(1,0,4) (1,0,5) (5,1,6)
p1
p2
p32018-1
2-1
6
DS - Time,clocks,events
Real clocks vs logical clocks
Vector logical clocks
Pairwise inconsistent
Example of Pairwise Inconsistent Events: Example: (3,1,3) and (1,0,2)
Real clocks vs logical clocks Vector logical clocks
Properties of Vector Clocks
Definition (Consistent Cut)
A cut defined by (c1, . . . , cn) is consistent if and only if:
∀i, j ∈ [1 . . . n] : VC (ecii )[i] ≥ VC (ecjj )[i])
In other words, a cut is consistent if its frontier does not contain anypairwise inconsistent pair of events.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 41 / 96
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Passive monitoring
A Passive Approach to GPE
How it works
At each (relevant) event, each process sends a notification to themonitor describing it local state
The monitor collects the sequence of notifications, i.e. anobservation of the distributed run.
An observation taken in this way can correspond to:
A consistent run
A run which is not consistent
No run at all
Can you find example of the three cases?Can you explain why this happen?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 42 / 96
Passive monitoring
Observations which are not runs
Problem
Observations may not correspond to a run because the notificationssent by a single process to the monitor may be delayed arbitrarily andthus arrive in any possible order
Solution
To adopt communication channels between the processes and themonitor that guarantee that notifications are never re-ordered
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 43 / 96
Passive monitoring
Notification ordering
Definition (FIFO Order)
Two notifications sent by pi to p0 must be delivered in the same orderin which they were sent:
∀m,m′ : send i(m, p0)→ send i(m′, p0)⇒ deliver0(m)→ deliver0(m
′)
Uh? What is “deliver”?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 44 / 96
Passive monitoring
Delivery Rules
How to order notifications?
To be ordered, each notification m carries a timestamp TS (m)containing “ordering” information
The act of providing the monitor with a notification in the desiredorder is called delivery; the event deliver(m) is thus distinct fromreceive(m).
The rule describing which notifications can be delivered amongthose received is called delivery rule
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 45 / 96
Passive monitoring
FIFO Order - Implementation
Each process maintains a local sequence number incremented ateach notification sent
The timestamp of a notification corresponds to the local sequencenumber of the sender at the time of sending
Definition (FIFO Delivery Rule)
If the last notification delivered by p0 from pj has timestamp s, p0 maydeliver “any” notification m received from pj with TS (m) = s+ 1.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 46 / 96
Passive monitoring
Observations which are not consistent runs
Problem
If we use FIFO order between all processes and p0, all the observationstaken by p0 will be runs; but there is no guarantee that they areconsistent runs.
Solution
To adopt communication channels that guarantee that notifications aredelivered in an order that respects the happen-before relation.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 47 / 96
Passive monitoring
Notification ordering
Definition (Causal Order)
Two notifications sent by pi and pj to p0 must be delivered followingthe happen-before relation:
∀m,m′ : send i(m, p0)→ send j(m′, p0)⇒ deliver0(m)→ deliver0(m
′)
Question
FIFO order among all channels...Is it sufficient to obtain Causal delivery?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 48 / 96
Passive monitoring
Example
p1
p2
p3
m
m’
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 49 / 96
Example
p1
p2
p3
m
m’
2018-1
2-1
6
DS - Time,clocks,events
Passive monitoring
Example
• Given that each communication channels is associated with just onemessage, this distributed system satisfies FIFO order.
• However, it does not satisfy Causal order: the deliver of message m′
should be delivered after message m, because the send event of m ispotentially caused by m′.
• Silly example: p1 sends an invitation for a party to p2 and p3 by SMS;p2 sends a message to p3 saying “Are you going to participate to theparty of p1?” p3 gets offended because it has not been invited.
Passive monitoring
Causal delivery and consistent observations
Theorem
If p0 uses a delivery rule satisfying Causal Order, then all of itsobservations will be consistent.
Proof.
Definition of Causal Order ≡ definition of a consistent observation
Three implementations of the causal delivery rule:
Real-time clocks
Logical (scalar) clocks
Vector clocks
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 50 / 96
Causal delivery and consistent observations
Theorem
If p0 uses a delivery rule satisfying Causal Order, then all of itsobservations will be consistent.
Proof.
Definition of Causal Order ≡ definition of a consistent observation
Three implementations of the causal delivery rule:
Real-time clocks
Logical (scalar) clocks
Vector clocks
2018-1
2-1
6
DS - Time,clocks,events
Passive monitoring
Causal delivery and consistent observations
• Suppose, by contradiction, that the monitor obtains an observationwhich is not consistent.
• This means that an event e′ appears before e in the observation, bute→ e′.
• This means that the notification of e′ has been delivered before thenotification of e
• But this is impossible, because send(me, p0)→ send(me′ , p0) and bythe Casual Delivery rule deliver(me)→ deliver(me′), a contradiction.
Passive monitoring Passive monitoring, v.1
Passive monitoring with real-time
Initial assumptions
All processes have access to a real-time clock RC
Let RC (e) be the real-time at which e is executed
All notifications are delivered within a time δ
The timestamp of notification m corresponding to an event e isTS (m) = RC (e).
Definition (DR1: Real-time delivery rule)
At any time, deliver all received notifications in increasing timestamporder.
Theorem
Observation O constructed by p0 using DR1 is guaranteed to beconsistent (?)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 51 / 96
Passive monitoring Passive monitoring, v.1
Passive monitoring with real-time
Initial assumptions
All processes have access to a real-time clock RC
Let RC (e) be the real-time at which e is executed
All notifications are delivered within a time δ
The timestamp of notification m corresponding to an event e isTS (m) = RC (e).
Definition (DR1: Real-time delivery rule)
At any time, deliver all received notifications in increasing timestamporder.
Theorem
Observation O constructed by p0 using DR1 is guaranteed to beconsistent
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 51 / 96
Passive monitoring Passive monitoring, v.1
Stability of messages
Definition (Stability)
A notification m received by p0 is stable at p0 if no notification m′ withTS (m′) < TS (m) can be received in the future by p0
Definition (DR1: Delivery rule for RC )
Deliver all received notifications that are stable at p0, in increasingtimestamp order
Theorem
Observation O constructed by p0 using DR1 is guaranteed to beconsistent
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 52 / 96
Passive monitoring Passive monitoring, v.1
Proof
Safety: Clock Condition for RC
e→ e′ ⇒ RC (e) < RC (e′)
Note that RC (e) < RC (e′) 6⇒ e→ e′, but this rule is sufficient toobtain consistent observations, as two notifications e→ e′ are neverdelivered in the incorrect order.
Liveness: Stability
At time t, any message sent by time t− δ is stable.
Note that real-time clocks do not support stability; it is the maximumdelay of messages that enables it.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 53 / 96
Passive monitoring Passive monitoring, v.2
Passive monitoring with logical clocks
Initial assumptions
All processes have access to a logical clock LC ;let LC (e) be the logical clock at which e is executed
The timestamp of notification m corresponding to an event e isTS (m) = LC (e)
Definition (DR2: Deliver Rule for LC )
Deliver all received notifications that are stable at p0 in increasingtimestamp order.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 54 / 96
Passive monitoring Passive monitoring, v.2
Passive monitoring with logical clocks
Safety: Clock Condition for LC
e→ e′ ⇒ LC (e) < LC (e′)
Note that LC (e) < LC (e′) 6⇒ e→ e′, but this rule is sufficient toobtain consistent observations, as two events e→ e′ are never orderedincorrectly
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 55 / 96
Passive monitoring Passive monitoring, v.2
Passive monitoring with logical clocks
Liveness: Stability
We need a way to reproduce the concept of δ in an asynchronoussystem, otherwise no notification will be ever delivered.
Solution
Each process communicates with p0 using FIFO delivery
When p0 receives a notification from pi describing an event e withtimestamp TS (e), it is sure that it will never receive a messagefrom pi describing an event e′ with TS(e′) ≤ TS(e)
Stability of notification m at p0 can be guaranteed when p0 hasreceived at least one notification from all other processes with atimestamp greater or equal than TS (m)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 56 / 96
Passive monitoring Passive monitoring, v.2
Problems of Logical Clocks
They add unnecessary delays to observations
They require a constant flux of notifications from all processes
1 2 3 6
2 3 4p1
p2
p01 2 3 5
5
Stable messages
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 57 / 96
Passive monitoring Passive monitoring, v.3
Passive Monitoring with Vector Clocks
Variables maintained at p0
M : the set of notifications received but not yet delivered by p0
D: an array, initialized to 0’s, such that D[k] contains TS(m)[k]where m is the last notification delivered by p0 from process pk.
When a notification is deliverable by p0?A notification m ∈M from process pj is deliverable as soon as p0 canverify that there is no other notification m′ (neither in M nor in thechannels) such that send(m′, p0)→ send(m, p0).
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 58 / 96
Passive monitoring Passive monitoring, v.3
Implementing Causal Delivery with Vector Clocks
m ∈M : a notification sent by pj to p0
m′: the last notification delivered from process pk, k 6= j
Definition (Weak Gap Detection)
If TS (m′)[k] < TS (m)[k] for some k 6= j, then there exists eventsendk(m′′) such that
sendk(m′, p0)→ sendk(m′′, p0)→ send j(m, p0)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 59 / 96
Passive monitoring Passive monitoring, v.3
Implementing Causal Delivery with Vector Clocks
Two conditions to be verified to check if m can be delivered:
There is no earlier message from pj that has not been delivered yet.
Causal Delivery Rule, Part 1 D[j] == TS (m)[j]− 1
∀k 6= j, let m′ be the last message from pk delivered by p0(D[k] = TS (m′)[k]); we must be sure that no message m′′ from pkexists such that: sendk(m′, p0)→ sendk(m′′, p0)→ send j(m, p0)
Causal Delivery Rule, Part 2: ∀k 6= j : D[k] ≥ TS (m)[k]It follows from Weak Gap Detection
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 60 / 96
Passive monitoring Passive monitoring, v.3
Example
(D[j] == TS (m)[j]− 1
)∧(∀k 6= j : D[k] ≥ TS (m)[k]
)
p0
p1
p2
VC1 [0,0] [1,0]
VC2 [0,0] [1,1]
D [0,0] [1,0] [1,1]
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 61 / 96
Passive monitoring Passive monitoring, v.3
Final Comments
We have seen how to implement Causal Delivery “many-to-one”
The same rules apply if we implement a mechanism forimplementing “one-to-many” (reliable broadcast)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 62 / 96
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Proactive monitoring
Snapshot Protocol
Problem
Goal: To build the global state “on demand” of the monitor.
How: By taking “pictures” (snapshots) of the local state wheninstructed
Challenge: To build a consistent global state.
Applications
Failure recovery: a global state (checkpoint) is periodically saved;to recover from a failure, the system is restored to the last savedcheckpoint.
Distributed garbage collection
Monitoring distributed events (e.g., industrial process control)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 63 / 96
Proactive monitoring
Chandy-Lamport Snapshot Algorithm
This particular protocol enables to reason about “channel states”
GPE can be delayed with respect to passive monitoring
Assumption: processes communicate through FIFO channels
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 64 / 96
Proactive monitoring
Snapshot Protocol
Channel State
For each channel between pi and pjI xi,j contains the messages sent by pi but not received yet by pjI xj,i contains the messages sent by pj but not received yet by pi
Note: channel state can be obtained by storing appropriateinformation in the local state, but it is complicated.
Recorded information
Each process pi will record its local state σi and the content of itsincoming channels xj,i.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 65 / 96
Proactive monitoring
Snapshot Protocol
Assumptions:
Communication channels satisfy FIFO order
Assumptions to be relaxed later:
Access to a real time clock RC ;
Message delays are bounded by some known value δ;
Relative process speeds are bounded;
Message m is tagged with a timestamp TS (m) = RC (e), wheree = send(m).
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 66 / 96
Proactive monitoring Snapshot protocol, v.1
Snapshot Protocol, v.1
1 p0 chooses a time ts far enough in the future that a messagecontaining ts sent by p0 will be received by all processes by ts:
ts = now() + δ
2 p0 sends a message “take a snapshot at ts” to all processes in Π
3 When RC i = ts do:1 pi records its local state2 pi sends an empty message to all processes in Π3 pi starts to record all messages received over each incoming channel
4 When pi receives a message m from j such that TS (m) ≥ ts, itstops recording messages for incoming channel j
5 Each pi sends its recorded local state and channel states to p0
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 67 / 96
Proactive monitoring Snapshot protocol, v.1
Snapshot Protocol, v.1
Liveness The empty messages at 3.2 guarantee liveness.Safety This protocol constructs a consistent global state; actually, thisglobal state did in fact occur. Formally:Let Cs be the cut associated to the constructed global state;
(e ∈ Cs) ∧ (e′ → e) ⇒(e ∈ Cs) ∧ (RC (e′) < RC (e)) ⇒ C.C. : e′ → e⇒ RC (e′) < RC (e)
(e′ ∈ Cs) Cs Def. : e ∈ Cs ⇔ RC (e) < ts
Can we use logical clocks instead of a real time clock?
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 68 / 96
Proactive monitoring Snapshot protocol, v.2
Snapshot Protocol, v.2
From real time clocks to logical clocks:
The construct “ when LC = ts do statement” makes no senseI LC s are not continuous (e.g., they can jump from ts − 1 to ts + 1)I When LC = ts, the event that caused the clock update is already
done.
Solution: before pi executes an event e:I If e is an internal or send event, and LC = ts − 2:
F pi executes eF pi executes statement
I If e = receive(m) ∧ TS (m) ≥ ts and LC < ts − 1:F pi puts LC = ts − 1F pi executes statementF pi executes e
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 69 / 96
Proactive monitoring Snapshot protocol, v.2
Snapshot Protocol, v.2
From real time clocks to logical clocks:
We assume that we can found a logical time ts large enough that amessage containing it sent by p0 will be received by all otherprocesses before ts
Impossible in an asynchronous distributed systems
We will relax this assumption later
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 70 / 96
Proactive monitoring Snapshot protocol, v.2
Snapshot Protocol, v.2
1 p0 chooses a logical time ts large enough
2 p0 sends a message “take a snapshot at ts” to all processes in Π
3 p0 sets its logical clock to ts;
4 When LC i = ts do:1 pi records its local state;2 pi sends an empty message to all processes in Π;3 pi starts to record all messages received over each incoming channel.
5 When pi receives a message m from j such that TS (m) ≥ ts, itstops recording messages for incoming channel j
6 Each pi sends its recorded local state and channel states to p0.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 71 / 96
Proactive monitoring Snapshot protocol, v.3
Snapshot Protocol, v.3
We now remove the need for ts:
A process may receive an ”empty message” from a node before the“take snapshot at ts” is actually received
In other words, it may be already aware of the snapshot protocol
We remove the “at ts” and we use snapshot messages instead ofempty ones
We can now remove logical clocks completely, as messages are nottimestamped any more
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 72 / 96
Proactive monitoring Snapshot protocol, v.3
Snapshot Protocol, v.3
1 p0 sends a message snapshot to itself ;
2 when pi receives snapshot for the first time:I let pj be the sender of this messageI pi records its local state σi;I sends snapshot to all processes in Π;I xk,i ← ∅ ∀k 6= i;
3 when pi receives message m 6= snapshot from pk, k 6= jI xk,i ← xk,i ∪ {m}
4 when pi receives a snapshot from pk, k 6= j beyond the first time:I pi stops recording messages in xk,i;
5 when pi has received a snapshot from all processesI pi then sends its recorded local state and the channel states to p0.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 73 / 96
Proactive monitoring Snapshot protocol, v.3
Example
e1 1 e2
1 e3 1 e4
1 e5 1
e1 2 e2
2 e3 2 e4
2 e5 2
p0
p1
p2
SnapshotsRecorded states
Normal messages State p1
State p2
Incoming channel p1
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 74 / 96
Proactive monitoring Snapshot protocol, v.3
Proof of correctness
Theorem
A global state built using the Chandy-Lamport snapshot algorithm isconsistent.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 75 / 96
Proof of correctness
Theorem
A global state built using the Chandy-Lamport snapshot algorithm isconsistent.
2018-1
2-1
6
DS - Time,clocks,events
Proactive monitoring
Snapshot protocol, v.3
Proof of correctness
In order for the global state to be inconsistent, there should be a receiveevent included in the global state for which the corresponding send event isnot included. Let ei = receivei(m) and let ej = send j(m).In order for ej to not belong to the global state, process pj must have re-ceived the first snapshot message before ej . Thus, it has sent a messageto pi containing a snapshot message to pi before sending m. For the FIFOproperty of the channel, pi must have received the snapshot message beforeei. So, it must have stored the local state before receiving m, a contradiction.
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Stable predicates
Stable Predicates
Problem
Let Σ be a global state built by one of the presented methods
It represents a state of the past, potentially with no bearing to thepresent
Does it make sense to evaluate predicate Φ on it?
A special case: stable predicates
Many systems properties have the characteristic that once they becometrue, they remain true.
Deadlock
Garbage collection
Termination
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 76 / 96
Stable predicates
A distributed computation may have many runs
Definition (Leads-to relation)
A consistent run R = e1e2 . . . results in a sequence of consistentglobal states Σ0Σ1Σ2 . . ., where Σ0 denotes the initial global state.
We say that a global state Σ leads to to a global state Σ′, denotedΣ R Σ′ in a consistent run R if:
I R results in a sequence of global states Σ0Σ1Σ2 . . .;I Σ = Σi,Σ′ = Σj , i < j.
We write Σ Σ′ if there is a run R such that Σ R Σ′.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 77 / 96
Stable predicates
A distributed computation may have many runs
Definition (Lattice)
The set of all consistent global states of a computation along withthe leads-to relation defines a lattice;
n orthogonal axis, one per process;
Σk1...kn shorthand for the global state (σk11 , . . . , σ
knn );
The level of Σk1...kn is equal to k1 + · · ·+ kn.
A path in the lattice is a sequence of global states of increasinglevels that corresponds to a consistent run.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 78 / 96
Stable predicates
6 Consistency
52
12
22
42
32
61
2
11
41
1
51
31
21
11
21
132231
3241
3342
43
4453
455463
5564
65
10
00
12
23
01
02
14
24
34
35
03
04
Figure 3. A Distributed Computation and the Lattice of its Global States
UBLCS-93-1 9
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 79 / 96
Stable predicates
Stable Predicates
Consider a global state construction protocol:
Let Σa be the global state in which the protocol is initiated;
Let Σf be the global state in which the protocol terminates;
Let Σs be the global state constructed by the protocol
Since Σa Σs Σf , if Φ is stable, then:
Φ(Σs) = true ⇒ Φ(Σf ) = true
Φ(Σs) = false ⇒ Φ(Σa) = false
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 80 / 96
Stable predicates Deadlock detection
Deadlock Detection
Code
Server code
Server code, modified for snapshot protocol
Monitor code for snapshot protocol
Server code, modified for passive protocol
Monitor code for passive protocol
Notes
No need to store channel state in this case
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 81 / 96
Stable predicates Deadlock detection
Server code
Process pi
Queue pending← new Queue()boolean working← falsewhile true do
while working or pending.size() = 0 doreceive 〈m, pj〉if m.type = request then
pending.enqueue(〈m, pj〉)else if m.type = response then〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pk
while not working and pending.size() > 0 do〈m, pj〉 ← pending.dequeue()〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pk
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 82 / 96
Stable predicates Deadlock detection through active monitoring
Deadlock Detection through Snapshot
Approach
All channels are based on FIFO delivery
Add code to deal with Snapshot messages
Pros and Cons
Generates overhead only when deadlock is suspected
Introduces a delay between deadlock and detection
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 83 / 96
Stable predicates Deadlock detection through active monitoring
Server code, modified for active monitoring (1)
Process pi
Queue pending← new Queue()boolean working← falseboolean[] blocking← {false, ..., false}while true do
while working or pending.size() = 0 doreceive 〈m, pj〉if m.type = request then
blocking[j]← truepending.enqueue(〈m, pj〉)
else if m.type = response then〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pkif m′.type = response then
blocking[k]← false
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 84 / 96
Stable predicates Deadlock detection through active monitoring
Server code, modified for active monitoring (2)
Process pi
else if m.type = snapshot thenif s = 0 then
send 〈snapshot, blocking〉 to p0send 〈snapshot〉 to Π− {pi}
s← (s+ 1) mod n
while not working and pending.size() > 0 do〈m, pj〉 ← pending.dequeue()〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pkif m′.type = response then
blocking[k]← false
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 85 / 96
Stable predicates Deadlock detection through active monitoring
Monitor code, for active monitoring
Process p0
boolean[ ][ ] wfg← new boolean[1 . . . n][1 . . . n]while true do{ Wait until deadlock is suspected }send 〈snapshot〉 to Πfor k ← 1 to n do
receive 〈m, j〉wfg[j]← m.data
if there is a cycle in wfg then{ the system is deadlocked }
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 86 / 96
Stable predicates Deadlock detection through passive monitoring
Deadlock detection through passive monitoring
Approach
Sends a message to p0 for each relevant event
Communication with p0 based on causal delivery
Pros and Cons
Simpler approach, but complexity is hidden by the causal deliverymechanism
Latency limited to message delays
Higher overhead
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 87 / 96
Stable predicates Deadlock detection through passive monitoring
Server code, modified for passive monitoring (1)
Process pi
Queue pending← new Queue()boolean working← falsewhile true do
while working or pending.size() = 0 doreceive 〈m, pj〉if m.type = request then
send 〈requested, j, i〉 to p0pending.enqueue(〈m, pj〉)
else if m.type = response then〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pkif m′.type = response then
send 〈responded, i, k〉 to p0
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 88 / 96
Stable predicates Deadlock detection through passive monitoring
Server code, modified for passive monitoring (2)
Process pi
while not working and pending.size() > 0 do〈m, pj〉 ← pending.dequeue()〈m′, pk〉 ← nextState(m, pj)working← (m′.type = request)send m′ to pkif m′.type = response then
send 〈responded, i, k〉 to p0
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 89 / 96
Stable predicates Deadlock detection through passive monitoring
Monitor code, for passive monitoring
Process p0
boolean[ ][ ] wfg← new boolean[1 . . . n][1 . . . n]while true do
receive 〈m, pj〉if m.type = responded then
wfg[m.from,m.to]← falseelse
wfg[m.from,m.to]← true
if there is a cycle in wfg then{ the system is deadlocked }
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 90 / 96
Contents
1 Modeling DistributedExecutions
HistoriesHappen-beforeGlobal states and cuts
2 Global predicate evaluationProblem definitionExample: deadlock detection
3 Real clocks vs logical clocksLogical clocksScalar logical clocksVector logical clocks
4 Passive monitoringPassive monitoring, v.1Passive monitoring, v.2Passive monitoring, v.3
5 Proactive monitoringSnapshot protocol, v.1Snapshot protocol, v.2Snapshot protocol, v.3
6 Stable predicatesDeadlock detectionDeadlock detection throughactive monitoringDeadlock detection throughpassive monitoring
7 Non-stable predicates
Non-stable predicates
Non-stable predicates
Problems of non-stable predicates
The condition encoded by the predicate may not persist longenough for it to be true when the predicate is evaluated
If a predicate Φ is found to be true by the monitor, we do notknow whether Φ ever held during the actual run.
Conclusions
Evaluating a non-stable predicate over a single computation makesno sense
The evaluation must be extended to the entire lattice of thecomputation
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 91 / 96
Non-stable predicates
Predicates over entire computations
It is possible to evaluate a predicate over an entire computation usingan observation obtained by a passive monitor.
Possibly(Φ): There exists a consistent observation O of thecomputation such that Φ holds in a global state of O.
Definitely(Φ): For every consistent observation O of thecomputation, there exists a global state of O in which Φ holds.
Example (Debugging)
If Possibly(Φ) is true, and Φ identifies some erroneous state of thecomputation, than there is a bug, even if it is not observed during anactual run.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 92 / 96
Non-stable predicates
Example: Possibly((y − x) = 2), Definitely(x = y)
14 Properties of Global Predicates
12
22
42
32
: 6 : 4
52
: 2
2
1
: 3 : 4 : 5
0 10
11
31
21
41
51
61
0211
0321
132231
3241
3342
43
4453
4554
5564
65
10
00
01
12
23
63
2
Figure 16. Global States Satisfying Predicates and 2
UBLCS-93-1 33
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 93 / 96
Non-stable predicates
Predicates over entire computations
Theorem
Possibly and Definitely are not duals:
¬Possibly(Φ) 6⇔ Definitely(¬Φ)
¬Definitely(Φ) 6⇔ Possibly(¬Φ)
Example
Possibly(x 6= y), Definitely(x = y)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 94 / 96
Non-stable predicates
Algorithms for detecting Possibly and Definitely
We use the passive approach in which processes send notificationsof events relevant to Φ to the monitor p0;
Events are tagged with vector clocks;
The monitor collects all the events and builds the lattice of globalstates.How?
To detect Possibly(Φ): if there exists one global state in which Φis true, then return true, otherwise false.
To detect Definitely(Φ): mark nodes where Φ is true with a value1, the other nodes with value 0. If the cost of the shortest pathbetween the initial state and the final state is larger than 0, returntrue, otherwise false.
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 95 / 96
Non-stable predicates
Algorithms for detecting Possibly and Definitely
Problems
The number of states grows exponentially with the number oftotal events.
Techniques can be used to reduce the number of eventsI Only those relevant to ΦI Forcing periodic synchronizationI Reducing the complexity of predicates (conjunction of local
predicates)
Alberto Montresor (UniTN) DS - Time,clocks,events 2017/05/19 96 / 96
Reading Material
O. Babaoglu and K. Marzullo. Consistent global states of distributed systems:Fundamental concepts and mechanisms.
In S. Mullender, editor, Distributed Systems (2nd ed.). Addison-Wesley, 1993.
http:
//www.disi.unitn.it/~montreso/ds/papers/ConsistentGlobalStates.pdf
Reality Check: Interesting links
Clocks are bad, or welcome to distributed systemsWhy vector clocks are easyWhy vector clocks are hardWhy Cassandra doesn’t need vector clocks