Détecteurs de défaillances, mémoire partagée/passages de messages
-
Upload
cholena-harris -
Category
Documents
-
view
29 -
download
0
description
Transcript of Détecteurs de défaillances, mémoire partagée/passages de messages
Détecteurs de défaillances, mémoire partagée/passages de messages
Hugues Fauconnier LIAFA, Université Denis Diderot
Plan Introduction
Objectifs et contexte Objets et mémoire partagée
Mémoire partagée linearisabilté Implémentation wait-free Universalité du consensus
Communication par messages Détecteurs de défaillances Implémentation de la mémoire partagée Implémentation d'objets partagés Hiérarchie du consensus et détecteurs de défaillances
Conclusion(s)
Introduction et contexte
Possible – impossible (FLP) Mémoire partagée - communication
par échanges de messages Objets partagés:
Comparaison et hiérarchie: un test-and-set est-il plus puissant qu'un
compare-and-swap?
Vers les transactions
Introduction…
Détecteur de défaillances: Détecteur minimal et comparaison
(connaissance nécessaire et suffisante sur les pannes)
hiérarchie des problèmes Consensus
Accord sur une valeur Registres Exclusion mutuelle Le plus faible des plus faibles
K-set consensus (accord sur au plus k-valeurs)
Shared memory
Set of processes p1, …, pn (process=sequential thread)
Processes are asynchronous a step can take an arbitrary (finite) time
Processes communicate trough shared data structures (objects) examples: shared memory, test-and-set,
queue..
Objects: an object is defined by its type
e.g.: the type of R is atomic register the type of the object defines a set of
possible states and a set of primitives operations e.g.: the state of the register is the value
stored, the primitives are read() write(v) processes access objects by
primitives operations
Objects: we consider here only atomic objects
a sequential specification defines the behavior of the object (a transition system)
linearizability (=atomicity) operations of concurrent processes may overlap,
but each operation appears to take effect instantaneously between its invocation and its response: the operation appears to be atomic
crashes: if a process crashes between an invocation and
the corresponding response the operation completes or aborts
every invocation by correct processes terminates
Example: atomic register States : the value stored ( initially) Operations: read() and write(v) Sequential specification:
read() returns the value stored write(v) changes the state of the register (the
new state is v)
Linearizability: each time interval between a request / answer of an operation can be reduced to a point such that the history of read/write satisfies the specification
Atomic register With only one writer linearizability is
here equivalent to: a read returns the last value written
if a read is concurrent with a write the read returns either the previous written value or the value of a concurrent write()
if a read operation r precedes another read operation r' then r' cannot return a value written before the one returned by r
can be generalized to multi-writer atomic registers
Linearizable
Write 1 Write 00
Read 0
Write 1 Write 00
Read 1
Linearizable?
Write 10
Read 1 Read 0
impossible
Another example
consensus: sequential specification
propose(0) propose(1)
decide(1)/propose(*)decide(0)/propose(*)
Another example
RMWRMW(r register, f function) returns value
previous := rr :=f(r)return previous
from RMW we get test-and-set, swap, compare-and-swap.
Implementation Given some objects O1, …, Om and
processes p1, …, pn is-it possible to implement another object O? Wait-free implementation:
the implementation is correct (in an intuitive sense)
every invocation from correct processes terminates
moreover a correct process can always terminate its invocation with only its own steps (with objects O1,…,Om)
Wait-free
Wait-free implementation As each process can always finish the
work alone, a wait-free implementation tolerate any number of (crash failure)
very strong assumption!
Wait-free implementations Consider k-consensus (i.e. consensus
between k processes) Let the consensus number for object X be
the largest k such that k-consensus can be implemented with X and atomic registers
(clearly if consensus number for O is strictly greater than consensus number for O', there is no implementation for O using only O')
Wait-free implementations
Results registers have consensus number equals
to 1 (FLP) test-and-set has consensus number
equals to 2 … for each n there some objects with
consensus number n
Example
FIFO queue:decide(v) returns val
prefer[P]:=vif deq(q) =
then return prefer[P]else return prefer[Q]
With FIFO and registers it is possible to get 2-consensus but not 3-consensus
Results
Universality of consensus (Herlihy):the n-consensus is universal in a system of
n processes: every object shared by n processes can be (wait-free) implemented with n-consensus and registers
(principle of the proof: with help of a n-consensus processes agree on the history of the object)
Plan Objects
shared memory model linearizability wait-free implementation Main results: universality of consensus
Message passing failure detectors shared memory implementation object implementation Consensus Hierarchy with failure detectors
Conclusion
Message passing The previous results prove that generally (at
least) objects with consensus number >1 cannot be implemented with only registers
Instead of sharing data structures it is interesting to consider message passing models message passing: processes don't share data but
can send and receive messages (Note that message passing could be defined in
the previous general framework– communication channels are then the shared data structures)
Message passing model Processes communicate by messages Communication is asynchronous (no bound
on communication delays) Communication is point-to-point and reliable Processes can fail by crashing
Message passing models are suitable and natural for networks
(shared objects models are more suitable for hardware)
Message passing
In message passing it is interesting to implement objects: objects are easier to work with some objects are natural in message
passing models (e.g. registers consensus)
Atomic register: practical point of view
Data server Ensure safety properties
If a value is written it is available (even if the writer disappears)
When a process ends its write() then all next read() will return this value (or a value written later) –note that the writer knows when the write
ends
Shared register implementation With only one reader and one writer and a
majority of correct processes (sketch): for the k-th write
to write(v): the writer sends (v,k) to all processes and waits for receiving an "ack" from a majority of processes.
to read(): the reader asks all processes and waits for receiving an answer (v,k) from a majority of processes; the value read is the value with the greatest k
when a process receives (v,k) from the writer it stores (v,k) and then sends an "ack" to the server
when a process receives a query from the reader it answers with the stored (v,k).
It works… because:
by the majority assumption there is always at least one process that participates to the last write and the read. then the read returns the last written value (but this implementation is not really atomic: if the
writer crashes during a write, next reads could returns the previous value or the new one. It is not very difficult to fix it: the reader always value with
maximal timestamp)
(some classical algorithms enables to implement general atomic registers from atomic register with one reader and one writer)
Implementation issues
in message passing there is no implementation of consensus (even if at most one process can crash)
the implementation of registers needs to have a majority of correct processes
Then … failure detectors The impossibility results come from
crashes (without failure all these problems are easy to solve).
Then: add oracles giving (possibly unreliable) information about crashes. what information about crashes of
processes enable to solve the problem? what information about crashes is
needed?
Failure detectors distributed "oracle" F:
at each time t a process can ask the failure detector and gets an answer (generally the answer is a list of processes
suspected to be dead) the output is not the same at each process
the output of failure detector F depends only on the history of crashes (not on the states of processes).
Example: perfect failure detector output: lists of suspected processes
if p is in the list for q then p is crashed if p is crashed then p will eventually belong to
the list of suspected processes of q
Failure detector comparison
Reduction: Failure detector F is weaker than failure
detector F' (F≤F') if F can be implemented from F'
≤ defines a partial order
Minimal Failure Detector
Given a problem P, F is a minimal failure detector for P if and only if With help of F, P can be solved if F' enables to solve P then F ≤ F'
Then if F is a minimal failure detector for P: F encapsulates the information about
crashes needed to solve P
Minimal Failure Detector Why look for the minimal failure detector?
find the needed information about crashes compare problems: if the minimal failure
detector for P is weaker than the minimal failure detector for P' then P is easier than P'
(from a practical point of view the knowledge of the minimal failure detector helps to find the assumptions on the underlying system to solve the problem)
Then to implement Objects: In message passing
for each object O find the minimal failure detector to implement O
from the comparison between these failure detectors we get an hierarchy on these objects
Then we get 2 hierarchies on objects consensus number as defined before minimal failure detector needed for the
object
S-register
Begin with registers (consensus number =1) S-register is an atomic register in which
only processes in S can read or write (but all processes may participate to its implementation)
Weakest failure detector
with a majority of correct processes atomic registers can be implemented without failure detector
but without a majority of correct processes? Failure detector Σ
Failure detector ΣS
ΣS(p,t) (output for process p of failure detector ΣS at time t) is a list of trusted processes. (q Є ΣS(p,t) means that p considers that q is not dead at time t)
Intersection: for each process p, q in S, for each time tout t , t’ : ΣS(p,t) ΣS(q,t’) is not empty (at least one process is trusted by p and q)
Completeness: There is a time t such that for each correct process in S for each time t’>t ΣS(p,t’) contains only correct processes
Remarks
with a majority of correct processes ΣS can be implemented in asynchronous systems.
ΣS gives a kind of quorum (a quorum is a family of sets such that two elements of the family always have a non empty intersection).
Theorem
ΣS is the weakest failure detector to implement S-register
sufficient part: adapt the previous algorithm
necessary part: more difficult…
S-Consensus S is a set of processes S-consensus
processes in S propose value and have to (irrevocably) decide. The decision has to ensure: Validity: the decision value has been
proposed Agreement: if p and q decide they decide
the same value Termination: every correct process
eventually decides
ΩS
ΩS(p,t) (output for p of failure detector ΩS at time t) is a process (the leader)
Eventual leader election: there is a time t, there is a correct process l, such that for every correct process p in S for all time t’>t ΩS (p,t’)=l
intuitively: after some time all processes agree on the same leader forever
Theorem
ΣS*ΩS is the weakest failure detector for S-consensus.
(ΣS*ΩS outputs both ΣS and ΩS)
For the proof
(necessary condition) Adaptation of the proof of Chandra,
Hadzilacos et Toueg: from an S-consensus algorithm using a failure detector, implement ΩS
With reliable broadcast and S-consensus implement S-register, (then use the previous theorem)
For the proofSufficient conditionprocess in Sforever
C:=1 +r mod nSend(Coord, v,r) to Cwait for receiving (One,*,r) from C or suspect C in ΩS
if receeived (One,w,r) then FromCoord:=w else undefSend(Keep,FromCoord,r) to allwait for receiving (Two,*,r) form all processes in ΣS If there only one value v received
decide this value vsend (decide,v) to allstop
else if received only 2 values (w and undef) then v:=w
all processes When received (Coord,*,k) for the first time
(let (Coord,x,k) this message ) send (One,x,k) to all processes in S
When received (Keep,*,k) for the first time, (let (Keep,x,k) this message ) send Two,x,k) to all processes in S
k-consensus
k-consensus = consensus between any subset of k processes
Result: for 2<=k<=n: The weakest failure
detector for k-consensus is Σ*Ω
proof (idea):
consider case k=2 From the previous results:
the weakest failure detector for 2-consensus is the set of ΣS*ΩS for all subsets with 2 elements
Proof
From these ΣS (S is the set of subsets with two elements) atomic registers can be implemented then we get Σ
From these ΩS (S is the set of subsets with two elements) it is possible to implement Ω: let G=(X,E) the graph where X is the set of
processes, and (p,q)ЄE if there is x such that q is an eventual leader pour Ω p,x. Consider the strongly connected components of: there is an unique sink connected component and this sink contains (eventually) only correct processes.
p q
p has q as leader
the sink
Proof
(sketch)From this we deduce an algorithm for
Ω :all processes approximate this graph and compute the sink: the output of the emulated failure detector is this sink. Eventually, this sink contains only correct processes. (then extract the same leader in this sink)
Then we get Ω
Corollary
If the consensus number of atomic object T is 2:
Then: The weakest failure detector for T is
Σ*Ω Every failure detector implementing T
implements any object. (in other word T is universal for all n)
Corollary
Concerning message passing models with failure detectors there is only two classes for objects: no consensus k=1 (atomic registers Σ ) k>1 then consensus for every n (Σ*Ω )
Conclusion In shared memory objects are given (by hardware)
and can be compared with consensus number for example: no implementation of compare-&-swap
with test-&-set and registers In message passing with failure detectors objects are
implemented and there is (essentially) two classes (with consensus and without consensus) all objects with consensus number>1 are equivalent!
Implementation with shared objects and implementation in message passing with failure detectors are not the same!
Conclusion…
Mémoire transactionnelle: Abortable objects Rendre atomique des séquences de code
La hiérarchie des détecteurs de défaillances Mémoire partagée – message passing
K-set agreement Le plus petit faible des plus faibles?