Détecteurs de défaillances, mémoire partagée/passages de messages

Détecteurs de défaillances, mémoire partagée/passages de messages

Hugues Fauconnier LIAFA, Université Denis Diderot

Plan Introduction

Objectifs et contexte Objets et mémoire partagée

Mémoire partagée linearisabilté Implémentation wait-free Universalité du consensus

Communication par messages Détecteurs de défaillances Implémentation de la mémoire partagée Implémentation d'objets partagés Hiérarchie du consensus et détecteurs de défaillances

Conclusion(s)

Introduction et contexte

Possible – impossible (FLP) Mémoire partagée - communication

par échanges de messages Objets partagés:

Comparaison et hiérarchie: un test-and-set est-il plus puissant qu'un

compare-and-swap?

Vers les transactions

Introduction…

Détecteur de défaillances: Détecteur minimal et comparaison

(connaissance nécessaire et suffisante sur les pannes)

hiérarchie des problèmes Consensus

Accord sur une valeur Registres Exclusion mutuelle Le plus faible des plus faibles

K-set consensus (accord sur au plus k-valeurs)

Shared memory

Set of processes p1, …, pn (process=sequential thread)

Processes are asynchronous a step can take an arbitrary (finite) time

Processes communicate trough shared data structures (objects) examples: shared memory, test-and-set,

queue..

Objects: an object is defined by its type

e.g.: the type of R is atomic register the type of the object defines a set of

possible states and a set of primitives operations e.g.: the state of the register is the value

stored, the primitives are read() write(v) processes access objects by

primitives operations

Objects: we consider here only atomic objects

a sequential specification defines the behavior of the object (a transition system)

linearizability (=atomicity) operations of concurrent processes may overlap,

but each operation appears to take effect instantaneously between its invocation and its response: the operation appears to be atomic

crashes: if a process crashes between an invocation and

the corresponding response the operation completes or aborts

every invocation by correct processes terminates

Example: atomic register States : the value stored ( initially) Operations: read() and write(v) Sequential specification:

read() returns the value stored write(v) changes the state of the register (the

new state is v)

Linearizability: each time interval between a request / answer of an operation can be reduced to a point such that the history of read/write satisfies the specification

Atomic register With only one writer linearizability is

here equivalent to: a read returns the last value written

if a read is concurrent with a write the read returns either the previous written value or the value of a concurrent write()

if a read operation r precedes another read operation r' then r' cannot return a value written before the one returned by r

can be generalized to multi-writer atomic registers

Linearizable

Write 1 Write 00

Read 0

Write 1 Write 00

Read 1

Linearizable?

Write 10

Read 1 Read 0

impossible

Another example

consensus: sequential specification

propose(0) propose(1)

decide(1)/propose(*)decide(0)/propose(*)

Another example

RMWRMW(r register, f function) returns value

previous := rr :=f(r)return previous

from RMW we get test-and-set, swap, compare-and-swap.

Implementation Given some objects O1, …, Om and

processes p1, …, pn is-it possible to implement another object O? Wait-free implementation:

the implementation is correct (in an intuitive sense)

every invocation from correct processes terminates

moreover a correct process can always terminate its invocation with only its own steps (with objects O1,…,Om)

Wait-free

Wait-free implementation As each process can always finish the

work alone, a wait-free implementation tolerate any number of (crash failure)

very strong assumption!

Wait-free implementations Consider k-consensus (i.e. consensus

between k processes) Let the consensus number for object X be

the largest k such that k-consensus can be implemented with X and atomic registers

(clearly if consensus number for O is strictly greater than consensus number for O', there is no implementation for O using only O')

Wait-free implementations

Results registers have consensus number equals

to 1 (FLP) test-and-set has consensus number

equals to 2 … for each n there some objects with

consensus number n

Example

FIFO queue:decide(v) returns val

prefer[P]:=vif deq(q) =

then return prefer[P]else return prefer[Q]

With FIFO and registers it is possible to get 2-consensus but not 3-consensus

Results

Universality of consensus (Herlihy):the n-consensus is universal in a system of

n processes: every object shared by n processes can be (wait-free) implemented with n-consensus and registers

(principle of the proof: with help of a n-consensus processes agree on the history of the object)

Plan Objects

shared memory model linearizability wait-free implementation Main results: universality of consensus

Message passing failure detectors shared memory implementation object implementation Consensus Hierarchy with failure detectors

Conclusion

Message passing The previous results prove that generally (at

least) objects with consensus number >1 cannot be implemented with only registers

Instead of sharing data structures it is interesting to consider message passing models message passing: processes don't share data but

can send and receive messages (Note that message passing could be defined in

the previous general framework– communication channels are then the shared data structures)

Message passing model Processes communicate by messages Communication is asynchronous (no bound

on communication delays) Communication is point-to-point and reliable Processes can fail by crashing

Message passing models are suitable and natural for networks

(shared objects models are more suitable for hardware)

Message passing

In message passing it is interesting to implement objects: objects are easier to work with some objects are natural in message

passing models (e.g. registers consensus)

Atomic register: practical point of view

Data server Ensure safety properties

If a value is written it is available (even if the writer disappears)

When a process ends its write() then all next read() will return this value (or a value written later) –note that the writer knows when the write

ends

Shared register implementation With only one reader and one writer and a

majority of correct processes (sketch): for the k-th write

to write(v): the writer sends (v,k) to all processes and waits for receiving an "ack" from a majority of processes.

to read(): the reader asks all processes and waits for receiving an answer (v,k) from a majority of processes; the value read is the value with the greatest k

when a process receives (v,k) from the writer it stores (v,k) and then sends an "ack" to the server

when a process receives a query from the reader it answers with the stored (v,k).

It works… because:

by the majority assumption there is always at least one process that participates to the last write and the read. then the read returns the last written value (but this implementation is not really atomic: if the

writer crashes during a write, next reads could returns the previous value or the new one. It is not very difficult to fix it: the reader always value with

maximal timestamp)

(some classical algorithms enables to implement general atomic registers from atomic register with one reader and one writer)

Implementation issues

in message passing there is no implementation of consensus (even if at most one process can crash)

the implementation of registers needs to have a majority of correct processes

Then … failure detectors The impossibility results come from

crashes (without failure all these problems are easy to solve).

Then: add oracles giving (possibly unreliable) information about crashes. what information about crashes of

processes enable to solve the problem? what information about crashes is

needed?

Failure detectors distributed "oracle" F:

at each time t a process can ask the failure detector and gets an answer (generally the answer is a list of processes

suspected to be dead) the output is not the same at each process

the output of failure detector F depends only on the history of crashes (not on the states of processes).

Example: perfect failure detector output: lists of suspected processes

if p is in the list for q then p is crashed if p is crashed then p will eventually belong to

the list of suspected processes of q

Failure detector comparison

Reduction: Failure detector F is weaker than failure

detector F' (F≤F') if F can be implemented from F'

≤ defines a partial order

Minimal Failure Detector

Given a problem P, F is a minimal failure detector for P if and only if With help of F, P can be solved if F' enables to solve P then F ≤ F'

Then if F is a minimal failure detector for P: F encapsulates the information about

crashes needed to solve P

Minimal Failure Detector Why look for the minimal failure detector?

find the needed information about crashes compare problems: if the minimal failure

detector for P is weaker than the minimal failure detector for P' then P is easier than P'

(from a practical point of view the knowledge of the minimal failure detector helps to find the assumptions on the underlying system to solve the problem)

Then to implement Objects: In message passing

for each object O find the minimal failure detector to implement O

from the comparison between these failure detectors we get an hierarchy on these objects

Then we get 2 hierarchies on objects consensus number as defined before minimal failure detector needed for the

object

S-register

Begin with registers (consensus number =1) S-register is an atomic register in which

only processes in S can read or write (but all processes may participate to its implementation)

Weakest failure detector

with a majority of correct processes atomic registers can be implemented without failure detector

but without a majority of correct processes? Failure detector Σ

Failure detector ΣS

ΣS(p,t) (output for process p of failure detector ΣS at time t) is a list of trusted processes. (q Є ΣS(p,t) means that p considers that q is not dead at time t)

Intersection: for each process p, q in S, for each time tout t , t’ : ΣS(p,t) ΣS(q,t’) is not empty (at least one process is trusted by p and q)

Completeness: There is a time t such that for each correct process in S for each time t’>t ΣS(p,t’) contains only correct processes

Remarks

with a majority of correct processes ΣS can be implemented in asynchronous systems.

ΣS gives a kind of quorum (a quorum is a family of sets such that two elements of the family always have a non empty intersection).

Theorem

ΣS is the weakest failure detector to implement S-register

sufficient part: adapt the previous algorithm

necessary part: more difficult…

S-Consensus S is a set of processes S-consensus

processes in S propose value and have to (irrevocably) decide. The decision has to ensure: Validity: the decision value has been

proposed Agreement: if p and q decide they decide

the same value Termination: every correct process

eventually decides

ΩS

ΩS(p,t) (output for p of failure detector ΩS at time t) is a process (the leader)

Eventual leader election: there is a time t, there is a correct process l, such that for every correct process p in S for all time t’>t ΩS (p,t’)=l

intuitively: after some time all processes agree on the same leader forever

Theorem

ΣS*ΩS is the weakest failure detector for S-consensus.

(ΣS*ΩS outputs both ΣS and ΩS)

For the proof

(necessary condition) Adaptation of the proof of Chandra,

Hadzilacos et Toueg: from an S-consensus algorithm using a failure detector, implement ΩS

With reliable broadcast and S-consensus implement S-register, (then use the previous theorem)

For the proofSufficient conditionprocess in Sforever

C:=1 +r mod nSend(Coord, v,r) to Cwait for receiving (One,*,r) from C or suspect C in ΩS

if receeived (One,w,r) then FromCoord:=w else undefSend(Keep,FromCoord,r) to allwait for receiving (Two,*,r) form all processes in ΣS If there only one value v received

decide this value vsend (decide,v) to allstop

else if received only 2 values (w and undef) then v:=w

all processes When received (Coord,*,k) for the first time

(let (Coord,x,k) this message ) send (One,x,k) to all processes in S

When received (Keep,*,k) for the first time, (let (Keep,x,k) this message ) send Two,x,k) to all processes in S

k-consensus

k-consensus = consensus between any subset of k processes

Result: for 2<=k<=n: The weakest failure

detector for k-consensus is Σ*Ω

proof (idea):

consider case k=2 From the previous results:

the weakest failure detector for 2-consensus is the set of ΣS*ΩS for all subsets with 2 elements

Proof

From these ΣS (S is the set of subsets with two elements) atomic registers can be implemented then we get Σ

From these ΩS (S is the set of subsets with two elements) it is possible to implement Ω: let G=(X,E) the graph where X is the set of

processes, and (p,q)ЄE if there is x such that q is an eventual leader pour Ω p,x. Consider the strongly connected components of: there is an unique sink connected component and this sink contains (eventually) only correct processes.

p q

p has q as leader

the sink

Proof

(sketch)From this we deduce an algorithm for

Ω :all processes approximate this graph and compute the sink: the output of the emulated failure detector is this sink. Eventually, this sink contains only correct processes. (then extract the same leader in this sink)

Then we get Ω

Corollary

If the consensus number of atomic object T is 2:

Then: The weakest failure detector for T is

Σ*Ω Every failure detector implementing T

implements any object. (in other word T is universal for all n)

Corollary

Concerning message passing models with failure detectors there is only two classes for objects: no consensus k=1 (atomic registers Σ ) k>1 then consensus for every n (Σ*Ω )

Conclusion In shared memory objects are given (by hardware)

and can be compared with consensus number for example: no implementation of compare-&-swap

with test-&-set and registers In message passing with failure detectors objects are

implemented and there is (essentially) two classes (with consensus and without consensus) all objects with consensus number>1 are equivalent!

Implementation with shared objects and implementation in message passing with failure detectors are not the same!

Conclusion…

Mémoire transactionnelle: Abortable objects Rendre atomique des séquences de code

La hiérarchie des détecteurs de défaillances Mémoire partagée – message passing

K-set agreement Le plus petit faible des plus faibles?

Détecteurs de défaillances, mémoire partagée/passages de messages

Documents

Transcript of Détecteurs de défaillances, mémoire partagée/passages de messages