1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K....

17
1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR)
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    219
  • download

    1

Transcript of 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K....

Page 1: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

1

Dynamic Atomic Storage Without Consensus

Alex Shraer (Technion)

Joint work with:

Marcos K. Aguilera (MSR), Idit Keidar (Technion),

Dahlia Malkhi (MSR)

Page 2: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

2

The Goal

client

• Reliable replicated storage• Using unreliable components• Asynchrony - tolerate unpredictable network delays

server(process)

Page 3: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

3

Designing an Asynchronous Replicated System

• State machine replication (e.g., Paxos)– Any object– Impossible in asynchronous systems

• Atomic R/W Register [Attiya, Bar-Noy, Dolev 95] – Simple object: read( ), write(v)– Possible in asynchronous system– Atomic (linearizable)– Liveness: if #failures < #servers/2 then every operation

invoked on a correct server eventually completes.

Page 4: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

4

Breaking the Minority Barrier• Over a long period of time #failures < #servers/2

is not good enough• Reconfiguration!

– Increasing resilience by changing the set of servers– Example: 3 failures out of 5

• Semantics of Reconfigurable R/W register:– Atomic (linearizable)– Liveness: ?

A B C D E

Our first contribution:First "black box"

definition (in terms of user

interface)

Page 5: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Reconfigurable Register: User Interface• read() (returns a value)

• write(value) (returns OK)

• reconfig(c) (returns OK)– c is a set of changes (relative to current config.)– Each change is either (Add, pid) or (Remove, pid)– Example: c = {+C, +E, –D}

• Only processes that were successfully added can invoke ops

• Universe of processes (servers): – Unknown, unbounded, possibly infinite– At any given time, only a finite number has been added

change change change

Page 6: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Definitions• Current(t) – servers in the system at time t

– the “current configuration”

• AddPending(t) – servers whose Add is pending at t

• RemovePending(t) – servers whose Remove is pending at t

• Faulty(t) – servers that have crashed by t

• pi is active in an execution if– During the execution, pi does not crash– Some process invokes reconfig adding pi – No process invokes reconfig removing pi

Page 7: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Dynamic System Liveness• Static system: operations complete if #failures<#servers/2• What should this be in a dynamic system?

• Try #1: for every t, a minority of Current(t) is in Faulty(t)

What if processes crash while others are removed?

no operation is guaranteed to complete in new configuration!

• Try #2: for every t, a minority of Current(t) is in Faulty(t)RemovePending(t)

reconfig({–A})

OK

A B C

Page 8: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Adding Servers

Q: At time t0, who can crash from {A, B, ..., G}?A: minority of {A, B, ..., E}, and in addition,

– in this scenario G can crash– in a different scenario

F can crash

• Simple condition: any 2 servers can fail (fewer than |Current(t)|/2)

reco

nfig(

{+F})

reco

nfig

({+G

})

OK

OK

time t0

A

F

B

G

E

D

C

Page 9: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Dynamic Service Liveness

If #reconfigs invoked in the execution is finiteand at every time t in the execution,

fewer than |Current(t)|/2 processes out of Current(t)AddPending(t) are in Faulty(t)RemovePending(t)

Then:Eventually, every active process that was successfully added can invoke operations

Every operation invoked by an active process eventually completes

Page 10: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

10

Reconfigurable SolutionsMany previous solutions: All use consensus (or similar)

• State machine replication (Paxos)– Use state-machine to agree on set of servers

• Virtual Synchrony based solutions– e.g., [Yeger-Lotem, Keidar, Dolev 97]

• R/W register + reconfiguration service – [Lynch, Shvartsman 97], [Englert, Shvartsman 00]– Rambo [Lynch, Shvartsman 02] – Rambo II [Gilbert, Lynch, Shvartsman 03]– Long Lived Rambo [Georgiou, Musial, Shvartsman 04]

• Is consensus really necessary?

consensus to agree on next configuration

one designated “reconfigurer”

membership servicestronger than consensus

(equivalent to P)

Our second contribution:Consensus is NOT

needed!DynaStore - algorithm

for a completely asynchronous system

Page 11: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

“Old” and “New” Configurations• A reconfiguration transfers the state from a majority

of the old config. to a majority of the new config.

• What if there are concurrent reconfigurations ?

• Suppose that initial configuration is {A, B, C, D}– A invokes reconfig({+E}); C invokes reconfig({D})– A writes to {A, D, E}, a majority of {A, B, C, D, E}– C reads from {B, C}, a majority of {A, B, C}– No intersection Atomicity is violated!

• Simple solution: consensus on the sequence of configurations

• But how can we do this without consensus?

Page 12: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

The approach in DynaStore• For each configuration c, we use a (weak) snapshot

nextConfig(c) to store the next configuration

• (weak) snapshot objects are (easily) implemented in an asynchronous environment

• Processes update nextConfig(c) to suggest the next configuration after c (concurrent updates possible)

• Sequence of Established Configurations (simplified):– The initial configuration is established– If c is established, then the first snapshot update to nextConfig(c)

is the next established configuration after c

included in every scan from nextConfig(c)

Page 13: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

Transferring the State

• scan of nextConfig(c) returns a set of configs that follow c– if c is established, one config in the returned set is the

next established config after c

• scanning nextConfig for each returned config returns a further set, etc. this creates a DAG of configurations– This DAG contains the sequence of established configs

• A reconfiguration transfers state along all paths in the DAG– This guarantees that state is transferred along the

sequence of established configurations

Page 14: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

• Suppose that initial configuration is {A, B, C, D}• A invokes reconfig({+E}); C invokes reconfig({D})

• A updates nextConfig(C0) to C1

• A scans nextConfig(C0) to check for concurrent updates.Scan returns {C1}, i.e., no concurrent updates detected– C1 is the next established config after C0

• A’s state transfer: – Read from maj. of C0 and maj. of C1

– Write latest value found to maj. of C1

Example

C0

C1

{A, B, C, D, E}

{A, B, C, D}

Page 15: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

• Suppose that initial configuration is {A, B, C, D}• A invokes reconfig({+E}); C invokes reconfig({D})

• Concurrently, C updates nextConfig(C0) to C2 and scans it. Scan returns {C1, C2}, implying that A’s update was concurrent

• C updates nextConfig(C1) and nextConfig(C2) to C3. No concurrent updates detected– C3 is an established configuration

• C’s state transfer: – Read from maj. of each config on every path found from C0 to C3

– Write latest value found to maj. of C3

Example

C0

C1

{A, B, C, D, E}

{A, B, C, D}

C2{A, B, C}

C3{A, B, C, E}

Page 16: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

• Suppose that initial configuration is {A, B, C, D}• A invokes reconfig({+E}); C invokes reconfig({D})

• A invokes a write(newValue) operation in C1

• In this scenario, DynaStore guarantees:

1. Either C’s state transfer finds newValue in C1, or A’s write op discovers C3 and ends after writing newValue to maj. of C3

3. Read operations also traverse the DAG, and will find newValue on the path of established configurations, intersecting the write

Example

C0

C1

{A, B, C, D, E}

{A, B, C, D}

C2{A, B, C}

C3{A, B, C, E}

Page 17: 1 Dynamic Atomic Storage Without Consensus Alex Shraer (Technion) Joint work with: Marcos K. Aguilera (MSR), Idit Keidar (Technion), Dahlia Malkhi (MSR.

17

Conclusions

• First “black box” definition of dynamic R/W register – In terms of events visible to user– A natural failure model – resilience changes dynamically– Possibly useful for specifying other dynamic problems

• DynaStore: first asynch. dynamic storage protocol– Implements a Reconfigurable Atomic MWMR register– In a completely asynchronous system (consensus impossible)– Proves that R/W storage is really easier than consensus

(not only in a static system)