Idit Keidar

64
Idit Keidar, Principles of Reliable Distributed Systems, Technion EE 1 Principles of Reliable Distributed Systems Lecture 11: Atomic Shared Memory Objects & Shared Memory Emulations Idit Keidar

description

Principles of Reliable Distributed Systems Lecture 11: Atomic Shared Memory Objects & Shared Memory Emulations. Idit Keidar. Material. Attiya and Welch , Distributed Computing Ch. 9 & 10 Nancy Lynch, Distributed Algorithms Ch. 13 & 17 Linearizability slides adapted from Maurice Herlihy. - PowerPoint PPT Presentation

Transcript of Idit Keidar

Page 1: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE1

Principles of Reliable Distributed Systems

Lecture 11: Atomic Shared

Memory Objects & Shared Memory Emulations

Idit Keidar

Page 2: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE2

Material

• Attiya and Welch, Distributed Computing– Ch. 9 & 10

• Nancy Lynch, Distributed Algorithms– Ch. 13 & 17

• Linearizability slides adapted from Maurice Herlihy

Page 3: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE3

Shared Memory Model

• All communication through shared memory!– No message-passing.

• Shared memory registers/objects.

• Accessed by processes with ids 1,2,…

• Note: we have two types of entities: objects and processes

Page 4: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE4

Motivation• Multiprocessors with shared memory• Multi-threaded programs• Distributed shared memory (DSM)• Abstraction for message passing systems –

we will see how to:– Emulate shared memory in message passing

systems– Use shared memory for consensus and state

machine replication

Page 5: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE5

Linearizability (Atomicity)Semantics for Concurrent

Objects

Page 6: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE6

FIFO Queue: Enqueue Method

q.enq( )

Process

Page 7: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE7

FIFO Queue: Dequeue Method

q.deq()/

Process

Page 8: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE8

Sequential Objects

• Each object has a state– Usually given by a set of fields– Queue example: sequence of items

• Each object has a set of methods– Only way to manipulate state– Queue example: enq and deq methods

Page 9: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE9

Methods Take Time

time

Method call

invocation 12:00

q.enq(...)

response 12:01

void

Page 10: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE10

Split Method Calls into Two Events

• Invocation– Method name & args– q.enq(x)

• Response– Result or exception– q.enq(x) returns void– q.deq() returns x– q.deq() throws empty

Page 11: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE11

A Single Process (Thread)

• Sequence of events

• First event is an invocation

• Alternates matching invocations and responses

• This is called a well-formed interaction

Page 12: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE12

Concurrent Methods Take Overlapping Time

time

Method call Method call

Method call

Page 13: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE13

Concurrent Objects

• What does it mean for a concurrent object to be correct?

• What is a concurrent FIFO queue?– FIFO means strict temporal order– Concurrent means ambiguous temporal order

• Help!

Page 14: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE14

Sequential Specifications

• Precondition, say for q.deq(…)– Queue is non-empty

• Postcondition:– Returns & removes first item in queue

• You got a problem with that?

Page 15: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE15

Concurrent Specifications

• Naïve approach– Object has n methods– Must specify O(n2) possible interactions– Maybe more

If the queue is empty and then enq begins and deq begins after enq(x) begins but before enq(x) ends and then enq returns before deq then…

• Linearizability: same as it ever was

Page 16: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE16

Linearizability

• Each method should –– “Take effect”

• Effect defined by the sequential specification

– Instantaneously• Take 0 time

– Between its invocation and response events• Real-time order• Pending method (invocation and no response) can

either occur after its invocation or not at all

Page 17: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE17

Linearization

• A linearization of a concurrent execution is1. A sequential execution

• Each invocation is immediately followed by its response

• Satisfies the object’s sequential specification

2. Looks like • Responses to all invocations are the same as in • Responses to pending invocations in may be added

3. Preserves real-time order• Each invocation-response pair occurs between the

corresponding invocation and response in

Page 18: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE18

Linearizability and Atomicity

• A concurrent execution that has a linearization is linearizable

• An object that has only linearizable executions is atomic

Page 19: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE19

Why Linearizability?

• “Religion”, not science

• Scientific justification:– Facilitates reasoning– Nice mathematic properties

• Common-sense justification– Preserves real-time order– Matches my intuition (sorry about yours)

Page 20: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE20

Example

time

q.enq(x)

q.enq(y) q.deq(x)

q.deq(y)

time

Page 21: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE21

Example

time

q.enq(x)

q.enq(y)

q.deq(y)

Page 22: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE22

Example

time

q.enq(x)

q.deq(x)

time

Page 23: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE23

Example

time

q.enq(x)

q.enq(y)

q.deq(y)

q.deq(x)

time

Page 24: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE24

Read/Write Variable Example

time

read(1)write(0)

write(1)

time

read(0)

write(1) happened

after write(0)

Page 25: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE25

Read/Write Variable Example

time

read(1)write(0)

write(1)

write(2)

time

read(1)write(1) already

happened

Page 26: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE26

Read/Write Variable Example

time

read(1)write(0)

write(1)

write(2)

time

read(2)

Page 27: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE27

Concurrency

• How much concurrency does linearizability allow?

• When must a method invocation block?

• Focus on total methods– Defined in every state– Why?

Page 28: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE28

Concurrency

• Question: when does linearizability require a method invocation to block?

• Answer: never!

• Linearizability is non-blocking

Page 29: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE29

Non-Blocking Theorem

If method invocationA q.invoc()

is pending in linearizable history H, then there exists a responseA q:resp()

such thatH + A q:resp()

is legal

Page 30: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE30

Note on Non-Blocking

• A given implementation of linearizability may be blocking

• The property itself does not mandate it– For every pending invocation, there is always a

possible return value that does not violate linearizability

– The implementation may not always know it…

Page 31: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE31

Atomic Objects

• An object is atomic if all of its concurrent executions are linearizable

• What if we want an atomic operation on multiple objects?

Page 32: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE32

Serializability

• A transaction is a finite sequence of method calls

• A history is serializable if transactions appear to execute serially– It is strictly serializable if the order is also

compatible with real-time

• Used in databases, more recently, transactional memory

Page 33: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE33

Serializability is Blocking

x.read(0)

y.read(0) x.write(1)

y.write(1)

deadlock

Transaction

Transaction

Page 34: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE34

Comparison

• Serializability appropriate for– Fault-tolerance– Multi-step transactions

• Linearizability appropriate for– Single objects– Multiprocessor synchronization

Page 35: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE35

Critical Sections

• Easy way to implement linearizability– Take sequential object– Make each method a critical section

• Like synchronized methods in Java™

• Problems?– Blocking– No concurrency

Page 36: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE36

Linearizability Summary

• Linearizability– Operation takes effect instantaneously between

invocation and response

• Uses sequential specification– No O(n2) interactions

• Non-Blocking– Never required to pause method call

• Granularity matters

Page 37: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE37

Atomic Register Emulation in a Message-Passing System

[Attiya, Bar-Noy, Dolev]

Page 38: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE38

Distributed Shared Memory (DSM)

• Can we provide the illusion of atomic shared-memory registers in a message-passing system?

• In an asynchronous system?

• Where processes can fail?

Page 39: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE39

Liveness Requirement

• Wait-freedom: every operation by a correct process p eventually completes – In a finite number of p’s steps

• Regardless of steps taken by other processes– In particular, the other processes may fail

or take any number of steps between p’s steps

– But p must be given a chance to take as many steps as it needs. (Fairness).

Page 40: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE40

Register

• Holds a value

• Can be read

• Can be written

• Interface: – int read(); /* returns a value */

– void write(int v); /* returns ack */

Page 41: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE41

Take I: Failure-Free Case

• Each process keeps a local copy of the register

• Let’s try state machine replication– Step1: Implement atomic broadcast– How?

• Recall: atomic broadcast service interface:– broadcast(m)– deliver(m)

Page 42: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE42

Emulation with Atomic Broadcast (Failure-Free)

• Upon client request (read/write)– Broadcast (abcast) the request

• Upon deliver write request – Write to local copy of register– If from local client, return ack to client

• Upon deliver read request– If from local client, return local register value to client

• Homework questions: – Show that the emulated register is atomic– Is broadcasting reads required for atomicity?

Page 43: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE43

What If Processes Can Crash?

• Does the same solution work?

Page 44: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE44

ABD: Fault-Tolerant Emulation[Attiya, Bar-Noy, Dolev]

• Assumes up to f<n/2 processes can fail

• Main ideas: – Store value at majority of processes before

write completes

– read from majority

– read intersects write, hence sees latest value

Page 45: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE45

Take II: 1-Reader 1-Writer (SRSW)

• Single-reader – there is only one process that can read from the register

• Single-writer – there is only one process that can write to the register

• The reader and writer are just 2 processes– The other n-2 processes are there to help

Page 46: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE46

Trivial Solution?

• Writer simply sends message to reader – When does it return ack?– What about failures?

• We want a wait-free solution: – If the reader (writer) fails, the writer (reader)

should be able to continue writing (reading)

Page 47: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE47

SRSW Algorithm: Variables

• At each process:– x, a copy of the register– t, initially 0, unique tag associated with latest

write

Page 48: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE48

SRSW AlgorithmEmulating Write

• To perform write(x,v)– choose tag > t– set x ← v; t ← tag– send (“write”, v, t) to all

• Upon receive (“write”, v, tag) – if (tag > t) then set x ← v; t ← tag fi– send (“ack”, v, tag) to writer

• When writer receives (“ack”, v, t) from majority (counting an ack from itslef too)– return ack to client

Page 49: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE49

SRSW AlgorithmEmulating Read

• To perform read(x,v)– send (“read”) to all

• Upon receive (“read”) – send (“read-ack”, x, t) to reader

• When reader receives (“read-ack”, v, tag) from majority (including local values of x and t)– choose value v associated with largest tag– store these values in x,t– return x

Page 50: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE50

Does This Work?

• Only possible overlap is between read and write– why?

• When a read does not overlap any write –– It reads at least one copy that was written by the latest

write (why?)– This copy has the highest tag (why?)

• What is the linearization order when there is overlap between read and write?

• What if 2 reads overlap the same write?

Page 51: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE51

Example

time

read(1) read(?)

write(1)

time

write(1) already

happened

finds a copy that was written

does not find a written copy

but local copy written by

read

Page 52: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE52

Wait-Freedom

• Only waiting is for majority of responses

• There is a correct majority

• All correct processes respond to all requests– Respond even if the tag is smaller

Page 53: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE53

Take III: n-Reader 1-Writer (MRSW)

• n-reader – all the processes can read

• Does the previous solution work?

• What if 2 reads by different processes overlap the same write?

Page 54: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE54

Example

time

read(1)

read(?)

write(1)

time

write(1) already

happened

finds a copy that was written

does not find a written

copy,returns 0

Page 55: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE55

MRSW Algorithm Extending the Read

• When reader receives (“read-ack”, v, tag) from majority – choose value v associated with largest tag– store these values in x,t– send (“propagate”, x, t) to all (except writer)

• Upon receive (“propagate”, v, tag) from process i– if (tag > t) then set x ← v; t ← tag fi– send (“prop-ack”, x, t) to process i

• When reader receives (“prop-ack”, v, tag) from majority (including itself)– return x

Page 56: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE56

The Complete Read

S1S1 S1

S2

Sn

.

.

.

S1

S2

Sn

.

.

.

S1

(“read”) (“read-ack”,v, t)

Phase 1: Read Phase 2 : Write-BackMulti-reader only

read() return

(“propagate”, v, t)(“prop-ack”)

Page 57: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE57

Take IV: n-Reader n-Writer (MRMW)

• n-writer – all the processes can write to the register

• Does the previous solution work?

Page 58: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE58

Playing Tag

• What if two writers use the same tag for writing different values?

• Need to ensure unique tags– That’s easy: break ties, e.g., by process id

• What if a later write uses a smaller tag than an earlier one?– Must be prevented (why?)

Page 59: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE59

MRMW Algorithm Extending the Write

• To perform write(x,v)– send (“query”) to all

• Upon receive (“query”) from i– send (“query-ack”, t) to i

• When writer receives (“query-ack”, tag) from majority (counting its own tag)– choose unique tag > all received tags– continue as in 1-writer algorithm

• What if another writer chooses a higher tag before write completes?

Page 60: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE

60

The Complete Write

S1S1 S1

S2

Sn

.

.

.

S1

S2

Sn

.

.

.

S1

(“query”) (“query-ack”, t)

Phase 1: ReadMulti-writer only

Phase 2: Write

write(v) ack

(“write”, v, t) (“ack”)

Page 61: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE61

How Long Does it Take?

• The write emulation– Single-writer: 2 rounds (steps)– Multi-writer: 4 rounds (steps)

• The read emulation– Single-reader: 2 rounds (steps)– Multi-reader: 4 rounds (steps)

Page 62: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE62

What if A Majority Can Fail?

• You guessed it!

• Homework question

Page 63: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE63

Can We Emulate Every Atomic Object the Same Way?

• We only emulated a read/write object

• Think of a general object type, with some data members and some methods

• Can we support it the same way?

Page 64: Idit Keidar

Idit Keidar, Principles of Reliable Distributed Systems, Technion EE64

R/W Registers vs. Consensus

• ABD works even if the system is completely asynchronous

• In Paxos, there is no progress when there are multiple leaders

• Here, there is always progress – multiple writers can write concurrently– One will prevail (which?)