Distributed Systems Replication. Why Replication Replication is Maintenance of copies at multiple...

Distributed Systems

Replication

Why Replication • Replication is Maintenance of copies at multiple sites

• Enhancing Services by replicating data

• Performance Enhancement• Example: Workload is shared between the servers by binding all the

server IP addresses to the site’s DNS name. A DNS lookup of the site results in one of the servers’ IP addresses being returned, in a round-robin fashion.

– Fault Tolerance• Under the fail-stop model, if up to f of f+1 servers crash, at least one

remains to supply the service.

–Increased Availability• Service may not be available when servers fail or when the network is

partitioned.

Object Replication

• Organization of a distributed remote object shared by two different clients.

State Machine

4

Client Server

State Machine

5

Client Server

Two Subproblems

• Your boss says to you, “Our system is too slow,

make it faster.”

• You decide that replication of servers is the

answer. What do you do next? What are the

questions that need to be answered?

– Where to place servers?

– Where to place content?

Placing Servers

• Given a set of N locations, how do you place the K servers?– What are the goals?

– What is the metric that is being optimized?

• One algorithm, each time you place a server, minimize the average remaining distance to clients.– What is “distance”?

• Can we ignore the client locations?– Yes, if they are uniformly distributed.

• Other ideas for algorithms?

Permanent Replicas

Initial set of replicas– Other replicas can be created from them

Example: Web site horizontal distribution– Replicate Web site on a limited number of machines

on a LAN Distribute request in round-robin

– Replicate Web site on a limited number of machines on a WAN (mirroring) Clients choose which sites to talk to

System model for replication

• Each logical object is implemented by a collection of physical copies called replicas

– the replicas are not necessarily consistent all the time (some may have received updates, not yet conveyed to the others)

• Replica Managers

– an RM contains replicas on a computer and access them directly

– RMs apply operations to replicas recoverably

– objects are copied at all RMs unless stated otherwise

– static systems are based on a fixed set of RMs

Basic Mode of Replication

Replication Transparency User need not know that multiple physical copies

of data exist. Replication Consistency

Data is consistent on all of the replicas

Client Front EndRM

RM

RMClient Front End

Client Front End

Service

server

server

server

11

A basic architectural model for the management of replicated data

FE

Requests andreplies

C

ReplicaC

Service

ClientsFront ends

managers

RM

RMFE

RM

•

Clients see a service that gives them access to logical objects, which are in fact replicated at the

RMs

Clients request operations: those without updates are called read-only requests the others are called update requests (they may

include reads)

Clients’ request are handled by front ends. A front end makes replication

transparent.

Replication Managemer

• Request Communication

– Requests can be made to a single RM or to multiple RMs

• Coordination: The RMs decide

– whether the request is to be applied

– the order of requests

• FIFO ordering: If it issues r then r’, then any correct RM handles r and then r’.

• Causal ordering: If the issue of r happened before the issue of r’, then any correct RM handles r and then r’.

Data-Centric Consistency Models

• The general organization of a logical data store, physically distributed and replicated across multiple processes.

Consistency Model

• Contract between data store & processes

– Rules for processes to obey

– Data store’s expected behavior

A consistency model effectively restricts the values that a read operation can return …

Notation

• Processes execute to the right as time progresses.• The notation W1(x)a means that the process wrote the value

‘a’ to the variable x.• The notation R2(x)a means that the process read the value ‘a’

from the variable x.• The subscript is often dropped.

Strict Consistency

Any read on a data item x returns a value corresponding to the result of the most recent write on x

a. A strictly consistent store.b. A store that is not strictly consistent.

Impossible to implement in a distributed system !(assumes all writes are instantaneously visible to all processes)

Strict Consistency• Determination of “most recent” is unambiguous• Suppose that x is stored only on host B• At time t1, a process on host A “reads” x

– … thereby sending a message to host A

• At time t2 > t1, a process on B “writes” x• Strict consistency The process on host A must obtain

the previous value of x– What happens if t2 – t1 = 3 nsec, and A-to-B is a 3-meter

segment of optical fiber ??

Linearizability • A replicated shared object service is linearizable if for any

execution, there is some interleaving of operations issued by all

clients and:

– meets the specification of a single correct copy of objects

– is consistent with the real times at which each operation

occurred during the execution

• The real-time requirement means clients should receive up-to-date information

– but may not be practical due to difficulties of synchronizing clocks

Sequential Consistency • A less strict criterion is sequential consistency

• A replicated shared object service is sequentially consistent if for any execution, there is some interleaving of clients’ operations that:

– meets the specification of a single correct copy of objects

– is consistent with the program order in which each individual client executes those operations.

• The criterion does not require absolute time or total order. Only that for each client the order in the sequence be consistent with that client’s program order.

20

Sequential consistency

Client 1:Client 2:

setBalanceB(x,1)

getBalanceA(y)

getBalanceA(x)

setBalanceA(y,2)

•

this is possible under a naive replication strategy, even if neither A or B fails -

the update at B has not yet been propagated to A when Client 2 reads it

it is not linearizable because Client 2’s getBalance is after Client 1’s setBalance in real time.

The following execution is sequentially consistent but not linearizable

Sequential Consistency Diagrams

a) A sequentially consistent data-store – the “first” write occurred after the “second” on all replicas.

b) A data-store that is not sequentially consistent – it appears the writes have occurred in a non-sequential order, and this is NOT allowed.

In other words: all processes see the same interleaving set of operations, regardless of what that interleaving is.

Sequential Consistency

• Three concurrently executing processes.

Process P1Process P2Process P3

x = 1;

print ( y, z);

y = 1;

print (x, z);

z = 1;

print (x, y);

Sequential Consistency

• Four valid execution sequences for the processes of the previous slide. The vertical axis is time.

x = 1;

print ((y, z);

y = 1;

print (x, z);

z = 1;

print (x, y);

Prints: 001011

(a)

x = 1;

y = 1;

print (x,z);

print(y, z);

z = 1;

print (x, y);

Prints: 101011

(b)

y = 1;

z = 1;

print (x, y);

print (x, z);

x = 1;

print (y, z);

Prints: 010111

(c)

y = 1;

x = 1;

z = 1;

print (x, z);

print (y, z);

print (x, y);

Prints: 111111

(d)

Casual Consistency

• Writes that are potentially casually related must

be seen by all processes in the same order.

• Concurrent writes may be seen in a different

order on different machines.

Casual Consistency

• A sequence allowed with a casually-consistent store– … but not with sequentially or strictly consistent store.

W2(x)b & W1(x)c are concurrent events

Order of operations is non-deterministic… but all processes agree what it is.

Causal Consistency Example

a) Violation of causal-consistency – P2’s write is related to P1’s write due to the read on ‘x’ giving ‘a’ (all processes must see them in the same order).

b) A causally-consistent data-store: the read has been removed, so the two writes are now concurrent. The reads by P3 and P4 are now OK.

FIFO Consistency• Necessary Condition:

Writes done by a single process are seen by all other processes in the order in which they were issued,

• … but writes from different processes may be seen in a different order by different processes.

Pipelined RAM (PRAM): A process does not have to stall waiting for a write to complete

(Process, Seq#) tag for each write

FIFO Consistency

• A valid sequence of events of FIFO consistency

FIFO Consistency

• Statement execution as seen by the three processes from previous slide.

x = 1;

print (y, z);

y = 1;

print(x, z);

z = 1;

print (x, y);

Prints: 00

(a)

x = 1;

y = 1;

print(x, z);

print ( y, z);

z = 1;

print (x, y);

Prints: 10

(b)

y = 1;

print (x, z);

z = 1;

print (x, y);

x = 1;

print (y, z);

Prints: 01

(c)

Different order seen by each process.

FIFO Consistency

• Two concurrent processes

Process P1Process P2

x = 1;

if (y == 0) kill (P2);

y = 1;

if (x == 0) kill (P1);

FIFO Both processes can be killed !

Sequential 6 interleavings … but at most 1 process is killed

Weak Consistency• Accesses to synchronization variables associated with a

data store are sequentially consistent– All processes see them in the same order

• No operation on a synchronization variable is allowed to be performed until all previous writes have been completed everywhere– Synchronization “flushes the pipeline” …– After an update, a process can force others to see it

• No read or write operation on data items are allowed to be performed until all previous operations to synchronization variables have been performed.– Sync. before a read guarantees that a process sees the most

recent values

Weak Consistency

a) A valid sequence of events for weak consistency.b) An invalid sequence for weak consistency.

P2 and P3 have not yet been synchronized …

Release Consistency

• A valid event sequence for release consistency.

Distinguish entry/exit in a critical section

•acquire•release•barrier

release updates of protected data are propagated

-Does not necessarily import updates from other copies

Protected shared data

Release Consistency• Before a read or write operation on shared data is

performed, all previous acquires done by the process must have completed successfully.

• Before a release is allowed to be performed, all previous reads and writes by the process must have completed

• Accesses to synchronization variables are FIFO consistent (sequential consistency is not required).

Processes must use acquire/release pairs … properly !

Eager vs Lazy (timestamp-based) implementation

Release Consistency• Simple-minded implementation:

– Central sync. manager

– After a process is granted ‘Acquire’:• It can perform reads & writes locally

• … without propagating the updates

– After ‘release’, the updates are propagated

– … and each recipient responds with an ACK

– After all processes ACK, the sync. manager is informed of the ‘Release’

– Acquire & Release operations on different locks occur independently of one another

Eager release consistency

Release Consistency• Upon Release, nothing is sent out …• Upon Acquire, the process has to get the most recent

values of the required data– Timestamps are used to determine which data items have to

be actually transmitted

Lazy release consistency

Repeated acquire-release pairs by the same process are free … in the absence of outside competition

Entry Consistency

• A valid event sequence for entry consistency.

Each shared data item needs to be associated with a sync. variable (lock)

Current owner per sync. variable

Acquire makes visible all remote changes to the guarded data

Can be done implicitly (by the run-time system)

Only data guarded by a lock are kept consistent

Entry Consistency• An acquire access of a synchronization variable is not allowed

to perform with respect to a process until all updates to the guarded shared data have been performed with respect to that process.– All remote changes to the guarded data are made visible

• Before an exclusive mode access to a synchronization variable by a process is allowed to perform with respect to that process, no other process may hold the synchronization variable, not even in nonexclusive mode.

• After an exclusive mode access to a synchronization variable has been performed, any other process's next nonexclusive mode access to that synchronization variable may not be performed until it has performed with respect to that variable's owner.

Entry Consistency• Associating a set of shared data items with a lock

reduces the overhead– … since only a few items need to be synchronized– … multiple critical sections with disjoint shared data can

execute concurrently

• Each lock has a current owner– … which may enter & leave a critical section repeatedly

without any network messages– Other processes have to send messages to the owner

• Exclusive vs non-exclusive mode

Proper association of data items with locks ?

Distributed objects + associated sync. variablesImplicit acquire/release operations

Summary of Consistency Modelsa) Consistency models not using synchronization operations.

b) Models with synchronization operations.

ConsistencyDescription

StrictAbsolute time ordering of all shared accesses matters.

LinearizabilityAll processes must see all shared accesses in the same order. Accesses are furthermore ordered according to a (nonunique) global timestamp

SequentialAll processes see all shared accesses in the same order. Accesses are not ordered in time

CausalAll processes see causally-related shared accesses in the same order.

FIFOAll processes see writes from each other in the order they were used. Writes from different processes may not always be seen in that order

(a)

ConsistencyDescription

WeakShared data can be counted on to be consistent only after a synchronization is done

ReleaseShared data are made consistent when a critical region is exited

EntryShared data pertaining to a critical region are made consistent when a critical region is entered.

(b)

Eventual Consistency• The principle of a mobile user accessing different replicas of a

distributed database.

If no updates take place for some time, all replicas gradually converge to a consistent state …

Monotonic Reads

If a process reads the value of a data item x, any successive read on x by that process will always return that same value or a more

recent value.• The read operations performed by a single process P at two

different local copies of the same data store.a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.

Monotonic Reads

• The read operations performed by a single process P at two different local copies of the same data store.

a) A monotonic-read consistent data storeb) A data store that does not provide monotonic reads.

If a process has seen a value of x at time t, it will never see an older

value at a later time.

Example: -replicated mailboxes with

on-demand propagation of updates

WS(x1) is part of WS(x2)

Monotonic Writes

• The write operations performed by a single process P at two different local copies of the same data store

a) A monotonic-write consistent data store.b) A data store that does not provide monotonic-write consistency.

If an update is made to a copy, all preceding updates

must have been completed first.

Example: - s/w library

FIFO propagation ofupdates by each process

A write may affect only part of the state of a data item

No guarantee that x at L2 has the same value as x at L1 at the time W(x1) completed

Read Your Writes

a) A data store that provides read-your-writes consistency.

b) A data store that does not.

A write is completed before a successive read, no matter where

the read takes place

Negative examples:- updates of Web pages- changes of passwords

The effects of the previous write at L1 have not yet been propagated !

Read Your Writes

• A data store is said to provide read-your-writes consistency, if the following condition holds:– The effect of a write operation by a process on

data item x will always be seen by a successive read operation on x by the same process.

• Suppose your web browser has a cache.– You update your web page on the server.– You refresh your browser.– Do you have read-your-writes consistency?

Writes Follow Reads

a) A writes-follow-reads consistent data storeb) A data store that does not provide writes-

follow-reads consistency

Any successive write will be performed on a copy that is up-to-date with the value most recently

read by the process.

Example:- updates of a newsgroup:

Responses are visible only after the original posting has been received

Server-Initiated Replicas

• Counting access requests from different clients.

Server-Initiated Replicas• Counting access requests from different clients.

•Deletion threshold: del(S, F)•Replication threshold: rep(S, F)

Routing DB to determine “closest” server for client C

P := closest serverfor both C1 & C2

CntQ(P, F)

At each server:•Count of accesses

for each file•Originating clients

Extra care to ensure that at least one copy remains !

Dynamic decisions to delete/migrate/replicate file F to server S

Distributed Systems Replication. Why Replication Replication is Maintenance of copies at multiple...

Documents

Transcript of Distributed Systems Replication. Why Replication Replication is Maintenance of copies at multiple...