Synchronization Tanenbaum Chapter 5. Synchronization Multiple processes sometimes need to agree on...

46
Synchronization Tanenbaum Chapter 5
  • date post

    21-Dec-2015
  • Category

    Documents

  • view

    228
  • download

    0

Transcript of Synchronization Tanenbaum Chapter 5. Synchronization Multiple processes sometimes need to agree on...

Synchronization

Tanenbaum Chapter 5

Synchronization

• Multiple processes sometimes need to agree on order of a sequence of events.

• This requires some synchronization, which is more elaborate in distributed systems.

• Synchronization may be based on time (absolute or relative), leader election

• The aim is to make it global…

Clock Synchronization

• Execution of Make utility in a distributed system: The edited local version is created later than the object file according to the local clocks, although this was because of the discrepancy of local clocks.– When each machine has its own clock, an event that occurred after

another event may nevertheless be assigned an earlier time.

Time

Physical Clocks (1)

Computation of the mean solar day. • The period of earth’s rotation is not constant• Starting 1958 International Atomic Time (TAI) was accepted, counting the number transitions of Cesium

133 in an average solar second (9,192,631,770 transitions=1 second), one solar second is 1/86400 solar day, which is between to sun peak times in the sky. Averaged over 50 labs.

• Solar day length seems to changed because of atmospheric drag and tidal friction issues

Physical Clocks (2)

• TAI seconds are of constant length, unlike solar seconds. However leap seconds are introduced when necessary (about 3 msec in a day), to keep in phase with the sun, 1 sec in every 800 msec of discrepancy. So far, since 1958, 30 leap seconds are introduced…

• This is known as Universal Coordinated Time or UTC

Clock Synchronization Algorithms

• The relation between clock time and UTC when clocks tick at different rates.• In perfect world, C(t)=t, where t is the UTC, C(t) is value of the local clock, on all

machines. With modern timer chips, the relative error is 10-5.• Two clocks needs to be synchronized according to maximum drift rate for each

clock. • If difference between two clocks is to be limited to , then a resynchronization is

required every /2 seconds, if the is the max drift rate. 2, when clocks drifts in opposite direction.

Cristian's Algorithm

Getting the current time from a time server.

• The time should never set to smaller value, as it will cause consistency problems. So, a large discrepancy should be consumed slowly, by adjusting numb of msec to be added per clock interrupt.

• (T1-T0-I)/2 is the one way propagation time, counting for the server’s request (interrupt) handling time I. Cristian suggest taking average of the delays in the system… Note that the time server is passive.

The Berkeley Algorithm: the time server is active and poling the clients.

a) The time daemon sends its time and asks all the other machines for their clock discrepancy valuesb) The answers from the machines is received and an average time discrepancy is computed, for each

computer…c) Then, the time daemon tells everyone else how to adjust their clockd) The daemons’s time need to be set periodically by the operator or radio time servers…

Distributed Clock synchronization

• Cristian’s and Berkeley’s algorithms are centralized• In decentralized distributed algorithms case, every

machine should periodically broadcast its time and collects time from other peers.

• Every peer comes to conclusion about the average time, using the same algorithm distributedly, taking into account the communication latencies…

• In the Internet, a so called Network Time Protocol-NTP is used, which is assumed to achieve 1-50 msec accuracy.

Network Time Protocol-NTP RFC 1305 defines the NTP The recent implementations provide accuracy of up to 1

microseconds It is designed to execute on top of IP and UDP NTP is organized into multiple Tree structures, with primary servers

at the root the secondary servers at the internal nodes NTP design goals: accurate UTC synchronization, Survival despite

the losses of connectivity, allow frequent resynchronization, protect against malicious interference

NTP communicates clock offset (diff between two clocks), round-trip delay, dispersion (max error)

Statistical technique is used, based on multiple comparisons of timing information exchanged

It may operate in three modes: multicast, client/server, symmetric The SNTP-Simple NTP is also defined in RFC 1769, with no fault

tolerance

Use of Synchronized clocks• Used in the implementation of at-most-once message delivery:

– Every message is sent with a connection number and a time stamp

– For each connection the recent time stamp is recorded– If any message on any connection is lower than the recorded

one, the message is discarded.• To remove old messages,

– The server removes all the messages with old time stamps older than

G=CurrentTime-MaxLifeTime-MaxClockSkew– MaxLifeTime is the max time a message can live in the

system…– MaxClockSkew is the distance from UTC.

• To recover from a crash, every T, G needs to be written to the hard disk, to be processed later, during the recovery phase….

Coordinator or Leader Election Algorithms

• Bully Algorithm– A process holds an election for the coordinator, if it

thinks coordinator is failed:• Send an election message to all the processes with higher id

numbers,

• If no one responds process declares itself as coordinator

• If on of the higher-ups answer, it withdraws from the contest

• Ring Algorithm– The process are logically or physically ordered

• Process detecting the missing coordinators sends a message down the ring, if message comes back to the sender, then it declares itself as the coordinator…

The Bully Algorithm (1)

The bully election algorithm• Process 4 holds an election• Process 5 and 6 respond, telling 4 to stop• Now 5 and 6 each hold an election

The Bully Algorithm (2)

d) Process 6 tells 5 to stope) Process 6 wins and tells everyone

A Ring Algorithm

• Election algorithm using a ring. Both 5 and 2 decide on failure of the coordinator, about the same time. Both messages make a full trip round the network.

Mutual Exclusion:

• Mutual exclusion involves execution of critical sections, one at a time, in mutual exclusion.

• In centralized systems this is achieved using semaphores, monitors, and similar constructs…

• How to establish mutual exclusion in distributed systems:

– Centralized approach– Distributed approach

Mutual Exclusion: A Centralized Algorithm

a) Process 1 asks the coordinator for permission to enter a critical region. Permission is granted

b) Process 2 then asks permission to enter the same critical region. The coordinator does not reply.

c) When process 1 exits the critical region, it tells the coordinator, it will then reply to 2…

MX:A Distributed Algorithm

a) Two processes want to enter the same critical region at the same moment. Processes 0 and 2 contend for the CR, so they send a time stamped “MX access to the resource” message to every one else.

b) Process 0 has the lowest timestamp, so it wins.c) When process 0 is done, it sends an OK also, so 2 can now enter

the critical region.

MX:A Token Ring Algorithm

a) An unordered group of processes on a network, logically numbered. b) A logical ring constructed in software, where a token is released by one of

the nodes, initially 0.– Token loss must be handled properly, with token generation algorithm.– Node failure must be handled too…

Comparisonnumber of messages per process to enter/exit a critical region

A comparison of three mutual exclusion algorithms for n odes, regarding complexity and failure or loss situation.

AlgorithmMessages per

entry/exitDelay before entry (in message times)

Problems

Centralized 3 2 Coordinator crash

Distributed 2 ( n – 1 ) 2 ( n – 1 )Crash of any process

Token ring 1 to 0 to n – 1Lost token, process crash

The Transaction Model

• Transaction model is all or nothing model.

• Analogy can be made with a discussion process going on for a project towards signing a contract. Unless the contract is signed, any party can withdraw with no harm.

• Programming with tx requires special primitives supplied by the OS, language, or a middleware. The exact list of primitives may be different for different application or system environments.

The Transaction Model (1)

Updating a daily master inventory tape is fault tolerant. If something goes wrong, every thing is redone from the

beginning, ie. rewind the tapes to the beginning and restart the process- all or nothing.

The Transaction Model (2)

Typical examples of primitives for transactions. Either all nothing between the begin and end is executed.

Primitive Description

BEGIN_TRANSACTION Make the start of a transaction

END_TRANSACTION Terminate the transaction and try to commit

ABORT_TRANSACTION Kill the transaction and restore the old values

READ Read data from a file, a table, or otherwise

WRITE Write data to a file, a table, or otherwise

The Transaction Model (3)reservation flight seat from NY to Malindi in Kenya, capitol city

Nairobi.

a) Transaction to reserve three flights commits, as three different operations

b) Transaction aborts when third flight is unavailable, during the same booking, as if nothing has happened

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi;END_TRANSACTION

(a)

BEGIN_TRANSACTION reserve WP -> JFK; reserve JFK -> Nairobi; reserve Nairobi -> Malindi full =>ABORT_TRANSACTION (b)

The Transaction Model (4)Transaction properties

a) Atomicity-indivisibility of the tx

b) Consistency-no violation of the invariants

c) Isolated-no interference between concurrent txs

d) Durable- changes are made permanent once committed

…ACID property of txs

Classification of Txs

a) Flat Txs- Txs of ACID properties discussed so far: not practical for most distributed tx applications…

b) Nested Txs- a number of logically related complementing sub-transactions form one nested tx. One problem is the level of ACID, top level parent aborts very every done child must be undone; every child’s universe becomarees the universe for the parent…

c) Distributed Txs- flat indivisible tx that operates on data that are distributed across multiple computers.

Nested and Distributed Transactions

a) A nested transactionb) A distributed transaction

Implementation

How to implement nothing or all principle in case of Dist Txs?

a) Private workspace: implemented so that individual updates can be undone without effecting the original data, defending on commit/abort

b) Writeahead log: log of changes is created throughout execution, so that commit/abort can be taken care of…

Private Workspace

a) The file index and disk blocks for a three-block fileb) The situation after a transaction has modified block 0 and appended block 3c) After committing

Writeahead Log

a) N example transaction that changes x and yb) – d) The log before each statement is executed. First

value is before the change, second value is after the change

x = 0;

y = 0;

BEGIN_TRANSACTION;

x = x + 1;

y = y + 2

x = y * y;

END_TRANSACTION;

(a)

Log

[x = 0 / 1]

(b)

Log

[x = 0 / 1]

[y = 0/2]

(c)

Log

[x = 0 / 1]

[y = 0/2]

[x = 1/4]

(d)

Concurrency Control (1)

General organization of managers for handling transactions. Top level ensures atomicity, middle level ensures consistency, bottom level ensures execution

Concurrency Control (2)

General organization of managers for handling distributed transactions.

SerializabilityFinal result of concurrent tx exec should be same for different runs, as if the txs are

sequentially executed… Concurrency control algs should synchronize tex executions…

a) – c) Three transactions T1, T2, and T3

d) Possible schedules

BEGIN_TRANSACTION x = 0; x = x + 1;END_TRANSACTION

(a)

BEGIN_TRANSACTION x = 0; x = x + 2;END_TRANSACTION

(b)

BEGIN_TRANSACTION x = 0; x = x + 3;END_TRANSACTION

(c)

Schedule 1 x = 0; x = x + 1; x = 0; x = x + 2; x = 0; x = x + 3 Legal

Schedule 2 x = 0; x = 0; x = x + 1; x = x + 2; x = 0; x = x + 3; Legal

Schedule 3 x = 0; x = 0; x = x + 1; x = 0; x = x + 2; x = x + 3; Illegal

(d)

Concurrency Control Methods

• Two-phase locking

• Pessimistic time-stamp ordering

• Optimistic time-stamp ordering

Two-phase locking-2PL-1

• Rcquire all the locks during the growing phase, release them during the shrinking phase.

– On conflict operation is delayed

– A lock is never released before the operation on the data for which the lock is set is complete

– Once a lock is released on behalf of a transaction no other lock can b granted to the same transaction

• In strict 2PL, all the acquired resource are released at the same time…This avoids cascaded aborts deadlocks

• 2PL can easily cause deadlocks to happen• Centralized and versions of distributed 2PL are possible

Two-Phase Locking (2)

Two-phase locking.

Two-Phase Locking (3)

Strict two-phase locking.

Pessimistic time-stamp ordering-1

• Every operation of a Tx is time stamped as ts by an appropriate algorithm (Lamport’s algorithm)

• Every data item in the system is time-stamped for the last read (tsR) and last write (tsW) transaction operations

• If two operations on a data item x conflict, the data manager grant the operation to the Tx with earlier ts

Pessimistic time-stamp ordering-2

• Read operation of a Tx with time-stamp ts– If ts <tsW abort the Tx– If ts>tsW allow execution and set tsR to max(ts,tsR)

• Write operation of a Tx with time-stamp ts– If ts <tsR abort the Tx– If ts>tsR allow execution and set tsW to max(ts,tsW)

Pessimistic Timestamp Ordering-3

Concurrency control using timestamps.

Optimistic time-stamp ordering

• Go ahead do whatever you want, if there is conflict during the commit handle it then: If conflicts are rare, most of the time commits take place without any problem

• This requires recording of all read and write ts on the data items, to check if any of the items have been changed during decision a commit…

• Abort, if a changed is detected, commit otherwise• This scheme has not been much research for

distributed systems…

Snapshot Protocols

• Snapshot Protocol 21. Process p0 sends “take snapshot at ” to all process and than sets

its clock to 2. when its LC reaches , pi

• records its i and immediately• sends an empty message along each outgoing channel.• Start recording messages received over each of its incoming channels

3. Pi stops recording messages first time a message with TS> is received from pj… pi declares messages received from pj as ji

• Instead of using a message “take snapshot at ” a process can record its state first time it receive a special empty message serving as a tag message.

• This is protocol 3…

Supplementary for Mullender’s book

Snapshot Protocol 2

Already covered!!!!

Snapshot Protocols

Snapshot Protocol 21. Process p0 sends “take snapshot at ” to all process and than sets

its clock to 2. when its LC reaches , pi

• records its i and immediately• sends an empty message along each outgoing channel.• Start recording messages received over each of its incoming channels

3. Pi stops recording messages first time a message with TS> is received from pj… pi declares messages received from pj as ji

– Instead of using a message “take snapshot at ” a process can record its state first time it receive a special empty message serving as a tag message.

– This is protocol 3…

Properties of Snapshots

• Any state constructed by distributed snapshot algorithm is guaranteed to be consistent. However, the actual run may not pass through the constructed states,

• yet constructed states are, but the relation related to the constructed state holds in in general…

• Order of two events in a run can be swapped to put in pre-recording post-recording order.

Properties of Global Predicates

• Once a predicate became true it remains to be true is Stability criteria for the predicate… (figure 4.16).