Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable...

55
Fault Tolerance

Transcript of Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable...

Page 1: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Fault Tolerance

Page 2: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Agenda

• Overview• Introduction to Fault Tolerance• Process Resilience• Reliable Client-Server communication• Reliable group communication• Distributed commit• Recovery• Summary

Page 3: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Overview• Introduction to Fault Tolerance

– Basic Concepts– Failure Modes– Failure Masking

• Process Resilience– Design Issues

• Reliable Communication– P2P Communication– Client Server Communication (RPC, RMI)– Group Communication (Multicasting)

• Distributed Commit– Multi Phase Commit (Two & Three Phase)

• Recovery Techniques– Check Pointing, Message Logging

Page 4: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Basic Concepts (1/3)

• What is Failure?– System is said to be in failure state when it cannot meet its

promise.• Why do Failure occurs?

– Failures occurs because of the error state of the system.• What is the reason for Error?

– The cause of an error is called a fault• Is there some thing ‘Partial Failure’?• Faults can be Prevented, Removed and Forecasted.• Can Faults be Tolerated by a system also?

Page 5: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Basic Concepts (2/3)• What characteristics makes a system Fault Tolerant?

– Availability: System is ready to used immediately.– Reliability: System can run continuously without failure.– Safety: Nothing catastrophic happens if a system

temporarily fails.– Maintainability: How easy a failed system can be repaired.– Dependability: ???

• What is the reliability and availability of following systems?– If a system goes down for one millisecond every hour– If a System never crashes but is shut down for two weeks

every August.

Page 6: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Basic Concepts (3/3)• Classification of Faults

– Transient: Occurs once and than disappears --A flying bird obstructing the transmitting

waves signals

– Intermittent: Occurs, vanishes on its own accord, than reappears and so on

-- A loosely connected power plug

– Permanent: They occurs and doesn’t vanish until fixed manually.

-- Burnt out chips

Page 7: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Faults in Distributed Systems

• If in a Distributed Systems some fault occurs, the error may by in any of – The collection of servers or – Communication Channel or – Even both

• Dependency relations appear in abundance in DS.

• Hence, we need to classify failures to know how serious a failure actually is.

Page 8: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure ModelsType of Failure

Description

Crash Failure A server halts, but is working correctly until it halts

Omission Failure•Receive omission•Send omission

A server fails to respond to incoming requestsA server fails to receive incoming messagesA server fails to send messages

Timing Failure A server's response lies outside the specified time interval

Response Failure•Value•State Transition

The server's response is incorrectThe value of the response is wrongThe server deviates from the correct flow of control

Arbitrary Failure

A server may produce arbitrary responses at arbitrary times

Page 9: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure Masking by Redundancy (1/3)• A system to be fault tolerant, the best it can

do is try to hide the occurrence of failure from other processes

• Key technique to masking faults is to use Redundancy.– Information redundancy: Extra bits are added to allow

recovery from garbled bits

– Time redundancy: An action is performed, and then, if need be, it is performed again.

– Physical redundancy: Extra equipment or processes are added

Page 10: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure Masking by Redundancy (2/3)

• Some Examples of Redundancy Schemes– Hamming Code– Transactions– Replicated Processes or Components– Aircraft has four engines, can fly with only three– Sports game has extra referee.

Page 11: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure Masking by Redundancy (3/3)

• Triple modular redundancy:– If two or three of the input are the same, the output is

equal to that input. – If all three inputs are different, the output is undefined.

Figure: Fault Tolerance in Electronic Circuits

Page 12: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

• Suppose that element Az fails. Each of the voters, Vb Vz, and V3 gets two good (identical) inputs and one rogue input, and each of them outputs the correct value to the second stage.

• In essence, the effect of Az failing is completely masked, so that the inputs to B I, Bz, and B3 are exactly the same as they would have been had no fault occurred.

• Now consider what happens if B3 and C1 are also faulty, in addition to Az· These effects are also masked, so the three final outputs are still correct.

Page 13: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Process Resilience• Problem:

– How fault tolerance in distributed system is achieved, especially against Process Failures?

• Solution: – Replicating processes into groups. – Groups are analogous to Social Organizations.– Consider collections of process as a single abstraction– All members of the group receive the same message, if

one process fails, the others can take over for it.– Process groups are dynamic and a Process can be

member of several groups. – Hence we need some management scheme for groups.

Page 14: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Process Groups (1/2)

• Flat Group• Advantage: Symmetrical and has no single point failure• Disadvantage: Decision making is more complicated. Voting

• Hierarchical Group• Advantage: Make decision without bothering others• Disadvantage: Lost coordinator Entire group halts

Flat Group vs. Hierarchical Group

Page 15: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Process Groups (2/2)• Group Server (Client Server Model)

– Straight forward, simple and easy to implement– Major disadvantage Single point of failure

• Distributed Approach (P2P Model)– Broadcast message to join and leave the group– In case of fault, how to identify between a really dead

and a dead slow member– Joining and Leaving must be synchronized on joining

send all previous messages to the new member– Another issue is how to create a new group?

Group Membership

Page 16: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure Masking & Replication• Replicate Process and organize them into groups • Replace a single vulnerable process with the whole fault tolerant

Group• A system is said to be K fault tolerant if it can survive faults in K

components and still meet its specifications.• How much replication is needed to support K Fault Tolerance?

– K+1 or 2K+1 ?• Case:

1) If K processes stop, then the answer from the other one can be used.K+1

2) If meet Byzantine failure, the number is 2K+1 Problem?

Page 17: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Agreement in Faulty Systems

• Why we need Agreements?• Goal of Agreement

– Make all the non-faulty processes reach consensus on some issue

– Establish that consensus within a finite number of steps.

• Problems of two cases– Good process, but unreliable communication

• Example: Two-army problem

– Good communication, but crashed process• Example: Byzantine generals problem

Page 18: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Byzantine generals problem

The Byzantine generals problem for 3 loyal generals and1 traitor.a) The generals announce their troop strengths (in units of 1 thousand

soldiers).b) The vectors that each general assembles based on (a)c) The vectors that each general receives in step 3.

Page 19: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

• In Fig. 8-5 we illustrate the working of the algorithm for the case of N = 4 and k = 1.

• For these parameters, the algorithm operates in four steps. In step 1, every nonfaulty process i sends Vi to every other process using reliable unicasting.

• Faulty processes may send anything. Moreover, because we are using multicasting, they may send different values to different processes. Let Vi =i.

• In Fig. 8-5(a) we see that process 1 reports 1, process 2 reports 2, process 3 lies to everyone, giving x, y, and z, respectively, and process 4 reports a value of 4.

• In step 2, the results of the announcements of step 1 are collected together in the form of the vectors of Fig. 8-5(b).

Page 20: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

• Step 3 consists of every process passing its vector from Fig. 8-5(b) to every other process.

• In this way, every process gets three vectors, one from every other process. Here, too, process 3 lies, inventing 12 new values, a through 1.The results of step 3 are shown in Fig. 8-5(c).

• Finally, in step 4, each process examines the ith element of each of the newly received vectors.

• If any value has a majority, that value is put into the result vector. If no value has a majority, the corresponding element of the result vector is marked UNKNOWN. From Fig. 8-5(c) we see that 1, 2, and 4 all come to agreement on the values for VI, v 2, and v 4, which is

• the correct result. What these processes conclude regarding v 3 cannot be decided, but is also irrelevant. The goal of Byzantine agreement is that consensus is reached on the value for the nonfaulty processes only

Page 21: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Go forward one more step

The same as in previous slide, except now with 2 loyal generals and one traitor.

Lamport proved that in a system with m faulty processes, agreement can be achieved only if 2m+1 correctly functioning processes are present, for a total of 3m+1.

More than two-thirds agreement

Page 22: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure Detection• Failure detection is one of the cornerstones of fault tolerance in

distributed systems. • What it all boils down to is that for a group of processes, nonfaulty

members should be able to decide who is still a member, and who is not.• When it comes to detecting process failures, there are essentially only

two• mechanisms. Either processes actively send "are you alive?" messages to

each• other (for which they obviously expect an answer), or passively wait until

messages come in from different processes.• The latter approach makes sense only when it can be guaranteed that

there is enough communication between processes.• In practice, actively pinging processes is usually followed.• A timeout mechanism is used to check whether a process has failed

Page 23: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Reliable client-server communication

• TCP masks omission failures– … by using ACKs & retransmissions

• … but it does not mask crash failures !– E.g.: When a connection is broken, the client is only

notified via an exception

What about reliable point-to-point transport protocols ?

Page 24: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Five classes of failures in RPC• Client is unable to locate server

– Binding exception• … at the expense of transparency

• Request message is lost– Is it safe to retransmit ?

• Allow server to detect it is dealing with a retry

• Server crashes after receiving a request• Reply message is lost• Client crashes after sending a request

Page 25: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Server Crashes (I)

A server in client-server communicationa) Normal caseb) Crash after execution c) Crash before execution

Page 26: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Server Crashes (II)

• At-least-once semantics– Client keeps retransmitting until it gets a response

• At-most-once semantics– Give up immediately & report failure

• Guarantee nothing

Ideal would be exactly-once semantics•… no general way to arrange this !

Page 27: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Server Crashes (III)• Print server scenario:

– M: server’s completion message• Server may send M either before or after printing

– P: server’s print operation– C: server’s crash

• Possible event orderings:– M P C– M C ( P)– P M C– P C ( M)– C ( P M)– C ( M P)

Page 28: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Server Crashes (IV)

Different combinations of client & server strategies in the presence of server crashes.

Client Server

Strategy M -> P Strategy P -> M

Reissue strategy MPCMC(P)

C(MP) PMCPC(M)

C(PM)

Always DUP OK OK DUP DUP OK

Never OK ZERO ZERO OK OK ZERO

Only when ACKed DUP OK ZERO DUP OK ZERO

Only when not ACKed

OK ZERO OK OK DUP OK

No combination of client & server strategy is correct for all cases !

Page 29: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Lost Reply Messages

• Is it safe to retransmit the request ?– Idempotent requests

• Example: Read a file’s first 1024 bytes• Counterexample: money transfer order

• Assign sequence number to request– Server keeps track of client’s most recently

received sequence #– … additionally, set a RETRANSMISSION bit in the

request header

Page 30: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Client Crashes (I)• Orphan computation:

– No process waiting for the result• Waste of resources (CPU cycles, locks)• Possible confusion upon client’s recovery

• 4 alternative strategies proposed by Nelson (1981)

• Extermination:– Client keeps log of requests to be issued

• Upon recovery, explicitly kill orphans

– Overhead of logging (for every RPC)– Problems with grand-orphans– Problems with network partitions

Page 31: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Client Crashes (II)• Reincarnation:

– Divide time up into epochs (period of time)(sequentially numbered)– Upon reboot, client broadcasts start-of-epoch

• Upon receipt, all remote computations on behalf of this client are killed• After a network partition, an orphan’s response will contain an obsolete

epoch number easily detected• Gentle reincarnation:

– Upon receipt of start-of-epoch, each server checks to see if it has any remote computations

• If the owner cannot be found, the computation is killed• Expiration:

– Each RPC is given a time quantum T to complete• … must explicitly ask for another if it cannot finish in time • After reboot, client only needs to wait a time T …• How to select a reasonable value for T ?

Page 32: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Reliable group communication

Page 33: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Basic Reliable-Multicasting Schemes

A simple solution to reliable multicasting when all receivers are known & are assumed not to fail

a) Message transmissionb) Reporting feedback

Page 34: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Scalability in Reliable Multicasting

• The scheme described above can not support large numbers of receivers .

• Reason: Feedback Implosion Receivers are spread across a wide-area

network• Solution: Reduce the number of feedback

messages that are returned to the sender.• Model: Feedback suppression

Page 35: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Nonhierarchical Feedback Control

Several receivers have scheduled a request for retransmission, but the first retransmission request leads to the suppression of others.

Page 36: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Hierarchical Feedback Control

The essence of hierarchical reliable multicasting:a) Each coordinator forwards the message to its children.b) A coordinator handles retransmission requests.

Page 37: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Atomic Multicast• We need to achieve reliable multicasting in the presence of

process failures.

• Atomic multicast problem: a message is delivered to either all processors or to none

at all all messages are delivered in the same order to all

processes

• Virtually synchronous reliable multicasting offering totally-ordered delivery of messages is called atomic multicasting

Page 38: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Virtual Synchrony (I)

The logical organization of a distributed system to distinguish between message receipt and message delivery

Page 39: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Virtual Synchrony (II)• Reliable multicast guarantees that a message multicast to

group view G is delivered to each nonfaulty process in G. • If the sender of the message crashes during the multicast,

the message may either be delivered to all remaining processes, or ignored by each of them.

• A reliable multicast with this property is said to be virtually synchronous

• All multicasts take place between view changes. A view change acts as a barrier across which no multicast can pass

Page 40: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Virtual Synchrony (III)

The principle of virtual synchronous multicast.

Page 41: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Implementing Virtual Synchrony

a) Process 4 notices that process 7 has crashed, sends a view change

b) Process 6 sends out all its unstable messages, followed by a flush message

c) Process 6 installs the new view when it has received a flush message from everyone else

Page 42: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Message Ordering (I)1. Unordered multicast

2. FIFO-ordered multicast

Process P1 Process P2 Process P3

sends m1 receives m1 receives m2

sends m2 receives m2 receives m1

Process P1 Process P2 Process P3 Process P4

sends m1 receives m1 receives m3 sends m3

sends m2 receives m3 receives m1 sends m4

receives m2 receives m2

receives m4 receives m4

Page 43: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

3. Reliable causally-ordered multicast delivers messages so that potential causality between different messages is preserved

4. Total-ordered delivery

Multicast Basic Message Ordering Total-ordered Delivery?

Reliable multicast None No

FIFO multicast FIFO-ordered delivery No

Causal multicast Causal-ordered delivery No

Atomic multicast None Yes

FIFO atomic multicast FIFO-ordered delivery Yes

Causal atomic multicast Causal-ordered delivery Yes

Message Ordering (II)

Page 44: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

AgendaIntroduction to Fault ToleranceProcess ResilienceReliable Client-Server communicationReliable group communicationDistributed commitRecoverySummary

Page 45: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Two-phase Commit (I)

• Process crashes other processes may be indefinite waiting for a message This protocol can easily fail

timeout mechanisms are used

a) The finite state machine for the coordinator in 2PC.b) The finite state machine for a participant.

Page 46: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure handling in 2PC • Participant times out waiting for

coordinator’s Request-to-prepare– It decide to abort.

• Coordinator times out waiting for a participant’s vote– It decides to abort.

• A participant that voted Prepared times out waiting for the coordinator’s decision– It’s blocked.– Use a termination protocol to decide what to do.– Native termination protocol – wait until

coordinator recovers.• The coordinator times out waiting for ACK

message– It must resolicit them, so it can forget the decision

Page 47: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

• Participant Wait• In INIT -timeout -abort

Page 48: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Participant waits in Ready State

Page 49: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Actions by coordinatorwhile START _2PC to local log;multicast VOTE_REQUEST to all participants;while not all votes have been collected { wait for any incoming vote; if timeout { while GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants; exit; } record vote;}if all participants sent VOTE_COMMIT and coordinator votes COMMIT{ write GLOBAL_COMMIT to local log; multicast GLOBAL_COMMIT to all participants;} else { write GLOBAL_ABORT to local log; multicast GLOBAL_ABORT to all participants;}

Page 50: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Actions by participant write INIT to local log;

wait for VOTE_REQUEST from coordinator;if timeout { write VOTE_ABORT to local log; exit;}if participant votes COMMIT { write VOTE_COMMIT to local log; send VOTE_COMMIT to coordinator; wait for DECISION from coordinator; if timeout { multicast DECISION_REQUEST to other participants; wait until DECISION is received; /* remain blocked */ write DECISION to local log; } if DECISION == GLOBAL_COMMIT write GLOBAL_COMMIT to local log; else if DECISION == GLOBAL_ABORT write GLOBAL_ABORT to local log;} else { write VOTE_ABORT to local log; send VOTE ABORT to coordinator;}

Page 51: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.
Page 52: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

3 Phase Commit Protocol (I)• Problem of 2PC

– In some cases, participants cannot reach a final decision

– What case?

• Solution1. No single state from which to make a transition to

either COMMIT or ABORT.2. No state in which not possible to make a final

decision and from which a transition to COMMIT can be made

Page 53: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Three-phase Commit Protocol (II)

• Introduction of another Phase

a) Finite state machine for the coordinator in 3PCb) Finite state machine for a participant

Page 54: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure handling in 3PC (I)

• Coordinator Timeout/Recovery– Coordinator

• WAIT: timeout

– > ABORT• PRECOMMIT: timeout

– > COMMIT– Recovering participant?

Page 55: Fault Tolerance. Agenda Overview Introduction to Fault Tolerance Process Resilience Reliable Client-Server communication Reliable group communication.

Failure handling in 3PC

• Participant Timeout/Recovery• Participant

• INIT: timeout

– > ABORT• PRECOMMIT: timeout

– > COMMIT

• READY :Timeout• Q : INIT -abort• Q: abort - abort• Q:prtecommit -commit• Q:ready -abort