Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University...

37
Chapter 2 Chapter 2 Wenbing Zhao Wenbing Zhao Department of Electrical and Computer Department of Electrical and Computer Engineering Engineering Cleveland State University Cleveland State University [email protected] [email protected] Building Dependable Building Dependable Distributed Systems Distributed Systems Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Transcript of Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University...

Page 1: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Chapter 2Chapter 2

Wenbing ZhaoWenbing ZhaoDepartment of Electrical and Computer EngineeringDepartment of Electrical and Computer Engineering

Cleveland State UniversityCleveland State University

[email protected]@ieee.org

Building Dependable Building Dependable Distributed SystemsDistributed Systems

Building Dependable Distributed Systems, Copyright Wenbing Zhao 1

Page 2: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

OutlineOutline Checkpointing and logging

System models Checkpoint-based protocols

Uncoordinted checkpointing Coordinated checkpointing

Logging-based protocols Pessimistic logging Optimistic logging Causal logging

Page 3: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Checkpointing and Logging:Checkpointing and Logging: Checkpointing and logging are the most essential

techniques to achieve dependability By themselves, they provide rollback recovery They are used for more sophisticated dependability

schemes Checkpoint: a copy of the system state

Can be used to recover the system to the state when the checkpoint was taken

Checkpointing: the action of taking a copy of the system state, typically periodically

Logging: log incoming/outgoing messages, etc.

Page 4: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Building Dependable Distributed Systems, Building Dependable Distributed Systems, Copyright Wenbing ZhaoCopyright Wenbing Zhao Wenbing ZhaoWenbing Zhao

Rollback Recovery vs. Rollback Recovery vs. Rollforward RecoveryRollforward Recovery

Page 5: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

System Models

Distributed system model Global state: consistent, inconsistent Distributed system model redefined Piecewise deterministic assumption Output commit Stable storage

Building Dependable Distributed Systems, Copyright Wenbing Zhao 5

Page 6: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

System Models

Distributed system A DS consists of N

processes A process may interact with

other processes only by means of sending and receiving messages

A process may interact with another process within the DS, or a process in the outside world

Fault Model: fail stop

Building Dependable Distributed Systems, Copyright Wenbing Zhao 6

Page 7: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

System Models Process state

Defined by its entire address space in OS

Relevant info can be captured by user-supplied APIs

Global state The state of the entire

distributed systems Not a simple aggregation of

the states of the processes

Building Dependable Distributed Systems, Copyright Wenbing Zhao 7

Page 8: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Capturing Global State Global state can be captured using a set of individual

checkpoints Inconsistent state: checkpoints reflects message

received but not sent

Building Dependable Distributed Systems, Copyright Wenbing Zhao 8

Page 9: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Capturing Global State: Example

P0: bank account A, P1: bank account B

m0: deposit $100 to B (after A has debited A)

P0 takes checkpoint C0 before debit op P1 takes checkpoint C1 after depositing

$100 Scenario: P0 crashes after sending m0,

and P1 crashes after taking C1 If the global state is reconstructed

based on C0 and C1, it would appear that P1 got $100 from nowhere

Typos: p17, section 2.1.2 & p18, example

2.1: Figure 2.2(a) => Figure 2.2(c)

Building Dependable Distributed Systems, Copyright Wenbing Zhao 9

Page 10: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Capturing Global State: Example

P0 takes checkpoint C0 after sending m0 (reflect debit of $100)

P1 takes checkpoint C1 after depositing $100

Dependency of P0 and P1 is captured by C0 and C1

Global state can be reconstructed based on C0 and C1 correctly

Typos: p19, example 2.1: Figure 2.2(b) => Figure 2.2(a)Figure 2.2(c) => Figure 2.2(b)

Building Dependable Distributed Systems, Copyright Wenbing Zhao 10

Page 11: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Capturing Global State: Example P0 takes checkpoint C0 after sending m0 (reflect debit of $100)

P1 takes checkpoint C1 before receiving m0 but after sending m1

P2 takes checkpoint C3 before receiving m1

If using C0, C1, C3 to reconstruct global state, it would appear that m0 is sent but not received Debit $100 from A, but not deposited to B

However, the reconstructed global state is still regarded as consistent because this state could have happened: m0 and m1 are still in transit

=> channel state

Building Dependable Distributed Systems, Copyright Wenbing Zhao 11

Page 12: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Distributed System Model Redefined A distributed system consists of the following: A set of N processes

Each process consists of a set of states and a set of events One of the states is the initial state The change of states is caused by an event

A set of channels Each channel is a uni-directional reliable communication channel between

two processes The state of a channel consists of the set of messages in transit in the

channel A pair of neighboring processes are connected by a pair of channels, one in each direction.

An event (such as the sending or receiving of a message) at a process may change the state of the process and the state of the channel it is associated with, if any

Building Dependable Distributed Systems, Copyright Wenbing Zhao 12

Page 13: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Back on the Global State Example Global state consists of C0, C1, and C2 Channel state from P0 to P1:

m0 Channel state from P1 to P2:

m1

Building Dependable Distributed Systems, Copyright Wenbing Zhao 13

Page 14: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Piecewise Deterministic Assumption Using checkpoints to restore system state (after a crash)

would mean that any execution after a checkpoint is lost Logging of events in between two checkpoints would

ensure full recovery Piecewise deterministic assumption:

All nondeterministic events can be identified Sufficient information (referred to as determinant) that can be

used to recreate the event deterministic must be logged for each event

Examples: receiving of a message, system calls, timeouts, etc. Note that the sending of a message is not a nondeterministic

event (it is determined by another nondeterministic event or the initial state)

Building Dependable Distributed Systems, Copyright Wenbing Zhao 14

Page 15: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Output Commit

Once a message is sent to the outside world, the state of the distributed system may be exposed to the outside world

Should a failure occur, the outside world cannot be relied upon for recovery

Output commit problem: To ensure that the recovered state is consistent with the external view, sufficient recovery information must be logged prior to the sending of a message to the outside world.

Building Dependable Distributed Systems, Copyright Wenbing Zhao 15

Page 16: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Stable Storage

Checkpoints and events must be logged to stable storage that can survive failures for recovery

Various forms of stable storage Redundant disks: RAID-1, RAID-5 Replicated file systems: GFS

Building Dependable Distributed Systems, Copyright Wenbing Zhao 16

Page 17: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Checkpoint-Based Protocols

Uncoordinated protocols Coordinated protocols

Building Dependable Distributed Systems, Copyright Wenbing Zhao 17

Page 18: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Uncoordinated Checkpointing Uncoordinated checkpointing: full autonomy,

appears to be simple. However, we do not recommend it for two reasons Checkpoints taken might not be useful to reconstruct a

consistent global state Cascading rollback to the initial state (domino effect)

To enable the selection of a set of consistent checkpoints during a recovery, the dependency of checkpoints has to be determined and recorded together with each checkpoint Extra overhead and complexity => not simple after all

Page 19: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Cascading Rollback Problem Last checkpoint: C1,1 by P1,

before P1 crashed Cannot use C0,1 at P0

because it is inconsistent with C1,1 => P0 rollbacks to C0,0

Cannot use C2,1 at P2 because it fails to reflect the sending of m6 => P2 rollbacks to C2,0

Cannot use C3,1 and C3,0 as a result => P3 rollbacks to initial state

Page 20: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Cascading Rollback Problem The rollback of P3 to initial

state would invalidate C2,0 => P2 rollbacks to initial state

P1 rollbacks to C1,0 due to the rollback of P2 to initial state

This would invalidate the useof C0,0 at P0 => P0 rollbacks to initial state

The rollback of P0 to initial state would invalidate the use of C1,0 at P1 => P1 rollbacks to initial state

Page 21: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol One of the processes is designated as the coordinator Others are participants The coordinator uses a two-phase commit protocol for

consistency on the checkpoints Global checkpointing is carried out atomically: all or nothing First phase: create a quiescent point of the distributed system Second phase: ensure the atomic switchover from old checkpoint

to the new one

Page 22: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol Control messages for coordination

CHECKPOINT message: initiate a global checkpoint & to create quiescent point

SAVED message: to inform the coordinator that local checkpoint is done by participant

FAULT message: a timeout occurred, global checkpointing should abort

RESUME message: to inform participants that it is time to resume normal operation

CHECKPOINT certificate: keep track if received it from each incoming channel Certificate complete: when a CHECKPOINT msg is received from

every incoming channel

Page 23: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol Sending Control messages

CHECKPOINT message: send to every outgoing channel SAVED message: only to upstream link, i.e., the process from

which one receives the CHECKPOINT msg the first time FAULT message:

If originated from the process, send to every outgoing channel If received from a process, send to all outgoing channel except the one that

connects to the process from which it receives the FAULT msg

RESUME message: For the coordinator (msg originator): send to all outgoing channels For a participant, send to all outoing channel except the one that connects to

the process from which it receives the RESUME msg.

Page 24: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing ProtocolTypos: p24, figure 2.4, p25,

figure 2.5Final state machine

=> Finite state machine

Page 25: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol

SAVED: send to up stream node

Page 26: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol: Example

P0 channel state: m0 P1 channel state: m1 P2 channel state: empty

Page 27: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Tamir and Sequin Global Checkpointing Protocol: Proof of Correctness The protocol produces consistent global state Proof: a consistent global state consists of only two

scenarios: All msgs sent by one process prior to its taking a local checkpoint

have been received prior to the other process taking its local checkpointing This is the case if no process sends any msg after the global checkpoint is

initiated Some msgs sent by one process prior to its taking a local

checkpoint might arrive after the other process has checkpointed its state, but they are logged for replay Msgs received after the initiation of global checkpointing are logged, but not

executed, ensuring this property Note that if a process fails, the global checkpointing would abort

Page 28: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Chandy and Lamport Distributed Snapshot Protocol CL snapshot protocol is a nonblocking protocol

TS checkpointing protocol is blocking CL protocol is more desirable for applications that do not wish to

suspect normal operation However, CL protocol is only concerned how to obtain a

consistent global checkpoint CL Protocol: no coordinator, any node may initiate a global

checkpointing

Data structure Marker message: equivalent to the CHECKPOINT message Marker certificate: keep track to see a marker is received from

every incoming channel

Page 29: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

CL Distributed Snapshot Protocol

Page 30: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Example

P0 channel state: m0 (p1 to p0 channel) P1 channel state: m1 (p2 to p1 channel) P2 channel state: empty

Page 31: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Comparison of TS & CL Protocols Similarity Both rely on control msgs to

coordinate checkpointing Both capture channel state in

virtually the same way Start logging channel state upon

receiving the 1st checkpoint msg from another channel

Stop logging channel state after received checkpoint on the incoming channel

Communication overhead similar

Page 32: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Comparison of TS & CL Protocols Differences: strategies in producing a global

checkpoint TS protocol suspends normal operation upon 1st

checkpoint msg while CL does not TS protocol captures channel state prior to taking a

checkpoint, while CL captures channel state after taking a checkpoint

TS protocol more complete and robust than CL Has fault handling mechanism

Page 33: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Log Based Protocols Work might be lost upon recovery using checkpoint-

based protocols By logging messages, we may be able to recover the

system to where it was prior to the failure System mode: the execution of a process is modeled as

a set of consecutive state intervals Each interval is initiated by a nondeterministic state or initial state We assume the only type of nondeterministic event is receiving

of a message

Page 34: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Log Based Protocols In practice, logging is always used together wit checkpointing

Limits the recovery time: start with the latest checkpoint instead of from the initial state

Limits the size of the log: after taking a checkpoint, previously logged events can be purged

Logging protocol types: Pessimistic logging: msgs are logged prior to execution Optimistic logging: msgs are logged asynchronously Causal logging: nondeterministic events that not yet logged (to stable

storage) are piggybacked with each msg sent

For optimistic and causal logging, dependency of processes has to be tracked => more complexity, longer recovery time

Page 35: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Pessimistic Logging

Synchronously log every incoming message to stable storage prior to execution

Each process periodically checkpoints its state: no need for coordination

Recovery: a process restores its state using the last checkpoint and replay all logged incoming msgss

Page 36: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Pessimistic Logging: Example

Pessimistic logging can cope with concurrent failures and the recovery of two or more processes

Page 37: Chapter 2 Wenbing Zhao Department of Electrical and Computer Engineering Cleveland State University wenbing@ieee.org Building Dependable Distributed Systems.

Benefits of Pessimistic Logging Processes do not need to track their dependencies

Logging mechanism is easy to implement and less error prone

Output commit is automatically ensured No need to carry out coordinated global checkpointing

By replaying the logged msgs, a process can always bring itself to be consistent with other processes

Recovery can be done completely locally Only impact to other processes: duplicate msgs (can be

discarded)